Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
LIN-351 Include lineapy.save() in code slice (#634)
* Include lineapy.save() in code slice * Do not include lineapy.save() in code slice by default; but DO include it in pipeline building * Start slicing from .save() to include all ancestors including lineapy import * Use consistent input param type * If not applicable (e.g., lineapy.file_system), use original sink * Update tests related to pipeline building * Identify .save() statement more precisely * Use a clearer param name * Fix mypy issue * require lineapy when dumping pipelines because we have that in now. * Fix typos * pull out the de-lineazing functions out into utils. (added a new file api_utils to prevent circular dependency thing. Utils is big enough now to warrant a subpackage) * De-Lineate (i.e., use non-Linea serialization) code slices for pipeline building * Add tutorials to docs * mock path example * Ignore cell output for CI * Update pipeline tests * Add tutorials to docs * sample mock for ipython tests * Update other IPython-related tests * no dependency on lineapy * add test case to run the script to check if it works or not!! * fix the missing import pickle on top * fix the snapshots that are affected. Co-authored-by: Shardul Sardesai <shardul@linea.ai> Co-authored-by: Yifan Wu <yifan1030@gmail.com>
- Loading branch information
1 parent
9c16686
commit 272e94a
Showing
22 changed files
with
288 additions
and
146 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
apache-airflow==2.2.4 | ||
pandas | ||
scikit-learn==1.0.2 | ||
SQLAlchemy==1.3.24 | ||
sklearn | ||
SQLAlchemy==1.3.24 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
import re | ||
|
||
from lineapy.db.db import RelationalLineaDB | ||
|
||
|
||
def de_lineate_code(code: str, db: RelationalLineaDB) -> str: | ||
""" | ||
De-linealize the code by removing any lineapy api references | ||
""" | ||
lineapy_pattern = re.compile( | ||
r"(lineapy.(save\(([\w]+),\s*[\"\']([\w\-\s]+)[\"\']\)|get\([\"\']([\w\-\s]+)[\"\']\).get_value\(\)))" | ||
) | ||
# init swapped version | ||
|
||
def replace_fun(match): | ||
if match.group(2).startswith("save"): | ||
# FIXME - there is a potential issue here because we are looking up the artifact by name | ||
# This does not ensure that the same version of current artifact is being looked up. | ||
# We support passing a version number to the get_artifact_by_name but it needs to be parsed | ||
# out in the regex somehow. This would be simpler when we support named versions when saving. | ||
dep_artifact = db.get_artifact_by_name(match.group(4)) | ||
path_to_use = db.get_node_value_path( | ||
dep_artifact.node_id, dep_artifact.execution_id | ||
) | ||
return f'pickle.dump({match.group(3)},open("{path_to_use}","wb"))' | ||
|
||
elif match.group(2).startswith("get"): | ||
# this typically will be a different artifact. | ||
dep_artifact = db.get_artifact_by_name(match.group(5)) | ||
path_to_use = db.get_node_value_path( | ||
dep_artifact.node_id, dep_artifact.execution_id | ||
) | ||
return f'pickle.load(open("{path_to_use}","rb"))' | ||
|
||
swapped, replaces = lineapy_pattern.subn(replace_fun, code) | ||
if replaces > 0: | ||
# If we replaced something, pickle was used so add import pickle on top | ||
# Conversely, if lineapy reference was removed, potentially the import lineapy line is not needed anymore. | ||
remove_pattern = re.compile(r"import lineapy\n") | ||
match_pattern = re.compile(r"lineapy\.(.*)") | ||
swapped = "import pickle\n" + swapped | ||
if match_pattern.search(swapped): | ||
# we still are using lineapy.xxx functions | ||
# so do nothing | ||
pass | ||
else: | ||
swapped, lineareplaces = remove_pattern.subn("\n", swapped) | ||
# logger.debug(f"Removed lineapy {lineareplaces} times") | ||
|
||
# logger.debug("replaces made: %s", replaces) | ||
|
||
return swapped |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.