-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add license check transform #257
base: dev
Are you sure you want to change the base?
add license check transform #257
Conversation
378a170
to
b539226
Compare
transforms/code/license_check/python/src/dpk_license_check_python/internal/__init__.py
Outdated
Show resolved
Hide resolved
transforms/code/license_check/python/src/license_check_local_pure.py
Outdated
Show resolved
Hide resolved
transforms/code/license_check/python/src/license_check_local_python.py
Outdated
Show resolved
Hide resolved
transforms/code/license_check/python/src/dpk_license_check_python/transform_config.py
Outdated
Show resolved
Hide resolved
3b48683
to
2066cb8
Compare
transforms/code/license_check/python/src/license_check_local_python.py
Outdated
Show resolved
Hide resolved
transforms/code/license_check/python/src/license_check_transform.py
Outdated
Show resolved
Hide resolved
transforms/code/license_check/python/src/license_check_transform.py
Outdated
Show resolved
Hide resolved
transforms/code/license_check/python/test-data/expected/metadata.json
Outdated
Show resolved
Hide resolved
@shivdeep-singh-ibm This module (folder) should be called license filtering |
@shivdeep-singh-ibm it seems that the make[4]: Leaving directory '/home/runner/work/data-prep-kit/data-prep-kit/transforms/code/ingest_2_parquet'
Using recursive workflow-build rule in license_check/
make[4]: Entering directory '/home/runner/work/data-prep-kit/data-prep-kit/transforms/code/license_check'
make -C kfp_ray/v1 workflow-build
make[5]: *** kfp_ray/v1: No such file or directory. Stop. |
2758803
to
7b7597c
Compare
41bcc4f
to
d3d6b41
Compare
transforms/code/license_check/python/src/license_check_transform.py
Outdated
Show resolved
Hide resolved
transforms/code/license_check/python/src/license_check_transform.py
Outdated
Show resolved
Hide resolved
transforms/code/license_check/python/src/license_check_transform.py
Outdated
Show resolved
Hide resolved
transforms/code/license_check/ray/src/license_check_local_ray.py
Outdated
Show resolved
Hide resolved
transforms/code/license_check/ray/test/test_license_check_ray.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor doc/readme comment. otherwise changes good to me.
@@ -0,0 +1,13 @@ | |||
# License Check | |||
|
|||
The License Check transform checks if the '`license` of input data is in approved/denied list as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence looks incomplete. The transformer verifies and marks the sample accordingly in the output right? should we convey that in this description.
c39d8ea
to
09221dd
Compare
@shivdeep-singh-ibm suggest you merge dev into this branch given some recent issues with versioning. See #355 |
bdf4f98
to
cff9a6d
Compare
transforms/code/license_check/python/src/license_check_transform.py
Outdated
Show resolved
Hide resolved
transforms/code/license_check/python/src/license_check_transform.py
Outdated
Show resolved
Hide resolved
`--lc_license_column_name` - set the name of the column holds license to process | ||
`--lc_allow_no_license` - allow entries with no associated license (default: false) | ||
`--lc_licenses_file` - S3 or local path to allowed/denied licenses JSON file | ||
`--lc_deny_licenses` - allow all licences except those in licenses_file (default: false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because you have your own DataAccessFactory, aren't there some lc_data params that would allow reading the license file from S3, for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A hidden feature, to see the --help text for a given transform
make .transforms.test-image-help
This should show the -lc_data_* options
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no additional data options used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I'm missing something, the code below from line 134 of this file will add the --lc_data_* parameters (specifically the last line below).
# Create the DataAccessFactor to use CLI args
self.daf = DataAccessFactory(CLI_PREFIX, False)
# Add the DataAccessFactory parameters to the transform's configuration parameters.
self.daf.add_input_params(parser)
3ffc512
to
c43993e
Compare
c43993e
to
24121c3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addtion to my specific comments. a more global comment is to rename this to license_select which then matches proglang_select naming.
|
||
# distribution versions is the same as image version. | ||
set-versions: | ||
$(MAKE) TRANSFORM_PYTHON_VERSION=${LICENSE_CHECK_RAY_VERSION} .transforms.set-versions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This now back to requiring TOML_VERSION. For example, from noop
$(MAKE) TRANSFORM_PYTHON_VERSION=$(NOOP_PYTHON_VERSION) TOML_VERSION=$(NOOP_PYTHON_VERSION) .transforms.set-versions ```
@@ -0,0 +1,11 @@ | |||
# License Check | |||
|
|||
The License Check transform checks if the `license` of input data is in approved/denied list. It is implemented as per the set of [transform project conventions](../../README.md#transform-project-conventions) the following runtimes are available: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you be more specific and say it selects only the specified licenses and filters out the other licenses..
task_image = "quay.io/dataprep1/data-prep-kit/license_check-ray:0.4.0.dev6" | ||
|
||
# components | ||
base_kfp_image = "quay.io/dataprep1/data-prep-kit/kfp-data-processing:0.2.0.dev6" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think you need to run make set-versions
as these are old. Probably should be 0.2.1.dev0 everywhere
|
||
# distribution versions is the same as image version. | ||
set-versions: | ||
$(MAKE) TRANSFORM_PYTHON_VERSION=${LICENSE_CHECK_PYTHON_VERSION} .transforms.set-versions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This now back to requiring TOML_VERSION. For example, from noop
$(MAKE) TRANSFORM_PYTHON_VERSION=$(NOOP_PYTHON_VERSION) TOML_VERSION=$(NOOP_PYTHON_VERSION) .transforms.set-versions ```
set-versions: | ||
$(MAKE) TRANSFORM_PYTHON_VERSION=${LICENSE_CHECK_PYTHON_VERSION} .transforms.set-versions | ||
|
||
build-dist:: set-versions .defaults.build-dist |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove set-versions
here. The convention now is that this generally only to be called from the top of the repo outside of the makefiles (unless you get out of sync, like you are in the the _wf.py file and maybe others)
|
||
## Summary | ||
|
||
This filter scans the license column of an input dataset and appends the `license_status` column to the dataset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you be more specific and say it selects only the specified licenses and filters out the other licenses. (same comment as on the top level readme)
Then | ||
|
||
ls output | ||
To see results of the transform. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should add the same last paragraph that all transforms now have...https://github.com/IBM/data-prep-kit/tree/dev/transforms/universal/noop/python#transforming-data-using-the-transform-image
set-versions: | ||
$(MAKE) TRANSFORM_PYTHON_VERSION=${LICENSE_CHECK_RAY_VERSION} .transforms.set-versions | ||
|
||
build-dist:: set-versions .defaults.build-dist |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove set-versions
here
Also, not sure how far behind dev this is. It's probably a good idea to merge with dev |
ca9a1c4
to
fed1461
Compare
Signed-off-by: Shivdeep Singh <Shivdeep.Singh@ibm.com>
fed1461
to
e4d6efa
Compare
Why are these changes needed?
Add license check filter
Related issue number (if any).