-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ML refactoring #551
ML refactoring #551
Conversation
touch a file to rerun bm
experiment/src/data_loader.py
Outdated
# line_index_set.add((index[0], index[1])) | ||
# rules_set.update(line_data["RuleName"]) | ||
# if not reference_line_data: | ||
# reference_line_data = copy.deepcopy(line_data) | ||
|
||
# remove to reduce memory usage | ||
# line_data.pop("line_num") | ||
# line_data.pop("path") | ||
# line_data.pop("value_end") | ||
|
||
values.append(line_data) | ||
# values = list(detected_data.values()) | ||
|
||
# for _, i in meta_data.items(): | ||
# if i["Used"] is True: | ||
# continue | ||
# elif i["GroundTruth"] == 'T' \ | ||
# and any(x in rules_set for x in i["Category"].split(':')) \ | ||
# and (i["FilePath"], i["LineStart"]) in line_index_set \ | ||
# and 0 <= i["ValueStart"] < i["ValueEnd"]: | ||
# print(f"NOT FOUND:{i}") | ||
# markup_data = { | ||
# "line": None, # read | ||
# "line_num": i["LineStart"], # not used | ||
# "path": i["FilePath"], | ||
# "value": None, | ||
# "value_start": i["ValueStart"], # remove | ||
# "value_end": i["ValueEnd"], # remove | ||
# "variable": None, # ??? | ||
# 'RuleName': (x for x in i["Category"].split(':') if x in line_index_set), | ||
# 'GroundTruth': 'T', | ||
# 'ext': Util.get_extension(i["FilePath"]), | ||
# 'type': i["FilePath"].split('/')[-2] | ||
# } | ||
# assert markup_data.keys() == reference_line_data.keys(), reference_line_data.keys() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems old code left.. Please remove if you don't need it.
@@ -798,7 +798,6 @@ def test_param_n(self) -> None: | |||
def test_param_p(self) -> None: | |||
# internal parametrized tests for quick debug | |||
items = [ # | |||
("prod.py", b"secret_api_key='Ah\\tga%$FiQ@Ei8'", "secret_api_key", "Ah\\tga%$FiQ@Ei8"), # |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems this PR makes some FPs...
Why those cases are removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
\t
sign did not appear in train set, so the sequence gives low ml probability. I'll update the sample instead removing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for refactor the ML model.
I have one suggestion about typo, please check below.
Co-authored-by: ShinHyung Choi <sh519.choi@samsung.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ML model has been refactored.
LGTM 👍
Description
Please include a summary of the change and which is fixed.
How has this been tested?
Please describe the tests that you ran to verify your changes.