Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ML refactoring #551

Merged
merged 104 commits into from
May 28, 2024
Merged

ML refactoring #551

merged 104 commits into from
May 28, 2024

Conversation

babenek
Copy link
Contributor

@babenek babenek commented May 3, 2024

Description

Please include a summary of the change and which is fixed.

  • ML refactoring
  • add line, variable lstm layers
  • retrain the model

ml_model

20240520_225355

How has this been tested?

Please describe the tests that you ran to verify your changes.

  • UnitTest
  • Benchmark

@babenek babenek marked this pull request as ready for review May 21, 2024 13:23
@babenek babenek requested a review from a team as a code owner May 21, 2024 13:23
Comment on lines 166 to 199
# line_index_set.add((index[0], index[1]))
# rules_set.update(line_data["RuleName"])
# if not reference_line_data:
# reference_line_data = copy.deepcopy(line_data)

# remove to reduce memory usage
# line_data.pop("line_num")
# line_data.pop("path")
# line_data.pop("value_end")

values.append(line_data)
# values = list(detected_data.values())

# for _, i in meta_data.items():
# if i["Used"] is True:
# continue
# elif i["GroundTruth"] == 'T' \
# and any(x in rules_set for x in i["Category"].split(':')) \
# and (i["FilePath"], i["LineStart"]) in line_index_set \
# and 0 <= i["ValueStart"] < i["ValueEnd"]:
# print(f"NOT FOUND:{i}")
# markup_data = {
# "line": None, # read
# "line_num": i["LineStart"], # not used
# "path": i["FilePath"],
# "value": None,
# "value_start": i["ValueStart"], # remove
# "value_end": i["ValueEnd"], # remove
# "variable": None, # ???
# 'RuleName': (x for x in i["Category"].split(':') if x in line_index_set),
# 'GroundTruth': 'T',
# 'ext': Util.get_extension(i["FilePath"]),
# 'type': i["FilePath"].split('/')[-2]
# }
# assert markup_data.keys() == reference_line_data.keys(), reference_line_data.keys()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems old code left.. Please remove if you don't need it.

@@ -798,7 +798,6 @@ def test_param_n(self) -> None:
def test_param_p(self) -> None:
# internal parametrized tests for quick debug
items = [ #
("prod.py", b"secret_api_key='Ah\\tga%$FiQ@Ei8'", "secret_api_key", "Ah\\tga%$FiQ@Ei8"), #
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this PR makes some FPs...
Why those cases are removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\t sign did not appear in train set, so the sequence gives low ml probability. I'll update the sample instead removing

@babenek babenek requested a review from csh519 May 27, 2024 12:42
Copy link
Contributor

@csh519 csh519 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for refactor the ML model.
I have one suggestion about typo, please check below.

experiment/src/prepare_data.py Outdated Show resolved Hide resolved
@babenek babenek requested a review from csh519 May 28, 2024 10:11
Copy link
Contributor

@csh519 csh519 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ML model has been refactored.

LGTM 👍

@babenek babenek merged commit d16a3a5 into Samsung:main May 28, 2024
27 of 28 checks passed
@babenek babenek deleted the ml branch May 28, 2024 10:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants