Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Docker Containerization with Run Script for CLI Tool Execution #364

Closed
wants to merge 182 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
182 commits
Select commit Hold shift + click to select a range
72b7b28
initial lsh implementation
sacca97 Oct 10, 2022
c9deda6
keywords extraction test using spacy related to #331, fixes #329
sacca97 Oct 10, 2022
47c6627
fixed json report generation to handle sets and file extensions when …
sacca97 Oct 10, 2022
385cdf7
initial lsh implementation
sacca97 Oct 10, 2022
ee41a3a
add datasketch to requirements.txt
sacca97 Oct 10, 2022
699e393
rules refactoring, now every rule is a sub-class of the class Rule
sacca97 Oct 12, 2022
5125090
finished rules refactoring, added Minhash to Commit: reworked backend…
sacca97 Oct 12, 2022
cae9ddd
code cleanup, report code refactoring
sacca97 Oct 13, 2022
57dfe5f
added find_twin method, testing phase
sacca97 Oct 13, 2022
2a19c86
minor addons and cleanup
sacca97 Oct 13, 2022
43a2a4f
Major refactoring, cleanup and optimization. WIP
sacca97 Oct 13, 2022
5969930
continuing refactoring and fixes
sacca97 Oct 13, 2022
ceded21
almost finished Git, raw_commit and exec refactoring and cleaning
sacca97 Oct 14, 2022
4a60aca
other refactoring, fixed logging level and refactored it
sacca97 Oct 14, 2022
cf4d4b6
fixed typos, updated pytest files, modified db structure and updated …
sacca97 Oct 17, 2022
d1a64de
modified logger class to wrap the default python logger
sacca97 Oct 17, 2022
76e694a
fixed logger, updated some default parameters
sacca97 Oct 17, 2022
52304df
adjusted some imports to match new refactoring, adjusted some types
sacca97 Oct 17, 2022
0590cfe
re-fixed logger
sacca97 Oct 17, 2022
6308672
updated method names and tests, added verb extraction from text
sacca97 Oct 17, 2022
0d7f33b
added automatic filtering for different file extension at git level w…
sacca97 Oct 17, 2022
0ec74c6
minor fixes to raw commit class
sacca97 Oct 17, 2022
5c4cd1d
finished git rework, added setup.py to allow external imports
sacca97 Oct 18, 2022
15ab84e
updated setup.py
sacca97 Oct 18, 2022
4a3c24f
updated makefile and .gitignore
sacca97 Oct 20, 2022
2c77cc2
modified and updated logger
sacca97 Oct 20, 2022
4d8d3e3
reworked RawCommit and Git to reduce the calls to subprocess, overall…
sacca97 Oct 20, 2022
56db736
minor changes
sacca97 Oct 20, 2022
c45a5a8
fixed report json export
sacca97 Oct 20, 2022
c20f560
added lemmatization to spacy, optimized and cleaned jira and github r…
sacca97 Oct 20, 2022
d0791f6
updated helper method to match nlp extraction format
sacca97 Oct 20, 2022
5408d7f
fixed a "black hole" bug that was causing some commits to disappear
sacca97 Oct 20, 2022
1a9455e
fixed imports and modified names to match real usage
sacca97 Oct 20, 2022
ebbf237
fixed imports and modified the names to match real usage
sacca97 Oct 20, 2022
751a582
deleted some useless stuff, renamed test files
sacca97 Oct 24, 2022
09ae1c0
updated test filenames. TODO: Fix stuff
sacca97 Oct 24, 2022
55e5d1a
updated lsh helpers method
sacca97 Oct 25, 2022
1666e01
back to basemodel usage
sacca97 Oct 25, 2022
5103c6a
added xlm parser for jira issues
sacca97 Oct 25, 2022
54c3092
updated rules with twins
sacca97 Oct 25, 2022
6d29d54
report handlers now create also the nested directories
sacca97 Oct 25, 2022
59cf94e
refactoring. now handling large requests to the preprocessed_commits …
sacca97 Oct 25, 2022
704575c
minor updates and optimizations
sacca97 Oct 25, 2022
294259a
finished refactoring and optimization only calling "git log" once for…
sacca97 Oct 25, 2022
981cb3d
minor updates to match other changes
sacca97 Oct 25, 2022
06b5a9e
updated database structure, again
sacca97 Oct 25, 2022
09c3ab2
update to match get_hunks
sacca97 Oct 25, 2022
714ed7b
update to python 3.10
sacca97 Nov 2, 2022
47807d9
updated html report format
sacca97 Nov 2, 2022
217e0d5
now jira issues are accessed using XML API
sacca97 Nov 2, 2022
f969c9e
updated rule messages and fully functioning twin commits
sacca97 Nov 2, 2022
79d35f0
reworked communication with backend
sacca97 Nov 2, 2022
969c193
fixed bug in retrieving preprocessed commits
sacca97 Nov 2, 2022
ddaad02
skip github when fetching references, we already do it separately
sacca97 Nov 2, 2022
b3636da
added github apis, fine-tuned word extraction
sacca97 Nov 2, 2022
1439ae6
allow single-hunk commits
sacca97 Nov 2, 2022
9976746
code cleanup
sacca97 Nov 2, 2022
e1733ae
fine-tuned security words extraction
sacca97 Nov 2, 2022
7ed2569
kinda-final lsh implementation
sacca97 Nov 2, 2022
c4e1b8c
back to full basemodel
sacca97 Nov 2, 2022
001455f
adapted tests
sacca97 Nov 3, 2022
0ce19ad
minor changes
sacca97 Nov 3, 2022
da0556a
changed gh workflows for python 3.10
sacca97 Nov 3, 2022
b6d24b4
updated github actions for python
sacca97 Nov 3, 2022
6373a59
updated github actions for python
sacca97 Nov 3, 2022
0f43290
omegaconf first implementation
sacca97 Nov 3, 2022
ebffabd
git cache now set from config file and git class handles the folder c…
sacca97 Nov 7, 2022
9e058e3
added sample config file and updated .gitignore
sacca97 Nov 7, 2022
d8496e6
updated makefile
sacca97 Nov 7, 2022
95d918a
updatd docker settings and backend, now downloading NVD advisories on…
sacca97 Nov 7, 2022
2ca147d
use omegaconf+argparse to parse settings
sacca97 Nov 7, 2022
c8b1cfd
code formatting
sacca97 Nov 7, 2022
f4ada04
advisory versions extracted from the dedicated section (if available)…
sacca97 Nov 7, 2022
167eaad
modified report to match advisory changes
sacca97 Nov 7, 2022
91c2e69
main switch to omegaconf
sacca97 Nov 7, 2022
4a5ff16
updated config-sample.yaml
sacca97 Nov 7, 2022
87413ce
removed old main.py file
sacca97 Nov 7, 2022
149110b
vuln_id is now cve_id
sacca97 Nov 7, 2022
60bd7e1
tests failing for new config.yaml
sacca97 Nov 7, 2022
60c9bfb
test
sacca97 Nov 7, 2022
2c5b82a
trying to fix pytest
sacca97 Nov 7, 2022
ff4f38c
added E999 to gh workflow exception for flake8, doesn't like match st…
sacca97 Nov 7, 2022
52883b7
fixed some bugs in the backend and missing Github token in NLP
sacca97 Nov 7, 2022
b5d8048
updated makefile: removed mkdirs
sacca97 Nov 7, 2022
343379b
merged omegaconf with aggregated commits, all rebased on main
sacca97 Nov 4, 2022
ef7a010
only commits relevant to our version are shown, if there is a twin is…
sacca97 Nov 4, 2022
383dc89
removed old import, fixed tests
sacca97 Nov 7, 2022
1bdbac8
fixed commit twins counting itself
sacca97 Nov 7, 2022
be636ae
removed twins from postgresql
sacca97 Nov 7, 2022
f50faa6
check if tags are none before using them
sacca97 Nov 7, 2022
7927179
updated .gitignore
sacca97 Nov 7, 2022
8a9d30f
updated README, makefile and config-sample
sacca97 Nov 8, 2022
9d69e34
updated omegaconf and default parameters to match new stuff
sacca97 Nov 8, 2022
d9e6409
TESTING: new implementation of version_to_tag method
sacca97 Nov 8, 2022
7c3fc35
removed self.commit id from twins of the same commit
sacca97 Nov 8, 2022
8d89d1c
added extra strip for tags. Writing no-tag in report when no tag is a…
sacca97 Nov 8, 2022
d3559cd
add check on the interval if an open end is left
sacca97 Nov 8, 2022
d5a926e
updated makefile with pre-commit in dev option
sacca97 Nov 10, 2022
c77e1e7
moved parse_config out of client/cli, to be used also in the backend …
sacca97 Nov 10, 2022
3865913
lowered jaccard similarity for twins to match same title with differe…
sacca97 Nov 10, 2022
67534fb
commented out unused code, changed third party references to dict
sacca97 Nov 10, 2022
98976ca
changed references to dict
sacca97 Nov 10, 2022
b3c715e
temporary fix to exclude short words
sacca97 Nov 10, 2022
907a3f4
now computing hash on the first 50 chars of the commit msg, added fil…
sacca97 Nov 10, 2022
520c2f1
better find tag: now filtering to look for our exact fixed tag version
sacca97 Nov 10, 2022
418e2f9
testing extraction of important stuff from pages linked in the advisory
sacca97 Nov 10, 2022
2a1a218
updated some rule messages
sacca97 Nov 10, 2022
5f2fe3b
small refactoring and fixes
sacca97 Nov 10, 2022
991a802
modified report frontend
sacca97 Nov 10, 2022
fb03643
add check if the NVD api key is wrong
sacca97 Nov 10, 2022
85f05fb
added tags hint when multiple possible tags are found from versions
sacca97 Nov 10, 2022
0718d77
updated rules, added relevant method in diffs
sacca97 Nov 14, 2022
fc97fd8
major rework of the html report layout
sacca97 Nov 14, 2022
fdff0fd
version to tag improved
sacca97 Nov 14, 2022
1003707
added option to set tags during git output parsing (slow)
sacca97 Nov 14, 2022
eee3005
added console print methods
sacca97 Nov 14, 2022
dbc23bc
renamed report methods
sacca97 Nov 14, 2022
81a7714
small fixes and improvement in nlp code
sacca97 Nov 14, 2022
f86b835
extracting version from the apposite section of the NVD json response…
sacca97 Nov 14, 2022
fe11ae0
remove commented code, small fixes
sacca97 Nov 14, 2022
264eccc
fixed some stuff
sacca97 Nov 14, 2022
2cd2c34
commented out unused code, other minor changes
sacca97 Nov 22, 2022
ef2834e
new logic for version_to_tag method
sacca97 Nov 22, 2022
7f15147
cleaned some code and added a time delta of tolerance when extracting…
sacca97 Dec 2, 2022
23f33c5
refactored main and report
sacca97 Dec 5, 2022
99b2e6e
various refactoring, brought back files extension filtering, unified …
sacca97 Dec 5, 2022
742db77
prepared new rules, cleaned Git and raw_commit code, optimized NLP co…
sacca97 Dec 5, 2022
3d3f622
minor changes
sacca97 Dec 8, 2022
dcfa6f3
updated test classes
sacca97 Dec 8, 2022
950d0bc
fix git test problem
sacca97 Dec 9, 2022
09f1412
updated backend to fully use config.yaml
sacca97 Dec 9, 2022
bdf1653
fixed imports
sacca97 Dec 9, 2022
6aeb89b
added redis to config.yaml
sacca97 Dec 9, 2022
2b2ebe8
Added find twins only features when commit id is in the advisory
sacca97 Dec 12, 2022
0c8deb6
Added find twins only features when commit id is in the advisory
sacca97 Dec 12, 2022
75c0cf6
Merge branch 'commit-in-adv' of github.com:SAP/project-kb into commit…
sacca97 Dec 12, 2022
3723ff7
fixed and separated commit in adv and in references, trying to exclud…
sacca97 Jan 22, 2023
ecede62
deleted unused stuff, helpers file might become useless at a point
sacca97 Jan 22, 2023
cd86a33
improved version-to-tag conversion, added file logging
sacca97 Jan 22, 2023
9199767
add clarifications comment
sacca97 Jan 22, 2023
c5388b6
updated non-relevant-files list
sacca97 Jan 22, 2023
b2797ce
fixed and updated webpage scraping
sacca97 Jan 22, 2023
eaee50d
put tags back, added twins to as_dict method to have them in the json…
sacca97 Jan 22, 2023
61f5fc7
added redhat.com domain to references to check, modified references a…
sacca97 Jan 22, 2023
d6c6ff8
added warning when fixing commit is based on advisory references
sacca97 Jan 22, 2023
7fcc5cb
added exit code to run in batch on large datasets, added twins-lookup…
sacca97 Jan 22, 2023
6c64e72
added RelevantWordsInMsg rule, changed relevances to logaritmic scale
sacca97 Jan 25, 2023
d660b4c
prepared filtering for logging filtered out commits
sacca97 Jan 25, 2023
34b7775
nlp extension filtering in a bad way
sacca97 Jan 25, 2023
375a3f9
updated advisory fixing commit extraction
sacca97 Jan 25, 2023
56f5889
removed useless prints, set some flags to default TRUE
sacca97 Jan 25, 2023
91b00d9
updated filtering in html
sacca97 Jan 25, 2023
88b59f6
feb 2023 study
sacca97 Feb 7, 2023
cc89f9e
updated gitignore
sacca97 Feb 7, 2023
487e227
optimized references fetching and commits extraction
sacca97 Mar 2, 2023
74dcb3c
prepared to use also MITRE APIs
sacca97 Mar 2, 2023
3f05672
various cleanup and fixes, added silent mode and changed behavior whe…
sacca97 Mar 2, 2023
f9970c8
project-kb execution
sacca97 Mar 6, 2023
ee0469d
project-kb execution
sacca97 Mar 6, 2023
7dc605c
project-kb execution
sacca97 Mar 6, 2023
987c301
project-kb execution
sacca97 Mar 6, 2023
9307e4f
Sacchetti thesis
sacca97 Mar 14, 2023
bc76de9
Refactor prospector code structure
matteogreek Apr 6, 2023
6b48e00
fix file path
matteogreek Apr 6, 2023
50f3204
Updated test files, fixed backend container to include new generated …
matteogreek Apr 11, 2023
5f6aa51
Merge branch 'main' into commit-in-adv
matteogreek Apr 11, 2023
e463308
Update python.yml
copernico Apr 13, 2023
d26b4ec
Update python.yml
copernico Apr 13, 2023
7f06dac
Merge branch 'main' into commit-in-adv
copernico Apr 13, 2023
9a70578
Update python.yml
copernico Apr 13, 2023
c6a06ae
Update python.yml
matteogreek Apr 13, 2023
075e270
Update python.yml
matteogreek Apr 13, 2023
27c5ceb
Update python.yml
matteogreek Apr 13, 2023
14a8409
Update python.yml
matteogreek Apr 13, 2023
dc32b38
Update python.yml
matteogreek Apr 13, 2023
7fede42
Update python.yml
matteogreek Apr 13, 2023
c813d47
Update python.yml
copernico Apr 17, 2023
fcd9683
Added run script. Containerized CLI version.
matteogreek May 26, 2023
6cb0211
Merge branch 'commit-in-adv' of https://github.com/matteogreek/projec…
matteogreek May 26, 2023
bd77008
Fixed pytest failures
matteogreek May 27, 2023
cdb1533
Merge branch 'commit-in-adv'
matteogreek May 27, 2023
261cb12
Fix ConsoleWriter error. Changed returning type in advisory.extract_h…
matteogreek May 27, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions prospector/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ dev-setup: setup requirements-dev.txt
@echo "$(DONE) Installed development requirements"

docker-setup:
docker build -t prospector-base:1.0 -f ./docker/Dockerfile .
docker-compose up -d --build

docker-clean:
Expand Down
9 changes: 3 additions & 6 deletions prospector/cli/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,10 +30,7 @@

def main(argv): # noqa: C901
with ConsoleWriter("Initialization") as console:
print("before config: ", argv)
config = get_configuration(argv)
print("after config: ", config.cve_id)

if not config:
logger.error("No configuration file found. Cannot proceed.")

Expand All @@ -46,7 +43,7 @@ def main(argv): # noqa: C901
logger.setLevel(config.log_level)
logger.info(f"Global log level set to {get_level(string=True)}")

if config.cve_id is None:
if config.vuln_id is None:
logger.error("No vulnerability id was specified. Cannot proceed.")
console.print(
"No configuration file found.",
Expand All @@ -64,10 +61,10 @@ def main(argv): # noqa: C901
logger.debug("Using the following configuration:")
pretty_log(logger, config.__dict__)

logger.debug("Vulnerability ID: " + config.cve_id)
logger.debug("Vulnerability ID: " + config.vuln_id)

results, advisory_record = prospector(
vulnerability_id=config.cve_id,
vulnerability_id=config.vuln_id,
repository_url=config.repository,
publication_date=config.pub_date,
vuln_descr=config.description,
Expand Down
4 changes: 2 additions & 2 deletions prospector/config-sample.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,8 @@ nvd_token: Null
use_backend: optional

# Optional backend info to save/use already preprocessed data
backend: http://backend:8000
#backend: http://localhost:8000
#backend: http://backend:8000
backend: http://localhost:8000

database:
user: postgres
Expand Down
14 changes: 7 additions & 7 deletions prospector/core/prospector.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ def prospector( # noqa: C901
git_cache: str = "/tmp/git_cache",
limit_candidates: int = MAX_CANDIDATES,
rules: List[str] = ["ALL"],
tag_commits: bool = False,
tag_commits: bool = True,
silent: bool = False,
) -> Tuple[List[Commit], AdvisoryRecord] | Tuple[int, int]:
if silent:
Expand Down Expand Up @@ -143,7 +143,7 @@ def prospector( # noqa: C901
with ExecutionTimer(
core_statistics.sub_collection("commit preprocessing")
) as timer:
with ConsoleWriter("\nPreprocessing commits") as writer:
with ConsoleWriter("\nProcessing commits") as writer:
try:
if use_backend != "never":
missing, preprocessed_commits = retrieve_preprocessed_commits(
Expand All @@ -158,7 +158,7 @@ def prospector( # noqa: C901
)
if use_backend == "always":
print("Backend not reachable: aborting")
sys.exit(0)
sys.exit(1)
print("Backend not reachable: continuing")

if "missing" not in locals():
Expand All @@ -169,7 +169,7 @@ def prospector( # noqa: C901
# preprocessed_commits += preprocess_commits(missing, timer)

pbar = tqdm(
missing, desc="Preprocessing commits", unit="commit", disable=silent
missing, desc="Processing commits", unit="commit", disable=silent
)
start_time = time.time()
with Counter(
Expand Down Expand Up @@ -215,7 +215,7 @@ def preprocess_commits(commits: List[RawCommit], timer: ExecutionTimer) -> List[
counter.initialize("preprocessed commits", unit="commit")
for raw_commit in tqdm(
commits,
desc="Preprocessing commits",
desc="Processing commits",
unit=" commit",
):
counter.increment("preprocessed commits")
Expand Down Expand Up @@ -321,8 +321,8 @@ def retrieve_preprocessed_commits(

def save_preprocessed_commits(backend_address, payload):
with ExecutionTimer(core_statistics.sub_collection(name="save commits to backend")):
with ConsoleWriter("Saving preprocessed commits to backend") as writer:
logger.debug("Sending preprocessing commits to backend...")
with ConsoleWriter("Saving processed commits to backend") as writer:
logger.debug("Sending processing commits to backend...")
try:
r = requests.post(
backend_address + "/commits/",
Expand Down
Loading