Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial import restructure #451

Merged
merged 72 commits into from
Sep 30, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
72 commits
Select commit Hold shift + click to select a range
1765ced
Speed improvements for initial import of data
P-T-I Aug 6, 2020
8367971
black formatting
P-T-I Aug 6, 2020
da48802
black formatting
P-T-I Aug 7, 2020
a64729c
Added speed improvements for initial import
P-T-I Aug 7, 2020
b6cebe4
Added further multiprocessing
P-T-I Aug 10, 2020
9506f88
added queues and multiprocessing
P-T-I Aug 10, 2020
9bd6693
Added tqdm and ijson requirements
P-T-I Aug 11, 2020
0f2d4db
Added logging and file extension specific classes
P-T-I Aug 11, 2020
c45741c
added requirements ijson and tqdm
P-T-I Aug 12, 2020
d6c9745
moved updates of info collection to DownloadHandler
P-T-I Aug 12, 2020
3d33a8c
Added VIADownloads class for update optimalization
P-T-I Aug 12, 2020
f75931c
set exit code on errors to 1
P-T-I Aug 12, 2020
6863224
added venv and .idea folders to ignore
P-T-I Aug 12, 2020
fdf5d0e
Set debug print to every 10 cycles
P-T-I Aug 13, 2020
3b782cc
Minor
P-T-I Aug 13, 2020
171836b
Minor
P-T-I Aug 13, 2020
49eadd2
Added different handlers
P-T-I Aug 13, 2020
a269c28
Added different handlers
P-T-I Aug 13, 2020
9eef164
Added different handlers
P-T-I Aug 13, 2020
fdc804c
Reset insert to original
P-T-I Aug 13, 2020
1cedb94
minor changes
P-T-I Aug 13, 2020
48c6d31
added additional logging
P-T-I Aug 13, 2020
f09b017
Refactor
P-T-I Aug 13, 2020
7adde3a
added database action class
P-T-I Aug 15, 2020
3f173e9
added redis queue as a replacement of multiprocessing queue
P-T-I Aug 15, 2020
1ced1be
Moved download_site method to DownloadHandler.py
P-T-I Aug 15, 2020
89b1b10
added RedisQueue
P-T-I Aug 15, 2020
5620eb3
added db (9) for redis queue
P-T-I Aug 15, 2020
963fd00
added process_item to XMLFileHandler class
P-T-I Aug 17, 2020
292d61a
added method to retrieve the entire redis list
P-T-I Aug 17, 2020
5156473
added process_item to DownloadHandler class
P-T-I Aug 17, 2020
b8ce094
changed process_item method
P-T-I Aug 17, 2020
c24266f
added process methods to class instead
P-T-I Aug 17, 2020
cbb96cf
methods refactor
P-T-I Aug 17, 2020
839a7a1
Separate file for xml Content Handlers
P-T-I Aug 17, 2020
3983659
Separate file for source process classes
P-T-I Aug 17, 2020
ffc9b88
refactor
P-T-I Aug 17, 2020
9d8ab05
moved process classes to separate file
P-T-I Aug 17, 2020
59a64cd
modified update doc versus insert doc
P-T-I Aug 17, 2020
196b1d9
refactor and unified logging with process classes
P-T-I Aug 17, 2020
3cae9fd
refactor and unified logging with process classes
P-T-I Aug 17, 2020
730ada6
added debug counter from processing items from file
P-T-I Aug 18, 2020
0343dc3
added debug counter from processing items from file every 1000 items
P-T-I Aug 18, 2020
572e5a1
fixed misspelled method (getCVEID instead of getCVEIDs) and black for…
P-T-I Aug 18, 2020
97682a1
added logging
P-T-I Aug 18, 2020
350f987
added logging and tqdm progressbar
P-T-I Aug 18, 2020
793b441
added CPERedisBrowser class
P-T-I Aug 18, 2020
775745b
Moved logic to process class
P-T-I Aug 18, 2020
ad9b438
Set JSON file progress debug logging to every 5000 items
P-T-I Aug 18, 2020
4c5f1b8
Import refactor and minor edit
P-T-I Aug 18, 2020
695d9a0
Unified logging with updater and black formatting
P-T-I Aug 18, 2020
9c3c7fe
Added description to tqdm progressbar from CPERedisBrowser class
P-T-I Aug 18, 2020
4996556
Changed logger name
P-T-I Aug 18, 2020
3cd1343
Added additional log entries
P-T-I Aug 18, 2020
17e6046
Moved DatabaseIndexer to separate class in Sources_process.py
P-T-I Aug 18, 2020
f780d56
Moved DatabaseIndexer to separate class in Sources_process.py
P-T-I Aug 18, 2020
0d0bbc9
rebase
P-T-I Aug 27, 2020
3bd117a
Corrected update error
P-T-I Aug 28, 2020
4c180ae
added variable interval counter for debug logging
P-T-I Aug 28, 2020
5fc944f
Merge branch 'master' into import_impr
P-T-I Sep 10, 2020
c2067af
troubleshooting build error on feedformatter version
P-T-I Sep 10, 2020
9c4fa08
added auto creation of log dir
P-T-I Sep 10, 2020
ad15d36
added jsonpickle requirement
P-T-I Sep 10, 2020
d6d62d9
fixed syntax warnings
P-T-I Sep 14, 2020
7080749
merging
P-T-I Sep 22, 2020
3a348a4
Merge branch 'up_master' into import_impr
P-T-I Sep 24, 2020
8cc7011
Merge branch 'up_master' into import_impr
P-T-I Sep 24, 2020
1e0465f
Merge from master
P-T-I Sep 28, 2020
a3b2b98
Minor adjustment travis.yml
P-T-I Sep 28, 2020
e37b8b7
Fix for missing last-modified field in cve documents
P-T-I Sep 29, 2020
af1893c
Fix for missing last-modified field in cve documents
P-T-I Sep 29, 2020
ea18e88
Final fix for missing field
P-T-I Sep 29, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Binary file added .coverage
Binary file not shown.
151 changes: 14 additions & 137 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,143 +1,20 @@
# Created by .ignore support plugin (hsz.mobi)
### Python template
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
bin/__pycache__/
lib/__pycache__/
sbin/__pycache__/

# C extensions
*.so
log/
ssl/
tmp/
plugins/
indexdir/
web/templates/plugins/

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
etc/configuration.ini
etc/plugins.ini
etc/plugins.txt

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version

# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock

# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/
.gitignore

.idea
venv
5 changes: 3 additions & 2 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,6 @@ jobs:
- ./db_mgmt_json.py -p
- ./db_mgmt_cpe_dictionary.py -p
- popd
after_success:
- codecov
- stage: unit tests
python: nightly
script: pytest --cov-report term --cov=./ test/unit -v
Expand Down Expand Up @@ -57,3 +55,6 @@ notifications:
email:
on_success: change
on_failure: change

after_success:
- codecov
2 changes: 2 additions & 0 deletions doc/html/Installation.html
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,8 @@ <h2>Requirements</h2>
<li>Jinja2</li>
<li>itsdangerous</li>
<li>click</li>
<li>ijson</li>
<li>tqdm</li>
</ul>
</p>
<h2>Setting up CVE-Search</h2>
Expand Down
2 changes: 2 additions & 0 deletions doc/markdown/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,8 @@ After installing these packages, you need to install some python modules
* Jinja2
* itsdangerous
* click
* ijson
* tqdm

## Setting up CVE-Search
Before setting up CVE-Search, you have to make sure the scripts are
Expand Down
1 change: 1 addition & 0 deletions etc/configuration.ini.sample
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
Host: localhost
Port: 6379
Password: RedisPassword
redisQ: 9
VendorsDB: 10
NotificationsDB: 11
RefDB: 12
Expand Down
16 changes: 16 additions & 0 deletions lib/Config.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ class Configuration:
default = {
"redisHost": "localhost",
"redisPort": 6379,
"redisQ": 9,
"redisVendorDB": 10,
"redisNotificationsDB": 11,
"redisRefDB": 12,
Expand Down Expand Up @@ -186,6 +187,21 @@ def getRedisRefConnection(cls):
decode_responses=True,
)

@classmethod
def getRedisQConnection(cls):
redisHost = cls.getRedisHost()
redisPort = cls.getRedisPort()
redisDB = cls.readSetting("Redis", "redisQ", cls.default["redisQ"])
redisPass = cls.readSetting("Redis", "Password", cls.default["redisPass"])
return redis.StrictRedis(
host=redisHost,
port=redisPort,
db=redisDB,
password=redisPass,
charset="utf-8",
decode_responses=True,
)

# Flask
@classmethod
def getFlaskHost(cls):
Expand Down
26 changes: 11 additions & 15 deletions lib/DatabaseLayer.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,7 @@ def updateCVE(cve):
"last-modified": cve["Modified"],
}
},
upsert=True,
)


Expand All @@ -105,24 +106,21 @@ def bulkInsert(collection, data):
bulk.execute()


def bulkUpdate(collection, data):
if len(data) < 1:
return

def bulkUpdate(collection, data, total):
batch = []
count = 1000
print("Beginning bulk update of {} items".format(str(len(data))))
count = 0
print("Beginning bulk update of {} items".format(total))
for d in data:
batch.append(UpdateOne({"id": d["id"]}, {"$set": d}, upsert=True))
batch.append(d)
if len(batch) == 1000:
count += 1000
db[collection].bulk_write(batch)
print("Updated {}/{} items".format(str(count), str(len(data))))
print("Updated {}/{} items".format(count, total))
batch = []
count += 1000
if len(batch) > 0:
db[collection].bulk_write(batch)
count += len(batch)
print("Updated {}/{} items.".format(str(count), str(len(data))))
print("Updated {}/{} items".format(count, total))
print("Finished bulk update.")


Expand All @@ -137,9 +135,9 @@ def dropCollection(col):


def getTableNames():
return db.collection_names()
# return db.collection_names()
# jdt_NOTE: collection_names() is depreated, list_collection_names() should be used instead
# return db.list_collection_names()
return db.list_collection_names()


# returns True if 'target_version' is less or equal than
Expand Down Expand Up @@ -169,9 +167,7 @@ def cvesForCPE(cpe, lax=False, vulnProdSearch=False):
return []

cpe_regex = cpe

final_cves = []

cpe_searchField = (
"vulnerable_product" if vulnProdSearch else "vulnerable_configuration"
)
Expand Down Expand Up @@ -205,7 +201,6 @@ def cvesForCPE(cpe, lax=False, vulnProdSearch=False):
cves = colCVE.find(
{cpe_searchField: {"$regex": cpe_regex, "$options": "i"}}
).sort("Modified", -1)

i = 0

for cve in cves:
Expand Down Expand Up @@ -240,6 +235,7 @@ def cvesForCPE(cpe, lax=False, vulnProdSearch=False):

# default strict search
cves = colCVE.find({cpe_searchField: {"$regex": cpe_regex}})

final_cves = cves

final_cves = sanitize(final_cves)
Expand Down