Introduction

Over 15 thousand documents govern how the Department of Defense (DoD) operates. The documents exist in different repositories, often exist on different networks, are discoverable to different communities, are updated independently, and evolve rapidly. No single ability has ever existed that would enable navigation of the vast universe of governing requirements and guidance documents, leaving the Department unable to make evidence-based, data-driven decisions. Today GAMECHANGER offers a scalable solution with an authoritative corpus comprising a single trusted repository of all statutory and policy driven requirements based on Artificial-Intelligence (AI) enabled technologies.

Vision

Fundamentally changing the way in which the DoD navigates its universe of requirements and makes decisions

Mission

GAMECHANGER aspires to be the Department’s trusted solution for evidence-based, data-driven decision-making across the universe of DoD requirements by:

Building the DoD’s authoritative corpus of requirements and policy to drive search, discovery, understanding, and analytic capabilities
Operationalizing cutting-edge technologies, algorithms, models and interfaces to automate and scale the solution
Fusing best practices from industry, academia, and government to advance innovation and research
Engaging the open-source community to build generalizable and replicable technology

License & Contributions

See LICENSE.md (including licensing intent - INTENT.md) and CONTRIBUTING.md

How to Setup Local Env for Development

The following should be done in a MacOS or Linux environment (including WSL on Windows)

Install Google Chrome and ChromeDriver
- https://chromedriver.chromium.org/getting-started
- after a successful installation you should be able to run the following from the shell:
```
chromedriver --version
```
Install Miniconda or Anaconda (Miniconda is much smaller)
- https://docs.conda.io/en/latest/miniconda.html
- after a successful installation you should be able to run the following from the shell:
```
conda --version
```
Create a gamechanger crawlers python3.6 environment:
```
conda create -n gc-crawlers python=3.6
```

Clone the repo and change into that dir:

git clone https://github.com/dod-advana/gamechanger-crawlers.git
cd gamechanger-crawlers

Activate the conda environment and install requirements:

conda activate gc-crawlers
pip install --upgrade pip setuptools wheel
pip install -r ./docker/minimal-requirements.txt

That's it.

Quickstart Guide: Running a Crawler

Follow the environment setup guide above if you have not already
Change to the gamechanger crawlers directory and export the repository path to the PYTHONPATH environment variable:
```
cd /path/to/gamechanger-crawlers
export PYTHONPATH="$(pwd)"
```

Create an empty directory for the crawler file outputs:

CRAWLER_DATA_ROOT=/path/to/download/location
mkdir -p "$CRAWLER_DATA_ROOT"

Create an empty previous manifest file:

touch "$CRAWLER_DATA_ROOT/prev-manifest.json"

Run the desired crawler spider from the gamechanger-crawlers directory (in this example we will use the executive_orders_spider.py):

scrapy runspider dataPipelines/gc_scrapy/gc_scrapy/spiders/executive_orders_spider.py \
  -a download_output_dir="$CRAWLER_DATA_ROOT" \
  -a previous_manifest_location="$CRAWLER_DATA_ROOT/prev-manifest.json" \
  -o "$CRAWLER_DATA_ROOT/output.json"

After the crawler finishes running, you should have all files downloaded into the crawler output directory

Name		Name	Last commit message	Last commit date
Latest commit History 769 Commits
.github/workflows		.github/workflows
dataPipelines		dataPipelines
docker		docker
img		img
paasJobs		paasJobs
tests		tests
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
CONTRIBUTORS.md		CONTRIBUTORS.md
Dockerfile		Dockerfile
INTENT.md		INTENT.md
LICENSE.md		LICENSE.md
README.md		README.md
github_action.check_spiders_scheduled.py		github_action.check_spiders_scheduled.py
pdfCount.py		pdfCount.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
runCrawler.sh		runCrawler.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Vision

Mission

License & Contributions

How to Setup Local Env for Development

Quickstart Guide: Running a Crawler

About

Releases

Packages

Contributors 15

Languages

License

dod-advana/gamechanger-crawlers

Folders and files

Latest commit

History

Repository files navigation

Introduction

Vision

Mission

License & Contributions

How to Setup Local Env for Development

Quickstart Guide: Running a Crawler

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 15

Languages

Packages