The /data
dir should contain all the data that will be used in the project.
/data/raw/
contains all the raw data used in the project. Raw Data must never be manually changed, any change to the raw data must be scripted and placed in the /src/data/
dir. The output data should be stored in the /data/processed/
dir.
/models/
dir contains all the models created by any scripts in the /src/model/
dir.
/envs/
dir contains all the virtual env.yml
files to be used. A preconfigured one is already created and can modified.
Simply click the Use Template button on the top right of the repo and create your new repository.
Conda
/Miniconda
orMamba
make
make env
conda activate ./.conda
envs/environment.yml
Dockerfile
.github/workflows/*
This template comes pre-configured with git pre-commit hooks. To automatically run the pre-commits, activate the conda env and run pre-commit install
.
More info about the pre-commits used can be found below.
Checks whether the pre-commits pass or not.
- Configures AWS Credentials (Need to have
AWS_SECRET_ACCESS_KEY
andAWS_ACCESS_KEY_ID
added to repo secrets) - Configures
DVC
in case some of the tests need to pull data from remote storage. - Runs all the tests, if the tests pass build the docker image and push to registry.
Prevent giant files from being committed.
- Specify what is "too large" with
args: ['--maxkb=123']
(default=500kB). - Limits checked files to those indicated as staged for addition by git.
Attempts to load all json files to verify syntax.
Attempts to load all xml files to verify syntax.
Attempts to load all yaml files to verify syntax.
Makes sure files end in a newline and only a newline.
Checks that all your JSON files are pretty. "Pretty" here means that keys are sorted and indented.
Sorts entries in requirements.txt and removes incorrect entry for pkg-resources==0.0.0
Sorts simple YAML files which consist only of top-level keys, preserving comments and blocks.
Trims trailing whitespace.
Flake8
runs all the tools by launching the single flake8
command. It displays the warnings in a per-file, merged output.
Run black
code formatter on files.
Run black
on python code blocks in documentation files.