A cookiecutter template for Python data science projects

Initialize for a modular, Python-based data science project, with opinionated linters configuration.

[20220126] If you prefer flake8+isort instead of ruff, use branch flake8-isort.

Pre-requisite

You need to have the cli cookiecutter available in your Python environment. Please see its installation instructions here.

Usage

To generate a directory structure for a new data science project, you can run the following commands in your Python environment.

cookiecutter https://github.com/aws-samples/python-data-science-template

Alternatively, you can also clone this repository to use a local template:

# Clone to a local repository in the current directory.
git clone https://github.com/aws-samples/python-data-science-template.git

# The above command creates python-data-science-template/ in the current dir.

# Use the local repo to generate project structure
cookiecutter python-data-science-template

Project Structure

By using this template, your data science project is auto-generated as follows:

.
|-- bin/
|-- notebooks                    # A directory to place all notebooks files.
|   |-- *.ipynb
|   `-- my_nb_path.py            # Imported by *.ipynb to treat src/ as PYTHONPATH
|-- requirements/
|-- src
|   |-- my_custom_module         # Your custom module
|   |-- my_nb_color.py           # Imported by *.ipynb to colorize their outputs
|   `-- source_dir               # Additional codes such as SageMaker source dir
|-- tests/                       # Unit tests
|-- MANIFEST.in                  # Required by setup.py (if module name specified)
|-- setup.py                     # To pip install your Python module (if module name specified)

# These sample configuration files are auto-generated too:
|-- .editorconfig                # Sample editor config (for IDE / editor that supports this)
|-- .gitattributes               # Sample .gitattributes
|-- .gitleaks.toml               # Sample Gitleaks config (if pre_commit is advanced)
|-- .gitignore                   # Sample .gitignore
|-- .pre-commit-config.yaml      # Sample precommit hooks
|-- LICENSE                      # Boilperplate (auto-generated)
|-- README.md                    # Template for you to customize
|-- pyproject.toml               # Sample configurations for Python toolchains
`-- tox.ini                      # Sample configurations for Python toolchains

This structure has been used in a few other places as well, e.g., aws-samples/sagemaker-rl-energy-storage-system and aws-samples/amazon-sagemaker-gluonts-entrypoint. Feel free to look at those repositories and observe the project structure documented in their README.md.

Related Projects

Ready to start your new data science project on AWS? If so, you may want to check on these related samples.

Do you like to work on EC2 instances? Then why don't you check out these simple template to setup basic Vim, Tmux, Zsh for the Deep Learning AMI Amazon Linux 2 for data scientsts.
Do you like to work on SageMaker classic notebook instances? Then why don't you check out the one-liner customization command that quickly applies common tweaks on a fresh (i.e., newly created or rebooted) SageMaker classic notebook instance, to make the notebook instance a little bit more ergonomic for prolonged usage.
Are you loooking for a quickstart to accelerate the delivery of custom ML solutions to production, without having to make too many design choices? Then why don't you check out the ML Max repo which includes templates for four pillars: training pipeline, inference pipeline, development environment and data management/ETL.
Are you tired of repeatedly writing the same boilerplate codes for common, tactical data science tasks? Then why don't you check on the SageMaker meta-entrypoint utilities, and the smallmatter library.

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
.github		.github
hooks		hooks
{{cookiecutter.repo_name}}		{{cookiecutter.repo_name}}
.gitattributes		.gitattributes
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
cookiecutter.json		cookiecutter.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github

.github

hooks

hooks

{{cookiecutter.repo_name}}

{{cookiecutter.repo_name}}

.gitattributes

.gitattributes

.gitignore

.gitignore

CODE_OF_CONDUCT.md

CODE_OF_CONDUCT.md

CONTRIBUTING.md

CONTRIBUTING.md

LICENSE

LICENSE

README.md

README.md

cookiecutter.json

cookiecutter.json

Repository files navigation

A cookiecutter template for Python data science projects

Pre-requisite

Usage

Project Structure

Related Projects

Security

License

About

Releases 6

Packages

Contributors 4

Languages

License

aws-samples/python-data-science-template

Folders and files

Latest commit

History

Repository files navigation

A cookiecutter template for Python data science projects

Pre-requisite

Usage

Project Structure

Related Projects

Security

License

About

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Languages