Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] New version with cleaner options #162

Merged
merged 29 commits into from Mar 20, 2021
Merged

[WIP] New version with cleaner options #162

merged 29 commits into from Mar 20, 2021

Conversation

pjbull
Copy link
Collaborator

@pjbull pjbull commented Mar 22, 2019

We've seen a lot of potential features for this where we need to handle forking paths gracefully. By default cookiecutter can't do this (see cookiecutter/cookiecutter#848). It's been years, so we can't reasonably expect this to change upstream...

This implements a monkey-patching workaround to enable this behavior. It introduces a couple of major changes, so here are my recommendations.

Here are the big differences for a user:

  • we now need to run ccds <path to repo> instead of cookiecutter <path to repo>
  • the options and their defaults are changing
  • as a consequence of supporting more environments/dependency managers at setup time, we stop supporting them at execution time. this means that teams will have to pick one of each and stick to it across developers. I think this is pretty widespread already. (i.e. make create_environment will only support one of the options rather than multiple like it does now)

Implementation details are:

  • Add setup.py to make this package installable and give it a CLI
  • add monkey_patch.py to patch the cookiecutter functions that we need to handle our use case
  • support a list of dictionaries in cookiecutter.json that let's a use pick an option and then sub-options
  • use post_gen_project.py to create the environment file based on a standard list of libraries. (Add package list that populates requirements.txt to the workflow #5)

There is also work for a number of longstanding items in this branch as well:

Done:

  • tag current master as v1 so anyone relying on the current flow/structure can continue to use it easily
  • implement the rails for the high priority items
  • make comprehensive tests run on CI/CD to de-risk
  • support multiple dependency formats (environment.yml, requirements.txt, pipenv)
  • default pydata dependencies options
  • add support for azure/google cloud
  • add ccds command and make cookiecutter-data-science a proper package

Remaining items

Cookiecutter default structure

  • user supplied config files
  • revise generated python package boilerplate (make optional)
  • mkdocs in place of sphinx
  • add lint command to Makefile

Cookiecutter options

Infrastructure

  • tests passing on windows
  • release command and make PyPI release

Docs

  • add documentation for new installation (pip install cookiecutter-data-science) and new initiation (ccds <path to repo>)
  • add table of options with links to the project documentation in our docs
  • update screencast and add screenshots ( Video of example in documentation is displaying error. #197 ) of the new flow
  • add options for

Copy link
Collaborator

@isms isms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pjbull. Approving general direction of edits on this WIP PR, seems like this hits many of the customization requests that we have been getting and is a reasonable workaround for the cookiecutter limitations.

docs/docs/index.md Outdated Show resolved Hide resolved
@ned2
Copy link

ned2 commented May 13, 2019

I would suggest changing cookiecutter.module_name to cookiecutter.package_name. While it's technically true that all Python packages are modules, when arranged into a collection of other modules and having a __init__.py, they then also become a package, and this is more salient from the end user perspective. For example, the status as package informs the user that it can be pip installed.

@jamesmyatt
Copy link

I would suggest changing cookiecutter.module_name to cookiecutter.package_name. While it's technically true that all Python packages are modules, when arranged into a collection of other modules and having a __init__.py, they then also become a package, and this is more salient from the end user perspective. For example, the status as package informs the user that it can be pip installed.

module_name is correct. For example, "sklearn" is the module name for the package named "scikit-learn".

@ned2
Copy link

ned2 commented May 13, 2019

module_name is correct. For example, "sklearn" is the module name for the package named "scikit-learn".

Ah yes. Good point. Just found it a little counter-intuitive at first.

@tim-werner
Copy link

Are there any updates regarding this pull request? The master branch is quite outdated and I think this pull request adds a lot of value and addresses many open issues.

Copy link

@hoangphan1509 hoangphan1509 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should check for this file also https://github.com/drivendata/cookiecutter-data-science/blob/new-cli/%7B%7B%20cookiecutter.repo_name%20%7D%7D/docs/commands.rst.

`
{% if 's3' in cookiecutter.dataset_storage %}

Syncing data to S3
^^^^^^^^^^^^^^^^^^

  • make sync_data_to_s3 will use aws s3 sync to recursively sync files in data/ up to s3://{{ cookiecutter.s3_bucket }}/data/.
  • make sync_data_from_s3 will use aws s3 sync to recursively sync files from s3://{{ cookiecutter.s3_bucket }}/data/ to data/.
    {% endif %}
    `

@hackalog
Copy link

Or perhaps you can expect this to change upstream: New maintainier. Cookiecutter 2.0 coming. cookiecutter/cookiecutter#1256

@pjbull
Copy link
Collaborator Author

pjbull commented Feb 13, 2020

I've got a decent head of steam on a comprehensive set of tests we needed anyway for this change (and for the project generally). I'll keep going to get all the environment options tested on all platforms and then we can assess what to look to cookiecutter for and what to do ourselves.

The biggest decision is whether this becomes a tool where you have to do ccds .... or cookiecutter .... I guess, once this is all working, we can always integrate upstream cookiecutter functionality and remove the monkey_patch. However, I wouldn't want to go back to cookiecutter ... being the entrypoint since that is annoying churn in the command that you run and the packages you need installed to create environments.

drivendata and others added 10 commits July 14, 2020 16:28
* Change archived asciinema example (#163)

* Change archived asciinema example

* Update README.md

Fix Asciinema powerline error

* Update docs to show updated asciinema example

* Added source and destination to Make data target (#169)

* Fix broken Airflow link (#182)

* Fixed: Typo in Makefile (#184)

Fixed typo in Makefile, section "Set up python interpreter environment": intalled --> installed

* Set up CI with Azure Pipelines

[skip ci]

* Update azure-pipelines.yml for Azure Pipelines

* Update azure-pipelines.yml for Azure Pipelines

* Update azure-pipelines.yml for Azure Pipelines

* str paths for windows support

* handle multiple data providers (#199)

* Add missing env directory bin/activate path

* Remove version from PYTHON_INTERPRETER command

* Search for virtualenvwrapper.sh path if executable not found

* Try chardet for character encoding detection

* Specify python and virtualenv binaries for virtualenvwrapper

* Add shebang to virtualenvwrapper.sh

* Diagnostic

* Try virtualenvwrapper-win

* Set encoding if detected None

* Fixes to Mac and Windows tests on Azure pipelines (#217)

* Temporarily comment out py36

* Update azure-pipelines.yml

* Fix tests on Windows and Mac (#1)

* Temporarily remove py37

* Update virtualenv_harness.sh

* put py37 back in

* Set encoding to utf-8

* Comment out rmvirtualenv

* Update test_creation.py

* Update virtualenv_harness.sh

* Add --show-capture

* Update azure-pipelines.yml

* Update azure-pipelines.yml

* Update test_creation.py

* Update virtualenv_harness.sh

* Update virtualenv_harness.sh

* Update virtualenv_harness.sh

* Update virtualenv_harness.sh

* Update Makefile

* Update virtualenv_harness.sh

* Update cookiecutter.json

* Update cookiecutter.json

* Update virtualenv_harness.sh

* Update Makefile

* Update Makefile

* Update Makefile

* Update virtualenv_harness.sh

* Update virtualenv_harness.sh

* Update virtualenv_harness.sh

* Update virtualenv_harness.sh

* Update virtualenv_harness.sh

* Update virtualenv_harness.sh

* Update virtualenv_harness.sh

* Update virtualenv_harness.sh

* Update Makefile

* Update Makefile

* Update Makefile

* Update Makefile

* Update virtualenv_harness.sh

* Update virtualenv_harness.sh

* Update virtualenv_harness.sh

* Update Makefile

* Update Makefile

* Update virtualenv_harness.sh

* Update Makefile

* Update virtualenv_harness.sh

* Update virtualenv_harness.sh

* Update test_creation.py

* Update azure-pipelines.yml

* Update virtualenv_harness.sh

* Update virtualenv_harness.sh

* Update virtualenv_harness.sh

* Update virtualenv_harness.sh

* Update cookiecutter.json

* Update conda_harness.sh

* Update conda_harness.sh

* Update conda_harness.sh

Co-authored-by: Eric Jalbert <ericmjalbert@users.noreply.github.com>
Co-authored-by: Jonathan Raviotta <jraviotta@users.noreply.github.com>
Co-authored-by: Wes Roach <wesr000@gmail.com>
Co-authored-by: Christopher Geis <16896724+geisch@users.noreply.github.com>
Co-authored-by: Peter Bull <pjbull@gmail.com>
Co-authored-by: Ian Preston <17241371+ianepreston@users.noreply.github.com>
Co-authored-by: Jay Qi <jayqi@users.noreply.github.com>
Co-authored-by: inchiosa <4316698+inchiosa@users.noreply.github.com>
@pjbull pjbull changed the base branch from master to v2 March 20, 2021 05:35
@pjbull
Copy link
Collaborator Author

pjbull commented Mar 20, 2021

Moving this work on to an official v2 branch!

@pjbull pjbull merged commit 1fe968d into v2 Mar 20, 2021
@pjbull pjbull deleted the new-cli branch March 20, 2021 05:38
milescsmith pushed a commit to milescsmith/cookiecutter-data-science that referenced this pull request Sep 20, 2023
* WIP - New version with cleaner options

* Fix find-replace error (drivendata#177)

* Remove unnecessary .gitkeep

* Remove unused tox.ini

* Split reqs into dev/non-dev

* Add basic packages support

* Add tests for testing environment creation and requirements

* Set up CI with Azure Pipelines (drivendata#194)

* Change archived asciinema example (drivendata#163)

* Change archived asciinema example

* Update README.md

Fix Asciinema powerline error

* Update docs to show updated asciinema example

* Added source and destination to Make data target (drivendata#169)

* Fix broken Airflow link (drivendata#182)

* Fixed: Typo in Makefile (drivendata#184)

Fixed typo in Makefile, section "Set up python interpreter environment": intalled --> installed

* Set up CI with Azure Pipelines

[skip ci]

* Update azure-pipelines.yml for Azure Pipelines

* Update azure-pipelines.yml for Azure Pipelines

* Update azure-pipelines.yml for Azure Pipelines

* str paths for windows support

* handle multiple data providers (drivendata#199)

* Add missing env directory bin/activate path

* Remove version from PYTHON_INTERPRETER command

* Search for virtualenvwrapper.sh path if executable not found

* Try chardet for character encoding detection

* Specify python and virtualenv binaries for virtualenvwrapper

* Add shebang to virtualenvwrapper.sh

* Diagnostic

* Try virtualenvwrapper-win

* Set encoding if detected None

* Fixes to Mac and Windows tests on Azure pipelines (drivendata#217)

* Temporarily comment out py36

* Update azure-pipelines.yml

* Fix tests on Windows and Mac (#1)

* Temporarily remove py37

* Update virtualenv_harness.sh

* put py37 back in

* Set encoding to utf-8

* Comment out rmvirtualenv

* Update test_creation.py

* Update virtualenv_harness.sh

* Add --show-capture

* Update azure-pipelines.yml

* Update azure-pipelines.yml

* Update test_creation.py

* Update virtualenv_harness.sh

* Update virtualenv_harness.sh

* Update virtualenv_harness.sh

* Update virtualenv_harness.sh

* Update Makefile

* Update virtualenv_harness.sh

* Update cookiecutter.json

* Update cookiecutter.json

* Update virtualenv_harness.sh

* Update Makefile

* Update Makefile

* Update Makefile

* Update virtualenv_harness.sh

* Update virtualenv_harness.sh

* Update virtualenv_harness.sh

* Update virtualenv_harness.sh

* Update virtualenv_harness.sh

* Update virtualenv_harness.sh

* Update virtualenv_harness.sh

* Update virtualenv_harness.sh

* Update Makefile

* Update Makefile

* Update Makefile

* Update Makefile

* Update virtualenv_harness.sh

* Update virtualenv_harness.sh

* Update virtualenv_harness.sh

* Update Makefile

* Update Makefile

* Update virtualenv_harness.sh

* Update Makefile

* Update virtualenv_harness.sh

* Update virtualenv_harness.sh

* Update test_creation.py

* Update azure-pipelines.yml

* Update virtualenv_harness.sh

* Update virtualenv_harness.sh

* Update virtualenv_harness.sh

* Update virtualenv_harness.sh

* Update cookiecutter.json

* Update conda_harness.sh

* Update conda_harness.sh

* Update conda_harness.sh

Co-authored-by: Eric Jalbert <ericmjalbert@users.noreply.github.com>
Co-authored-by: Jonathan Raviotta <jraviotta@users.noreply.github.com>
Co-authored-by: Wes Roach <wesr000@gmail.com>
Co-authored-by: Christopher Geis <16896724+geisch@users.noreply.github.com>
Co-authored-by: Peter Bull <pjbull@gmail.com>
Co-authored-by: Ian Preston <17241371+ianepreston@users.noreply.github.com>
Co-authored-by: Jay Qi <jayqi@users.noreply.github.com>
Co-authored-by: inchiosa <4316698+inchiosa@users.noreply.github.com>

* More graceful deprecation

* Make tests pass locally

* test version match installed version

* Remove unused imports

* Unremove used import

* Move to GH Actions

* Fix typo

* Test non-windows

* Add netlify configs

* Update suggestion to keep using deprecated cookiecutter template (drivendata#231)

* Add mkdocs requirements file to docs directory

* Try setting python version in runtime txt for netlify

* Trigger build

* Python 3.8 netlify

* Python 3.6 netlify

* Do not specify python runtime for netlify

* Use 3.7

This reverts commit 898d7d3.

Co-authored-by: James Myatt <james@jamesmyatt.co.uk>
Co-authored-by: drivendata <info@drivendata.org>
Co-authored-by: Eric Jalbert <ericmjalbert@users.noreply.github.com>
Co-authored-by: Jonathan Raviotta <jraviotta@users.noreply.github.com>
Co-authored-by: Wes Roach <wesr000@gmail.com>
Co-authored-by: Christopher Geis <16896724+geisch@users.noreply.github.com>
Co-authored-by: Ian Preston <17241371+ianepreston@users.noreply.github.com>
Co-authored-by: Jay Qi <jayqi@users.noreply.github.com>
Co-authored-by: inchiosa <4316698+inchiosa@users.noreply.github.com>
Co-authored-by: Robert Gibboni <robert@drivendata.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants