Skip to content

Commit

Permalink
Added information on structuring your project
Browse files Browse the repository at this point in the history
  • Loading branch information
vcalderon2009 committed Dec 4, 2017
1 parent f606b36 commit 0adc875
Showing 1 changed file with 93 additions and 0 deletions.
93 changes: 93 additions & 0 deletions docs/source/python_intro.rst
Original file line number Diff line number Diff line change
Expand Up @@ -124,5 +124,98 @@ For more information, see `<https://conda.io/docs/user-guide/tasks/manage-enviro
use it, see `<https://github.com/chdoig/conda-auto-env>`_.


----------------------------
Structuring your code
----------------------------

Now that you have a working version of *python* on your work computer,
you can start doing research.

One of the key elements of a project is **reproducibility**. Having this
in mind when you're structuring your project will allow others to
look at your code, understand it well enough to be able to **recreate**
your results.

I would suggest starting with the
**`Cookiecutter Data Science <https://drivendata.github.io/cookiecutter-data-science/>`_**
(`https://drivendata.github.io/cookiecutter-data-science/`_)
project structure, a

*A logical, reasonably standardized, but flexible project structure for
doing and sharing data science work*

This folder structure allows everyone looking at your code to
understand it right away.

The structure of the project looks like:

.. code::
├── LICENSE
├── Makefile <- Makefile with commands like `make data` or `make train`
├── README.md <- The top-level README for developers using this project.
├── data
│ ├── external <- Data from third party sources.
│ ├── interim <- Intermediate data that has been transformed.
│ ├── processed <- The final, canonical data sets for modeling.
│ └── raw <- The original, immutable data dump.
├── docs <- A default Sphinx project; see sphinx-doc.org for details
├── models <- Trained and serialized models, model predictions, or model summaries
├── notebooks <- Jupyter notebooks. Naming convention is a number (for ordering),
│ the creator's initials, and a short `-` delimited description, e.g.
│ `1.0-jqp-initial-data-exploration`.
├── references <- Data dictionaries, manuals, and all other explanatory materials.
├── reports <- Generated analysis as HTML, PDF, LaTeX, etc.
│ └── figures <- Generated graphics and figures to be used in reporting
├── requirements.txt <- The requirements file for reproducing the analysis environment, e.g.
│ generated with `pip freeze > requirements.txt`
├── src <- Source code for use in this project.
│ ├── __init__.py <- Makes src a Python module
│ │
│ ├── data <- Scripts to download or generate data
│ │ └── make_dataset.py
│ │
│ ├── features <- Scripts to turn raw data into features for modeling
│ │ └── build_features.py
│ │
│ ├── models <- Scripts to train models and then use trained models to make
│ │ │ predictions
│ │ ├── predict_model.py
│ │ └── train_model.py
│ │
│ └── visualization <- Scripts to create exploratory and results oriented visualizations
│ └── visualize.py
└── tox.ini <- tox file with settings for running tox; see tox.testrun.org
It includes *Makefiles*, documentation, dependencies files, etc., to
make it easy to structure your code.

To start a **new project**:

.. code::
cookiecutter https://github.com/drivendata/cookiecutter-data-science
.. note::

This folder structure is **really** easy to use, and I really
advice using it, since it allows for better structure and
reproducibility.

For *my version* of the cookiecutter Data Science *template*, you can
clone `<https://github.com/vcalderon2009/cookiecutter-data-science/>`_
and use that folder structure instead.





0 comments on commit 0adc875

Please sign in to comment.