Added information on structuring your project

VandyAstroML · Dec 4, 2017 · 0adc875 · 0adc875
1 parent f606b36
commit 0adc875
Showing 1 changed file with 93 additions and 0 deletions.
diff --git a/docs/source/python_intro.rst b/docs/source/python_intro.rst
@@ -124,5 +124,98 @@ For more information, see `<https://conda.io/docs/user-guide/tasks/manage-enviro
     use it, see `<https://github.com/chdoig/conda-auto-env>`_.
 
 
+----------------------------
+Structuring your code
+----------------------------
+
+Now that you have a working version of *python* on your work computer, 
+you can start doing research.
+
+One of the key elements of a project is **reproducibility**. Having this 
+in mind when you're structuring your project will allow others to 
+look at your code, understand it well enough to be able to **recreate** 
+your results.
+
+I would suggest starting with the 
+**`Cookiecutter Data Science <https://drivendata.github.io/cookiecutter-data-science/>`_** 
+(`https://drivendata.github.io/cookiecutter-data-science/`_) 
+project structure, a 
+
+    *A logical, reasonably standardized, but flexible project structure for 
+    doing and sharing data science work*
+
+This folder structure allows everyone looking at your code to 
+understand it right away.
+
+The structure of the project looks like:
+
+.. code:: 
+
+    ├── LICENSE
+    ├── Makefile           <- Makefile with commands like `make data` or `make train`
+    ├── README.md          <- The top-level README for developers using this project.
+    ├── data
+    │   ├── external       <- Data from third party sources.
+    │   ├── interim        <- Intermediate data that has been transformed.
+    │   ├── processed      <- The final, canonical data sets for modeling.
+    │   └── raw            <- The original, immutable data dump.
+    │
+    ├── docs               <- A default Sphinx project; see sphinx-doc.org for details
+    │
+    ├── models             <- Trained and serialized models, model predictions, or model summaries
+    │
+    ├── notebooks          <- Jupyter notebooks. Naming convention is a number (for ordering),
+    │                         the creator's initials, and a short `-` delimited description, e.g.
+    │                         `1.0-jqp-initial-data-exploration`.
+    │
+    ├── references         <- Data dictionaries, manuals, and all other explanatory materials.
+    │
+    ├── reports            <- Generated analysis as HTML, PDF, LaTeX, etc.
+    │   └── figures        <- Generated graphics and figures to be used in reporting
+    │
+    ├── requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
+    │                         generated with `pip freeze > requirements.txt`
+    │
+    ├── src                <- Source code for use in this project.
+    │   ├── __init__.py    <- Makes src a Python module
+    │   │
+    │   ├── data           <- Scripts to download or generate data
+    │   │   └── make_dataset.py
+    │   │
+    │   ├── features       <- Scripts to turn raw data into features for modeling
+    │   │   └── build_features.py
+    │   │
+    │   ├── models         <- Scripts to train models and then use trained models to make
+    │   │   │                 predictions
+    │   │   ├── predict_model.py
+    │   │   └── train_model.py
+    │   │
+    │   └── visualization  <- Scripts to create exploratory and results oriented visualizations
+    │       └── visualize.py
+    │
+    └── tox.ini            <- tox file with settings for running tox; see tox.testrun.org
+
+
+It includes *Makefiles*, documentation, dependencies files, etc., to 
+make it easy to structure your code. 
+
+To start a **new project**:
+
+.. code::
+
+    cookiecutter https://github.com/drivendata/cookiecutter-data-science
+
+.. note::
+
+    This folder structure is **really** easy to use, and I really 
+    advice using it, since it allows for better structure and 
+    reproducibility.
+
+    For *my version* of the cookiecutter Data Science *template*, you can 
+    clone `<https://github.com/vcalderon2009/cookiecutter-data-science/>`_ 
+    and use that folder structure instead.
+
+
+