# Lesson 5: Project Templates (Cookiecutter)

**Module 4b: Advanced Tooling**  
**Estimated Time**: 1 hour  
**Difficulty**: Beginner

---

## ðŸŽ¯ Learning Objectives

By the end of this lesson, you will:

âœ… Understand the value of Standard Folder Structures  
âœ… Learn **Cookiecutter** to generate projects  
âœ… Explore the **Cookiecutter Data Science** standard  
âœ… Answer interview questions on project organization  

---

## ðŸ“š Table of Contents

1. [The Problem: Spaghetti Folders](#1-problem)
2. [The Solution: Cookiecutter](#2-cookiecutter)
3. [Standard Structure Overview](#3-structure)
4. [Interview Preparation](#4-interview-questions)

---

## 1. The Problem: Spaghetti Folders

Does this look familiar?
```
project/
  analysis_v1.ipynb
  analysis_final_v2.ipynb
  data.csv
  clean_data.csv
  model.pkl
  utils.py
```

This is unmaintainable. Where is the raw data? Where is the source code? Where are the docs?

## 2. The Solution: Cookiecutter

**Cookiecutter** is a CLI tool that creates projects from **templates**.

Most popular: **Cookiecutter Data Science (CCDS)**.

Command:
`cookiecutter https://github.com/drivendata/cookiecutter-data-science`

It asks you questions ("Project Name?", "Python Version?") and generates a perfect folder structure.

## 3. Standard Structure Overview

```
â”œâ”€â”€ LICENSE
â”œâ”€â”€ Makefile           <- Build commands
â”œâ”€â”€ README.md          <- The top-level README
â”œâ”€â”€ data
â”‚   â”œâ”€â”€ external       <- Third party sources
â”‚   â”œâ”€â”€ interim        <- Intermediate data
â”‚   â”œâ”€â”€ processed      <- The final canonical data sets
â”‚   â””â”€â”€ raw            <- The original, immutable data dump
â”‚
â”œâ”€â”€ docs               <- Sphinx project
â”‚
â”œâ”€â”€ models             <- Trained models
â”‚
â”œâ”€â”€ notebooks          <- Jupyter notebooks (Naming convention: 1.0-jqp-initial-exploration)
â”‚
â”œâ”€â”€ pyproject.toml     <- Project configuration
â”‚
â”œâ”€â”€ src                <- Source code (The Python Package)
â”‚   â”œâ”€â”€ __init__.py
â”‚   â”œâ”€â”€ data           <- Scripts to download or generate data
â”‚   â”œâ”€â”€ features       <- Scripts to turn raw data into features
â”‚   â”œâ”€â”€ models         <- Scripts to train models
â”‚   â””â”€â”€ visualization  <- Scripts to create figures
```

## 4. Interview Preparation

### Common Questions

#### Q1: "Why separate `src` from `notebooks`?"
**Answer**: "Notebooks are for experimentation and EDA. `src` is for production code. Once experimental code in a notebook works, I refactor it into a function in `src` and import it back into the notebook. This ensures the logic is testable and reusable."

#### Q2: "What is the 'Immutable Raw Data' rule?"
**Answer**: "We NEVER manually edit files in `data/raw`. This ensures we can always reproduce the pipeline from scratch. If I need to clean data, I write a script that reads from `raw` and writes to `interim/processed`."