## Lesson 12 — Modules and Classes

So far, we have learned mostly about how we can use Python focusing on data analysis. This class will instead focus on the object oriented aspects of the language.

There is quite a lot of things to cover, so this will instead focus on getting the basic concept of OOP.

We will also uncover some ways to organize and test the code we have.

Readings:

* Shaw Ex40-52 (most important: Ex40-45)
* [_Modules_, Python Official docs](https://docs.python.org/3/tutorial/modules.html)
* [_Classes_, Python Official docs](https://docs.python.org/3/tutorial/classes.html)
* [_5 Useful Dunder methods in Python_, by Indently (video)](https://www.youtube.com/watch?v=y1ZWQQEe5PM)
* [_Structuring Your Project_, by The Hitchhiker's Guide to Python](https://docs.python-guide.org/writing/structure/)
* [_Standardized project structure for science work: Opinions_, by Cookiecutter Data Science](https://cookiecutter-data-science.drivendata.org/opinions/)

Topics covered:

* [Modules](#Modules)
* [Classes](#Classes)
* [Object-oriented nomenclature](#Object-oriented-nomenclature)
* [Inheritance and composition](#Inheritance-and-composition)
* [Unit testing](#Unit-testing)
* [Structuring your project](#Structuring-your-project)

## Modules

A **module** is a file containing Python definitions and statements. The file name is the module name with the suffix `.py` appended. Definitions from a module can be imported into other modules or into the main module (the collection of variables that you have access to in a script executed at the top level and in calculator mode).

Using an example from the [Python Docs](https://docs.python.org/3/tutorial/modules.html), we can use our favorite text editor to create a file called `fibo.py` in the current directory with the following contents:

```python
# Fibonacci numbers module

def fib(n: int):
    """
    Write Fibonacci series up to a max value of n
    """
    a, b = 0, 1
    while b < n:
        print(b, end=' ')
        a, b = b, a + b
    print()


def fib2(n: int):
    """
    Return Fibonacci series up to a max value of n
    """
    result = []
    a, b = 0, 1
    while b < n:
        result.append(b)
        a, b = b, a + b
    return result

```

Now enter the IPython (or Python) interpreter and import this module with the following command:

```python
>>> import fibo
```

This does not enter the names of the functions defined in `fibo` directly in the current symbol table; it only enters the module name `fibo` there. Using the module name you can access the functions:

```python
>>> fibo.fib(1000)
1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987
>>> fibo.fib2(100)
[1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]
>>> fibo.__name__
'fibo'
```

If you intend to use a function often, you import it directly:

```python
>>> from fibo import fib
>>> fib(500)
1 1 2 3 5 8 13 21 34 55 89 144 233 377
```

#### Variants of the module import statement

We have seen all of these variants of the module import statement before. What does each of these commands do?

```python
>>> from fibo import fib, fib2

>>> from fibo import *  # this is often not recommended!

>>> import fibo as fib

>>> from fibo import fib as fibonacci
```

#### Importing from inside Jupyter notebook

It works the same as from the Terminal.

In [1]:
import fibo

In [2]:
fibo.fib(30)

1 1 2 3 5 8 13 21 


In [3]:
fibo.fib2(50)

[1, 1, 2, 3, 5, 8, 13, 21, 34]

#### Executing modules as scripts

When you run a Python module with

```
$ python fibo.py <arguments>
```

the code in the module will be executed, just as if you imported it, but with the `__name__` set to `"__main__"`. That means that by adding this code at the end of your module:

```python
if __name__ == "__main__":
    import sys
    fib(int(sys.argv[1]))
```

you can make the file usable as a script as well as an importable module, because the code that parses the command line only runs if the module is executed as the "main" file:

```
$ python fibo.py 50
1 1 2 3 5 8 13 21 34
```

If the module is imported, the code is not run:

```python
>>> import fibo
>>>
```

This is often used either to provide a convenient user interface to a module, or for testing purposes (running the module as a script executes a test suite).

#### Packages

Packages are a way of structuring Python’s module namespace by using "dotted module names". For example, the module name `A.B` designates a submodule named `B` in a package named `A`. Just like the use of modules saves the authors of different modules from having to worry about each other’s global variable names, the use of dotted module names saves the authors of multi-module packages like NumPy or the Python Imaging Library from having to worry about each other's module names.

For example, `pyplot` is a submodule of the `matplotlib` module, which we use all the time:

```python
import matplotlib.pyplot as plt
```

## Classes

Classes provide a means of bundling data and functionality together. Creating a new **class** creates a new *type* of object, allowing new *instances* (or *tokens*) of that type to be made. Each class instance can have **attributes** attached to it for maintaining its state. Class instances can also have **methods** (defined by its class) for modifying its state.

Put in philosophical terms, *object class* is to *object instance* as *type* is to *token*. For example, many of you have a MacBook (a *class* or *type* of computer), but the physical object you possess is a unique computer (an *instance* or *token* of a MacBook).

Classes are used when you want to create many objects that all have the same properties, and each one won't interfere with the others. A module is typically imported only once for the entire program, but a module can contain classes.

Take this simple example (adapted from Shaw's *Learn Python The Hard Way*):

In [4]:
class MyClass:
    """Class documentation comes here"""

    def __init__(self, name: str):
        """Documentation from the initialization class comes here."""
        self.name = name
        self.walrus = "I am the walrus"

    def __str__(self) -> str:
        """Dunder methods implement how your class interacts as an object."""
        return f"{self.walrus}, my name is {self.name}"

    def choo(self):
        """Custom function defined for the class"""
        print("Goo goo g'joob.")

Then import the module and create (instantiate) a new object of type `MyClass`:

```python
>>> import myclass
>>> wally = myclass.MyClass("Wally Walrus")
>>> wally.walrus
'I am the walrus'
>>> wally.choo()
Goo goo g'joob.
```

On the notebook we don't have to import the class from another file, we can directly use it:

In [5]:
wally = MyClass("Wally Walrus")

In [6]:
wally.walrus

'I am the walrus'

In [7]:
wally.choo()

Goo goo g'joob.


In [8]:
print(wally)

I am the walrus, my name is Wally Walrus


You *instantiate* (create) a class by calling the class like it's a function. When you instantiate an object from a class (here: `MyClass`), Python does the following things:

1. Python looks for `MyClass()` and sees that it is a class you’ve defined.
2. Python crafts an empty object with all the functions you’ve specified in the class using `def`.
3. Python then looks to see if you made a "magic" `__init__` function, and if you have it calls that function to initialize your newly created empty object.
4. In the `MyClass` method `__init__` you then get this extra variable `self`, which is that empty object Python made for you, and you can set variables on it just like you would with a module, dictionary, or other object.
5. The `self` variable will hold all the information you stored on the class **instance**.
6. In this case, you set `self.walrus` to a song lyric and then you've initialized this object.
7. Now Python can take this newly minted object and assign it to the `wally` variable for you to work with.

Here is another example:

In [9]:
class Song:

    def __init__(self, lyrics):
        self.lyrics = lyrics

    def print_it(self):
        for line in self.lyrics:
            print(line)

Then import the module and create (instantiate) some `Song` objects:

```python
>>> from song import *
>>> lyrics1 = ["Who lives in a pineapple under the sea?",  "SpongeBob SquarePants"]
>>> lyrics2 = ["Tell me why", "Ain't nothin' but a heartache"]
>>> song1 = Song(lyrics1)
>>> song2 = Song(lyrics2)
>>> song1.print_it()
Who lives in a pineapple under the sea?
SpongeBob SquarePants
>>> song2.print_it()
Tell me why
Ain't nothin' but a heartache
```

In [10]:
lyrics1 = ["Who lives in a pineapple under the sea?", "SpongeBob SquarePants"]
song1 = Song(lyrics1)
song1.print_it()

Who lives in a pineapple under the sea?
SpongeBob SquarePants


### @classmethod

If your method is meant to create instances of your class, it is called a [class method](https://docs.python.org/3/library/functions.html#classmethod). This method can be called without instantiating a class of your object.

A class method can be added by annotating your method with the `@classmethod` decorator:

In [11]:
import datetime


class Tyre:
    """
    The first two numbers designate the week and the last two numbers, the year of production.

    If the type code indicates 4714, this means that the tyre was manufactured
    in the 47th week of the year 2014.
    """

    def __init__(self, production_date: datetime.date):
        self.production_date: datetime.date = production_date

    def __repr__(self) -> str:
        week = self.production_date.isocalendar()[1]
        year = self.production_date.year % 100  # Last two digits
        return f"Tyre manufactured in week {self.production_date.strftime('%Y-%m')} (DOT: {week}{year})"

    @classmethod
    def from_properties(cls, week: int, year: int):
        # Convert 2-digit year to 4-digit (assumes 20xx)
        full_year = 2000 + year if year < 100 else year
        date = datetime.date.fromisocalendar(full_year, week, 1)
        return cls(date)

    @classmethod
    def from_number(cls, code: str):
        if len(code) != 4:
            raise ValueError("Code must be 4 digits (WWYY)")
        week = int(code[:2])
        year = int(code[2:])
        return cls.from_properties(week, year)

    @classmethod
    def from_dot_code(cls, code: str):
        date_code = code[-4:]
        return cls.from_number(date_code)

In [12]:
tyre1 = Tyre.from_number("4714")
tyre1

Tyre manufactured in week 2014-11 (DOT: 4714)

In [13]:
tyre2 = Tyre.from_properties(47, 14)
tyre2

Tyre manufactured in week 2014-11 (DOT: 4714)

In [14]:
tyre3 = Tyre.from_dot_code("DOT XXXX XXXX 4714")
tyre3

Tyre manufactured in week 2014-11 (DOT: 4714)

In [15]:
tyre4 = Tyre(datetime.date(2024, 12, 1))
tyre4

Tyre manufactured in week 2024-12 (DOT: 4824)

### @property

You can define methods in your class and annotate them with a [`@property` decorator](https://docs.python.org/3/library/functions.html#property) to make them behave like static properties.

This is useful to define properties which have to be computed:

In [16]:
class Rectangle:
    def __init__(self, length: float, height: float):
        self.length = length
        self.height = height

    @property
    def area(self) -> float:
        return self.length * self.height

In [17]:
r = Rectangle(3, 10)
r

<__main__.Rectangle at 0x10ace57f0>

Now, I don't need to call `r.area()`, just use `r.area` instead:

In [18]:
r.area

30

### Special classes called dataclasses

Sometimes there is a need to take/return structured data from a method. You might have used dictionary return types before, but they are not very explicit and are very prone to errors. 

There is an idiomatic way to do it in Python called [dataclasses](https://docs.python.org/3/tutorial/classes.html#odds-and-ends). Here is one example: 

In [19]:
from dataclasses import dataclass, asdict


@dataclass
class Point:
    x: float
    y: float

The advantage of dataclasses if that they already implement useful functionality on top of the normal classes.

For example, both the **constructor** and **stringification** are done by default:

In [20]:
p = Point(3, 4)
p

Point(x=3, y=4)

You can access the fields directly, as well as using it in a function:

In [21]:
p.x

3

In [22]:
p.y

4

In [23]:
import math


def vector_length(point: Point) -> float:
    return math.sqrt(point.x**2 + point.y**2)

In [24]:
vector_length(p)

5.0

You can also easily serialize this value back to a dictionary:

In [25]:
asdict(p)

{'x': 3, 'y': 4}

## Object-oriented nomenclature

From Shaw's *Learn Python The Hard Way* Exercise 41.

### Word Drills

**class** Tell Python to make a new type of thing.

**object** Two meanings: the most basic type of thing, and any instance of some thing.

**instance** What you get when you tell Python to create a new object from a type of class.

**def** How you define a function inside a class.

**self** Inside the functions in a class, self is a variable for the instance/object being accessed.

**inheritance** The concept that one class can inherit traits from another class, much like you and your parents.

**composition** The concept that a class can be composed of other classes as parts, much like how a car has wheels.

**attribute** A property classes have that are from composition and are usually variables.

**method** A property classes have that are from composition and are another name for functions.

**is-a** A phrase to say that something inherits from another, as in a "salmon" is-a "fish".

**has-a** A phrase to say that something is composed of other things or has a trait, as in "a salmon has-a mouth".

### Phrase Drills

**class X(Y)** "Make a class named X that is-a Y."

**class X: def __init__(self, J)** "class X has-a `__init__` that takes self and J parameters."

**class X: def M(self, J)** "class X has-a function named M that takes self and J parameters." 

**foo = X()** "Set foo to an instance of class X."

**foo.M(J)** "From foo get the M function, and call it with parameters self, J."

**foo.K = Q** "From foo get the K attribute and set it to Q."

## Inheritance and composition

### Inheritance

**Inheritance** is used to indicate that one class will get most or all of its features from a parent class. This happens implicitly whenever you write `class Foo(Bar)`, which says "Make a class Foo that inherits from Bar." When you do this, the language makes any action that you do on instances of `Foo` also work as if they were done to an instance of `Bar`. Doing this lets you put common functionality in the `Bar` class, then specialize that functionality in the `Foo` class as needed.

You use inheritance when you want to define objects which have a **is-a** relationship.

For example, observe what happens with the following code, this time executed directly in our IPython notebook. Note that the relationship here makes sense because a cat **is** an animal.

In [26]:
class Animal:
    """Implements an animal."""

    def walk(self):
        print("Walking through the park!")

    def say_hello(self):
        print("Hello!")


class Cat(Animal):
    pass

In [27]:
animal = Animal()
cat = Cat()

In [28]:
animal.walk()
cat.walk()

Walking through the park!
Walking through the park!


The above is called *implicit inheritance*. It is possible to override a function in the parent class by using the same function name in the child class, or to switch between the two versions of the function name. See the following example with a dog:

In [29]:
class Dog(Animal):

    def say_hello(self):  # overwritten from base class
        print("Woof!")

In [30]:
animal.say_hello()

Hello!


In [31]:
dog = Dog()
dog.say_hello()

Woof!


### Composition

**Composition** refers to composing your classes using functions from other classes, rather than relying on implicit inheritance, to arrive at the same result we just saw with inheritance.

Use composition when you want to define objects that have a **has-a** relationship.

For example, we can achieve the same thing as above using functions of other classes:

In [32]:
class Wings:

    def fly(self):
        print("I believe I can fly!")


class Duck(Animal):
    def __init__(self):
        self.wings = Wings()

In [33]:
duck = Duck()
duck.say_hello()
duck.wings.fly()

Hello!
I believe I can fly!


In [34]:
dog.say_hello()
# A dog cannot fly

Woof!


**Inheritance or composition?** Both inheritance and composition are designed to prevent re-use of code, which is unclean and inefficient. Inheritance solves this problem by creating a mechanism for you to have implied features in base classes. Composition solves this by giving you modules and the ability to call functions in other classes.

Use *composition* to package code into modules that are used in many different unrelated places and situations.

Use *inheritance* only when there are clearly related reusable pieces of code that fit under a single common concept or if you have to because of something you’re using.

### Polymorphism

[Polymorphism](https://en.wikipedia.org/wiki/Polymorphism_(computer_science)) is the concept that different data types implement the same interface, and are therefore compatible for certain methods.

For example, a function that receives an `Animal` can receive any sub-class while being able to use the methods from the parent class:

In [35]:
def greet_animal(animal: Animal):
    """This works with any Animal subclass - that's polymorphism!"""
    animal.say_hello()

I can now call the function with multiple instances:

In [36]:
greet_animal(cat)

Hello!


In [37]:
greet_animal(dog)

Woof!


### Some practical examples

#### An astronomy example - defining a base class

We can create a `Planet` class that has attributes `name` and `diameter` and methods `area()` and `volume()`.

In [38]:
import numpy as np

In [39]:
class Planet:

    def __init__(self, name: str, diameter: float | int):
        self.name = name
        self.diameter = diameter

    def __repr__(self) -> str:
        return f"Planet({self.name}, diameter={self.diameter})"

    def area(self):
        return 4 * np.pi * (self.diameter / 2) ** 2

    def volume(self):
        return 4 / 3 * np.pi * (self.diameter / 2) ** 3

In [40]:
# instantiate the class
earth = Planet("Earth", 12742)

In [41]:
earth.name

'Earth'

In [42]:
earth.diameter

12742

In [43]:
earth.area()

510064471.90978825

In [44]:
earth.volume()

1083206916845.7535

In [45]:
# class instantiation also supports label-based assignment
earth = Planet(name="Earth", diameter=12742)

#### An astronomy example - mixing inheritance and composition

We can create a `Moon` class that inherits the attributes and methods of `Planet`, plus an additional attribute for host planet. We do this by creating `Moon` as a subclass of `Planet`, then redefining the `__init__` function with an additional attribute for host planet.

In [46]:
class Moon(Planet):

    def __init__(self, name: str, diameter: float | int, host_planet: Planet):
        Planet.__init__(self, name, diameter)
        self.host_planet = host_planet

    def __repr__(self) -> str:
        return f"Planet({self.name}, diameter={self.diameter}, hosted_by={self.host_planet.name})"

In [47]:
# instantiate the class
moon = Moon("Moon", 3476, earth)
moon

Planet(Moon, diameter=3476, hosted_by=Earth)

In [48]:
moon.name

'Moon'

In [49]:
moon.diameter

3476

In [50]:
moon.host_planet

Planet(Earth, diameter=12742)

In [51]:
moon.area()

37958531.99804035

In [52]:
moon.volume()

21990642870.864708

#### Built-in functions with classes: A book example

Below is a class called `Book` that takes values for the author, title, and the ID number. 

The class definition uses a number of special built-in functions to control not only how instances of the class are initialized (`__init__`) but also how they are officially represented as strings (`__repr__`), how they are informally represented as strings (`__str__`), and how equivalence operations should be evaluated (`__eq__`).

Note that:

- `__repr__`: Official representation for developers/debugging
- `__str__`: Informal representation for end users

In [53]:
class Book:

    def __init__(self, author: str, title: str, book_id: str):
        self.author = author
        self.title = title
        self.book_id = book_id

    def __str__(self):
        return f"Book({self.author}, {self.title}, {self.book_id})"

    def __eq__(self, other: "Book"):
        return self.title == other.title and self.author == other.author and self.book_id == other.book_id

We can instantiate an example book to see how the built-in functions affect object behavior:

In [54]:
# instantiate the class
iliad = Book("Homer", "The Iliad", "9780140275360")

In [55]:
iliad.author, iliad.title, iliad.book_id

('Homer', 'The Iliad', '9780140275360')

In [56]:
# effect of __repr__
iliad

<__main__.Book at 0x10af0ef90>

In [57]:
# effect of __repr__
repr(iliad)

'<__main__.Book object at 0x10af0ef90>'

In [58]:
# effect of __str__
str(iliad)

'Book(Homer, The Iliad, 9780140275360)'

In [59]:
# which affects print commands
print(iliad)

Book(Homer, The Iliad, 9780140275360)


In [60]:
# effect of __eq__
iliad == Book("Homer", "The Iliad", "9780140275360")

True

In [61]:
# only if all three are equivalent
iliad == Book("Homer", "The Iliad", "5")

False

## Unit testing

Unit tests are a way to make sure your code works by testing the case scenarios where it might be used. Each time a test goes over a line of your code, this is called _coverage_. Coverage is often represent in a percentage, and represents how many lines of your code were run by your tests.

Unit tests target specifically small parts of your code, like functions, that are testable.

Software usually is tested to ensure functionality is correct. Since Python is an interpreted language, this means running the code is one of the ways to ensure it runs well. Tests ensure that:

- Basic functionality of your application works.
- New changes are backward compatible
- New changes don't break old functionality

Here, we are going to use [ipytest](https://jupyter-tutorial.readthedocs.io/de/latest/notebook/testing/ipytest.html). Usually, when running Python with scripts, you want to use [pytest](https://docs.pytest.org/en/stable/) instead, leaving all your tests in a separate folder and running them every time you develop new code.

In [62]:
%pip install ipytest


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.2[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [63]:
import ipytest

ipytest.autoconfig()

The concept of a test is writing a function that tests a part of your code, example:

In [64]:
def add(a: int, b: int) -> int:
    return a + b

Here, we would expect that if we add 1 and 2, we get three, so this will be part of our test:

In [65]:
%%ipytest

def test_add():
    assert add(1, 2) == 3

[32m.[0m[32m                                                                                            [100%][0m
[32m[32m[1m1 passed[0m[32m in 0.01s[0m[0m


Let's find a better example, let's remember the assignment from lesson 02:

> The GCD of two numbers is the largest positive integer that perfectly divides both numbers.
For example, GCD of 48 and 60 is 12, while the GCD of 10 and 7 is 1.
> 
> 1. Create a function called "find_gcd" that takes two numbers as input and outputs the GCD of those two numbers.
> 2. Test with different inputs (try pairs like 48,60 or 54,24 or 10,7)

A [test driven development](https://en.wikipedia.org/wiki/Test-driven_development) approach would mean that we start writing tests for the functionality we expect, then start writing our code. Let's write out test first:

```python
def test_gcd_basic_scenarios():
    """Tests cases from the problem description"""
    assert find_gcd(48, 12) == 12
    assert find_gcd(10, 7) == 1

def test_gcd_other_scenarios():
    """Tests additional scenarios from items 1. and 2."""
    assert find_gcd(48, 60) == 12
    assert find_gcd(54, 24) == 6

def test_heavy_calculation():
    """Tests case where we provide a large number"""
    assert find_gcd(43827223, 238429) == 1
```

Now that we have our tests, we can start our function development:

In [66]:
def find_gcd(a: int, b: int) -> int:
    div = 1
    for i in range(2, a):
        if a % i == 0 and b % i == 0:
            div = i
    return div

Let's now run our tests and make sure they run:

In [None]:
%%ipytest

def test_gcd_basic_scenarios():
    """Tests cases from the problem description"""
    assert find_gcd(48, 12) == 12
    assert find_gcd(10, 7) == 1

def test_gcd_other_scenarios():
    """Tests additional scenarios from items 1. and 2."""
    assert find_gcd(48, 60) == 12
    assert find_gcd(54, 24) == 6

def test_heavy_calculation():
    """Tests case where we provide a large number"""
    assert find_gcd(43827223, 238429) == 1

[32m.[0m[32m.[0m

The good thing about a test drive approach is that you can refactor functions to optimize them, and still be able to run the tests again to ensure they still work. For example, if we change the GCD to the [Euclidean algorithm](https://en.wikipedia.org/wiki/Euclidean_algorithm) that is faster:

In [None]:
def find_gcd(a: int, b: int) -> int:
    """Calculates the GCD using the Euclidean algorithm"""
    while b:
        a, b = b, a % b
    return a

We can go up and re-run the tests to make sure everything is still working.

## Structuring your project

There are multiple ways to structure your project, each growing with the needs of the project.

Here is a proposal on how to do it, taken from [drivendataorg/cookiecutter-data-science](https://github.com/drivendataorg/cookiecutter-data-science). See their [official documentation](https://cookiecutter-data-science.drivendata.org/) for an explanation on how to use exactly their template.

```
├── .venv              <- Virtual environment for all packages installed (e.g. pandas, numpy, matplotlib)
├── LICENSE            <- Open-source license if one is chosen
├── Makefile           <- Makefile with convenience commands like `make data` or `make train`
├── README.md          <- The top-level README for developers using this project.
├── data
│   ├── external       <- Data from third party sources.
│   ├── interim        <- Intermediate data that has been transformed.
│   ├── processed      <- The final, canonical data sets for modeling.
│   └── raw            <- The original, immutable data dump.
│
├── docs               <- A default mkdocs project; see www.mkdocs.org for details
│
├── models             <- Trained and serialized models, model predictions, or model summaries
│
├── notebooks          <- Jupyter notebooks. Naming convention is a number (for ordering),
│                         the creator's initials, and a short `-` delimited description, e.g.
│                         `1.0-jqp-initial-data-exploration`.
│
├── pyproject.toml     <- Project configuration file with package metadata for 
│                         {{ cookiecutter.module_name }} and configuration for tools like black
│
├── references         <- Data dictionaries, manuals, and all other explanatory materials.
│
├── reports            <- Generated analysis as HTML, PDF, LaTeX, etc.
│   └── figures        <- Generated graphics and figures to be used in reporting
│
├── requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
│                         generated with `pip freeze > requirements.txt`
│
├── setup.cfg          <- Configuration file for flake8
│
└── {{ cookiecutter.module_name }}   <- Source code for use in this project.
    │
    ├── __init__.py             <- Makes {{ cookiecutter.module_name }} a Python module
    │
    ├── config.py               <- Store useful variables and configuration
    │
    ├── dataset.py              <- Scripts to download or generate data
    │
    ├── features.py             <- Code to create features for modeling
    │
    ├── modeling                
    │   ├── __init__.py 
    │   ├── predict.py          <- Code to run model inference with trained models          
    │   └── train.py            <- Code to train models
    │
    └── plots.py                <- Code to create visualizations   
```

They also give some good advice that I will quote (although, I do recommend you [read it fully](https://cookiecutter-data-science.drivendata.org/opinions/), as it contains a lot of good advice):

> - **Raw data is immutable**: This informs the design of the default data/ directory subfolders in which data originates from `raw/` and `external/`, intermediate analytical outputs get serialized or cached in `interim/`, and final products end up in `processed/` (the number or names of these folders is less important than the flow of data between them). 
> - **Notebooks are for exploration and communication, source files are for repetition**: Refactor the good parts into source code.
> - **Keep your modeling organized**: You should implement experiment documentation procedures that enable you to, at minimum, identify the provenance of the data and the version of the code that the experiment used, as well as the metrics used to measure performance.
> - **Encourage adaptation from a consistent default**: A project's organizational needs may differ from the start and can change over time.