# Best Practices in Machine Learning and Code Organization

## Motivation

- What does best-practice even mean?
- How do I know something is a bad practice?

> It's not wrong, but it feels wrong.

## Overview

Best Pratices in:
- Machine Learning Code Bases and Versioning
- Code and Module organization and philosophies

## Bad vs. Best Practices in Python

### Repetition

#### Python is not C - so do ***not*** copy-and-paste!

<img src="https://www.gogagah.com/wp-content/uploads/2019/04/Find-the-Difference-1024x576.jpg" width=380>

#### Instead of copy & pasting:

- write functions!
- compose functions!
- create partial function!

In [1]:
def add(a, b):
    return a + b

In [3]:
from functools import partial

add2 = partial(add, 2)  # Create a copy of add() with a=2

add2(3)

5

In [5]:
add2 = lambda x: add(2, x)

add2(3)

5

In [7]:
def add2(x):
    return add(2, x)

add2(3)

5

### Switch Behavior

#### Python has no switch statements, but don't go around stacking if's:

<img src="https://i.redd.it/6rbq35occu441.jpg" width=300px>

#### Instead of stacking if-else:

- map things with a dictionary!

Dictionaries are hashmaps, meaning the map a hash to an object.

Since Functions are first order objects in Python, they can be pointed to!

In [9]:
def add(a, b):
    return a + b

def add_sum(a, b):
    return sum([a, b])

math_functions = {'add': add_sum}

math_functions['add'](2, 2)

4

### Depth

#### Making too many layers - inheritance, nesting, etc.

<img src="https://preview.redd.it/3kz7f2k1psx41.jpg?width=640&crop=smart&auto=webp&s=0c026807888b4c611089b31c740947bf78b5a3c5" width=400 />

#### Instead keep things shallow

Ask yourself:
- Do I need this class?
  - Will it be instantiated often?
  - Are there many objects inheriting from it?
  - Does it carry state? Otherwise its a namespace!
- Does this need to be submodul or a file?
  - Are there many long functions?
  - Are there a large number of private functions?

Singleton Pattern (Single global instance for an Object)
- If it does not carry state, it is a namespace
  - In Python, any file is a namespace! No need for the Object or Instance!
- If it just carries state, you want a database
  - Atomicity of operation can be guaranteed with a database
  - Database outside of Global Interpreter Lock (GIL)
  - Databases scale better!

### Readability

#### Write code - but write it to be read!

<img src="https://i.redd.it/yl1lu031day41.png" width=400 />

#### Code is written to be read

- Documentation
- Type Hinting
- Naming

### Dependencies

#### Sometimes they're too tempting

<img src="https://i.redd.it/mapjfjami3y41.jpg" width=400 />

#### Why?

- Projects get abandoned
  - Lack of security patches
  - Forced to stay with old versions
  - => Your project becomes ancient

Update regularly!
- Small bugs on a regular basis prevent abandonment
- Improved performances
- Additional functionality!

### Keep things short

#### The first law of Software Quality

<img src="https://i.redd.it/tozimpm65gy41.jpg" widht=350 />

#### Sometimes less functionality is more maintainability

> Each line of code is a credit you take on and interest is paid in time to maintain the base. Don't default on your code debt.

Finding non-critical code:
- Is this functionality used by many?
- Is this code still used or abandoned?
- Is it relevant to the larger goal?

Solving too much code:
- Spin out functionality into a different module
- Simplify the code
- Delete code
- No really, you should delete code

### Use version control

<img src="https://external-preview.redd.it/u1_S5Vu4FztMR72c9pfl086wbmdlZYVjK77i1IEvTjg.jpg?width=640&crop=smart&auto=webp&s=310af21a5b237f4b53a982afc2077fcdb4b1839c" width="400">