## Breaking code into functions

Functions allow code to be reused and help programmers follow the *Don't
Repeat Yourself* (DRY) principle.
They can take inputs (called _parameters_) and perform the same operations
on different values.
Using the `import` command, functions defined in one module can be used in
many other scripts.

Functions can also be used to break code into logical chunks.
Each function should *do one thing*.
Such functions are easier to test than a large program.
As they have a local scope, they are protected from changes to variables
elsewhere in the code.
If they have descriptive names, the code becomes self-documenting and needs
fewer comments.

Functions can can be defined in any order and can call other functions.
Thus, the main structure of the program to be written at the top
of the file, before the definition of the functions that it calls.
By convention, this is often called the `main()` method.
The `main()` method provides a high-level summary of how the script works that is easy to
find, like the abstract of a scientific paper.

Below is an example structure of a simple data analysis script.

In [7]:
# process_instrument_data.py

CALIBRATION_COEFF = 1.2345

def process_instrument_data_file(input_file, output_file):
    """
    This is the main method of the module and calls other functions defined below.
    It isn't called `main()` this time, because we expect to import it into other
    code, so it is useful to have a descriptive name.
    """
    raw_data = read_instrument_data(input_file)
    result = process_raw_data(raw_data)
    save_output(result, output_file)


def read_instrument_data(input_file):
    """Read file and extract data."""
    print(f"Reading {input_file}")
    # <insert code here...>
    return (1, 2, 3)


def process_raw_data(data):
    """
    Process data from the instrument (this uses the global calibration
    coefficient.
    """
    print(f"Processing {data}")
    # <insert code here...>
    intermediate_result = _do_some_tricky_processing_step(data)
    return (4, 5, 6)


def _do_some_tricky_processing_step(data):
    """
    Some logically self-contained processing step within the main processing.
    The underscore at the start of the name says this is a "hidden"
    function.  This means that it is only meant for internal use and we wouldn't
    expect other users to import and call it by itself.
    """
    print(f"Intermediate processing step on {data}")
    # <insert code here...>
    return (2, 3, 4)


def save_output(result, output_file, format='png'):
    """
    Save output.  The `format` parameter is optional and has a default
    value of 'png'
    """
    print(f"Saving {result} to {output_file} in {format} format")
    # <insert code here>
    return


# Call the processing function
process_instrument_data_file('my_input_file.txt', 'my_output_file.txt')

Reading my_input_file.txt
Processing (1, 2, 3)
Intermediate processing step on (1, 2, 3)
Saving (4, 5, 6) to my_output_file.txt in png format


### Importing without running the script

When a Python file is imported as a module, all lines within it are read and
non-indented code blocks are executed.

Importing `process_instrument_data.py` from above will run the script.
We don't want this if we are trying to import some of the data or functions.

```python
from process_instrument_data import process_raw_data, CALIBRATION_COEFF
```
> Note: the code is only executed on first import. To reimport a module, it
> may be necessary to restart IPython kernel or use
> [importlib.reload()](https://docs.python.org/1/library/importlib.html#importlib.reload).

Adding an `if __name__ == '__main__'` block to the code avoids this.  The
indented block will only be executed when the Python file has been called as a
script.

```python
if __name__ == '__main__':
    process_instrument_data_file('my_input_file.txt', 'my_output_file.txt')
```

`__name__` is a special Python variable that refers to the name of the module
the code is being imported from.  When a file is run as a script, its code has
not been imported so the value is set to `__main__`.

This can be seen by adding `print(f"Hi, my __name__ is '{__name__}')"` to the
`process_instrument_data.py` code and running or importing it.

### Importing our own modules

In [8]:
# Set up paths so that we can use our own modules
from pathlib import Path
import sys

SCRIPT_DIR = Path(r"C:\Users\jostev\github\python-improvers-2\scripts")
sys.path.insert(0, str(SCRIPT_DIR.absolute()))
print(sys.path)

['C:\\Users\\jostev\\github\\python-improvers-2\\scripts', 'C:\\Users\\jostev\\github\\python-improvers-2\\scripts', 'c:\\Users\\jostev\\.conda\\envs\\python-improvers-2\\python313.zip', 'c:\\Users\\jostev\\.conda\\envs\\python-improvers-2\\DLLs', 'c:\\Users\\jostev\\.conda\\envs\\python-improvers-2\\Lib', 'c:\\Users\\jostev\\.conda\\envs\\python-improvers-2', '', 'c:\\Users\\jostev\\.conda\\envs\\python-improvers-2\\Lib\\site-packages', 'c:\\Users\\jostev\\.conda\\envs\\python-improvers-2\\Lib\\site-packages\\win32', 'c:\\Users\\jostev\\.conda\\envs\\python-improvers-2\\Lib\\site-packages\\win32\\lib', 'c:\\Users\\jostev\\.conda\\envs\\python-improvers-2\\Lib\\site-packages\\Pythonwin']


In [9]:
# Import that Bad file
import plot_climate_data_bad as pcdb

In [10]:
# Import the Good file
import plot_climate_data_good as pcdg

In [11]:
# See what was imported
from pprint import pprint  # print nice representation of variables

print("Bad:")
pprint(dir(pcdb))
print("Good:")
pprint(dir(pcdg))

Bad:
['ProcessFile',
 'StringIO',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'csv',
 'data_Dir',
 'f',
 'filename',
 'os',
 'plt',
 'results_Dir']
Good:
['DATA_DIR',
 'Figure',
 'Path',
 'RESULTS_DIR',
 'StringIO',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 '_preprocess_metoffice_file',
 'calculate_mean_maximum_temperature',
 'csv',
 'get_station_name',
 'logger',
 'logging',
 'pd',
 'plot_climate_paper_figs_and_csv',
 'plot_max_temp_figure',
 'plot_max_temp_png',
 'plt',
 'read_metoffice_file',
 'write_max_temps_csv_file']


### Question

+ Why does the `bad` file produce lots of text?
+ What do the `__name__`, `__file__`, and `__doc__` variables contain for each module?

In [12]:
# Use the functions as standalone
pcdg.get_station_name(Path('my_stationdata.txt'))

'my_station'

In [14]:
data = pcdg.read_metoffice_file(Path(SCRIPT_DIR) / "exampledata.txt")
data.head()

Unnamed: 0_level_0,tmax,tmin,frost_days,rain_mm,sun
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1957-01-01,8.0,1.6,7.0,70.2,59.0
1957-02-01,6.9,0.1,15.0,85.7,92.0
1957-03-01,10.4,4.6,0.0,38.5,78.2
1957-04-01,11.9,4.5,1.0,11.6,164.3
1957-05-01,13.2,5.6,0.0,32.1,200.7


### Exercise

+ Manually calculate the maximum temperature in Leuchars, using the functions from your module.
+ Make an inline plot of temperature of Leuchars in the notebook with a custom title.

### Tests

Having functions that can be imported allows them to be tested.
The tests can be run via:

```bash
pytest scripts/test_plot_climate_data_good.py
```

Pytest looks for functions named `test_*` within modules named `test_*` and runs them.
Look at the test file to see how the tests work.
The `assert` function will raise an `AssertionError` if the statement is not true.
This is caught by by `pytest` and used in the test result.

Writing tests lets you code faster because:

+ Writing code to be easily tested results in good structure e.g. functions do one thing
+ You only have to think about small scope
+ You can refactor your code with confidence

Further exploration of test methods and strategies is beyond the scope of this lesson.

### Exercise

+ "Test Driven Development (TDD)" refers to the practice of writing the tests before you write
 the code.  Use TDD to add a `get_oldest_data` function to the `plot_climate_data_good` module.