# Synopsis

In this unit we will learn that:

> **Modular** code is more readable, easier to maintain, and less prone to the creeping in of bugs.
>
> **Re-factoring** of code, that is, the re-writing and re-organizing of code, is a critical part of developing modular code.

Functions are the underpins of modular code. They enable a programmer to avoid repeating lines of code across a project.

>    1. Descriptive function names increase code readability.
>    
>    2. Appropriate documentation of a function makes it easier to avoid logical errors.


# Read libraries

In [2]:
%load_ext autoreload
%autoreload 2

import datetime
import json

from colorama import Fore, Style
from pathlib import Path

# Functions

**Writing modular code is good!**

**Functions** are the workhorses of modular programming in Python! So, what's a function?

You were actually exposed to functions when you filled in your answers to the homework questions **inside** of a function structure. So whenever you see this syntax:

```
def function_name():
     
     #Your statements
     
     return something
```

that block of code is a function. 

Functions help us avoid repeating the same set of statements everytime we want to repeat a task. Functions increase code readability. Functions make code revision and updating easier (you do not have to re-do revisions in all the places of your code where the task is needed. Functions make testing of your code easier and more reliable.

In order to execute the code in a function, you use the syntax `function_name()`. 

**If you do not "call" your function in your code, then it is never executed.**  However, the Python interpreter will still check its code for syntax errors.


<br>

<br>

## No inputs

Let's write a really simple function -- a function that "prints hello".


In [None]:
def print_hello():
    '''
    Prints the word "Hello"
    
    input:
        - None
    output:
        - None
    '''
    print('Hello!\n')
    


In [None]:
print(Fore.RED, print_hello(), Style.RESET_ALL)

You just wrote a simple function! 

Notice that after writing it nothing was printed. That is because you didn't *call* the function, You only defined it so `Python` will know what on earth you're talking if you write `print_hello()` anywhere.

You *call* a function just by writing its name along with the parentheses:

In [None]:
print_hello()
print('----')
help(print_hello)

<br>

## With inputs

Let's now write a more exciting function that allows for inputs. 

In [None]:
def greeting(name = '!'):
    '''
    Prints  "Hello name"
    
    input:
        name - str, default value is '!'
    output:
        - None
    '''
    my_string = f"Hello {name}!"        
    print(my_string)     
    
    return    

In [None]:
greeting('Mary')

greeting()

greeting(3)

a_name = 'Joe'
greeting(a_name)

<br>

Functions can take in inputs. The variable names in the code calling the function do not carry to inside of the function. The local variable names used inside the function definition are only valid in the scope of the definition. 

This scope is called a **namespace**.  

This compartmentalization of variable names makes it much easier to avoid errors due to using the name of variables defined in other parts of the code about which we may not be aware.


In [None]:
who

In [None]:
print(my_string)

<br>

<br>

This is an amazingly useful properties of **namespaces**. We do not need to keep track of what variables are defined within the  functions we call in our code because as soon as the execution of those functions is concluded all variable defined internally to the function will be deleted.  Thus, we do not need to keep inventing new variable names.  

# Namespaces

Generally speaking, a **namespace** (sometimes also called a context) is a naming system for making names unique to avoid ambiguity. A (not very good) namespace system used in daily life is the naming of people with a firstname and a surname. A much better namespace system is the  directory structure of file systems. The same file name can be used in different directories, the files can be uniquely accessed via the pathnames. 

Many programming languages use **namespaces** or contexts for identifiers. An identifier defined in a **namespace** is associated with that **namespace**. This way, the same identifier can be independently defined in multiple **namespaces**. (Like the same file names in different directories) 

**Note that in Python we can define variables that are not limited to a given `namespace`.  We can define `global` variables, and those variables will be accessible and known across all `namespaces` created during the execution of your code.**




In [5]:
special_text = 'Saying it back at you:'

def nothing_strange(input_string):
    '''
    Adds a special_text to a string

    input:
        input_string - str
        
    output:
        output - str
    '''    
    print(f"--The internal value of special_text is\n'{special_text}'\n")
    
    output = f"{special_text} {input_string.upper()}.\n\n"

    return output

In [6]:
print( nothing_strange('hello') )

print(f"--The external value of special_text is\n\t'{special_text}'\n")

--The internal value of special_text is
'Saying it back at you:'

Saying it back at you: HELLO.


--The external value of special_text is
	'Saying it back at you:'



In [7]:
who

Fore	 NamespaceMagics	 Path	 Style	 datetime	 get_ipython	 json	 nothing_strange	 special_text	 
sys	 


In the above code, we defined a global variable `special_text`, which was accessed inside the function `nothing_strange()`. 

**This is very dangerous!** Most of the time, one does not check the code inside a function or the entirety of the code in a large project.

> What if somewhere else in the code someone changes the value of `special_text`? 

In general, **it is not a good practice to make general use of global variables**.

Notice that if you change the value of `special_text` inside the function, Python will assume that the name is **not** referring to a global variable. This will result in an `UnboundLocalError`.


In [8]:
special_text = 'Saying it back to you:'

def nothing_strange(input_string):
    '''
    Adds a special_text to a string

    input:
        input_string - str
        
    output:
        output - str
    '''    
    special_text = Fore.RED + 'Bet you were not expecting this, were you?'
                     
    # print(f"--The internal value of special_text is\n'{special_text}'\n")    
    
    output = f"{special_text} {input_string.upper()}\n\n"

    return output

In [9]:
print( nothing_strange('hello') )

print(f"--The external value of special_text is\n\t'{special_text}'\n")

[31mBet you were not expecting this, were you? HELLO


--The external value of special_text is
	'Saying it back to you:'



# Computational thinking 

A simple example is one thing, but it can be hard to figure out how to use functions in **real** code. 

In general, **it is good practice to write functions that do only one task**. If a function you write turns out to do two or more tasks, it may be worthwhile to determine whether it would not make more sense to split your function in smaller ones. 

**If your functions do a single task, it will be easier to name them in a more specific manner by  referring to the actions it will perform**: find, print, calculate, connect, extract...

Function names can also be used as a device to sketch out what our program intends to do. To make this clear, let's go back to your administrative job from **notebook 7** and work with the `Roster` data.

Imagine that we want to *calculate the ages of all the people in the Roster folder*. **How would we accomplish this?**


## Sketching out a function


In order to write code, you have to think how to break down complex goals into smaller, simpler, well-defined tasks.  Let's break the goal of finding out the age of all persons, into smaller tasks.
There are 800 data files in the `Data/Roster`. 

**Pseudo-code**

>    set relevant inputs: path to folder with data and date at which age is calculated
>    
>    get list of paths to record files
>    
>    iterate over paths to record files:
> >
> >  read file and parse record
> >
> >  extract birthday from record
> >
> >  calculate age at desired date


Each task in this `pseudo-code` suggests a the coding of a function that implements the desired task. Our pseudo-code also makes it clear how we could add additional calculations to our program or to take actions based on the values we calculate.

Let's consider the first task in our pseudo-code: *get list of paths to files with records*.  


In [11]:
# Set relevant inputs
#
folder_path = Path.cwd() / 'Data' / 'Roster'
print(folder_path)
print()

relevant_date = datetime.date(2024, 9, 1)
relevant_date

/Users/amaral/Dropbox/Code_Development/COURSES/Amaral_Lab_Intro_to_Data_Science/Module_An_Intro_to_Python/Data/Roster



datetime.date(2024, 9, 1)

Since we want our code to be readable, we will call the function `get_path_to_records()`.

There are three basic questions that help you sketch a function:

> What does this function do?
>
> What inputs does this function need to do that?
>
> What does the function return?

The answers to these questions need to be answered before you start writing any code.  This is called **stubbing**, i.e., creating a code **stub**.

They are also the question that someone planning to use your code later will want answered.  For this reason, This is the information you **must** provide in the function's `docstring`. 

**What task is the function accomplishing?**

The student files in our project are all in a single directory and all end with the `txt` extension.  Thus, we should write a  function that returns all of the filenames with the `txt` extension in a given directory.

**What inputs does the function require?**

The function needs to be given as input a directory and, if we want to be more general, an extension type.

**How should the information returned be formatted?**

Our function is collecting the file names of the student records. Our pseudo-code also tells us that we will be iterating through the elements in our collection.  Putting these facts together, it seems pretty clear that a good option for the output of our function is a list of file names in string form.

Let's implement these decisions!

In [14]:
def get_path_to_records( folder_path, pattern = '*.txt' ):
    """
    Return all roster filenames in directory
    
    input:
        folder_path - Path object to the directory that contains the roster 
                      files
        pattern - str for filtering file names (default is *.txt)
        
    output:
        my_paths - list of Paths to record files in folder
    """
    my_paths = []
    
    # Statements here!
    
    return my_paths

<br>

<br>

and we can also go ahead and do the stubbing for the other functions we will need to write.

In [15]:
def parse_record( filename ):
    """
    
    """
    record = {}
    
    # Statements here!
    
    return record


In [16]:
def extract_birthdate( date_string ):
    """
    
    """
    birthdate = datetime.date( 1970, 1, 1 )
    # Statements here!
    
    return birthdate

In [17]:
def calculate_age( birthdate, relevant_date ):
    """
    
    """
    age = 0
    
    # Statements here!
    
    return age

## Putting all the stubs together...

A very cool thing is that these sketched functions enable us to write our pseudo-code for real and to run it! Of course the code returns nothing useful but it enables us to test it as we go along. 

In [20]:
my_paths = get_path_to_records( folder_path )
print(my_paths[:2])

records = []
for filename in my_paths[:2]:
    record = parse_record(filename)
    records.append(record)
    
    date_string = record.get(['Date of Birth'])
    birthdate = extract_birthdate( date_string )
    age = calculate_age(birthdate, relevant_date)
        
records[:2]

[]


[]

## Adding details to a stubbed function

With separate functions, it can be easy to develop and test each one without having to process the entire data set every time. Lets start by finding all the records.

A powerful capability of the notebook is that it enables us to write and test a piece of code in a single cell until we are confident that it does what we want. We can then copy that code into the cell where we want to define the function.

**Get going!**


In [None]:
def get_path_to_records( folder_path, pattern = '*.txt' ):
    """
    Return all roster filenames in directory
    
    input:
        folder_path - Path object to the directory that contains the roster 
                      files
        pattern - str for filtering file names
        
    output:
        my_paths - list of Paths to roster files in directory
    """
    my_paths = []
    
    # Statements here!
    
    return my_paths

# Creating a library

Since we will be developing all these functions, it makes sense to organize them in such as manner that we can have easy access to them anywhere.

To this end, we will create a file named `records_lib.py` in the current working folder.

For now, we will place all the stubs in that file and as we develop our code, we will add them there...

In [22]:
!cat records_lib.py

from pathlib import Path


#############################################################################################
def get_path_to_records( folder_path, pattern = '*.txt' ):
    """
    Return all roster filenames in directory
    
    input:
        folder_path - Path object to the directory that contains the roster 
                      files
        pattern - str for filtering file names
        
    output:
        my_paths - list of Paths to roster files in directory
    """
    my_paths = []
    
    # Statements here!
    
    return my_paths


#############################################################################################
def parse_record( filename ):
    """
    
    """
    record = {}
    
    # Statements here!
    
    return record


#############################################################################################
def extract_birthdate( date_string ):
    """
    
    """
    birthdate = datetime.date( 1970, 1, 1 )
    # Statements here!
 

<br>
<br>

If we want to import those functions, then we just need to use the `import` command...

In [23]:
from records_lib import get_path_to_records

<br>

Instead of continuing to work in this notebook, we will know switch to the **dictionaries notebook** because the variable `record` is going to be a `dictionary`.

[Next lesson](nb_10_Dictionaries.ipynb)