# 03: Mer om funksjoner

**Forfatter**: Benedikt Goodman\
**Medhjelpere**: Mistral Large, ChatGPT-4


Funksjoner er etter min mening det viktigste å mestre. Det er nøkkelen til å skrive kode som er modularisert, lett å forstå, teste og vedlikeholde. Det åpner også døren for å forstå objekter og dermed koden i mer eller mindre alle Python-pakker. Som vanlig har jeg fått hjelp av AI for å skrive dette og derfor er både koden og teksten på engelsk.

## Outline
1. Function basics
2. Function arguments
    1.  Positional arguments
    2. Keyword arguments
    3. Default arguments
    4. Variable-length arguments
3. Return values
4. Function documentation
5. Nested functions
6. Function factories
7. Single Responsibility Principle (SRP)
8. Function composition



## A recap of what we covered last time
What are some unique features about functions?

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import os

# Vil feile, nå skal vi repetere å legge til pakker :)
from pydantic import validate_call


ImportError: cannot import name 'validate_call' from 'pydantic' (/home/jovyan/Git_repos/nr-kurs-python-dypdykk/.venv/lib/python3.10/site-packages/pydantic/__init__.cpython-310-x86_64-linux-gnu.so)

In [1]:
def greet(name):
    return f"Hello, {name}!"

In [2]:
greet('Benedikt')

'Hello, Benedikt!'

### Function arguments

In Python, an `argument` is a value that is passed into a `function` when it is called. `arguments` allow you to customize the behavior of a function by providing different input values. When you define a `function`, you specify one or more `parameters` (also called formal parameters) that represent the *input values the function expects to receive*. When you call the function, you provide the actual values (arguments) for those parameters.


Functions can take several types of arguments

- `Positional arguments`: Arguments that need to be in a specific position.
- `Keyword arguments`: Arguments identified by a keyword and not necessarily placed in a position.
- `Variable-length arguments`: Arguments that allow you to pass an arbitrary number of arguments (usually abbreviated as `*args`) or keyword arguments (usually shortened to `**kwargs`). 

NB: Note how `*` and `**` is used to unpack the `args` and `kwargs`.

In [3]:
def function_example(positional, keyword=None,  *args, **kwargs):
    return f"Positional: {positional}, Keyword: {keyword}, Args: {args}, Kwargs: {kwargs}"

In [4]:
# Calling the function with a positional argument
result = function_example("positional_value")
print(result)  # Output: Positional: positional_value, Keyword: None, Args: (), Kwargs: {}

# Calling the function with a keyword argument
result = function_example("positional_value", keyword="keyword_value")
print(result)  # Output: Positional: positional_value, Keyword: keyword_value, Args: (), Kwargs: {}

# Calling the function with additional positional arguments
result = function_example("positional_value", 1, 2, 3)
print(result)  # Output: Positional: positional_value, Keyword: None, Args: (1, 2, 3), Kwargs: {}

# Calling the function with additional keyword arguments
result = function_example("positional_value", arg1="value1", arg2="value2")
print(result)  # Output: Positional: positional_value, Keyword: None, Args: (), Kwargs: {'arg1': 'value1', 'arg2': 'value2'}

# Calling the function with a mix of argument types
result = function_example("positional_value", 1, 2, 3, arg1="value1", arg2="value2")
print(result)  # Output: Positional: positional_value, Keyword: None, Args: (1, 2, 3), Kwargs: {'arg1': 'value1', 'arg2': 'value2'}


Positional: positional_value, Keyword: None, Args: (), Kwargs: {}
Positional: positional_value, Keyword: keyword_value, Args: (), Kwargs: {}
Positional: positional_value, Keyword: 1, Args: (2, 3), Kwargs: {}
Positional: positional_value, Keyword: None, Args: (), Kwargs: {'arg1': 'value1', 'arg2': 'value2'}
Positional: positional_value, Keyword: 1, Args: (2, 3), Kwargs: {'arg1': 'value1', 'arg2': 'value2'}


In [5]:
# Lets try it all together
function_example(
    1,  
    'Keyword',
    'These', 
    'are', 
    'args',
    [666],
    {'division': 'national accounts'}, 
    kwarg1='These',
    whatever='are',
    kwarg3='kwargs',
    )

"Positional: 1, Keyword: Keyword, Args: ('These', 'are', 'args', [666], {'division': 'national accounts'}), Kwargs: {'kwarg1': 'These', 'whatever': 'are', 'kwarg3': 'kwargs'}"

## Function Documentation

Using docstrings (triple-quoted strings) immediately after the function definition helps document the function's purpose and usage. Adding type hints to function parameters and return values further enhances the documentation by providing clear information about the expected input and output types.

Type hints can be added using Python's built-in typing module, which includes various types and type utilities. Combining well-written docstrings with type hints improves code readability and maintainability, and facilitates better collaboration among developers. Large language models like Mistral, Claude, or ChatGPT-3.5 / ChatGPT-4 are **very** good at writing docstrings and providing type hints for your functions.


In [6]:


# Example of a google-style docstring in a function with typehints
def calculate_area(radius: int | float | np.ndarray):
    """
    Calculate the area of a circle.

    Args:
        radius: The radius of the circle.

    Returns:
        float: The area of the circle.
    """
    return 3.14 * (radius ** 2)


In [7]:
def calculate_area(radius: float | int | np.ndarray) -> float | np.ndarray:
    """
    Calculate the area of a circle.

    Parameters
    ----------
    radius :
        The radius or radii of the circle(s).

    Returns
    -------
    float or ndarray
        The area of the circle(s).

    Examples
    --------
    >>> calculate_area(2.0)
    12.56

    >>> radii = np.array([1.0, 2.0, 3.0])
    >>> calculate_area(radii)
    array([ 3.14, 12.56, 28.26])
    """
    # This pattern of design is known as type-guarding
    if isinstance(radius, (type(float), type(int), np.ndarray)) is False:
        raise TypeError(f'Radius must be int, float or np.ndarray. Argument given was {type(radius)}')
    
    return 3.14 * (radius ** 2)

In [8]:
# Will work, hover the moue above the function and the documentation will show as well
calculate_area(np.random.randint(0, 10, size=(3,3)))

array([[200.96,   3.14, 200.96],
       [113.04,  50.24, 200.96],
       [153.86,  50.24,  50.24]])

In [9]:
# Will show documentation but raise error
calculate_area([1, 2, 3])


TypeError: Radius must be int, float or np.ndarray. Argument given was <class 'list'>

## Functions can contain functions

...I know we have done this before but it is worth repeating. Do you guys remember what we covered about scoping last time?

In [None]:
def outer_function(variable):
    print(f'This the variable in outer function: {variable}')
    
    def inner_function(variable):
        print(f'This is the variable in the inner function: {variable}')
        return variable + variable

    variable = inner_function(variable)
    print(f'This is the variable after triggering the inner function: {variable}')
    
    return variable

# We can call the outer function
outer_function(2)

# But can we call the inner function?

## Function factories

Functions can be used to make other functions. We often do these when we want to create variations of functions related to same-ish types of operations. The more in-depth reasons for why they might be useful are:


1. `Encapsulation and abstraction`: Factory functions help encapsulate the object creation process, hiding the complexity of creating an object and exposing a simplified interface. This makes it easier to change the implementation details of the object creation process without affecting the client code.

2. `Flexibility and customization`: Factory functions can return different types of objects based on input parameters or configuration settings. This allows for more flexible and customizable object creation, as the specific object type can be determined at runtime.

3. `Simplifying object initialization`: Factory functions can simplify the initialization of objects with complex or verbose constructors. By wrapping the initialization process in a function, you can provide a more user-friendly interface for creating objects.

4. `Resource management`: Factory functions can manage resources more efficiently by reusing existing objects or pooling resources. For example, a factory function can return an object from a cache instead of creating a new one, improving performance and reducing memory usage.

5. `Enforcing constraints`: Factory functions can enforce constraints on object/function creation, such as ensuring that only a single instance of a class is created (singleton pattern) or that objects are created with specific configurations.

The above highlight how factory functions can be a powerful tool in a developer’s toolkit, particularly for managing complexity and enhancing the flexibility and scalability of software designs. Or... help you write concise code which is (hopefully) easier to both test and understand.

In [None]:
# Instead of a function returing a variable, it now returns another function.
def create_logger(level):
    def logger(message):
        print(f"{level}: {message}")
    return logger

# Creating different loggers i.e. different variations of a type of function.
info_logger = create_logger("INFO")
warning_logger = create_logger("WARNING")
error_logger = create_logger("ERROR")

# Using the loggers
info_logger("This is an informational message.")
warning_logger("This is a warning message.")
error_logger("This is an error message.")

## Function decorators

Sometimes we want a function to do something to another function kind of like how factories do, but we want to be able to apply this to any function we want to define. For this we use what is called a `decorator` or a `wrapper`. These two words are use interchangeably. For simplicity's sake I will refer to it as a `decorator`.

### The basic idea of a `decorator`

Think of a decorator as a special wrapper for a present. When you give someone a present, you might put it in a box and then wrap the box in some pretty paper to make it look nice. A decorator in Python does something similar for functions. It takes a function, adds some extra "wrapping" (extra functionality), and then gives it back, still looking like the same function but now with some added features.



In [None]:
# Here we make the decorator which takes in a function and prints stuff before and after triggering it
def hello_goodbye_decorator(func):
    def wrapper():
        print("Hello! (This is the part where you can do things which change the inputs to the function you want to decorate.)")  # This is like saying "Hello!" when you open the present.
        func()  # This is the original function doing what it was supposed to do.
        print("Goodbye! (This is the part where you can do things which change the outputs of a function you want to decorate.) ")  # This is like saying "Goodbye!" after enjoying the present.
    return wrapper

# This is the function we want to decorate
def my_function():
    print('I am the function')

# The decoration of the function
decorated_function = hello_goodbye_decorator(my_function)

# Triggering the function
decorated_function()

Got it? Let's make a decorator which actually does something useful!

Let’s imagine a theme park called "Giantland," where you must be at least 69 years old to enter, because in Giantland, everyone ages very, very slowly! Our function will be for issuing an entry pass, and the decorator will make sure everyone is at least 69 years old.

In [None]:
def age_requirement_decorator(age_requirement):
    def decorator(func):
        def wrapper(*args, **kwargs):
            # Check and adjust the 'age' parameter
            if 'age' in kwargs and kwargs['age'] < age_requirement:
                print(f"Hmm, you look too young for this place. Let's pretend you're {age_requirement}")
                kwargs['age'] = age_requirement
            elif args and isinstance(args[1], int) and args[1] < age_requirement:
                print(f"Hmm, you look too young for this place. Let's pretend you're {age_requirement}. Nice.")
                args = (args[0], age_requirement) + args[2:]
            return func(*args, **kwargs)
        return wrapper
    return decorator

In [None]:
@age_requirement_decorator(age_requirement=1000)
def issue_giantland_pass(name, age):
    print(f"{name} receives a Giantland pass and is magically {age} years old now!")

In [None]:
issue_giantland_pass("Alice", 100)

You might want to create a decorator function when you find yourself applying the same logic or functionality to multiple functions repeatedly. Decorators allow you to encapsulate this common logic in a reusable way, making your code more modular, maintainable, and DRY (Don't Repeat Yourself). It's only really applicable when you are deep into developing source code for objects and functions that are quite complicated. For most use-cases in SSB you typically do not need to concern yourself with decorators. However, it's handy to know what they are so you don't freak out when you see `@validate_call` above a function or a method in an object in the source-code of a library.


A very common usage for decorators are `timer functions` or the `validate_call` decorators found in a library called `pydantic` which adds type-guard functionality for your 

In [None]:
from pydantic import validate_call

# The wrapper below will ensure that the function we define will only accept what the typehints say. Nothing more, nothing less
@validate_call
def some_function(number: int, other_number: int) -> int:
    return number + other_number

# Will work
some_function(3, 4)

In [None]:
# Pydantic will throw an error here as we give it a float
some_function(3.14, 999)

## Sidenote: When would you want to apply decorators?

TLDR: Decorators are useful when you need to apply the same logic or functionality to multiple functions. By encapsulating this common logic in a decorator, you can make your code more modular, maintainable, and not repeat yourself a thousand times.

Here are some scenarios where decorators can be useful:

1. `Logging and debugging`: You may want to log the execution time, input arguments, or return values of various functions for debugging or monitoring purposes. Instead of manually adding logging statements to each function, you can create a decorator that handles the logging and apply it to the desired functions.

2. `Authentication and authorization`: In some applications, you may need to restrict access to certain functions based on user permissions or authentication status. A decorator can be used to check these conditions and either allow or deny access to the function.

3. `Caching`: To improve performance, you might want to cache the results of time-consuming functions. A decorator can be used to implement caching, so you don't have to write caching logic for each individual function.

4. `Error handling`: You may want to add custom error handling or fallback behavior to multiple functions. A decorator can encapsulate this error handling logic, making it easy to apply to any function that needs it.

5. `Input validation`: You may want to validate the input arguments of multiple functions to ensure they meet certain criteria. A decorator can be used to perform input validation, so you don't have to write validation code for each function.

## The Single Responsibility Principle (SRP) ...or why you should make your functions do one thing

We covered this in the last lecture. But this is really, *really*, **really**, important. Bad code usually violates the SRP. It is hard to read, hard to test and hard to maintain. Now, lets write some truly bad code.

In [None]:
def a_truly_shitty_function(df_employees, df_performance):
    plt.figure(figsize=(12, 6))
    df_employees = df_employees[df_employees['Age'] > 30]
    df_employees = df_employees.copy()
    df_performance = df_performance[df_performance['Performance_Score'] > 250]
    df_performance['useless_flag'] = df_performance['Performance_Score'] > 100
    df_employees['Adjusted_Salary'] = np.where(
        df_employees['Years_with_Company'] > 10, 
        df_employees['Salary'] * 1.1 + 1,
        df_employees['Salary'] * 0.9 - 1
    )
    if df_employees['Adjusted_Salary'].mean() > 60000:
        print("Useless Information: Mean salary adjusted!")
        df_employees['Adjusted_Salary'] += np.sin(df_employees['Adjusted_Salary'].mean())
    merged_df = pd.merge(df_employees, df_performance, on='Employee_ID', how='inner')
    merged_df.to_csv('temp_df.csv')
    merged_df = pd.read_csv('temp_df.csv')
    os.remove('temp_df.csv')
    merged_df['Composite_Score'] = merged_df.apply(lambda row: (row['Years_with_Company'] * row['Performance_Score'] / np.sqrt(row['Age'])) if row['useless_flag'] else 0, axis=1)
    if np.random.rand() > 0.99:
        raise Exception("Random error just for extra chaos!")
    if not df_performance[df_performance['useless_flag']].empty:
        print('Completely unnecessary message: All scores are more than 100!')
    plt.subplot(1, 2, 1)
    plt.bar(merged_df['Employee_Name'], merged_df['Adjusted_Salary'], color='blue')
    plt.title('Adjusted Salaries of Employees')
    max_salary = merged_df['Adjusted_Salary'].max()
    avg_composite_score = merged_df['Composite_Score'].mean()
    plt.subplot(1, 2, 2)
    plt.scatter(merged_df['Employee_Name'], merged_df['Composite_Score'], color='red')
    plt.axhline(y=avg_composite_score, color='green', linestyle='--')
    plt.title('Composite Scores with Average Line')
    plt.show()
    return max_salary + np.random.randint(-500, 500), avg_composite_score * np.random.choice([0.98, 1.02])





In [None]:
# Lets trigger this pile of bullshit
data_employees = {
    'Employee_ID': [1, 2, 3, 4, 5],
    'Employee_Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, 40, 35, 30, 45],
    'Years_with_Company': [2, 10, 5, 3, 20],
    'Salary': [50000, 75000, 62000, 58000, 95000]
}
df_employees = pd.DataFrame(data_employees)

# Some more data
data_performance = {
    'Employee_ID': [1, 2, 3, 4, 5],
    'Performance_Score': [200, 300, 250, 220, 500]
}


df_performance = pd.DataFrame(data_performance)

# Calling the function
max_salary, avg_composite_score = a_truly_shitty_function(df_employees, df_performance)
print(f"Max Salary: {max_salary}, Average Composite Score: {avg_composite_score}")

And now, let's write code that respects the SRP.

In [None]:
def filter_data(df: pd.DataFrame, column: str, threshold: int) -> pd.DataFrame:
    """
    Filter a dataframe based on the given column and threshold.

    Args:
        df (pd.DataFrame): The dataframe to filter.
        column (str): The column name to filter by.
        threshold (int): The threshold value for filtering.

    Returns:
        pd.DataFrame: The filtered dataframe.
    """
    return df[df[column] > threshold]

def adjust_salaries(df: pd.DataFrame, salary_column: str, years_with_company_column: str, adjustment_factor: float) -> pd.DataFrame:
    """
    Adjust salaries based on years with the company.

    Args:
        df (pd.DataFrame): Employees data.
        salary_column (str): Column name for salary.
        years_with_company_column (str): Column name for years with the company.
        adjustment_factor (float): Adjustment factor for salary.

    Returns:
        pd.DataFrame: Employees data with adjusted salaries.
    """
    df['Adjusted_Salary'] = np.where(
        df[years_with_company_column] > 10,
        df[salary_column] * (1 + adjustment_factor),
        df[salary_column] * (1 - adjustment_factor)
    )
    return df

def merge_data(df_employees: pd.DataFrame, df_performance: pd.DataFrame, merge_key: str) -> pd.DataFrame:
    """
    Merge employees and performance data on Employee_ID.

    Args:
        df_employees (pd.DataFrame): Employees data.
        df_performance (pd.DataFrame): Performance data.
        merge_key (str): Column name for merging the data.

    Returns:
        pd.DataFrame: Merged employees and performance data.
    """
    return pd.merge(df_employees, df_performance, on=merge_key, how='inner')

def calculate_composite_score(df: pd.DataFrame, composite_score_column: str, years_with_company_column: str, performance_score_column: str, age_column: str) -> pd.DataFrame:
    """
    Calculate composite score based on years with the company, performance score, and age.

    Args:
        df (pd.DataFrame): Merged employees and performance data.
        composite_score_column (str): Column name for composite score.
        years_with_company_column (str): Column name for years with the company.
        performance_score_column (str): Column name for performance score.
        age_column (str): Column name for age.

    Returns:
        pd.DataFrame: Merged employees and performance data with calculated composite scores.
    """
    df[composite_score_column] = df[years_with_company_column] * df[performance_score_column] / np.sqrt(df[age_column])
    return df

def plot_results(df: pd.DataFrame, name_column: str, salary_column: str, composite_score_column: str) -> None:
    """
    Plot adjusted salaries and composite scores of employees.

    Args:
        df (pd.DataFrame): Merged employees and performance data with adjusted salaries and composite scores.
        name_column (str): Column name for employee names.
        salary_column (str): Column name for adjusted salaries.
        composite_score_column (str): Column name for composite scores.
    """
    plt.figure(figsize=(12, 6))

    plt.subplot(1, 2, 1)
    plt.bar(df[name_column], df[salary_column], color='blue')
    plt.title('Adjusted Salaries of Employees')

    plt.subplot(1, 2, 2)
    plt.scatter(df[name_column], df[composite_score_column], color='red')
    plt.axhline(y=df[composite_score_column].mean(), color='green', linestyle='--')
    plt.title('Composite Scores with Average Line')

    plt.show()
    



In [None]:
 # Filter employees and performance data
df_employees_filtered = filter_data(df_employees, 'Age', 30)
df_performance_filtered = filter_data(df_performance, 'Performance_Score', 250)
    
# Adjust salaries of filtered data
df_employees_adjusted = adjust_salaries(
    df_employees_filtered, 'Salary', 'Years_with_Company', 0.1
)

# Merge together and make composite score
merged_df = merge_data(df_employees_adjusted, df_performance_filtered, 'Employee_ID')
final_df = calculate_composite_score(
    merged_df, 'Composite_Score', 'Years_with_Company', 'Performance_Score', 'Age'
)

plot_results(final_df, 'Employee_Name', 'Adjusted_Salary', 'Composite_Score')

## Function composition

Recall how we can put functions inside of functions? This could come in handy when we want to combine several smaller functions together into something greater.

In [None]:
def double(x):
    """Double the input number."""
    return x * 2


def square(x):
    """Square the input number."""
    return x**2


def double_then_square(x):
    """Compose the double and square functions."""
    return square(double(x))


# Test the composed function
result = double_then_square(3)
print(result)  # Output: 36 (double(3) = 6, square(6) = 36)


Lets apply the same idea to our example with our pandas code from earlier. Note that this would make our code a bit more verbose and normally you would do this to a module you have designed. To the user however it would seem that we do very much with very little. Now let's compose some of our prior functions together.

In [None]:
def filter_and_adjust_salaries(
    df_employees: pd.DataFrame,
    age_column: str,
    salary_column: str,
    years_with_company_column: str,
    adjustment_factor: float,
    age_threshold: int,
) -> pd.DataFrame:
    # Filter employees data based on the age threshold
    filtered_df = filter_data(df_employees, age_column, age_threshold)

    # Adjust salaries based on years with the company
    adjusted_df = adjust_salaries(
        filtered_df, salary_column, years_with_company_column, adjustment_factor
    )

    return adjusted_df


def merge_and_calculate_composite_score(
    df_employees: pd.DataFrame,
    df_performance: pd.DataFrame,
    merge_key: str,
    composite_score_column: str,
    years_with_company_column: str,
    performance_score_column: str,
    age_column: str,
) -> pd.DataFrame:
    
    # Merge employees and performance data
    merged_df = merge_data(df_employees, df_performance, merge_key)

    # Calculate composite scores based on years with the company, performance score, and age
    calculated_df = calculate_composite_score(
        merged_df,
        composite_score_column,
        years_with_company_column,
        performance_score_column,
        age_column,
    )

    return calculated_df


def filter_merge_and_calculate_composite_score(
    df_employees: pd.DataFrame,
    df_performance: pd.DataFrame,
    merge_key: str,
    age_column: str,
    performance_score_column: str,
    composite_score_column: str,
    years_with_company_column: str,
    age_threshold: int,
    performance_threshold: int,
) -> pd.DataFrame:
    # Filter employees data based on the age threshold
    filtered_employees_df = filter_data(df_employees, age_column, age_threshold)

    # Filter performance data based on the performance score threshold
    filtered_performance_df = filter_data(
        df_performance, performance_score_column, performance_threshold
    )

    # Merge filtered employees and performance data
    merged_df = merge_data(filtered_employees_df, filtered_performance_df, merge_key)

    # Calculate composite scores based on years with the company, performance score, and age
    calculated_df = calculate_composite_score(
        merged_df,
        composite_score_column,
        years_with_company_column,
        performance_score_column,
        age_column,
    )

    return calculated_df


In [None]:
# This is what the user would see
filter_merge_and_calculate_composite_score(
    df_employees,
    df_performance,
    "Employee_ID",
    "Age",
    "Performance_Score",
    "Composite_Score",
    "Years_with_Company",
    30,
    250,
)


#### The pipe function in pandas - Another way of doing composition on dataframes
We'll cover this later, but i'm showing it to you now so you ask your favourite LLM-model to help you do something similar. Basically, there is this function in pandas which is called .pipe() which can help us not make so many redundant dataframes when we want to reach an end result. It can make for some really concise coding but should be used with caution. I.e. you should name your functions wisely before using it so it's clear to the user what you're doing.

The only requirement is that the functions you chain together take a df or pandas series in as its first argument, and that it returns a df or pandas series. More info and examples here: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pipe.html

In [None]:
df_employees_filtered = filter_data(df_employees, 'Age', 30)
df_performance_filtered = filter_data(df_performance, 'Performance_Score', 250)

final_df = (
    adjust_salaries(df_employees_filtered, 'Salary', 'Years_with_Company', 0.1)
    .pipe(merge_data, df_performance_filtered, 'Employee_ID')
    .pipe(calculate_composite_score, 'Composite_Score', 'Years_with_Company', 'Performance_Score', 'Age')
    )

plot_results(final_df, 'Employee_Name', 'Adjusted_Salary', 'Composite_Score')