# Writing Your Own Functions

<a href="https://colab.research.google.com/github/bradleyboehmke/uc-bana-4080/blob/main/example-notebooks/17_functions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This notebook accompanies [this textbook chapter](https://bradleyboehmke.github.io/uc-bana-4080/18-functions.html) and allows you to run the code examples interactively.

## Prerequisites

In [None]:
import numpy as np
import pandas as pd
from completejourney_py import get_data

df = get_data()['transactions']

## When to write functions

In [None]:
# array containing 4 sets of 10 random numbers
x = np.random.random_sample((4, 10))

x[0] = (x[0] - x[0].min()) / (x[0].max() - x[0].min())
x[1] = (x[1] - x[1].min()) / (x[1].max() - x[1].min())
x[2] = (x[2] - x[2].min()) / (x[1].max() - x[2].min())
x[3] = (x[3] - x[3].min()) / (x[3].max() - x[3].min())

In [None]:
x = np.random.random_sample((4, 10))

def rescale(array):
    for index, vector in enumerate(array):
        array[index] = (vector - vector.min()) / (vector.max() - vector.min())

    return(array)

rescale(x)

## Functions vs methods

In [None]:
# stand alone function
sum(x)

In [None]:
# method
x.sum(axis = 0)

In [None]:
# overall sum
x.sum()

In [None]:
# sum of each column
x.sum(axis = 0)

In [None]:
# sum of each row
x.sum(axis = 1)

## Defining functions

In [None]:
def yell(text):
    new_text = text.upper()
    return new_text

yell('hello world!')

In [None]:
def store_sales(data, store, week):
    filt = (data['store_id'] == store) & (data['week'] == week)
    total_sales = data['sales_value'][filt].sum()
    return total_sales

store_sales(data=df, store=309, week=48)

## Parameters vs arguments

In [None]:
# implicitly computing store sales for store 46 during week 43
store_sales(df, 46, 43)

In [None]:
# implicitly computing store sales for store 43 (does not exist) during week 46
store_sales(df, 43, 46)

In [None]:
# explicitly computing store sales for store 46 during week 43
store_sales(data=df, week=43, store=46)

In [None]:
def store_sales(data, store, week, qty_greater_than=0):
    filt = (data['store_id'] == store) & (data['week'] == week) & (data['quantity'] > qty_greater_than)
    total_sales = data['sales_value'][filt].sum()
    return total_sales

# you do not need to specify an input for qty_greater_than
store_sales(data=df, store=309, week=48)

In [None]:
# but you can if you want to change it from the default
store_sales(data=df, store=309, week=48, qty_greater_than=2)

In [None]:
def yell(*args):
    new_text = ' '.join(args).upper()
    return new_text

yell('hello world!', 'I', 'love', 'Python!!')

In [None]:
# **kwargs just creates a dictionary
def students(**kwargs):
    print(kwargs)

students(student1='John', student2='Robert', student3='Sally')

In [None]:
# we can use this dictionary however necessary
def print_student_names(**kwargs):
    for key, value in kwargs.items():
        print(f'{key} = {value}')

print_student_names(student1='John', student2='Robert', student3='Sally')

In [None]:
def some_function(name, age):
    return f'{name} is {age} years old'

some_function('Tom', 27)

In [None]:
def some_function(name: str, age: int) -> str:
    return f'{name} is {age} years old'

some_function('Tom', 27)

In [None]:
help(some_function)

In [None]:
def store_sales(data: pd.DataFrame, store: int, week: int) -> float:
    filt = (data['store_id'] == store) & (data['week'] == week)
    total_sales = data['sales_value'][filt].sum()
    return total_sales

store_sales(data=df, store=309, week=48)

## Docstrings

In [None]:
def store_sales(data: pd.DataFrame, store: int, week: int) -> float:
    """
    Compute total store sales.

    This function computes the total sales for a given
    store and week based on a user supplied DataFrame that
    contains sales in a column named `sales_value`.

    Parameters
    ----------
    data : DataFrame
        Pandas DataFrame
    store : int
        Integer value representing store number
    week : int
        Integer value representing week of year

    Returns
    -------
    float
        A float object representing total store sales

    See Also
    --------
    store_visits : Computes total store visits

    Examples
    --------
    >>> store_sales(data=df, store=309, week=48)
    395.6
    >>> store_sales(data=df, store=46, week=43)
    60.39
    """
    filt = (data['store_id'] == store) & (data['week'] == week)
    total_sales = data['sales_value'][filt].sum()
    return total_sales

## Errors and exceptions

In [None]:
#| error: true

def store_sales(data: pd.DataFrame, store: int, week: int) -> float:
     # argument validation
    if not isinstance(data, pd.DataFrame): raise Exception('`data` should be a Pandas DataFrame')
    if not isinstance(store, int): raise Exception('`store` should be an integer')
    if not isinstance(week, int): raise Exception('`week` should be an integer')

    # computation
    filt = (data['store_id'] == store) & (data['week'] == week)
    total_sales = data['sales_value'][filt].sum()
    return total_sales

store_sales(data=df, store='309', week=48)

In [None]:
#| error: true

def store_sales(data: pd.DataFrame, store: int, week: int) -> float:
    # argument validation
    if not isinstance(data, pd.DataFrame): raise TypeError('`data` should be a Pandas DataFrame')
    if not isinstance(store, int): raise TypeError('`store` should be an integer')
    if not isinstance(week, int): raise TypeError('`week` should be an integer')

    # computation
    filt = (data['store_id'] == store) & (data['week'] == week)
    total_sales = data['sales_value'][filt].sum()
    return total_sales

store_sales(data=df, store='309', week=48)

In [None]:
store_sales(data=df, store=35, week=48)

In [None]:
def store_sales(data: pd.DataFrame, store: int, week: int) -> float:
    # argument validation
    if not isinstance(data, pd.DataFrame): raise TypeError('`data` should be a Pandas DataFrame')
    if not isinstance(store, int): raise TypeError('`store` should be an integer')
    if not isinstance(week, int): raise TypeError('`week` should be an integer')
    if store not in data.store_id.unique():
        raise ValueError(f'`store` {store} does not exist in the supplied DataFrame')


    # computation
    filt = (data['store_id'] == store) & (data['week'] == week)
    total_sales = data['sales_value'][filt].sum()
    return total_sales

store_sales(data=df, store=35, week=48)

In [None]:
def apply_discount(product, discount):
    price = round(product['price'] * (1.0 - discount), 2)
    assert 0 <= price <= product['price']
    return price

In [None]:
# 25% off 3.50 should equal 2.62
milk = {'name': 'Chocolate Milk', 'price': 3.50}
apply_discount(milk, 0.25)

In [None]:
# 200% discount is not allowed
apply_discount(milk, 2.00)

In [None]:
def apply_discount(product, discount):
    price = round(product['price'] * (1.0 - discount), 2)
    assert 0 <= price <= product['price'], 'Invalid discount applied'
    return price

apply_discount(milk, 2.00)

In [None]:
# this discount is created somewhere else in the program
discount = 2

# if discount causes an error adjust it
try:
    apply_discount(milk, discount)
except Exception:
    if discount > 1: discount = 0.99
    if discount < 0: discount = 0
    apply_discount(milk, discount)

In [None]:
try:
    store_sales(data=df, store=35, week=48)
except TypeError:
    print('do something specific for a `TypeError`')
except ValueError:
    print('do something specific for a `ValueError`')
else:
    print('do something specific for all other errors')

In [None]:
try:
    store_sales(data=df, store=35, week=48)
except TypeError:
    raise
except ValueError:
    raise
finally:
    print('Code to close database connection')

## Scoping

In [None]:
x = 84

def func(x):
  return x + 1

func(x = 50)

In [None]:
x

In [None]:
y = 'Boehmke'

def func(x):
  return x + ' ' + y

func(x = 'Brad')

In [None]:
y = 'Boehmke'

def my_name(sep):
    x = 'Brad'
    def my_paste():
        return x + sep + y
    return my_paste()

my_name(sep=' ')

In [None]:
y = 8451

def convert(x):
    x = str(x)
    firstpart, secondpart = x[:len(x)//2], x[len(x)//2:]
    global y
    y = firstpart + '.' + secondpart
    return y

convert(8451)

In [None]:
y

## Anonymous functions

In [None]:
nums = [48, 6, 9, 21, 1]

list(map(lambda x: x ** 2, nums))

In [None]:
(
    df['sales_value']
    .apply(lambda x: 'high value' if x > 10 else 'low value')
)

In [None]:
(
    df[['basket_id', 'sales_value', 'quantity']]
    .groupby('basket_id')
    .apply(lambda x: (x['sales_value'] / x['quantity']).mean())
)

## Exercise: Practicing Function Writing and Application

In this exercise set, you’ll practice defining and applying custom Python functions, using type hints and docstrings, and leveraging methods like `.apply()` to work with real-world data. These tasks will help solidify your understanding of functions and how to use them in data cleaning, feature engineering, and exploratory analysis workflows.

## 1. Load and Inspect the Data

Download the [`companies.csv` dataset](https://github.com/bradleyboehmke/uc-bana-4080/blob/main/data/companies.csv) and load it into a DataFrame. This dataset contains company names and financial attributes.

Inspect the first few rows. What columns are available?

## 2. Define the `is_incorporated()` Function

Write a function `is_incorporated(name)` that checks whether the input string `name` contains the substring `"inc"` or `"Inc"`. If either appears in the name, return `True`; otherwise return `False`.

Test it using a few sample strings like:

```python
is_incorporated("Acme Inc.")
is_incorporated("Global Tech")
```

## 3. Add Type Hints and a Docstring

Now update your `is_incorporated()` function to include:

* A **type hint** for the `name` parameter and the return type
* A **docstring** describing what the function does, the input parameter, and the return value

Use the `help()` function or hover in your IDE to verify the documentation.

## 4. Apply the Function with a Loop

Use a `for` loop to iterate through the `Name` column of the `companies` DataFrame. For each value, call your `is_incorporated()` function and print the company name along with whether it's incorporated.

Your output might look like:

```
Acme Inc. → True  
Global Tech → False  
Bright Inc. → True
```

## 5. Apply the Function with `.apply()`

Now rewrite your logic using the `.apply()` method instead of a `for` loop.

* Apply `is_incorporated()` to the `Name` column
* Store the result in a new column called `"is_incorporated"`
* Print the updated DataFrame to verify the new column