<font color='blue'> First of all, please “Copy to Drive” to get your own copy for editing. </font>

<font color='red'> Run all the cells. For places with "Complete the codes below", please replace the "XXX" placeholder with your own codes.</font>

# More about Functions

...last time we talked about...
* Function basics
* **Lambda functions**: to define simple one-line functions. These functions are usually anonymous (not named) and are often used for small tasks.

In [1]:
# Example: Sort [(1, 2), (2, 0), (4, 1)] based on the 2nd item in the tuple.

sorted([(1, 2), (2, 0), (4, 1)], key=lambda x: x[1]) #based on the 2nd item in the tuple

[(2, 0), (4, 1), (1, 2)]

**Not that basic**:
* Higher Order Functions
* **Variable Number of Parameters**
* **Generators**
* Decorators
* Recursive Functions
* ...

## Higher Order Functions

A higher-order function is a function that can **take another function as an argument** .

In [2]:
# Define a higher-order function that applies a given function to each element in a list
def apply_to_list(func, values):
    return [func(x) for x in values]

# Define some basic functions to pass as arguments
def square(x):
    return x ** 2

def double(x):
    return x * 2

# Example usage of the higher-order function
numbers = [1, 2, 3, 4, 5]

squared_numbers = apply_to_list(square, numbers)
doubled_numbers = apply_to_list(double, numbers)

print(f"Squared: {squared_numbers}")  # Output: Squared: [1, 4, 9, 16, 25]
print(f"Doubled: {doubled_numbers}")  # Output: Doubled: [2, 4, 6, 8, 10]


Squared: [1, 4, 9, 16, 25]
Doubled: [2, 4, 6, 8, 10]


<font color='red'>Complete the codes in the cell below.  </font>

In [3]:
def app_to_list(some_list, func):
        return [func(x) for x in some_list]

def double(x):
       return x*2

numlist = [2, 4, 6]

# complete the code to get the output: [4,8,12]
print(app_to_list(numlist, double)) # [4,8,12]

[4, 8, 12]


## Variable Number of Arguments
<font color='blue'> When the **number of arguments** that will be passed into a function is **unknow**, `*args` and `**kwargs` can be used to denote variable number of positional and keyword arguments.
* `*args` is used to denote an arbitrary number of positional arguments.
* `**kwargs` allows you to pass a variable number of keyword arguments. `**kwargs` is a dictionary.

Example: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.plot.html

In [4]:
def print_order(*args, **kwargs):
    # When the number of arguments is unknow, *args can be used for positional
    print("Items ordered:")
    for item in args:
        print(f"- {item}")

    # Handling keyword arguments (**kwargs)
    print("\nAdditional details:")
    for key, value in kwargs.items():
        print(f"{key}: {value}")

# Example usage
print_order("Pizza", "Pasta", "Salad", name="John", table=5, takeout=False)

Items ordered:
- Pizza
- Pasta
- Salad

Additional details:
name: John
table: 5
takeout: False


<font color='red'>Complete the codes in the cell below.  </font>

In [5]:
def team(*args, **kwargs):
    print("Members: ", end="")
    for arg in args:
        print(arg, end=" ")
    print()
    for key, value in kwargs.items():
        print(f'{key}:{value}')
    print()

team("Sally", "Tom", "Adam", Project='TicTacToe', Status="incomplete")

# expected output:
# Members: Sally Tom Adam
# Project:TicTacToe
# Status:incomplete

Members: Sally Tom Adam 
Project:TicTacToe
Status:incomplete



In [6]:
import pandas as pd

# Function to calculate summary statistics for multiple columns
def summary_statistics(df, *args, **kwargs):
    stats = {}

    # Calculate summary stats for each column in *args
    for column in args:
        stats[column] = {
            'mean': df[column].mean(),
            'median': df[column].median(),
            'std': df[column].std(),
            **kwargs  # Adding any extra statistics passed via **kwargs
        }

    return pd.DataFrame(stats)

# Example DataFrame
data = {
    'age': [25, 32, 47, 51, 62],
    'income': [50000, 64000, 80000, 72000, 85000],
    'expenses': [1500, 2000, 2200, 2100, 1900]
}

df = pd.DataFrame(data)

# Example usage
# Passing 'age' and 'income' columns with an additional 'min' statistic through **kwargs
extra_stats = {'min': lambda x: df[x].min()}
summary = summary_statistics(df, 'age', 'income', **extra_stats)

print(summary)


                                          age  \
mean                                     43.4   
median                                   47.0   
std                                 14.876155   
min     <function <lambda> at 0x7f555aff4b80>   

                                       income  
mean                                  70200.0  
median                                72000.0  
std                              13827.508814  
min     <function <lambda> at 0x7f555aff4b80>  


As shown in the example above, the function calculates the mean, median, and standard deviation for the age and income columns, and then dynamically adds the min statistic using **kwargs.
* `*args`: Allows you to pass any number of columns from the DataFrame for which you want to calculate statistics.
* `**kwargs`: This can be used to dynamically add any additional summary statistics, such as min, max, or custom functions.


<font color='blue'> In Python, you can also use `*` to force arguments after `*` to be passed as keyword-only arguments. This means that after `*`, you must specify the argument name when calling the function.

Example: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html

In [7]:
def my_function(a, b, *, c):
    print(a, b, c)

# Example usage:
my_function(1, 2, c=3)  # Output: 1 2 3

1 2 3


`*(in function calls)`: Unpacks a sequence into positional arguments.

In [8]:
def add(x, y, z):
    return x + y + z

numbers = [1, 2, 3]
print(add(*numbers))  # unpack a list

# Output: 6

6


* <font color='red'> `*args`: Collects an arbitrary number of positional arguments into a tuple.
* <font color='red'> `*`: Can be used to enforce keyword-only arguments.
* <font color='red'> `*(in function calls)`: Unpacks a sequence into positional arguments.
* <font color='red'> `**kwargs`: Collects an arbitrary number of keyword arguments into a dictionary.

## Generators
Many objects in Python support **iteration**, such as over objects in a list or lines in a file. This is accomplished by means of the iterator protocol, a generic way to make objects iterable.

In [9]:
some_dict = {"a": 1, "b": 2, "c": 3}
for key in some_dict:
    print(key)

a
b
c


When you write `for key in some_dict`, the Python interpreter first attempts to create an iterator out of `some_dict`:

In [10]:
dict_iterator = iter(some_dict)
dict_iterator

<dict_keyiterator at 0x7f555a65d350>

An iterator is any object that will yield objects to the Python interpreter when used in a context like a for loop. This includes built-in methods such as **`min()`**, **`max()`**, and **`sum()`**, and type constructors like **`list()`** and **`tuple()`**:

In [11]:
list(dict_iterator)

['a', 'b', 'c']

**DEFINITION:**

A **generator** is a convenient way, similar to writing a normal function, to construct a new iterable object. Whereas normal functions execute and return a single result at a time, generators can return a sequence of multiple values by pausing and resuming execution each time the generator is used.

<font color='blue'>To create a generator, use the **`yield`** keyword instead of **`return`** in a function:

In [12]:
def squares(n=10):
    print(f"Generating squares from 1 to {n ** 2}")
    for i in range(1, n + 1):
        yield i ** 2

In [13]:
# When you actually call the generator, no code is immediately executed
gen = squares()
gen

<generator object squares at 0x7f552852d620>

In [14]:
#It is not until you request elements from the generator that it begins executing its code:
for x in gen:
    print(x, end=" ")

Generating squares from 1 to 100
1 4 9 16 25 36 49 64 81 100 

In [15]:
gen2 = squares(12)
for x in gen2:
  print(x, end="|")

Generating squares from 1 to 144
1|4|9|16|25|36|49|64|81|100|121|144|

<font color='red'>Complete the codes in the cell below. Please replace the "XXX" placeholder with your own codes. </font>

In [17]:
def cubes(n=10):
    print(f"Generating cubes from 1 to {n}")
    for i in range(1, n + 1):
        yield i**3

gen3 = cubes(4)
for item in gen3:
  print(item, end="|")

# Generating cubes from 1 to 1000
# 1|8|27|64|125|216|343|512|729|1000|

Generating cubes from 1 to 4
1|8|27|64|

### Generator expressions
Another way to make a generator is by using a generator expression. This is a generator analogue to list, dictionary, and set comprehensions.

In [18]:
gen = (x ** 2 for x in range(100))
gen

<generator object <genexpr> at 0x7f5527af5d20>

Generator expressions can be used instead of list comprehensions as function arguments in some cases:

In [19]:
sum(x ** 2 for x in range(100)) # Generates squares of numbers without creating a list in memory, then sum().

328350

In [20]:
dict((i, i ** 2) for i in range(5))

{0: 0, 1: 1, 2: 4, 3: 9, 4: 16}

Depending on the number of elements produced by the comprehension expression, the generator version can sometimes be meaningfully faster.

In [21]:
# A function that adds one to every number
def process_iterator(numbers):
    lst = []
    for number in numbers:
        lst.append(number+1)

    return lst

# Use a generator expression (that generates cubes of numbers up until number 10) as the function argument
process_iterator(i**3 for i in range(11)) # [1, 2, 9, 28, 65, 126, 217, 344, 513, 730, 1001]

[1, 2, 9, 28, 65, 126, 217, 344, 513, 730, 1001]

### itertools module (optional)

The standard library **`itertools`** module has a collection of generators for many common data algorithms. For example, **`groupby`** takes any sequence and a function, grouping consecutive elements in the sequence by return value of the function.

In [22]:
import itertools
def first_letter(x):
    return x[0]

names = ["Alan", "Adam", "Wes", "Will", "Albert", "Steven"]

for letter, group in itertools.groupby(names, first_letter):
    print(letter, list(group)) # group is a generator

A ['Alan', 'Adam']
W ['Wes', 'Will']
A ['Albert']
S ['Steven']


**EXPLANATION:**

`itertools.groupby(names, first_letter)`: This function takes an iterable (`names` in this case) and groups consecutive identical elements based on the key function (`first_letter`). It returns an iterator where each element is a tuple (**key**, **group**), where **key** is the value of the key function for the group (in this case `letter`), and **group** is an iterator that produces the grouped elements.

**SOME USEFUL ITERTOOLS FUNCTIONS:**

* **`chain(*iterables)`**: Generates a sequence by chaining iterators together. Once elements from the first iterator are                                  exhausted, elements from the next iterator are returned, and so on.


* **`combinations(iterable, k)`**: Generates a sequence of all possible k-tuples of elements in the iterable, ignoring order and                                    without replacement


* **`permutations(iterable, k)`**: Generates a sequence of all possible k-tuples of elements in the iterable, respecting order.


* **`groupby(iterable[, keyfunc])`**: Generates `(key, sub-iterator)` for each unique key.


* **`product(*iterables, repeat=1)`**: Generates the Cartesian product of the input iterables as tuples, similar to a nested                                            `for` loop.