<a href="https://colab.research.google.com/github/chrispi21/python-dataeng/blob/main/02_funkcje_zaawansowane.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Advanced functions

In [None]:
def add_one(x):
  return x + 1

In [None]:
add_one(10)

We can also create functions with a default value:

In [None]:
def add_n(x, n=5):
  return x + n

In [None]:
add_n(2)

We can also use named arguments:

In [None]:
add_n(x=1, n=100)

This allows us to change the order of passing arguments:

In [None]:
add_n(n=100, x=1)

# Unpacking lists, tuples, and dictionaries as function arguments

Python also allows unpacking a list, tuple, or dictionary of arguments when calling a function:

In [None]:
# We use * to unpack elements
list_of_arguments = [1, 100]
add_n(*list_of_arguments)

In [None]:
tuple_of_arguments = (1, 100)
add_n(*tuple_of_arguments)

In [None]:
# We use ** to unpack key-value pairs in dictionaries
dict_of_arguments = {"x": 1, "n": 100}
add_n(**dict_of_arguments)

# Functions that accept any number of elements

When an argument name is preceded by `*`, we can pass any tuple containing unnamed arguments. For example, we can create a function that sums any number of elements:


In [None]:
def add_elements(*args):
    total = 0
    for arg in args:
        total += arg
    return total


In [None]:
add_elements(1, 2, 4)

The argument name `*args` is a naming convention – it’s good practice to follow it.

Similarly, we can pass named arguments as a dictionary. We use `**` before the argument name. For example, we can create a function that returns the minimum:

In [None]:
def minimum(**kwargs):
    if len(kwargs) == 0:
        return None
    min_value = float("inf")
    for value in kwargs.values():
        if value < min_value:
            min_value = value
    return min_value


In [None]:
minimum(a=1, b=10, c=-5)

The naming convention recommends using `**kwargs` in this case.

We can create functions that use `*args`, `**kwargs`, and other arguments. For example:

In [None]:
def minimum(x=0, *args, **kwargs):
  minimum = x
  all_args = args + tuple(kwargs.values())
  for value in all_args:
    if value < minimum:
      minimum = value
  return minimum

In [None]:
minimum(1, 22, 2, a=3, b=15)

In [None]:
# Thats not going to work
minimum(a=3, 1, 22, 2, b=15)

## Fun Fact – Merging Dictionaries

Let’s check the signature of the `dict` function, which creates a dictionary:

In [None]:
dict?

Using the technique we just learned, we can merge two dictionaries:

In [None]:
d1 = {"a": 1, "b": 2}
d2 = {"abc": -1}
d3 = {"b": 13, "abc": -1}

In [None]:
dict(**d1, **d2)

In [None]:
# Won't work
dict(**d1, **d3)

In [None]:
# And this is working
{**d1, **d3}

In [None]:
# But order matters
{**d3, **d1}

## Exercise

Create a function `filter_records` that filters records in an employee table based on the provided conditions.

In [None]:
employees = [
    {"first_name": "Jan", "last_name": "Kowalski", "position": "Data Engineer"},
    {"first_name": "Jan", "last_name": "Nowak", "position": "Data Analyst"},
    {"first_name": "Janina", "last_name": "Nowak", "position": "Data Engineer"},
    {"first_name": "Anna", "last_name": "Wiśniewska", "position": "Data Scientist"},
    {"first_name": "Piotr", "last_name": "Lewandowski", "position": "Data Scientist"},
    {"first_name": "Katarzyna", "last_name": "Zielińska", "position": "Actuary"},
    {"first_name": "Marek", "last_name": "Kaczmarek", "position": "Actuary"},
    {"first_name": "Magdalena", "last_name": "Wójcik", "position": "Data Scientist"},
]


For example:

`filter_records(employees, first_name="Jan", last_name="Nowak")`

will return:

`[{"first_name": "Jan", "last_name": "Nowak", "position": "Data Analyst"}]`




In [None]:
# PLACEHOLDER FOR SOLUTION

In [None]:
# SOLUTION

def filter_records(employees, **kwargs):
    return [
        employee
        for employee in employees
        if all(employee[key] == value for key, value in kwargs.items())
    ]


In [None]:
filter_records(employees, first_name="Jan", last_name="Nowak")

# Functions that return multiple values

We will create a function that returns the minimum and maximum for the given input arguments:

In [None]:
def min_max(*args):
  return min(args), max(args)

In [None]:
min_, max_ = min_max(1, 2, 3, 4, 5)

In [None]:
min_, max_

Exercise

Create a function that, for any named arguments, returns the name and value of the argument with the lowest value.

In [None]:
# PLACEHOLDER FOR SOLUTION

In [None]:
# SOLUTION
def minimum(**kwargs):
    if len(kwargs) == 0:
        return None
    min_value = float("inf")
    for name, value in kwargs.items():
        if value < min_value:
            min_value = value
            min_name = name
    return min_name, min_value

In [None]:
minimum(a=1, b=25, c=-1)

# Enforcing the use of positional and named function arguments

Let’s analyze the following function definition:

In [None]:
def function(first_name, last_name, /, position, *, company):
    print(first_name, last_name, position, company)

How can we call this function? Let’s try a few ways:

In [None]:
# positional arguments
function("Janina", "Nowak", "Data Engineer", "ASEC")

In [None]:
# keyword arguments
function(first_name="Janina", last_name="Nowak", position="Data Engineer", company="ASEC")


In [None]:
# arguments before / must be positional
# arguments after * must be keyword arguments
# arguments in between can be passed either way
function("Janina", "Nowak", position="Data Engineer", company="ASEC")


In [None]:
# also works:
function("Janina", "Nowak", "Data Engineer", company="ASEC")

Note:

`/` and `*` can be used together, but they don’t have to be. That is, the following definitions are also correct (but behave differently):

In [None]:
# first_name, last_name must be passed as positional arguments
def function(first_name, last_name, /, position, company):
    print(first_name, last_name, position, company)


In [None]:
# position, company must be passed as keyword arguments
def function(first_name, last_name, *, position, company):
    print(first_name, last_name, position, company)


# `namespaces` and `nested functions`

Docs:
1. https://realpython.com/python-namespaces-scope/

## Local Namespace

To explain this concept, let’s use an example:

In [None]:
input_variable = 1
variable_outside_function = 2

def function(parameter):
    local_variable = parameter
    print("variable_outside_function", variable_outside_function)
    print("local_variable", local_variable)
    # ! This will NOT work if uncommented
    # variable_outside_function += 1

variable_outside_function += 1
function(input_variable)

# ! Attempting to access the function's local scope – will NOT work if uncommented
# local_variable


In [None]:
variable_outside_function

## Global Namespace

We can declare a global variable, which is created in the global namespace. Let’s see how it works with an example:

In [None]:
def function():
  global x
  x = 10
  print(x)


In [None]:
function()

In [None]:
x

Using global variables is not recommended!

## Nested Functions

We can declare nested functions inside other functions.

Let’s use an example – we will declare a function that calculates a bonus based on an employee’s annual review, current salary, and job grade:

In [None]:
def calculate_bonus(annual_review, annual_salary, grade):
    bonus_factor = 1 / 20

    def base_bonus():
        return annual_salary * grade * bonus_factor

    def review_multiplier():
        return 2 if annual_review == 5 else 1

    return base_bonus() * review_multiplier()


In [None]:
calculate_bonus(4, 12 * 10000, 2)

The local scope of variables applies separately at each level of function nesting.

We can use the `nonlocal` keyword to overwrite a variable from an outer namespace (we won’t cover this here).

# Mutable Objects as Function Arguments

This time, we will also use an example:

In [None]:
last_name = "Nowak"

def change_last_name(last_name):
    last_name = "Kowalski"
    return last_name


In [None]:
change_last_name(last_name)

In [None]:
last_name

This time, we will replace the `string` with a `dictionary`:

In [None]:
employee_data = {"first_name": "Jan", "last_name": "Nowak"}

def change_last_name_2(data):
    # update the dictionary
    data["last_name"] = "Kowalski"
    return data


In [None]:
employee_data

In [None]:
change_last_name_2(employee_data)

In [None]:
employee_data

What happens if, instead of updating the dictionary, we create a new one?

In [None]:
employee_data = {"first_name": "Jan", "last_name": "Nowak"}

def change_last_name_2(data):
    # create a new dictionary
    data = {"first_name": "Jan", "last_name": "Kowalski"}
    return data


In [None]:
change_last_name_2(employee_data)

In [None]:
employee_data

In Python, the argument-passing mechanism is described as Pass By Object Reference. This means that a reference to the object is passed by value. For immutable types (e.g., int), modification actually creates a new object. For mutable objects, a new object is not created. However, we can explicitly create a new object (using the same name doesn’t matter). In that case, we operate on separate objects independently.

Additional explanation:
https://python.plainenglish.io/pass-by-object-reference-in-python-79a8d92dc493

Additional resources:

https://docs.python-guide.org/writing/gotchas/#mutable-default-arguments

https://www.geeksforgeeks.org/use-mutable-default-value-as-an-argument-in-python/

## Exercise

Write a function that has two input arguments:

A list of employees

A new employee

Without using the `return` statement, create a function that adds the new employee to the list.

In [None]:
employees = [
    ("Jan", "Kowalski", "Data Engineer"),
    ("Janina", "Nowak", "Data Analyst"),
]
new_employee = ("Katarzyna", "Nowak", "Data Engineer")


In [None]:
# PLACEHOLDER FOR SOLUTION

In [None]:
# @title Podpowiedź

def add_employee(employees, new_employee):
    employees.append(new_employee)

add_employee(employees, new_employee)
employees

## A digression on mutability

Docs:
1. https://docs.python.org/3/library/copy.html
2. https://realpython.com/copying-python-objects/

## Puzzle

What will be the contents of the list `basket_a`? What will be the contents of the list `basket_b`?

In [None]:
basket_a = []

basket_b = basket_a

basket_a.extend(["Bread", "Butter", "Cheese"])
basket_b.extend(["Beer", "Instant noodles"])


In [None]:
# @title Spoiler
print(basket_a)
print(basket_b)

Explanation

`basket_a` and `basket_b` point to the same object in memory:

In [None]:
id(basket_a) == id(basket_b)

We want to fix this!

Puzzle 2

In [None]:
from copy import copy

basket_a = []

basket_b = copy(basket_a)

basket_a.extend(["Bread", "Butter", "Cheese"])
basket_b.extend(["Beer", "Instant noodles"])


In [None]:
# @title Spoiler

print(basket_a)
print(basket_b)

Explanation:

In [None]:
id(basket_a) == id(basket_b)

Puzzle 3

We have a basket of products:

In [None]:
basket_a = [
    {"name": "Bread", "price": 1.69},
    {"name": "Instant noodles", "price": 3.99},
]

basket_b = copy(basket_a)

# Update the price of the bread:
basket_a[0]["price"] = 2.99


What is the value of `basket_b[0]["price"]`?

In [None]:
# @title Spoiler

basket_b[0]["price"]

`copy` copies the references when creating a new object:

In [None]:
id(basket_a) == id(basket_b)

Copied references:

In [None]:
id(basket_a[0]) == id(basket_b[0])

How can we deal with this?

In [None]:
from copy import deepcopy

basket_a = [
    {"name": "Bread", "price": 1.69},
    {"name": "Instant noodles", "price": 3.99},
]

basket_b = deepcopy(basket_a)

# Update the price of the bread:
basket_a[0]["price"] = 0.99


In [None]:
basket_b[0]["price"]

This time this is completly new object

In [None]:
id(basket_a[0]) == id(basket_b[0])

Explanation for this behavior:
1. Saves RAM (memory efficiency)
2. Copying references is faster than creating new objects

## Best Practices for Default Values ​​for Mutable Objects

Let's start with bad practices and the problems they cause:

In [None]:
# WARNING: anti-example!
def add_employee(new_employee, employees=[]):
  print(id(employees))
  employees.append(new_employee)
  return employees

In [None]:
add_employee(("Jan", "Kowalski", "Data Engineer"))

In [None]:
add_employee(("Janina", "Kowalska", "Data Engineer"))

In [None]:
add_employee(("Jan", "Nowak", "Data Analyst"), [
("Paweł", "Nowak", "Data Engineer"),
("Paulina", "Nowak", "Data Engineer"),
])

In [None]:
add_employee(("Zofia", "Mickiewicz", "Manager"))

A good practice is to assign the value `None` as the default and then create a new object with the expected value, as in the following example:

In [None]:
def add_employee(new_employee, employees=None):
  print("#1: ", id(employees))
  if employees is None:
    employees = []
  print("#2: ", id(employees))
  employees.append(new_employee)
  return employees

In [None]:
add_employee(("Jan", "Kowalski", "Data Engineer"))

In [None]:
add_employee(("Janina", "Kowalska", "Data Engineer"))

In [None]:
add_employee(("Jan", "Nowak", "Data Analyst"), [
("Paweł", "Nowak", "Data Engineer"),
("Paulina", "Nowak", "Data Engineer"),
])

In [None]:
add_employee(("Jan", "Nowak", "Data Analyst"), [
("Paweł", "Nowak", "Data Engineer"),
("Paulina", "Nowak", "Data Engineer"),
])

In [None]:
add_employee(("Zofia", "Mickiewicz", "Manager"))

# Higher order functions
Docs:
1. https://docs.python.org/3/library/functools.html
2. https://www.geeksforgeeks.org/higher-order-functions-in-python/

Functions in Python are objects. This means we can pass them around just like any other argument!

Let's use a simple example. We're given a list of bonuses:

In [None]:
bonuses = [10000, 20000, 5000, 3000]

We can sort the list as follows:

In [None]:
bonuses.sort()

In [None]:
bonuses

What happens if our list becomes more complicated?

In [None]:
bonuses = [
{"position": "Data Engineer", "bonus": 10000},
{"position": "Data Analyst", "bonus": 20000},
{"position": "Manager", "bonus": 5000},
{"position": "Intern", "bonus": 3000},
]

In [None]:
bonuses.sort()

The `sort` method accepts an additional parameter, `key`. This allows you to pass a function to be used for sorting:

In [None]:
def bonus_sort_key(bonus):
  return bonus["bonus"]

In [None]:
bonuses.sort(key=bonus_sort_key)

In [None]:
bonuses

Exercise

Sort the items in the bonus list by job title.

In [None]:
# PLACEHOLDER FOR SOLUTION

In [None]:
# SOLUTION
def position_sort_key(bonus):
  return bonus["position"]

bonuses.sort(key=position_sort_key)

bonuses

# Decorators

Docs:
1. https://realpython.com/primer-on-python-decorators/
2. https://docs.python.org/3/glossary.html#term-decorator


Let's assume we want our functions to have common properties. However, we don't want to duplicate code. A common pattern in such cases is the decorator.

Let's use an example. We want to measure the execution time of the `filter` and `average_age` functions:

In [None]:
data = [
{"name": "Jan", "surname": "Kowalski", "age": 30},
{"name": "Janina", "surname": "Nowak", "age": 25},
{"name": "Zofia", "surname": "Mickiewicz", "age": 17},
]

def filter(data):
  result = []
  for record in data:
    if record["age"] >= 18:
      result.append(record)
  return result

def average_age(data):
  sum = 0
  for record in data:
    sum += record["age"]
  return sum / len(data)

Without using decorators, our code would look like this:

In [None]:
from time import time

start_filter = time()
data = filter(data)
print(time() - start_filter)

start_average_age = time()
result = average_age(data)
print(time() - start_average_age)
print(result)

Let's fix it!

In [None]:
def measure_time(function):
    def wrapper(*args, **kwargs):
        start = time()
        result = function(*args, **kwargs)
        print(time() - start)
        return result
    return wrapper

We could call our function as in the example below:

In [None]:
measure_time(filter)(data)

Or create a new function to avoid duplicate code:

In [None]:
def measure_filter(data):
  return measure_time(filter)(data)

In [None]:
measure_filter(data)

But a more common approach is to decorate the function with `@decorator_name` as in the example below:

In [None]:
@measure_time
def filter(data):
  result = []
  for record in data:
    if record["age"] >= 18:
      result.append(record)
  return result

@measure_time
def average_age(data):
  sum = 0
  for record in data:
    sum += record["age"]
  return sum / len(data)

Sprawdźmy jak to działa:

In [None]:
data = [
{"name": "Jan", "surname": "Kowalski", "age": 30},
{"name": "Janina", "surname": "Nowak", "age": 25},
{"name": "Zofia", "surname": "Mickiewicz", "age": 17},
]

data = filter(data)
result = average_age(data)
print(result)

Example uses of decorators:
1. Caching
2. Login
3. Authentication
4. Function retries
5. Input argument validation

Exercise

Prepare two decorators that will be used to decorate the following function:

```
# no-op (no operation)
def noop(x):
return x
```

1. Adds `!!!` to the function's return value (at the end)
2. Adds `???` to the function's return value (at the end)

Next, try declaring several variants of the `noop` ​​function (using a different name each time):
1. With decorator #1
2. With decorator #2
3. With decorators #1 and #2 in different orders

Does the order of decoration matter?

In [None]:
# PLACEHOLDER FOR SOLUTION

In [None]:
# SOLUTION

def exclamation(fun):
    def wrapper(*args, **kwargs):
        return fun(*args, **kwargs) + "!!!"
    return wrapper

def question_mark(fun):
    def wrapper(*args, **kwargs):
        return fun(*args, **kwargs) + "???"
    return wrapper

@exclamation
def noop_1(x):
    return x

@question_mark
def noop_2(x):
    return x

@question_mark
@exclamation
def noop_3(x):
    return x

@exclamation
@question_mark
def noop_4(x):
    return x

print("noop_1", noop_1("Hello"))
print("noop_2", noop_2("Hello"))
print("noop_3", noop_3("Hello"))
print("noop_4", noop_4("Hello"))
