# Functions in python
A recap on functions

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

## Commonly used Docstring formats:

functions developed usually follow either of the following 2 methods:

1. ### Google-style:

    **Description**: Short info of what the function does. <br>
    **Args**: describing on the type of parameters that need to be passed & if they're optional.<br>
    **Returns**: optional description of the returned value (whether its a sequenc, string, dataframe? etc.) <br>
    **Raises**: Info on error types that the function intentionally raises.<br>
    **Notes**: Additional notes or links (optional)

2. ### Numpydoc:

    **Description**: Short info of what the function does. <br>
    **Parameters**: Short info of what the function does. <br>
    **Returns**: Short info of what the function does. <br>


In [2]:
# An example fnction to split a dataframe into two halves based on columns
def split_stack(df, new_names):
    """
    Split a DataFrame into two column-halves and stack them vertically.

    Parameters
    ----------
    df : pandas.DataFrame
        Input DataFrame whose columns will be split at the midpoint.
    new_names : sequence
        Column names for the returned DataFrame. Should match the number of columns
        in each half after splitting (typically len(df.columns)//2).

    Returns
    -------
    pandas.DataFrame
        A new DataFrame produced by vertically stacking the left half followed by
        the right half of the original DataFrame, with columns renamed to `new_names`.

    Notes
    -----
    - If df has an odd number of columns, the left half will contain floor(n/2)
      columns and the right half will contain the remainder.
    - The function stacks the underlying values (np.vstack) — index is reset in the result.
    """
    halfpoint = int(len(df.columns)/2)
    half1 = df.iloc[:,:halfpoint]
    half2 = df.iloc[:,halfpoint:]
    return pd.DataFrame(data=np.vstack([half1.values,half2.values]),
                        columns=new_names)

In [3]:
# Defining a sample dataframe
data = {
    'EmployeeID': [101, 102, 103, 104, 105, 106],
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Ethan', 'Fiona'],
    'Department': ['HR', 'Finance', 'IT', 'IT', 'Marketing', 'Finance'],
    'Salary': [55000, 62000, 75000, 72000, 58000, 63000],
}

# Convert to DataFrame
employees = pd.DataFrame(data)

In [4]:
len(employees.columns)

4

In [5]:
employees

Unnamed: 0,EmployeeID,Name,Department,Salary
0,101,Alice,HR,55000
1,102,Bob,Finance,62000
2,103,Charlie,IT,75000
3,104,Diana,IT,72000
4,105,Ethan,Marketing,58000
5,106,Fiona,Finance,63000


Using the func on sample dataframe

In [6]:
split_stack(employees,['df_h1','df_h2'])

Unnamed: 0,df_h1,df_h2
0,101,Alice
1,102,Bob
2,103,Charlie
3,104,Diana
4,105,Ethan
5,106,Fiona
6,HR,55000
7,Finance,62000
8,IT,75000
9,IT,72000


## Retrieving the docstrings from a function

### Using the `__doc__` attribute

In [7]:
print(split_stack.__doc__)


Split a DataFrame into two column-halves and stack them vertically.

Parameters
----------
df : pandas.DataFrame
    Input DataFrame whose columns will be split at the midpoint.
new_names : sequence
    Column names for the returned DataFrame. Should match the number of columns
    in each half after splitting (typically len(df.columns)//2).

Returns
-------
pandas.DataFrame
    A new DataFrame produced by vertically stacking the left half followed by
    the right half of the original DataFrame, with columns renamed to `new_names`.

Notes
-----
- If df has an odd number of columns, the left half will contain floor(n/2)
  columns and the right half will contain the remainder.
- The function stacks the underlying values (np.vstack) — index is reset in the result.



### Using the inspect module 

In [8]:
import inspect

In [9]:
print(inspect.getdoc(split_stack))

Split a DataFrame into two column-halves and stack them vertically.

Parameters
----------
df : pandas.DataFrame
    Input DataFrame whose columns will be split at the midpoint.
new_names : sequence
    Column names for the returned DataFrame. Should match the number of columns
    in each half after splitting (typically len(df.columns)//2).

Returns
-------
pandas.DataFrame
    A new DataFrame produced by vertically stacking the left half followed by
    the right half of the original DataFrame, with columns renamed to `new_names`.

Notes
-----
- If df has an odd number of columns, the left half will contain floor(n/2)
  columns and the right half will contain the remainder.
- The function stacks the underlying values (np.vstack) — index is reset in the result.


## DRY Concpet in programming functions in python
Introduced by Andrew Hunt and David Thomas in The Pragmatic Programmer, promotes the idea that every piece of knowledge should have a single, unambiguous, authoritative representation within a system.<br>
https://www.techtarget.com/whatis/definition/DRY-principle


## 'Do One Thing' concept
Also known as the Single Responsibility Principle (SRP), is a design guideline that says a function or a class should have only one responsibility or purpose. This principle makes code more maintainable, readable, and reusable by breaking down complex tasks into smaller, focused functions that do a single job and do it well.

For example, a function that loads and plots data should be split into a separate function for loading and another for plotting.

## Pass by assignment

# Context managers

Helps set up a context, runs the function, closes the context. Using the `open` context manager to open file for the current context.

**`with`**`<context-manager>(<args>)`**`as`**_`<var_name>`_`:`<br>
&ensp;&ensp;`#run code here - assign output to a variable`<br>
`#use output of above in further code `<br>

### Setup code for context manager:

`@contextlib.contextmanager`<br> - decorator
`def fuction_name():`<br> - function definition
&ensp;&ensp;`#run code here`_<br>
&ensp;&ensp;`yield`_`output`_<br>
&ensp;&ensp;`#teardown code`


In [10]:
import contextlib

@contextlib.contextmanager
def open_read_only(filename):
  """Open a file in read-only mode.

  Args:
    filename (str): The location of the file to read

  Yields:
    file object
  """
  read_only_file = open(filename, mode='r')
  # Yield read_only_file so it can be assigned to my_file
  yield read_only_file
  # Close read_only_file
  read_only_file.close()

with open_read_only('C:/Users/abhijeet.bhambere/Desktop/Resources/DC-ADS/exploratory_data_analysis/importing_data_in_python/seaslug.txt') as my_file:
  print(my_file.read())


Time	Percent
99	0.067
99	0.133
99	0.067
99	0
99	0
0	0.5
0	0.467
0	0.857
0	0.5
0	0.357
0	0.533
5	0.467
5	0.467
5	0.125
5	0.4
5	0.214
5	0.4
10	0.067
10	0.067
10	0.333
10	0.333
10	0.133
10	0.133
15	0.267
15	0.286
15	0.333
15	0.214
15	0
15	0
20	0.267
20	0.2
20	0.267
20	0.437
20	0.077
20	0.067
25	0.133
25	0.267
25	0.412
25	0
25	0.067
25	0.133
30	0
30	0.071
30	0
30	0.067
30	0.067
30	0.133


### Nested context managers

# Decorators
**Functions as Objects:** In python, functions are first-class objects, meaning they can be passed as arguments to other functions, returned from functions, and assigned to variables.



In [11]:
def my_decorator(func):
    def wrapper(*args, **kwargs):
        print("Something is happening before the function is called.")
        result = func(*args, **kwargs)
        print("Something is happening after the function is called.")
        return result
    return wrapper

In [12]:
@my_decorator
def say_hello(name):
    return f"Hello, {name}!"

In [13]:
say_hello("Alice")

Something is happening before the function is called.
Something is happening after the function is called.


'Hello, Alice!'

## Use of decorator functions in EDA:

**Example - _Handling missing data_ decorator use case**

When cleaning or analyzing datasets, we often repeat lines like `dropna()` / `fillna()` before every aggregation or transformation.

In a large notebook or script, this becomes repetitive, messy, and easy to forget — leading to errors like `ValueError`.

**_Scenario_** : We're trying to calculate some total salary component for given employee but some records have `NaN` values 

**Solution** - create a decorator that automatically handles missing values before any aggregating function is run. This will ensure that every aggregation function that runs with decorator now always receives a clean DataFrame.


In [14]:
# Sample dataset
data = {
    'Employee': ['Alice', 'Bob', 'Charlie', 'Diana', 'Ethan'],
    'Salary': [55000, 65000, 72000, None, 58000],
    'Department': ['HR', 'Finance', None, 'IT', 'Marketing'],
    'Bonus': [5000, 3000, None, 6000, None]
}
df = pd.DataFrame(data)
df

Unnamed: 0,Employee,Salary,Department,Bonus
0,Alice,55000.0,HR,5000.0
1,Bob,65000.0,Finance,3000.0
2,Charlie,72000.0,,
3,Diana,,IT,6000.0
4,Ethan,58000.0,Marketing,


In [15]:
df[df['Salary'].isna()]

Unnamed: 0,Employee,Salary,Department,Bonus
3,Diana,,IT,6000.0


Now ,say we want to compute the some total component (fixed + bonus) , but our data has `NaN` values.

**This is going to return an `IntCastingNaNError`**

In [16]:
def total_compensation(df):
    # This will fail because Bonus has NaN → float * NoneType
    return (df['Salary'] + df['Bonus']).astype(int)

In [17]:
# total_compensation(df)

# Running above func gives IntCastingNaNError: Cannot convert non-finite values (NA or inf) to integer

In [18]:
# Define decorator
def handle_na(func):
    """Decorator to drop NaN values before executing a function."""
    def wrapper(df, *args, **kwargs):
        print(f"Cleaning data for {func.__name__}() ...")
        df = df.dropna()      # Drop missing rows
        return func(df, *args, **kwargs)
    return wrapper


In [19]:
@handle_na
def total_compensation(df):
    return round((df['Salary'] + df['Bonus']),2)

No error now, because the decorator removed the NaN rows before running your logic.

In [20]:
# Now running the earlier func wth decorator
total_compensation(df)  

Cleaning data for total_compensation() ...


0    60000.0
1    68000.0
dtype: float64

# Functions as objects

In Python, functions are considered "**first-class objects**", meaning they're treated like any other data type, such as integers, strings, or lists. This concept empowers developers with significant flexibility and enables powerful programming paradigms.

This "first-class" status implies several key characteristics:
- Assignment to Variables: Functions can be assigned to variables, just like any other value. The variable then acts as a reference to the function and can be used to call it
- Passing as Arguments: Functions can be passed as arguments to other functions. This is a fundamental concept for higher-order functions and callback mechanisms.
- Returning from Functions: Functions can be returned as the result of another function's execution. This is crucial for creating closures and decorators.
- Storing in Data Structures: Functions can be stored within data structures like lists, dictionaries, or sets. This allows for dynamic selection and execution of functions.

In [21]:
# Assinging function to a variable
# Defining a function
def greet(name):
    return f"Hello, {name}!"

# Assigning to a variable
my_greeting_function = greet
print(my_greeting_function("Alice"))
# Checking the variable's data type
print(type(my_greeting_function))

Hello, Alice!
<class 'function'>


In [22]:
# Storing into a data structure - a list
list_of_funcs = [greet, print]

# Now we call an element of the above list as we would for a normal list
print(list_of_funcs[0]("how you doin'"))
print("\n")
list_of_funcs[1]("this is a sample statement")

Hello, how you doin'!


this is a sample statement


In [23]:
# Storing into another data structure - a dictionary
dict_of_funcs = {'func1':greet, 'func2':print}

# We can call any of the function stored by referencing to the key
dict_of_funcs['func1']("AB")

'Hello, AB!'

In [24]:
# A function can be passed as an argument to another function
def greet1(name):
    """Return a greeeting with name passed in the argument"""
    return f"Hello, {name}!"

# Defining another function that checks for docstrings
def has_docstring(func):
    """Checks if the passed function has a docstring or not,
    Args: 'func' is a function
    Returns: bool"""
    return func.__doc__ is not None


In [25]:
print(has_docstring(greet))
print(has_docstring(greet1))

False
True


In [26]:
# Nested functions - Resulting one function as the result of another function 
def main_func():
    x = [11,22,33,44]

    def nested_func(y):
        print(y + 10)
    for value in x:
        nested_func(value)

In [27]:
main_func()

21
32
43
54


## Scope
In Python, the preference of scope in a function is determined by the **LEGB rule**, which stands for Local, Enclosing, Global, and Built-in.

When a program references a variable, Python searches for its definition in a specific, ordered sequence of namespaces. The search stops at the first scope in which the variable is found. 

In [28]:
x = 7
y = 42

In [29]:
def foo():
    x=42
    print(x)
    print(y)

In the func, 'x' was first checked for local scope, since it was found within the function, the assigned value of x was returned(printed) as the output 

In [30]:
foo()

42
42


Note that x's global value remains unaffected

In [31]:
x

7

In [32]:
def foo():
    global x
    x = 42    
    print(x)
    print(y)

foo()

42
42


Now x's value has been changed from 7 to 42 since we called int in the function above & the assigned value was changed in the function. 

In [33]:
print(x)

42


In [34]:
x = 50

def one():
  x = 10

def two():
  global x
  x = 30

def three():
  x = 100
  print(x)

for func in [one, two, three]:
  func()
  print(x)

50
30
100
30


Because there are two separate prints: one inside `three()` and one after each func() in the loop. Scope explains the values:

`one()` defines a local x = 10 (no effect on global). After one() returns, print(x) prints the global x (50).

`two()` declares global x and sets x = 30 → global changes. After two(), print(x) prints 30.

`three()` defines a local x = 100 and calls `print(x)` — that prints 100 inside the function. `three()` does not change the global x, so after `three()` returns the loop’s `print(x)` still prints the global value (30). Note that three() contains a print statement within it & the iteration hence **executes two print statements** - one contained inside three() that returns local-scope value & another is the outside value that returns the global scope.

In [35]:
def mean(*x):
    """Returns the mean of all the numbers"""
    total_sum = 0
    n = len(x)
    for i in x:
        total_sum = total_sum + i
    return total_sum/n
print((mean(4, 5), mean(40, 45, 50)))

(4.5, 45.0)


# Decorators examples

In [36]:
def double_args(func):
    def wrapper_func(a,b):
        return func(a*2 , b*2)
    
    return wrapper_func

In [37]:
def product_ab(a,b):
    return a*b


In [38]:
@double_args
def product_ab1(a,b):
    return a*b

In [39]:
product_ab1(1,3)

12

Another method for using the decorator function:<br>
Here, we pass 'decorate' the `products_ab` func with `double_args` function to get the same result as using earlier syntax (@decorator one)

In [40]:
newfunc = double_args(product_ab)
newfunc(1,3)

12

The decorator print_before_and_after() defines a nested function `wrapper()` that calls whatever function gets passed to `print_before_and_after()`.

`wrapper()` adds a little something else to the function call by printing one message before the decorated function is called and another right afterwards.

Since `print_before_and_after()` returns the new `wrapper()` function, we can use it as a decorator to decorate the `multiply()` function.

In [41]:
def print_before_and_after(func):
  def wrapper(*args):
    print('Before {}'.format(func.__name__))
    # Call the function being decorated with *args
    func(*args)
    print('After {}'.format(func.__name__))
  # Return the nested function
  return wrapper

In [42]:
# Runnig the function with decorator
@print_before_and_after
def multiply(a, b):
  print(a * b)

multiply(5, 10)

Before multiply
50
After multiply


# More examples on decorators

**⏩A timer function as decorator**: Helpful to print how long a function took to execute

In [43]:
import time

def timer(func):
    """A decorator that prints how long a function took to run.
    Args: func(callable): the function being timed
    Returns: callable: the decorated function"""
    # Defining the wrapper function
    def wrapper(*args, **kwargs):
        # Get the current time when the func was called
        strt_time = time.time()
        result = func(*args, **kwargs)
        # Get the time it took to run the passed / decorated func
        total_time = time.time() - strt_time
        print(f"{func.__name__} took {total_time}s to run")
        return result
    return wrapper


In [44]:
# Using the decorator
@timer
def sleep_n(n):
    time.sleep(n)

In [45]:
sleep_n(5)

sleep_n took 5.000480651855469s to run


**⏩A memorize decorator funtion**: stores results of a decorated func for fast retrieval

In [46]:
def memorize(func):
    """Stores results of a decorated function for fast lookup"""
    # Store results in a dictionary that maps args to reuslts
    cache={}
    # Define wrapper function
    def wrapper(*args,**kwargs):
        kwargs_key = tuple(sorted(kwargs.items()))
        if (args, kwargs_key) not in cache:
            cache[(args, kwargs_key)] = func(*args,**kwargs)
        return cache[(args, kwargs_key)]
    return wrapper


When you call the same `func(*args, **kwargs)` again, it checks whether the same key exists — it does — so it returns the result directly from cache without running the function again.

In [47]:
@memorize
def slow_func(a,b):
    print('Sleeping...')
    time.sleep(5) #delay of 5s before computing
    return a + b

In [48]:
# Executing for first time
slow_func(2,3)

Sleeping...


5

In [49]:
# Executing 2nd time -- instant output -- no 5s delay
slow_func(2,3)

5

**⏩ Validate function output**: To ensure a function is returnig the output in the expected data type.

In [50]:
def print_return_type(func):
  # Define wrapper(), the decorated function
  def wrapper(*args, **kwargs):
    # Call the function being decorated
    result = func(*args,**kwargs)
    print('{}() returned type {}'.format(func.__name__, type(result)))
    return result
  # Return the decorated function
  return wrapper

In [51]:
@print_return_type
def foo(value):
  return value

In [52]:
print(foo(42))
print(foo([1, 2, 3]))

foo() returned type <class 'int'>
42
foo() returned type <class 'list'>
[1, 2, 3]


# Decorators with arguments
## Decorator factory
This approach allows user to pass arguments tothe decorator while wrapping a function with it.

**EXAMPLE:** below is a decorator func defined to execute any func it wraps for 3 times

In [53]:
# Defining a dcorator func
def run_3times(func):
    def wrapper(*args,**kwargs):
        for i in range(3):
            func(*args,**kwargs)
    return wrapper

# defining a sample func & Wrapping it with run_3times decorator
@run_3times
def print_sum(a,b):
    print(a+b)

In [54]:
print_sum(2,7)

9
9
9


Now, suppose we want to have the user control the no. of times the func can be repeated.

In this scenario, we'll define a fun that 'returns' a decorator instead of simply defining a  decorator func. This means nesting the  decorator func within another func that accepts user argument.

This means creating a high-level function that will accept arguments for the decorator func nested within it.

In [55]:
# defining a func that accepts args for the decorator func
def run_ntimes(n):
    def decorator(func):
        def wrapper(*args,**kwargs):
            for i in range(n):
                func(*args,**kwargs)
        return wrapper
    return decorator
   
# defining a sample func & Wrapping it with run_ntimes decorator
@run_ntimes(5)
def print_sum_new(a,b):
    print(a+b)

In [56]:
print_sum_new(2,7)

9
9
9
9
9


In [57]:
@run_ntimes(3)
def print_sum_new1(a,b):
    print(a+b)

print_sum_new1(3,7)

10
10
10


### Real-world application:

In [58]:
import time
import pandas as pd

# 1️⃣ Create sample data
data = {
    'age': [25, 30, 22, 35, 28, None, 40],
    'salary': [50000, 60000, 45000, 80000, 70000, 65000, None],
    'department': ['HR', 'Finance', 'IT', 'IT', 'HR', 'Finance', 'IT']
}
df = pd.DataFrame(data)

# 2️⃣ Decorator with argument
def timer(enabled=True):
    def decorator(func):
        def wrapper(*args, **kwargs):
            if enabled:
                start = time.time()
                result = func(*args, **kwargs)
                end = time.time()
                print(f"⏱️ {func.__name__} executed in {end - start:.4f} seconds")
                return result
            else:
                return func(*args, **kwargs)
        return wrapper
    return decorator

# 3️⃣ Example EDA functions
@timer(enabled=True)
def calculate_mean(df, column):
    """Calculate mean of a given column"""
    return df[column].mean()

@timer(enabled=False)   # Will skip timing
def get_unique_departments(df):
    return df['department'].unique()

# 4️⃣ Run functions
print("Average Age:", calculate_mean(df, 'age'))
print("Unique Departments:", get_unique_departments(df))


⏱️ calculate_mean executed in 0.0002 seconds
Average Age: 30.0
Unique Departments: ['HR' 'Finance' 'IT']


**EXAMPLE:** a decorator with arguments that automatically handles missing data before running an analysis.

In [59]:
import pandas as pd

# 1️⃣ Create a sample DataFrame
data = {
    'age': [25, 30, None, 35, 40, None],
    'salary': [50000, None, 45000, 80000, 70000, None],
    'department': ['HR', 'Finance', 'IT', 'IT', None, 'Finance']
}
df = pd.DataFrame(data)

# 2️⃣ Decorator factory with argument
def handle_missing(dropna=False, fill_with=None):
    """
    Decorator to handle missing values before running the function.
    dropna=True → drops missing rows
    fill_with=value → fills missing values with given value
    """
    def decorator(func):
        def wrapper(df, *args, **kwargs):
            df_copy = df.copy()
            if dropna:
                df_copy = df_copy.dropna()
                print("✅ Dropped missing rows before running function.")
            elif fill_with is not None:
                df_copy = df_copy.fillna(fill_with)
                print(f"✅ Filled missing values with {fill_with}.")
            else:
                print("⚠️ Missing values not handled.")
            return func(df_copy, *args, **kwargs)
        return wrapper
    return decorator

# 3️⃣ Functions using the decorator

@handle_missing(dropna=True)
def average_salary(df):
    return df['salary'].mean()

@handle_missing(fill_with=0)
def total_age(df):
    return df['age'].sum()

@handle_missing()  # No handling
def dept_count(df):
    return df['department'].value_counts()

# 4️⃣ Run them
print("Average Salary:", average_salary(df))
print("Total Age:", total_age(df))
print("Department Counts:\n", dept_count(df))


✅ Dropped missing rows before running function.
Average Salary: 65000.0
✅ Filled missing values with 0.
Total Age: 130.0
⚠️ Missing values not handled.
Department Counts:
 department
Finance    2
IT         2
HR         1
Name: count, dtype: int64


In [77]:

[i for i in range(5) if i >2]

[3, 4]

In [78]:
import random
random.seed(2427)
def efc(n):
    x = [random.random() for _ in range(n)]
    return x

In [79]:
efc(20)

[0.6709313964859867,
 0.9738115340563225,
 0.6064264401595373,
 0.4066259803173813,
 0.241007454546324,
 0.9570332484250537,
 0.2349020347673353,
 0.8876755137054037,
 0.9720131163571095,
 0.1492980772443857,
 0.9414046155591562,
 0.5597323750561738,
 0.7608989127589141,
 0.5249642801198838,
 0.1344891272249087,
 0.9796560039964438,
 0.06863221669260322,
 0.8766064411366202,
 0.5504489926930571,
 0.4880661379761838]