# Stage 03: Python Fundamentals

**Date:** Aug 14 (Thursday) (C)

In this notebook, we explore core Python data structures, NumPy, pandas, and reusable functions.

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime

## Warm-Up: Function and Decorator

In [4]:
def calc_mean_std(lst):
    arr = np.array(lst)
    return arr.mean(), arr.std()

def log_call(func):
    def wrapper(*args, **kwargs):
        print(f"Function {func.__name__} called at {datetime.now()}")
        return func(*args, **kwargs)
    return wrapper

@log_call
def calc_mean_std_logged(lst):
    return calc_mean_std(lst)

calc_mean_std_logged([1, 2, 3, 4, 5])

Function calc_mean_std_logged called at 2025-08-27 15:54:55.877878


(np.float64(3.0), np.float64(1.4142135623730951))

## Data Structures Comparison

In [5]:
lst = [1, 2, 3]
arr = np.array(lst)
ser = pd.Series(lst)
df = pd.DataFrame({'col1': lst, 'col2': [4,5,6]})

print(type(lst), type(arr), type(ser), type(df))

<class 'list'> <class 'numpy.ndarray'> <class 'pandas.core.series.Series'> <class 'pandas.core.frame.DataFrame'>


## Vectorization vs Loops

In [6]:
big_array = np.arange(1_000_000)

%timeit [x * 2 for x in big_array]
%timeit big_array * 2

61.6 ms ± 1.54 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
1.98 ms ± 50 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


## Pandas Basics

In [7]:
df = pd.read_csv('data/starter_data.csv')
df.head(), df.info(), df.describe()

FileNotFoundError: [Errno 2] No such file or directory: 'data/starter_data.csv'

## Groupby & Aggregation

In [None]:
summary = df.groupby('category').mean(numeric_only=True).reset_index()
summary.to_csv('data/processed/summary.csv', index=False)
summary

## Plotting

In [None]:
df['value'].hist()
plt.savefig('data/processed/histogram.png')