| Category            | Function / Method | Description                |
| ------------------- | ----------------- | -------------------------- |
| **Creation**        | `pd.DataFrame()`  | Create a DataFrame         |
|                     | `pd.Series()`     | Create a Series            |
|                     | `pd.read_csv()`   | Read CSV file              |
|                     | `pd.read_excel()` | Read Excel file            |
|                     | `pd.concat()`     | Concatenate DataFrames     |
|                     | `pd.merge()`      | Merge / join DataFrames    |
| **Info / Overview** | `.head()`         | Show first n rows          |
|                     | `.tail()`         | Show last n rows           |
|                     | `.info()`         | Summary of columns & types |
|                     | `.shape`          | (rows, cols)               |
|                     | `.columns`        | Column names               |
|                     | `.index`          | Row indices                |
|                     | `.dtypes`         | Data types per column      |

Data Selection and Filtering 

| Category      | Function / Method   | Description                |
| ------------- | ------------------- | -------------------------- |
| **Indexing**  | `.iloc[]`           | Integer-based selection    |
|               | `.loc[]`            | Label-based selection      |
|               | `.at[]`             | Fast scalar label access   |
|               | `.iat[]`            | Fast scalar integer access |
| **Filtering** | `df[df['col'] > x]` | Boolean filtering          |
|               | `.query()`          | SQL-like queries           |

Modification / Cleaning

| Category         | Function / Method      | Description                 |
| ---------------- | ---------------------- | --------------------------- |
| **Columns**      | `.rename()`            | Rename columns              |
|                  | `.drop()`              | Drop rows or columns        |
|                  | `.insert()`            | Add column at position      |
|                  | `.assign()`            | Add new columns dynamically |
| **Missing Data** | `.isna()` / `.notna()` | Check missing values        |
|                  | `.fillna()`            | Replace missing values      |
|                  | `.dropna()`            | Remove missing values       |
| **Duplicates**   | `.duplicated()`        | Find duplicate rows         |
|                  | `.drop_duplicates()`   | Remove duplicates           |

Aggregation and Stats

| Category             | Function / Method   | Description                               |
| -------------------- | ------------------- | ----------------------------------------- |
| **Summary**          | `.describe()`       | Summary stats for numeric columns         |
|                      | `.count()`          | Count non-null values                     |
|                      | `.sum()`            | Sum per column/row                        |
|                      | `.mean()`           | Mean                                      |
|                      | `.median()`         | Median                                    |
|                      | `.min()` / `.max()` | Min / Max                                 |
|                      | `.std()` / `.var()` | Std deviation / variance                  |
| **Group Operations** | `.groupby()`        | Group data by one or more keys            |
|                      | `.agg()`            | Aggregate (mean, sum, etc.) after groupby |
|                      | `.apply()`          | Apply custom function                     |
|                      | `.transform()`      | Transform values within groups            |

Merging

| Category           | Function / Method         | Description                   |
| ------------------ | ------------------------- | ----------------------------- |
| **Combining Data** | `pd.merge()`              | SQL-like joins                |
|                    | `pd.concat()`             | Stack vertically/horizontally |
|                    | `.join()`                 | Merge on index                |
| **Reshaping**      | `.pivot()`                | Create pivot table            |
|                    | `.melt()`                 | Unpivot columns into rows     |
|                    | `.stack()` / `.unstack()` | Reshape hierarchical indexes  |

Maths

| Category           | Function / Method  | Description                |
| ------------------ | ------------------ | -------------------------- |
| **Apply**          | `.apply()`         | Apply function across axis |
|                    | `.applymap()`      | Apply function elementwise |
|                    | `.map()`           | Apply function to Series   |
| **Vectorized Ops** | `+`, `-`, `*`, `/` | Elementwise arithmetic     |
|                    | `.clip()`          | Limit values               |
|                    | `.round()`         | Round decimals             |

Time Series

| Category       | Function / Method  | Description                 |
| -------------- | ------------------ | --------------------------- |
| **Datetime**   | `pd.to_datetime()` | Convert to datetime         |
|                | `.dt` accessor     | Extract year, month, etc.   |
| **Resampling** | `.resample()`      | Group by time intervals     |
|                | `.rolling()`       | Rolling window calculations |
|                | `.shift()`         | Shift index or values       |

I/O

| Category         | Function / Method | Description    |
| ---------------- | ----------------- | -------------- |
| **Read / Write** | `.to_csv()`       | Write to CSV   |
|                  | `.to_excel()`     | Write to Excel |
|                  | `.to_json()`      | Write to JSON  |
|                  | `.read_sql()`     | Read from SQL  |
|                  | `.to_sql()`       | Write to SQL   |

Sorting Converting and Corr

| Category            | Function / Method | Description          |
| ------------------- | ----------------- | -------------------- |
| **Sorting**         | `.sort_values()`  | Sort by values       |
|                     | `.sort_index()`   | Sort by index        |
| **Type Conversion** | `.astype()`       | Convert column types |
| **Categoricals**    | `.unique()`       | Unique values        |
|                     | `.value_counts()` | Frequency count      |
| **Correlation**     | `.corr()`         | Correlation matrix   |
|                     | `.cov()`          | Covariance           |



### Recreate Pandas

In [4]:

# Recreate the pd.DataFrame structure without using pandas

data = {
    "name": ["Alice", "Bob", "Charlie", "Diana"],
    "age": [25, 30, 35, 28],
    "city": ["London", "Paris", "Berlin", "London"]
}

columns = list(data.keys())
rows = list(data.values())[0].__len__()

for i in range(rows):
    print(f"{data['name'][i]:<10} {data['age'][i]:<5} {data['city'][i]:<10}")
# A simple representation of tabular data without using pandas

Alice      25    London    
Bob        30    Paris     
Charlie    35    Berlin    
Diana      28    London    


In [5]:
# Display first and last n rows (head and tail) --> remember the dataframe is a hash map of columns to lists

n = 2
for i in range(n):
    print({col: data[col][i] for col in columns})  # head

for i in range(rows - n, rows):
    print({col: data[col][i] for col in columns})  # tail

{'name': 'Alice', 'age': 25, 'city': 'London'}
{'name': 'Bob', 'age': 30, 'city': 'Paris'}
{'name': 'Charlie', 'age': 35, 'city': 'Berlin'}
{'name': 'Diana', 'age': 28, 'city': 'London'}


In [6]:
# shape and columns 

print("Shape:", (rows, len(columns)))
print("Columns:", columns)
print("Number of rows:", rows)

Shape: (4, 3)
Columns: ['name', 'age', 'city']
Number of rows: 4


In [8]:
import pandas as pd

df = pd.DataFrame({
    "name": ["Alice", "Bob", "Charlie", "Diana"],
    "age": [25, 30, 35, 28],
    "city": ["London", "Paris", "Berlin", "London"]
})

print(df.iloc[2])
print(df[df['city'] == 'London'])

name    Charlie
age          35
city     Berlin
Name: 2, dtype: object
    name  age    city
0  Alice   25  London
3  Diana   28  London


In [None]:
# Let us recreate iloc and boolean indexing without pandas

for i in range(rows):
    if i == 2:
        print({col: data[col][i] for col in columns})  # iloc[2]


{'name': 'Charlie', 'age': 35, 'city': 'Berlin'}
{'name': 'Alice', 'age': 25, 'city': 'London'}
{'name': 'Diana', 'age': 28, 'city': 'London'}


In [None]:
# Let us recreate loc (boolean indexing) without pandas

for j in range(rows):
    if data["city"][j] == "London":
        print({col: data[col][j] for col in columns})


{'name': 'Alice', 'age': 25, 'city': 'London'}
{'name': 'Diana', 'age': 28, 'city': 'London'}


In [18]:
# lets now apply filtering in the data

for i in range(rows):
    if data['age'][i] > 28:
        filtered = {col: data[col][i] for col in columns}
        print(filtered)


{'name': 'Bob', 'age': 30, 'city': 'Paris'}
{'name': 'Charlie', 'age': 35, 'city': 'Berlin'}


In [21]:
# let us recreative simple statistics like the describe function

ages = data['age']

cities = data['city']

sum_cities = len(cities)

mean_age  = sum(ages)/len(ages)

min_age = min(ages)

max_age = max(ages)

count_age = len(ages)

print(
    {
    'count': count_age,
    'sum': sum_cities,
    'mean': mean_age,
    'min': min_age,
    'max': max_age
    }
)

{'count': 4, 'sum': 4, 'mean': 29.5, 'min': 25, 'max': 35}


In [22]:
print(len(data["city"]))        # total = 4
print(len(set(data["city"])))   # unique = 3
print(data["city"].count("London"))  # London count = 2

4
3
2


In [23]:
# add and drop columns

data["salary"] = [50000, 60000, 65000, 58000]

del data["city"]

In [30]:
# Merge (join on key)

data1 = [
    {"id": 1, "name": "Alice"},
    {"id": 2, "name": "Bob"},
    {"id": 3, "name": "Charlie"}
]

data2 = [
    {"id": 2, "age": 24},
    {"id": 3, "age": 30},
    {"id": 4, "age": 22}
]
  

merged = []
for i in data1:
    for j in data2:
        if i['id'] == j['id']:
            merged.append({**i, **j})
print(merged)

[{'id': 2, 'name': 'Bob', 'age': 24}, {'id': 3, 'name': 'Charlie', 'age': 30}]


In [38]:
# groupby and mean recreation 

from collections import defaultdict

data = [
    {"city": "London", "age": 25},
    {"city": "Paris", "age": 30},
    {"city": "London", "age": 35}
]

groupedby_data = defaultdict(list)

for i in data:
    groupedby_data[i["city"]].append(i["age"])

print(groupedby_data)

for city, ages in groupedby_data.items():
    print(city, sum(ages)/len(ages))


defaultdict(<class 'list'>, {'London': [25, 35], 'Paris': [30]})
London 30.0
Paris 30.0
