# ⚡ **PANDAS: Full Brain Upload**

Let's dump pure **theory** — no fluff, no implementation — for both **Pandas** and **Matplotlib**, in that order.

---

# 🧠 PANDAS — Python Data Analysis Library

### 🔷 What is Pandas?
- **Pandas = Panel Data + Python**
- High-level data manipulation and analysis library.
- Built on **NumPy**, optimized for **tabular, labeled data** (think: Excel + SQL + NumPy).

---

## 🔹 Core Data Structures

### 1. **Series**  
- 1D labeled array (like a column).  
- Has both **values** and **index**.

### 2. **DataFrame**  
- 2D labeled data structure (like a spreadsheet).  
- Rows = observations, Columns = features.  
- Flexible, heterogeneous (different types per column), fast.

---

## 🔹 Indexing & Selection

- Custom **row/column labels** (index/columns).
- Supports:
  - Label-based (`.loc`)
  - Integer-based (`.iloc`)
  - Boolean filtering
  - Condition-based masking

---

## 🔹 Data Operations

- **Merge / Join / Concat**: SQL-style combining.
- **GroupBy**: Split-Apply-Combine strategy for grouped computations.
- **Pivot / Melt**: Reshaping for tidy data.
- **Apply / Map / Transform**: Functional programming on rows/columns.
- **Missing Data Handling**: `NaN` support, fill, drop, interpolate.

---

## 🔹 Input / Output

- Reads/Writes:
  - CSV
  - Excel
  - JSON
  - SQL
  - Parquet
  - HDF5
- Seamless transition between **raw data and analysis-ready structure**.

---

## 🔹 Time Series Handling

- DateTime index, frequency conversion, rolling window functions.
- Resampling (downsampling/upsampling).
- Useful for finance, IoT, logs.

---

## 🔹 Advantages

| Feature              | Benefit                                |
|----------------------|-----------------------------------------|
| Labeled indexing     | Easy tracking of data context           |
| Data alignment       | Auto-matches on indexes                 |
| Heterogeneous types  | Mix of numeric, categorical, datetime   |
| High-level functions | Simplifies complex operations           |
| Interoperability     | Works with NumPy, Matplotlib, Scikit    |

---

### 🧠 Summary

| Library     | Role                              | Analogy                |
|-------------|-----------------------------------|------------------------|
| NumPy       | Numerical computing               | Calculator             |
| Pandas      | Data wrangling & manipulation     | Excel + SQL            |
| Matplotlib  | Plotting & visual representation  | Drawing board          |

---



## ✅ 1. Importing
```python
import pandas as pd
```

---

## ✅ 2. Core Data Structures

### 🔹 Series
```python
s = pd.Series([10, 20, 30], index=['a', 'b', 'c'])
```

### 🔹 DataFrame
```python
data = {'Name': ['Alice', 'Bob'], 'Age': [25, 30]}
df = pd.DataFrame(data)
```

---

## ✅ 3. Reading/Writing Data

```python
pd.read_csv('file.csv')
df.to_csv('file.csv', index=False)

pd.read_excel('file.xlsx')
df.to_excel('file.xlsx', index=False)

pd.read_json('file.json')
pd.read_sql(query, connection)
```

---

## ✅ 4. Inspecting Data

```python
df.head(5)
df.tail(3)
df.info()
df.describe()
df.shape
df.columns
df.index
df.dtypes
```

---

## ✅ 5. Selecting Data

```python
df['column']                  # Series
df[['col1', 'col2']]          # DataFrame

df.iloc[0]                    # Row by position
df.loc[0]                     # Row by label/index
df.loc[:, 'col1']             # All rows, one col
df.loc[df['Age'] > 25]        # Filtering rows

df.at[0, 'col']               # Fast scalar access
df.iat[1, 2]                  # Fast scalar access by position
```

---

## ✅ 6. Boolean Indexing & Filtering

```python
df[df['Age'] > 30]
df[(df['Age'] > 20) & (df['Gender'] == 'F')]
df[~df['col'].isnull()]
```

---

## ✅ 7. Sorting

```python
df.sort_values('Age')
df.sort_values(['Age', 'Salary'], ascending=[True, False])
df.sort_index()
```

---

## ✅ 8. Modifying Data

```python
df['Age'] = df['Age'] + 1
df['NewCol'] = df['A'] * df['B']
df.rename(columns={'old': 'new'}, inplace=True)
df.drop('col', axis=1)
df.drop(index=3)
```

---

## ✅ 9. Missing Data

```python
df.isnull()
df.notnull()

df.dropna()                  # Drop rows with NA
df.fillna(0)                 # Replace NA with 0
df.fillna(method='ffill')   # Forward fill
```

---

## ✅ 10. GroupBy Operations

```python
df.groupby('col').mean()
df.groupby(['col1', 'col2']).agg({'A': 'sum', 'B': 'mean'})
```

---

## ✅ 11. Aggregations

```python
df.sum(), df.mean(), df.min(), df.max()
df.count(), df.std(), df.median()
```

---

## ✅ 12. Value Counts & Unique

```python
df['col'].value_counts()
df['col'].unique()
df['col'].nunique()
```

---

## ✅ 13. Apply & Lambda

```python
df['col'].apply(lambda x: x**2)
df.apply(np.sum, axis=0)
```

---

## ✅ 14. Merging, Joining, Concatenation

```python
pd.concat([df1, df2])                          # Stack vertically
pd.concat([df1, df2], axis=1)                  # Stack side-by-side

pd.merge(df1, df2, on='key')                   # Inner join
pd.merge(df1, df2, how='left', on='key')       # Left join
```

---

## ✅ 15. Pivot Table & Crosstab

```python
df.pivot_table(index='Gender', columns='Dept', values='Salary', aggfunc='mean')

pd.crosstab(df['Dept'], df['Gender'])
```

---

## ✅ 16. Window Functions

```python
df['rolling_avg'] = df['col'].rolling(window=3).mean()
df['exp_avg'] = df['col'].ewm(span=3).mean()
```

---

## ✅ 17. Time Series

```python
df['date'] = pd.to_datetime(df['date'])
df.set_index('date', inplace=True)
df.resample('M').sum()
df['2023-01']
```

---

## ✅ 18. String Operations

```python
df['col'].str.lower()
df['col'].str.contains('pattern')
df['col'].str.replace('a', 'b')
df['col'].str.extract(r'(\d+)')
```

---

## ✅ 19. Duplicates

```python
df.duplicated()
df.drop_duplicates()
```

---

## ✅ 20. Export Clean Dataset

```python
df.to_csv('clean.csv', index=False)
```

---

## ✅ 21. Data Types Conversion

```python
df['col'] = df['col'].astype(int)
df['date'] = pd.to_datetime(df['date'])
```

---

## ✅ 22. Memory Optimization

```python
df.info(memory_usage='deep')
df['col'] = df['col'].astype('category')
```

---

## ✅ 23. Chaining Best Practices

```python
# BAD
df[df['col'] > 0]['another'] = 1

# GOOD
df.loc[df['col'] > 0, 'another'] = 1
```

---
