# Daily Blog #12 – Pandas Cheatsheet
### May 12, 2025

---

## 1. Setup & Import

```python
import pandas as pd
import numpy as np
```

---

## 2. Creating DataFrames

```python
# From dictionary
data = {'Name': ['Alice', 'Bob'], 'Age': [25, 30]}
df = pd.DataFrame(data)

# From list of lists
df = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 'B'])

# From CSV/Excel
df = pd.read_csv('file.csv')
df = pd.read_excel('file.xlsx')

# Preview
df.head()
df.tail()
df.sample(3)
```

---

## 3. Inspecting & Summarizing

```python
df.shape         # (rows, cols)
df.info()        # Column types & non-nulls
df.describe()    # Summary stats (numerical)
df.columns       # Index of columns
df.dtypes        # Data types
df.index         # Row index
df.nunique()     # Unique values per column
```

---

## 4. Selecting Data (Rows & Columns)

```python
df['Name']                   # Column as Series
df[['Name', 'Age']]          # Multiple columns
df.iloc[0]                   # First row (by position)
df.loc[0]                    # First row (by label)
df.loc[:, 'Name']            # All rows, Name column
df.loc[0:2, ['Name', 'Age']] # Row 0 to 2, selected cols
```

---

## 5. Filtering Rows

```python
df[df['Age'] > 25]
df[(df['Age'] > 20) & (df['Name'] == 'Alice')]
df.query('Age > 25 and Name == "Bob"')
```

---

## 6. Modifying Data

```python
df['Age'] = df['Age'] + 1            # Column-wise
df['Adult'] = df['Age'] > 18         # New boolean column
df['Category'] = df['Age'].apply(lambda x: 'Young' if x < 30 else 'Old')
```

---

## 7. Handling Missing Values

```python
df.isnull()                   # Boolean mask
df.isnull().sum()             # Count missing per column
df.dropna()                   # Drop rows with NaN
df.fillna(0)                  # Replace NaN with value
df.fillna(df.mean(numeric_only=True))  # Fill with column mean
```

---

## 8. Grouping & Aggregation

```python
df.groupby('Category')['Age'].mean()
df.groupby(['Category', 'Gender']).agg({'Age': ['mean', 'max']})
```

---

## 9. Sorting & Ordering

```python
df.sort_values('Age')                  # Ascending
df.sort_values('Age', ascending=False)
df.sort_values(by=['Category', 'Age'])
```

---

## 10. Merging & Joining

```python
pd.merge(df1, df2, on='ID')                     # Inner join
pd.merge(df1, df2, how='left', on='ID')         # Left join
pd.concat([df1, df2])                           # Row-wise concat
pd.concat([df1, df2], axis=1)                   # Column-wise concat
```

---

## 11. Renaming & Dropping

```python
df.rename(columns={'Name': 'FullName'}, inplace=True)
df.drop(columns=['Age'], inplace=True)
df.drop(index=0, inplace=True)
```

---

## 12. Working with Strings

```python
df['Name'].str.lower()
df['Name'].str.contains('bob', case=False)
df['Name'].str.replace('Alice', 'Alicia')
```

---

## 13. DateTime Operations

```python
df['Date'] = pd.to_datetime(df['Date'])
df['Year'] = df['Date'].dt.year
df['Month'] = df['Date'].dt.month
df['Weekday'] = df['Date'].dt.day_name()
```

---

## 14. Useful One-Liners

```python
df.value_counts()                   # Frequency of rows
df['col'].value_counts()            # Frequency of values
df.duplicated().sum()               # Count duplicates
df.drop_duplicates(inplace=True)    # Remove duplicates
```

---

## 15. Exporting Data

```python
df.to_csv('output.csv', index=False)
df.to_excel('output.xlsx', index=False)
```

---

## Common Errors to Avoid

| Mistake                                                     | Fix                                           |
| ----------------------------------------------------------- | --------------------------------------------- |
| `SettingWithCopyWarning`                                    | Use `.loc` explicitly when modifying          |
| Chained indexing (e.g., `df[df['Age'] > 20]['Name'] = ...`) | Avoid — use `.loc`                            |
| Data type mismatch                                          | Always check `df.dtypes` after reading a file |

---
## BONUS: Power Tools

* `pd.pivot_table(df, index='Gender', columns='Category', values='Age', aggfunc='mean')`
* `df.melt()` – unpivot wide to long format
* `df.applymap()` – element-wise functions on DataFrame
* `df.explode()` – turn list-like column entries into rows
