# 📝 Exploratory Data Analysis (EDA) Cheat Sheet

Exploratory Data Analysis (EDA) is a crucial step in understanding a dataset. This cheat sheet provides commonly used **pandas** and **NumPy** functions for EDA.

---
## 📂 Loading Data
```python
import pandas as pd
import numpy as np

# Load CSV file
df = pd.read_csv('data.csv')

# Load Excel file
df = pd.read_excel('data.xlsx')

# Load JSON file
df = pd.read_json('data.json')
```

---
## 📌 Basic Information About Data
```python
# Display the first 5 rows
df.head()

# Display the last 5 rows
df.tail()

# Display shape (rows, columns)
df.shape

# Display column names
df.columns

# Display index
df.index

# Data types of each column
df.dtypes

# General info
df.info()
```

---
## 🔍 Summary Statistics
```python
# Summary of numerical columns
df.describe()

# Summary of categorical columns
df.describe(include=['O'])

# Count of unique values in each column
df.nunique()

# Get unique values in a column
df['column_name'].unique()
```

---
## 🔄 Handling Missing Values
```python
# Check for missing values
df.isna().sum()

# Drop rows with missing values
df.dropna(inplace=True)

# Fill missing values with a specific value
df.fillna(value=0, inplace=True)

# Fill missing values with the column mean
df['column_name'].fillna(df['column_name'].mean(), inplace=True)
```

---
## 🔀 Handling Duplicates
```python
# Check for duplicates
df.duplicated().sum()

# Remove duplicate rows
df.drop_duplicates(inplace=True)
```

---
## 🔢 Value Counts & Frequency
```python
# Count occurrences of unique values in a column
df['column_name'].value_counts()

# Normalize (show percentage)
df['column_name'].value_counts(normalize=True)
```

---
## 📊 Correlation & Covariance
```python
# Compute pairwise correlation between columns
df.corr()

# Compute pairwise covariance
df.cov()
```

---
## 🎭 Detecting Outliers
```python
# Identify outliers using IQR
Q1 = df['column_name'].quantile(0.25)
Q3 = df['column_name'].quantile(0.75)
IQR = Q3 - Q1
outliers = df[(df['column_name'] < (Q1 - 1.5 * IQR)) | (df['column_name'] > (Q3 + 1.5 * IQR))]
```

---
## 🔄 Data Transformation
```python
# Convert to lowercase
df['column_name'] = df['column_name'].str.lower()

# Apply a function to a column
df['column_name'] = df['column_name'].apply(lambda x: x*2)

# Rename columns
df.rename(columns={'old_name': 'new_name'}, inplace=True)
```

---
## 📊 Basic Visualizations
```python
import matplotlib.pyplot as plt
import seaborn as sns

# Histogram
df['column_name'].hist()
plt.show()

# Boxplot
sns.boxplot(x=df['column_name'])
plt.show()

# Scatter plot
plt.scatter(df['column_x'], df['column_y'])
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.show()
```

---
## 🔍 Grouping & Aggregation
```python
# Group by and aggregate
df.groupby('category_column')['value_column'].mean()

# Multiple aggregations
df.groupby('category_column').agg({'value_column': ['mean', 'sum', 'count']})
```

---
## 🔎 Filtering Data
```python
# Filter rows based on a condition
df_filtered = df[df['column_name'] > 50]

# Filter rows with multiple conditions
df_filtered = df[(df['column1'] > 50) & (df['column2'] == 'A')]
```

---
## 🔀 Sorting Data
```python
# Sort by a column in ascending order
df.sort_values(by='column_name', ascending=True)

# Sort by multiple columns
df.sort_values(by=['column1', 'column2'], ascending=[True, False])
```

---
## 🔁 Pivot Tables
```python
# Create a pivot table
df.pivot_table(values='value_column', index='category_column', aggfunc='mean')
```

---
## 🚀 Ready to Explore!
This cheat sheet provides a **quick reference** for EDA techniques in Python. Happy analyzing! 😊
