## **Understanding your data** | [👆](https://github.com/AdilShamim8/50-Days-of-Machine-Learning/tree/main/Day%2004%20Understanding%20Your%20Data%20-%20Descriptive%20Stats)

## **Univariate Analysis** | [👆](https://github.com/AdilShamim8/50-Days-of-Machine-Learning/tree/main/Day%2005%20Univariate%20Analysis) 

## **Bivariate Analysis** | [👆](https://github.com/AdilShamim8/50-Days-of-Machine-Learning/tree/main/Day%2006%20Bivariate%20Analysis)

## **Pandas Profiling** | [👆](https://github.com/AdilShamim8/50-Days-of-Machine-Learning/tree/main/Day%2007%20Pandas%20Profiling)
---------------------------------------------------------------------------------------------------------------------------------

## *EDA on Health and Sleep Data* | [👆](https://www.kaggle.com/code/adilshamim8/eda-on-health-and-sleep-data)

## *EDA on Titanic Dataset* | [👆](https://www.kaggle.com/code/adilshamim8/eda-on-titanic-dataset)

## *EDA on House Prices Dataset* | [👆](https://www.kaggle.com/code/pmarcelino/comprehensive-data-exploration-with-python)

## *EDA on Heart Disease Dataset* | [👆](https://www.kaggle.com/code/kralmachine/analyzing-the-heart-disease) 

## *Olympic Data Set Analysis* | [👆](https://www.kaggle.com/code/adilshamim8/olympic-data-set-analysis) 

---------------------------------------------------------------------------------------------------------------------------------

# 📊 Exploratory Data Analysis (EDA) Notes

## 📌 What is EDA?
Exploratory Data Analysis (EDA) is the process of analyzing and summarizing datasets to understand their main characteristics using **visualization** and **statistical methods**.

### 🎯 Why is EDA Important?
- Helps understand data distribution and patterns.
- Identifies missing values, duplicates, and outliers.
- Finds relationships between features for better modeling.
- Ensures data is clean before applying Machine Learning models.

---

## 🛠️ Steps of EDA

### 1️⃣ Load the Data
First, we load the dataset using `pandas`:
```python
import pandas as pd

df = pd.read_csv("data.csv")  # Load dataset
df.head()  # View first 5 rows
```
### 2️⃣ Check Data Structure
Understanding the basic structure:
```python
df.info()  # Overview of dataset
df.describe()  # Summary statistics
df.shape  # Rows and columns
df.columns  # Column names
```
### 3️⃣ Handling Missing Values
Check for missing data:
```python
df.isnull().sum()  # Count missing values in each column
# Fill or drop missing values:
df.fillna(df.mean(), inplace=True)  # Fill missing values with mean
df.dropna(inplace=True)  # Remove missing values
```
### 4️⃣ Data Types & Duplicates
Check for data types and duplicates:
```python
df.dtypes  # Check data types
df.duplicated().sum()  # Find duplicate rows
df.drop_duplicates(inplace=True)  # Remove duplicates
```
### 5️⃣ Data Distribution
Checking how numerical data is distributed:
```python
import matplotlib.pyplot as plt
import seaborn as sns

df.hist(figsize=(10, 6))  # Histogram for numerical features
plt.show()

sns.boxplot(x=df['column_name'])  # Boxplot to detect outliers
```
### 6️⃣ Correlation Analysis
Finding relationships between numerical variables:
```python
df.corr()  # Compute correlation matrix
sns.heatmap(df.corr(), annot=True, cmap="coolwarm")  # Visualize correlations
plt.show()
```
### 7️⃣ Categorical Data Analysis
Understanding categorical features:
```python
df['category_column'].value_counts().plot(kind='bar')  # Count plot for categories
```
### 8️⃣ Outlier Detection
Using boxplots and IQR:
```python
Q1 = df['column_name'].quantile(0.25)
Q3 = df['column_name'].quantile(0.75)
IQR = Q3 - Q1
df_no_outliers = df[(df['column_name'] >= Q1 - 1.5 * IQR) & (df['column_name'] <= Q3 + 1.5 * IQR)]
```
### 9️⃣ Feature Engineering (Optional)
Creating new meaningful features:
```python
df['new_feature'] = df['existing_feature'] ** 2  # Example: Squaring a feature
```