# Auto-mpg Dataset EDA

This notebook performs Exploratory Data Analysis (EDA) on the Auto-mpg dataset.

## Steps:
1. Load the dataset
2. Identify missing values
3. Estimate skewness and kurtosis
4. Correlation heatmap
5. Scatter plots for different parameters
6. Replace categorical values with numerical values
7. Summary of findings


In [ ]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import skew, kurtosis

# Load the dataset
df = pd.read_csv('Unit02 auto-mpg (1).csv')
df.head()

## 1. Identify missing values

In [ ]:
df.info()
df.isnull().sum()

## 2. Estimate skewness and kurtosis

In [ ]:
numerical_cols = df.select_dtypes(include=[np.number]).columns
skewness = df[numerical_cols].skew()
kurt = df[numerical_cols].kurtosis()
print('Skewness:\n', skewness)
print('\nKurtosis:\n', kurt)

## 3. Correlation Heatmap

In [ ]:
plt.figure(figsize=(10,7))
sns.heatmap(df.corr(), annot=True, cmap='coolwarm')
plt.title('Correlation Heatmap')
plt.show()

## 4. Scatter plots for different parameters

In [ ]:
sns.pairplot(df[numerical_cols])
plt.show()

## 5. Replace categorical values with numerical values (e.g., America 1, Europe 2, etc.)

In [ ]:
# Check unique values in 'origin' column
print(df['origin'].unique())

# Replace origin values (example: if they are strings)
df['origin'] = df['origin'].replace({'USA': 1, 'Europe': 2, 'Japan': 3})

# If 'origin' is already numeric, you can skip or explain mapping.

## 6. Summary of Findings

- Missing values: [to be filled after running code]
- Skewness and kurtosis indicate [interpret based on output].
- Correlation heatmap shows [describe strongest correlations].
- Scatter plots reveal [describe relationships].
- Categorical values in 'origin' replaced with numeric codes.

---
> Run each cell and fill in the findings based on the output. Save your results for your e-portfolio and seminar discussion.
