
# Lecture 01: Exploratory Data Analysis (EDA) using Python
Exploratory Data Analysis (EDA) is an important step in data analysis focused on understanding patterns, trends, and relationships through statistical tools and visualizations.

This lecture demonstrates how to perform EDA using Python libraries such as **pandas**, **NumPy**, **Matplotlib**, and **Seaborn**.

## Step 1: Importing Required Libraries

In [None]:
# Importing essential libraries for EDA
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings as wr

# Ignore warnings for cleaner output
wr.filterwarnings('ignore')


âœ… Libraries imported successfully.

## Step 2: Reading the Dataset

In [None]:
# Reading dataset (adjust the path as needed)
df = pd.read_csv("/content/WineQT.csv")

# Display first 5 rows
df.head()


ðŸ“Š Shows the first five rows of the dataset.

## Step 3: Analyzing the Data

In [None]:
# Shape of the dataset
df.shape

# Information about dataset
df.info()

# Descriptive statistics
df.describe().T

# List of column names
df.columns.tolist()


ðŸ“ˆ Overview of dataset size, data types, and statistics.

## Step 4: Checking Missing Values

In [None]:
# Checking for missing values
df.isnull().sum()


ðŸ“Š Shows the number of missing values in each column.

## Step 5: Checking for Duplicate or Unique Values

In [None]:
# Checking unique values in each column
df.nunique()


ðŸ“ˆ Displays count of unique values per column.

## Step 6: Univariate Analysis

In [None]:
# Count plot of wine quality
quality_counts = df['quality'].value_counts()
plt.figure(figsize=(8, 6))
plt.bar(quality_counts.index, quality_counts, color='deeppink')
plt.title('Count Plot of Quality')
plt.xlabel('Quality')
plt.ylabel('Count')
plt.show()


ðŸ“Š Displays count of wine samples per quality rating.

In [None]:
# Kernel Density Plot for numerical columns
sns.set_style("darkgrid")
numerical_columns = df.select_dtypes(include=["int64", "float64"]).columns

plt.figure(figsize=(14, len(numerical_columns) * 3))
for idx, feature in enumerate(numerical_columns, 1):
    plt.subplot(len(numerical_columns), 2, idx)
    sns.histplot(df[feature], kde=True)
    plt.title(f"{feature} | Skewness: {round(df[feature].skew(), 2)}")
plt.tight_layout()
plt.show()


ðŸ“ˆ Shows the distribution and skewness for each numerical feature.

In [None]:
# Swarm Plot for detecting outliers
plt.figure(figsize=(10, 8))
sns.swarmplot(x="quality", y="alcohol", data=df, palette='viridis')
plt.title('Swarm Plot for Quality and Alcohol')
plt.xlabel('Quality')
plt.ylabel('Alcohol')
plt.show()


ðŸ“‰ Shows relation between Quality and Alcohol highlighting outliers.

## Step 7: Bivariate Analysis

In [None]:
# Pair Plot to show relationships between variables
sns.set_palette("Pastel1")
sns.pairplot(df)
plt.suptitle('Pair Plot for DataFrame', y=1.02)
plt.show()


ðŸ“Š Displays pairwise relationships and distributions.

In [None]:
# Violin Plot for Alcohol vs Quality
df['quality'] = df['quality'].astype(str)
plt.figure(figsize=(10, 8))
sns.violinplot(x="quality", y="alcohol", data=df, palette={
    '3': 'lightcoral', '4': 'lightblue', '5': 'lightgreen',
    '6': 'gold', '7': 'lightskyblue', '8': 'lightpink'}, alpha=0.7)
plt.title('Violin Plot for Quality and Alcohol')
plt.xlabel('Quality')
plt.ylabel('Alcohol')
plt.show()


ðŸ“ˆ Shows density and distribution of Alcohol across quality levels.

In [None]:
# Box Plot for Alcohol vs Quality
sns.boxplot(x='quality', y='alcohol', data=df)
plt.title('Box Plot for Alcohol vs Quality')
plt.show()


ðŸ“‰ Displays spread, median, and outliers for Alcohol per Quality.

## Step 8: Multivariate Analysis

In [None]:
# Correlation Heatmap
plt.figure(figsize=(15, 10))
sns.heatmap(df.corr(), annot=True, fmt='.2f', cmap='Pastel2', linewidths=2)
plt.title('Correlation Heatmap')
plt.show()


ðŸ“Š Shows pairwise correlations between numerical features.

## Conclusion
Through EDA, we explored the dataset using univariate, bivariate, and multivariate analyses.
These insights are essential before applying machine learning models.