# [Dataset Name] - Exploratory Data Analysis

## Overview
Brief description of the dataset and analysis goals.

## Table of Contents
1. [Data Loading and Initial Inspection](#1-data-loading-and-initial-inspection)
2. [Data Cleaning](#2-data-cleaning)
3. [Exploratory Data Analysis](#3-exploratory-data-analysis)
4. [Key Insights](#4-key-insights)
5. [Conclusions](#5-conclusions)

## 1. Data Loading and Initial Inspection

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import warnings
warnings.filterwarnings('ignore')

# Set style for plots
plt.style.use('default')
sns.set_palette('husl')

# Display options
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)

In [None]:
# Load the dataset
# df = pd.read_csv('../../datasets/[dataset_folder]/[dataset_file].csv')
# df.head()

In [None]:
# Basic information about the dataset
print(f"Dataset shape: {df.shape}")
print(f"\nColumn names and types:")
df.dtypes

In [None]:
# Basic statistics
df.describe()

## 2. Data Cleaning

In [None]:
# Check for missing values
missing_data = df.isnull().sum()
missing_percentage = 100 * missing_data / len(df)
missing_df = pd.DataFrame({
    'Missing Count': missing_data,
    'Percentage': missing_percentage
})
missing_df[missing_df['Missing Count'] > 0].sort_values('Missing Count', ascending=False)

In [None]:
# Check for duplicates
duplicates = df.duplicated().sum()
print(f"Number of duplicate rows: {duplicates}")

## 3. Exploratory Data Analysis

### 3.1 Univariate Analysis

In [None]:
# Analyze numerical columns
numerical_cols = df.select_dtypes(include=[np.number]).columns.tolist()
print(f"Numerical columns: {numerical_cols}")

In [None]:
# Analyze categorical columns
categorical_cols = df.select_dtypes(include=['object']).columns.tolist()
print(f"Categorical columns: {categorical_cols}")

### 3.2 Bivariate Analysis

In [None]:
# Correlation matrix for numerical variables
if len(numerical_cols) > 1:
    plt.figure(figsize=(10, 8))
    correlation_matrix = df[numerical_cols].corr()
    sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', center=0)
    plt.title('Correlation Matrix')
    plt.show()

### 3.3 Data Visualization

In [None]:
# Example visualizations (customize based on your data)
# Distribution plots for numerical variables
# for col in numerical_cols[:4]:  # Limit to first 4 columns
#     plt.figure(figsize=(10, 4))
#     plt.subplot(1, 2, 1)
#     plt.hist(df[col].dropna(), bins=30, alpha=0.7)
#     plt.title(f'Distribution of {col}')
#     plt.xlabel(col)
#     plt.ylabel('Frequency')
#     
#     plt.subplot(1, 2, 2)
#     plt.boxplot(df[col].dropna())
#     plt.title(f'Box Plot of {col}')
#     plt.ylabel(col)
#     
#     plt.tight_layout()
#     plt.show()

## 4. Key Insights

### Key Findings:
- Insight 1: [Description]
- Insight 2: [Description]
- Insight 3: [Description]

## 5. Conclusions

### Summary:
- [Key takeaway 1]
- [Key takeaway 2]
- [Key takeaway 3]

### Recommendations:
- [Recommendation 1]
- [Recommendation 2]
- [Recommendation 3]

### Next Steps:
- [Future analysis idea 1]
- [Future analysis idea 2]