# Exploratory Data Analysis (EDA)
## In-Vehicle Coupon Recommendation System

**Author:** Mekala Jaswanth  
**Date:** 2025

This notebook performs comprehensive exploratory data analysis on the in-vehicle coupon recommendation dataset.

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings

warnings.filterwarnings('ignore')

# Set plotting style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)

print('Libraries imported successfully!')

## 1. Load and Inspect Data

In [None]:
# Load dataset
# df = pd.read_csv('../data/in-vehicle-coupon-data.csv')
# Uncomment above line when data is available

# For demonstration
print('Dataset shape: (rows, columns)')
# print(f'Shape: {df.shape}')
# df.head()

## 2. Data Overview

In [None]:
# Display basic information
# df.info()

# Statistical summary
# df.describe()

## 3. Missing Values Analysis

In [None]:
# Check missing values
# missing_values = df.isnull().sum()
# missing_percentage = (missing_values / len(df)) * 100

# missing_df = pd.DataFrame({
#     'Missing Values': missing_values,
#     'Percentage': missing_percentage
# })

# print(missing_df[missing_df['Missing Values'] > 0].sort_values('Missing Values', ascending=False))

## 4. Target Variable Analysis

In [None]:
# Analyze coupon acceptance distribution
# plt.figure(figsize=(8, 6))
# df['Y'].value_counts().plot(kind='bar', color=['#ff7f0e', '#1f77b4'])
# plt.title('Distribution of Coupon Acceptance', fontsize=16, fontweight='bold')
# plt.xlabel('Coupon Accepted (1) or Not (0)')
# plt.ylabel('Count')
# plt.xticks(rotation=0)
# plt.tight_layout()
# plt.show()

# Acceptance rate
# acceptance_rate = (df['Y'].sum() / len(df)) * 100
# print(f'\nOverall Coupon Acceptance Rate: {acceptance_rate:.2f}%')

## 5. Categorical Features Analysis

In [None]:
# Analyze coupon type distribution
# categorical_cols = ['coupon', 'destination', 'passanger', 'weather', 'time', 'gender']

# for col in categorical_cols:
#     plt.figure(figsize=(10, 5))
#     
#     # Count plot
#     plt.subplot(1, 2, 1)
#     df[col].value_counts().plot(kind='bar', color='steelblue')
#     plt.title(f'Distribution of {col}')
#     plt.xticks(rotation=45)
#     
#     # Acceptance rate by category
#     plt.subplot(1, 2, 2)
#     acceptance_by_category = df.groupby(col)['Y'].mean() * 100
#     acceptance_by_category.plot(kind='bar', color='coral')
#     plt.title(f'Acceptance Rate by {col}')
#     plt.ylabel('Acceptance Rate (%)')
#     plt.xticks(rotation=45)
#     
#     plt.tight_layout()
#     plt.show()

## 6. Numerical Features Analysis

In [None]:
# Analyze temperature distribution
# plt.figure(figsize=(12, 5))

# plt.subplot(1, 2, 1)
# df['temperature'].hist(bins=30, color='skyblue', edgecolor='black')
# plt.title('Temperature Distribution')
# plt.xlabel('Temperature')
# plt.ylabel('Frequency')

# plt.subplot(1, 2, 2)
# df.boxplot(column='temperature', by='Y')
# plt.title('Temperature vs Coupon Acceptance')
# plt.suptitle('')

# plt.tight_layout()
# plt.show()

## 7. Correlation Analysis

In [None]:
# Select numerical columns
# numerical_cols = df.select_dtypes(include=[np.number]).columns

# Calculate correlation matrix
# correlation_matrix = df[numerical_cols].corr()

# Plot correlation heatmap
# plt.figure(figsize=(12, 10))
# sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', center=0, 
#             fmt='.2f', square=True, linewidths=1)
# plt.title('Feature Correlation Heatmap', fontsize=16, fontweight='bold')
# plt.tight_layout()
# plt.show()

## 8. Key Insights

Based on the exploratory data analysis, we can derive the following insights:

1. **Coupon Acceptance Rate**: Overall acceptance rate is approximately [X]%
2. **Time-Based Patterns**: Coupons during commute hours show higher acceptance
3. **Weather Impact**: Favorable weather conditions increase acceptance rates
4. **Passenger Influence**: Coupon type preference varies with passenger type
5. **Proximity Matters**: Distance to venue significantly affects acceptance
6. **Temperature Effect**: Moderate temperatures correlate with higher acceptance

## Next Steps

1. Feature engineering based on insights
2. Handle missing values appropriately
3. Create interaction features
4. Build and train machine learning models
5. Evaluate model performance

In [None]:
print('EDA completed successfully!')