# EDA Notebook for SalesPrediction AI

This notebook performs Exploratory Data Analysis (EDA) on sales data to understand the dataset and prepare for predictive modeling.

In [ ]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Load the sales data
data = pd.read_csv('sales_data.csv')
data.head()

## Data Exploration

Let's explore the dataset to understand its structure, feature distribution, and relationships.

In [ ]:
# Summary statistics
data.describe()

In [ ]:
# Data information
data.info()

## Data Visualization

Visualize data to gain insights into feature distributions, correlations, and patterns.

In [ ]:
# Histogram of Sales
plt.figure(figsize=(10, 6))
sns.histplot(data['Sales'], bins=20, kde=True)
plt.title('Distribution of Sales')
plt.xlabel('Sales')
plt.ylabel('Frequency')
plt.show()

In [ ]:
# Pairplot of numeric features
sns.pairplot(data, vars=['Sales', 'Promotion', 'Holiday'], hue='Weather', diag_kind='hist')
plt.suptitle('Pairplot of Sales, Promotion, Holiday by Weather', y=1.02)
plt.show()

## Correlation Analysis

Explore correlations between features and target variable (Sales).

In [ ]:
# Correlation heatmap
corr_matrix = data.corr()
plt.figure(figsize=(8, 6))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', vmin=-1, vmax=1)
plt.title('Correlation Heatmap')
plt.show()

## Conclusion

This EDA notebook provides initial insights into the sales data. Further analysis and preprocessing steps are necessary before training predictive models. Next steps may include handling missing values, encoding categorical variables, and scaling features.