---

# 🌟 **Customer Purchasing Behavior Exploration** 🌟

---

## **🎯 Objective**
Conduct a detailed **exploratory data analysis (EDA)** to evaluate customer purchasing behavior across various stores and features. Specifically, we aim to:

1. Examine **sales behavior** before, during, and after holidays.  
2. Identify **seasonal trends** in customer purchasing patterns.  
3. Evaluate the impact of **promotions** on sales and customer behavior.  
4. Understand **correlation** between sales, customers, and other key features.  
5. Analyze **store-specific factors** such as opening times, assortment type, and competition distance.  

---

## **📊 Key Questions**

1. **Distribution of Features**: Are promotions and other features distributed similarly in training and test sets?  
2. **Holidays**: How do sales behave before, during, and after holidays (e.g., Christmas, Easter)?  
3. **Seasonal Patterns**: What are the trends in sales across different months and seasons?  
4. **Correlation**: How strongly are sales correlated with the number of customers?  
5. **Promotions**:  
   - Do promotions attract new customers?  
   - How do they affect existing customers?  
   - Can they be deployed more effectively in specific stores?  
6. **Store Dynamics**:  
   - What is the effect of store opening/closing times on sales?  
   - Which stores are open all weekdays, and how does that affect weekend sales?  
7. **Assortment Types**: How does the type of assortment affect sales?  
8. **Competition**:  
   - How does competitor distance influence sales?  
   - What happens when new competitors enter the market?  
9. **Data Issues**: How do we handle missing values or changes in competitor distance over time?

---

## **🛠️ Approach**

### **1️⃣ Data Cleaning**

To ensure accurate analysis, the data cleaning process involves:  
- **Handling Missing Values**: Fill or impute missing data to avoid skewed results.  
- **Outlier Detection**: Identify and handle extreme values for features like sales and customers.  
- **Date Formatting**: Convert date columns to a standard datetime format for temporal analysis.

---

### **2️⃣ Exploratory Data Analysis**

#### **Feature Distribution**  
- Analyze the distribution of key features (e.g., promotions, holidays, etc.) across training and test sets.

#### **Temporal Trends**  
- Evaluate sales trends before, during, and after holidays.  
- Identify seasonal patterns (monthly and yearly).

#### **Correlation Analysis**  
- Compute correlation between sales, customers, promotions, and other features.

#### **Promotion Effectiveness**  
- Analyze sales performance with and without promotions.  
- Compare the number of new vs. existing customers during promotions.

#### **Store Analysis**  
- Assess the impact of store opening/closing times.  
- Examine assortment types and their influence on sales.  
- Evaluate competitor distance and its effect on sales.

---

### **3️⃣ Statistical Testing**

#### **Tests Conducted**  
1. **Categorical Data**:  
   - **Chi-squared Test**: To analyze relationships between categorical variables (e.g., promo and sales trends).  
2. **Numerical Data**:  
   - **t-test**: To compare numerical differences (e.g., average sales during holidays vs. non-holidays).  
   - **Correlation Coefficient**: To assess relationships between numerical features (e.g., sales and customers).

---

#### **🎯 Decision Rules**
- **Significance Level (α)**: **0.05**  
- **Interpretation**:  
  - If **p-value < 0.05**: Statistically significant relationship.  
  - If **p-value ≥ 0.05**: No significant relationship.  

---

## **✅ Summary of Steps**
1. Clean data by handling missing values, formatting dates, and removing outliers.  
2. Conduct exploratory analysis to answer key questions using visualizations (e.g., bar plots, line charts, heatmaps).  
3. Perform statistical tests to validate observations.  
4. Summarize findings and actionable insights.

Would you like the detailed code for this plan?

<style>
    h1 {
        color: #ffaa00;
        text-shadow: 2px 2px 5px #000;
        font-family: "Comic Sans MS", sans-serif;
    }
</style>

<h1>✨ Set Up Logging ✨</h1>



In [1]:
import logging

# Configure logging
logging.basicConfig(
    filename="eda_log.log",
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s"
)

logger = logging.getLogger()

# Example log
logger.info("Logging setup complete.")


<style>
    h1 {
        color: #ffaa00;
        text-shadow: 2px 2px 5px #000;
        font-family: "Comic Sans MS", sans-serif;
    }
</style>

<h1>✨ Import Modules ✨</h1>


In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
from scripts.Data_loader import load_data


logger.info("Imported required libraries.")

ModuleNotFoundError: No module named 'scripts.Data_loader'

In [7]:
# Load the dataset
train_data = pd.read_csv('train.csv')
test_data = pd.read_csv('test.csv')

logger.info("Loaded training and test datasets.")

# Preview the datasets
print(train_data.head())
print(test_data.head())
logger.info("Previewed datasets.")


FileNotFoundError: [Errno 2] No such file or directory: 'train.csv'

In [3]:
# Set visualization style
sns.set(style="whitegrid")

# Load the dataset
data = pd.read_csv(r'C:\Users\fikad\Desktop\10acedamy\Rossmann-Pharmaceuticals-Sales-Prediction\Data\train.csv')

# Display the first few rows of the dataset
print(data.head())

   Store  DayOfWeek        Date  Sales  Customers  Open  Promo StateHoliday  \
0      1          5  2015-07-31   5263        555     1      1            0   
1      2          5  2015-07-31   6064        625     1      1            0   
2      3          5  2015-07-31   8314        821     1      1            0   
3      4          5  2015-07-31  13995       1498     1      1            0   
4      5          5  2015-07-31   4822        559     1      1            0   

   SchoolHoliday  
0              1  
1              1  
2              1  
3              1  
4              1  


  data = pd.read_csv(r'C:\Users\fikad\Desktop\10acedamy\Rossmann-Pharmaceuticals-Sales-Prediction\Data\train.csv')


In [4]:
data.columns

Index(['Store', 'DayOfWeek', 'Date', 'Sales', 'Customers', 'Open', 'Promo',
       'StateHoliday', 'SchoolHoliday'],
      dtype='object')