**Dataset Overview**

The dataset represents daily sales data of a product for a hypothetical company from January 1, 2021, to December 31, 2023. Each row corresponds to a single day's data, including sales figures and several features that could influence sales. The dataset is designed to simulate real-world scenarios where various factors affect daily sales.

**Features in the Dataset**

**Date**: The date for each observation.

**Sales**: The number of sales for the given day, generated using a Poisson distribution with some added noise to simulate daily sales variability.

**Promotions**: A binary feature indicating whether there was a promotion on that day (1 for promotion, 0 otherwise). This is simulated using a binomial distribution with a probability of 0.2.

**Holidays**: A binary feature indicating whether the day was a holiday (1 for holiday, 0 otherwise). This is simulated using a binomial distribution with a probability of 0.05.

**Economic Indicators**: A continuous feature representing some economic indicator value, generated using a normal distribution with a mean of 100 and a standard deviation of 10.

Here the **Consumer Confidence Index** is used as an Economic Indicator. CCI is a measure of how optimistic or pessimistic consumers are about the economy's current and future state. It's based on surveys that ask consumers about their perceptions of current economic conditions and their expectations for the future.

**High CCI Value: (above 110)**

Indication: Consumers feel positive about the economy and their financial situation.
Impact on Sales: High confidence usually leads to increased consumer spending as people are more likely to make purchases, take out loans, and invest in high-value items.

**Low CCI Value (below 90)**

Indication: Consumers are worried about the economy and their financial future.
Impact on Sales: Low confidence often results in decreased consumer spending as people tend to save more and avoid big purchases.

In [None]:
import numpy as np
import pandas as pd
import random

In [None]:
# Set random seed for reproducibility
np.random.seed(42)

# Generate dates
date_range = pd.date_range(start='2021-01-01', end='2023-12-31', freq='D')
num_days = len(date_range)

In [None]:
# Generate synthetic sales data (in units) with added noise
sales = np.random.poisson(lam=200, size=num_days) + np.random.normal(scale=20, size=num_days)
sales = np.round(sales).astype(int)  # Ensure sales are integers

# Ensure no negative sales values
sales[sales < 0] = 0

# Generate synthetic promotion data (0 or 1)
promotions = np.random.binomial(1, p=0.2, size=num_days)

# Generate synthetic holiday data (0 or 1)
holidays = np.random.binomial(1, p=0.05, size=num_days)

# Generate synthetic economic indicator data
economic_indicators = np.round(np.random.normal(loc=100, scale=10, size=num_days), 3)

# Create DataFrame
data = pd.DataFrame({
    'date': date_range,
    'sales': sales,
    'promotions': promotions,
    'holidays': holidays,
    'economic_indicators': economic_indicators
})

# Save to CSV
data.to_csv('synthetic_sales_data.csv', index=False)

# Display first few rows
print(data.head())

        date  sales  promotions  holidays  economic_indicators
0 2021-01-01    246           1         0               95.056
1 2021-01-02    187           1         0               96.946
2 2021-01-03    193           0         0               87.965
3 2021-01-04    190           0         0              107.775
4 2021-01-05    155           0         0               79.497
