# Time Series Analysis and Delivery Time Impact on Review Scores

This notebook performs a time series analysis on e-commerce order data to identify trends, seasonality, and forecast future sales. Additionally, it analyzes the impact of delivery time on customer review scores.

## Objectives
1. Analyze sales volume (order count) and revenue over time.
2. Identify trends and seasonal patterns using seasonal decomposition.
3. Forecast future order counts using ARIMA.
4. Evaluate the relationship between delivery time and review scores.

## 1. Data Preparation

We load the dataset, convert time columns to `datetime`, filter for delivered orders, and calculate delivery time in days.

In [None]:
# Import libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.arima.model import ARIMA
import numpy as np
from scipy.stats import pearsonr

# Load the dataset
df = pd.read_csv('FINAL.csv')

# Convert time columns to datetime
df['order_purchase_timestamp'] = pd.to_datetime(df['order_purchase_timestamp'], format='%m/%d/%Y %H:%M')
df['order_delivered_customer_date'] = pd.to_datetime(df['order_delivered_customer_date'], format='%m/%d/%Y %H:%M', errors='coerce')
df['review_creation_date'] = pd.to_datetime(df['review_creation_date'], format='%m/%d/%Y %H:%M')

# Filter for delivered orders
df = df[df['order_status'] == 'delivered']

# Calculate delivery time in days
df['delivery_time'] = (df['order_delivered_customer_date'] - df['order_purchase_timestamp']).dt.total_seconds() / (24 * 3600)

# Handle missing delivery times
df = df.dropna(subset=['delivery_time', 'review_score'])

# Set index for time series analysis
df_ts = df.set_index('order_purchase_timestamp')

# Aggregate data by month for time series analysis
monthly_sales = df_ts.resample('M').agg({
    'Unnamed: 0': 'count',  # Order count
    'payment_value': 'sum'   # Total revenue
}).rename(columns={'Unnamed: 0': 'order_count'})

# Display first few rows of aggregated data
monthly_sales.head()

## 2. Time Series Analysis: Trends and Seasonality

We perform seasonal decomposition to identify trends and seasonal patterns in the order count.

In [None]:
# Seasonal decomposition for order count
decomposition = seasonal_decompose(monthly_sales['order_count'], model='additive', period=12)

# Plot decomposition
plt.figure(figsize=(12, 8))
decomposition.plot()
plt.suptitle('Seasonal Decomposition of Order Count', y=1.02)
plt.show()

## 3. Forecasting

We use an ARIMA model to forecast order counts for the next 6 months.

In [None]:
# Fit ARIMA model
model = ARIMA(monthly_sales['order_count'], order=(1, 1, 1), seasonal_order=(1, 1, 1, 12))
arima_result = model.fit()

# Forecast for the next 6 months
forecast = arima_result.forecast(steps=6)
forecast_index = pd.date_range(start=monthly_sales.index[-1] + pd.offsets.MonthBegin(1), periods=6, freq='M')

# Plot forecast
plt.figure(figsize=(12, 6))
plt.plot(monthly_sales['order_count'], label='Historical Order Count')
plt.plot(forecast_index, forecast, label='Forecast', color='red')
plt.title('Order Count Forecast (6 Months)')
plt.xlabel('Date')
plt.ylabel('Order Count')
plt.legend()
plt.show()

# Print forecast values
print('Forecasted Order Count for Next 6 Months:')
print(pd.Series(forecast, index=forecast_index))

## 4. Delivery Time Impact on Review Scores

We analyze how delivery time affects review scores using correlation analysis, scatter plots, box plots, and summary statistics.

In [None]:
# Correlation analysis
corr, p_value = pearsonr(df['delivery_time'], df['review_score'])
print(f'Pearson Correlation between Delivery Time and Review Score: {corr:.3f} (p-value: {p_value:.3f})')

# Scatter plot
plt.figure(figsize=(10, 6))
sns.scatterplot(x='delivery_time', y='review_score', data=df, alpha=0.5)
plt.title('Delivery Time vs. Review Score')
plt.xlabel('Delivery Time (Days)')
plt.ylabel('Review Score')
plt.show()

# Box plot by review score
plt.figure(figsize=(10, 6))
sns.boxplot(x='review_score', y='delivery_time', data=df)
plt.title('Delivery Time Distribution by Review Score')
plt.xlabel('Review Score')
plt.ylabel('Delivery Time (Days)')
plt.show()

# Summary statistics: Average delivery time by review score
avg_delivery_by_score = df.groupby('review_score')['delivery_time'].mean()
print('\nAverage Delivery Time by Review Score:')
print(avg_delivery_by_score)

## 5. Conclusions

### Time Series Analysis
- **Trends**: The seasonal decomposition shows the long-term trend in order counts, which may indicate business growth or decline.
- **Seasonality**: Peaks in certain months (e.g., November/December) suggest holiday-driven sales.
- **Forecast**: The ARIMA model provides a 6-month forecast for order counts, useful for inventory and marketing planning.

### Delivery Time Impact
- **Correlation**: A negative correlation between delivery time and review score (if observed) indicates that longer delivery times are associated with lower customer satisfaction.
- **Visualization**: The scatter and box plots highlight how delivery time varies with review scores.
- **Summary**: Average delivery times are typically higher for lower review scores, suggesting that faster delivery improves customer satisfaction.

### Recommendations
- **Business Planning**: Use forecast results to prepare for peak seasons and optimize inventory.
- **Delivery Optimization**: Reduce delivery times to improve review scores and customer satisfaction.
- **Further Analysis**: Explore category-specific trends or regional differences in delivery performance.