<a href="https://colab.research.google.com/github/Aidakazemi/BUS650/blob/main/BUSI650_Week6_TimeSeries_ABTesting.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# üìò BUSI650 ‚Äî Week 6 : Time Series Analysis & A/B Testing

---

## üéØ Learning Objectives
By the end of this session, you will be able to:
- Understand what time series data is and how to interpret it.  
- Apply forecasting methods (na√Øve, moving average, exponential smoothing, regression trend).  
- Understand and apply A/B testing for business decisions.  

---

##  Warm-Up & Recap
Last week, we learned about **hypothesis testing** and **regression models** ‚Äî how to predict one variable using another.

Today, we‚Äôll move one step forward:  
üëâ How to **predict the future** using **past data** ‚Äî that‚Äôs called *time series forecasting*.



# ‚è≥ Part 1 ‚Äî Time Series Basics

A **time series** is data collected over time ‚Äî daily, weekly, monthly, quarterly.

In business, examples include:
- Monthly sales revenue  
- Daily website visitors  
- Quarterly profit  
- Weekly employee hours worked  
- Look at this time series data of flow of a day in life at https://flowingdata.com/2015/12/15/a-day-in-the-life-of-americans/

---



### üìä Example: Monthly Coffee Sales (Caf√© Case)

In [None]:

import pandas as pd, numpy as np, matplotlib.pyplot as plt

# Generate simple time series dataset
np.random.seed(42)
dates = pd.date_range("2023-01-01", periods=24, freq="M")
sales = [200 + i*5 + np.random.randint(-10,10) for i in range(24)]  # upward trend
df = pd.DataFrame({"Month": dates, "Sales": sales})



# Plot the time series
plt.figure(figsize=(10,4))
plt.plot(df["Month"], df["Sales"], marker="o", linestyle="-")
plt.title("‚òï Caf√© Monthly Coffee Sales (2023‚Äì2024)")
plt.xlabel("Month")
plt.ylabel("Sales ($ in hundreds)")
plt.grid(True)
plt.show()


In [None]:
# Print the data of sales and months
print(df[["Month", "Sales"]])


**Story to Tell:**
> The caf√© wants to know how many cups of coffee it will likely sell next month (Month 25),  
> so it can plan for beans, milk, and staff scheduling.


# üî¢ Part 2 ‚Äî Time Series Forecasting Methods


## 1Ô∏è‚É£ Na√Øve Forecast ‚Äî "Tomorrow is Like Today"
**Concept:**  
Assumes the next month‚Äôs sales = this month‚Äôs sales.

$$
\hat{Y}_{t+1} = Y_t
$$


In [None]:

# Naive forecast
naive_forecast = df["Sales"].iloc[-1]
print("Na√Øve forecast for next month:", naive_forecast)


> Simple baseline: assumes next month = last month.


## 2Ô∏è‚É£ Moving Average ‚Äî "Average of Last Few Months"
**Concept:** Smooths random ups and downs by averaging the most recent *n* months.

$$
\hat{Y}_{t+1} = \frac{Y_t + Y_{t-1} + \dots + Y_{t-n+1}}{n}
$$


In [None]:

n = 3
moving_avg_forecast = df["Sales"].tail(n).mean()
print("3-Month Moving Average Forecast:", round(moving_avg_forecast, 2))


> Averages last 3 months for a smoother trend.


## 3Ô∏è‚É£ Weighted Moving Average ‚Äî "More Weight to Recent Months"
**Concept:** More recent data is more relevant.

$$
\hat{Y}_{t+1} = w_1Y_t + w_2Y_{t-1} + w_3Y_{t-2}, \quad w_1+w_2+w_3=1
$$


In [None]:

weights = [0.5, 0.3, 0.2]
recent = df["Sales"].tail(3).to_numpy()[::-1]
weighted_forecast = np.sum(weights * recent)
print("Weighted Moving Average Forecast:", round(weighted_forecast, 2))


> Gives more weight to recent data ‚Äî captures recency.


## 4Ô∏è‚É£ Regression Trend ‚Äî "Predict Based on Growth Pattern"
**Concept:** Fit a linear trend model to estimate consistent growth.

$$
Y_t = a + b t + \varepsilon_t
$$


In [None]:

from sklearn.linear_model import LinearRegression

df["t"] = np.arange(1, len(df)+1)
X = df[["t"]]; y = df["Sales"]
model = LinearRegression().fit(X, y)
a, b = model.intercept_, model.coef_[0]
t_next = 25
regression_forecast = a + b * t_next

print(f"Regression Equation: Sales = {a:.2f} + {b:.2f} * t")
print("Regression Forecast for next month:", round(regression_forecast, 2))


In [None]:

print("Na√Øve:", naive_forecast)
print("Moving Average:", round(moving_avg_forecast,2))
print("Weighted MA:", round(weighted_forecast,2))
print("Regression:", round(regression_forecast,2))



**Discussion:**
> Which forecast seems most realistic given the trend?  
> If you were the caf√© manager, which would you rely on?



# üéØ Part 3 ‚Äî A/B Testing (Experiments)

A/B testing compares two versions ‚Äî A (control) and B (new version) ‚Äî to see which performs better.



### üìà Example: Email Campaign
A company tests two subject lines for email campaign and we want to know: Is version B significantly better?
| Group | Open Rate (%) |
|--------|---------------|
| A | 25 |
| B | 32 |

**Hypotheses:**
- $H_0$: mean(A) = mean(B)

- $H_1$: mean(B) > mean(A)


In [None]:

import scipy.stats as stats

A = np.random.binomial(1, 0.25, 200)
B = np.random.binomial(1, 0.32, 200)

t, p = stats.ttest_ind(B, A, equal_var=False)
print(f"t = {t:.2f}, one-tailed p = {p/2:.4f}")



If *p < 0.05*, Version B performs significantly better.

> Example: ‚ÄúBuy One Get One Free‚Äù (A) vs. ‚ÄúFree Muffin with Coffee‚Äù (B).  
> If p < 0.05 ‚Üí B wins; send that campaign!



# üîö Wrap-Up

| Concept | Purpose |
|----------|----------|
| Time Series | Understand how data changes over time |
| Forecasting | Predict future using past patterns |
| A/B Testing | Evaluate two strategies to guide decisions |

---

## In-class Activity
# üìä BUSI 650 ‚Äî In-Class Activity: Forecasting a Time Series Using Regression

**Objective:**  
You will choose a business variable that changes over time (daily, monthly, quarterly, or yearly) and create a small dataset of **10 periods**.  
Then, you will fit a **regression line** to forecast future values and interpret the results.

---

## ü™Ñ Step 1 ‚Äî Choose Your Variable

Think about something from your **field of interest** that changes over time.

| Field | Possible Variable Examples |
|--------|----------------------------|
| Marketing | Monthly ad spend, weekly website visits, quarterly leads |
| Finance | Quarterly revenue, monthly expenses, yearly profit |
| HR | Monthly hires, quarterly turnover rate |
| Law / Policy | Monthly case filings, quarterly approvals |
| Operations | Daily production output, weekly order volume |

**Write down:**
1. Variable name (e.g., *Quarterly Revenue in $K*)  
2. Frequency (daily / monthly / quarterly / yearly)  
3. Expected trend (increasing / decreasing / flat)  
4. Any possible seasonality (yes/no)

---

## ‚öôÔ∏è Step 2 ‚Äî Create a 10-Period Dataset

Ask an AI tool (e.g., ChatGPT, Bing Copilot, or your Colab helper GPT) to generate a simple dataset for your chosen variable.

You can prompt like this:

> ‚ÄúCreate a synthetic dataset of 10 quarterly revenue values for a small business, starting around $100K and growing by about 8% per quarter with random variation.‚Äù

Once you get your data:
- Copy it into a table or CSV format.
- Load it into your Colab environment using `pandas`.
- The table should look like this:

| TimeIndex | Date | Value |
|------------|------|--------|
| 1 | 2022-Q1 | 100 |
| 2 | 2022-Q2 | 108 |
| 3 | 2022-Q3 | 117 |
| ‚Ä¶ | ‚Ä¶ | ‚Ä¶ |
| 10 | 2024-Q2 | 185 |

---

## üìà Step 3 ‚Äî Visualize the Data

Use Python libraries (like `matplotlib` or `seaborn`) to **plot your variable over time**.

Your plot should:
- Have **time** on the x-axis and **value** on the y-axis.  
- Show whether your data looks upward, downward, or fluctuating.  
- Include a **title**, **labels**, and **grid**.

*Example interpretation:*  
> ‚ÄúThe chart shows a gradual upward trend in quarterly revenue with small random fluctuations.‚Äù

---

## üìâ Step 4 ‚Äî Fit a Regression Line (Forecasting Model)

In Colab, fit a **simple linear regression** model:
- **Dependent variable (Y):** your chosen variable (e.g., revenue, sales, or profit).  
- **Independent variable (X):** the time index (1 to 10).

Then:
1. Generate the **regression equation** (e.g., `Value = 95 + 8.2 √ó TimeIndex`).
2. Plot the **actual data** and the **fitted regression line** together.
3. Record:
   - The **slope (coefficient)**  
   - The **p-value**  
   - The **R¬≤ value**

---

## üß† Step 5 ‚Äî Interpret the Results

Write a short paragraph answering the following:

1. **Slope:**  
   - What is the direction and size of the trend?  
   - Example: ‚ÄúEach quarter, revenue increases by an average of $8.2K.‚Äù

2. **p-value:**  
   - Is the trend **statistically significant** (p < 0.05)?  
   - Example: ‚ÄúThe trend is significant at the 5% level, suggesting a real upward pattern.‚Äù

3. **R¬≤:**  
   - What percentage of variation is explained by time?  
   - Example: ‚ÄúR¬≤ = 0.82, so 82% of the change in revenue is explained by time.‚Äù

4. **Forecast:**  
   - Predict the next period (e.g., period 11) using your regression equation.

5. **Business meaning:**  
   - What might this mean for decision-making?  
   - Example: ‚ÄúIf this trend continues, the company should prepare for higher sales and inventory needs next quarter.‚Äù

---

## üîÆ Step 6 ‚Äî One-Step-Ahead Forecast

Manually plug **TimeIndex = 11** into your regression equation to forecast the next value.

> Example:  
> \( Y = 95 + 8.2 √ó 11 = 185.2 \)  
> ‚ÄúPredicted quarterly revenue for next period: $185.2K‚Äù

---

## ‚úçÔ∏è Step 7 ‚Äî Reflect (Short Discussion)

Discuss with your peers:
- What trends did you observe?  
- Which variables showed the strongest time effect?  
- How could forecasting like this help in real business decisions?
---


