In [None]:
# import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from statsmodels.tsa.seasonal import seasonal_decompose

In [None]:
import pandas as pd

url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/airline-passengers.csv"
df = pd.read_csv(url)
df.head()

In [None]:
df.shape

In [None]:
plt.plot(df["Month"],df["Passengers"])
plt.show()

In [None]:
# decompose the data into trend, seasonality and residuals
decompose = seasonal_decompose(df["Passengers"], model="multiplicative", period=12)
decompose.plot()

Seasonality means:

* monthly patterns

* weekly cycles

* hourly repeated behavior

* seasonal peaks (summer/winter)

Clustering helps by grouping similar time slices.

Every year repeats the same 12 months.

Seasonality = some months are always high, some always low.

# **Time Series Key Concepts**

## ***1. Correlation in Time Series (example ECG signals)***
Correlation in time series tells us how two variables move together **over time**.  
There are two types:

### **a. Autocorrelation**
- Measures correlation of a series **with its own past values**.
- Example: Today's temperature correlated with yesterday‚Äôs temperature.
- Useful for identifying **patterns, lags, seasonality**.

### **b. Cross-correlation**
- Measures correlation between **two different time series**.
- Example: Sales vs. advertising spend over time.

**Tools:** ACF (Autocorrelation Function), PACF (Partial ACF).

---

## **2. Stationarity in Time Series**
A time series is **stationary** if its statistical properties **do NOT change over time**.

### **A stationary series has:**
- Constant mean  
- Constant variance  
- Constant autocorrelation structure  

### **Why it matters?**
Most time series models (AR, MA, ARIMA) **assume stationarity**.

### **How to make a series stationary?**
- Differencing  
- Log transforms  
- Removing trends and seasonality  

### **Tests:**
- **ADF test**
- **KPSS test**

---

## 3. **Seasonality in Time Series**
Seasonality is a **regular, repeating pattern** that occurs at fixed intervals.

### **Examples:**
- Sales spike every December.
- Traffic increases every Monday.
- Temperature rises every summer.

### **Characteristics:**
- Periodic
- Predictable
- Fixed frequency (daily/weekly/monthly/yearly)

---

## 4. **Trend in Time Series**
Trend represents the **long-term upward or downward movement** in the data.

### **Types:**
- **Upward trend:** prices rising over years  
- **Downward trend:** decreasing user activity  
- **Flat trend:** no major change  

### **Causes:**
- Population growth  
- Economic changes  
- Technology adoption  

Trends are usually removed for modeling (using detrending or differencing).

---

## 5. **Cyclic Patterns in Time Series**
Cyclic patterns are **long-term oscillations** without a fixed period.

### **Key properties:**
- Duration is **not fixed** (unlike seasonality)
- Often influenced by **economic or business cycles**
- Example: Recession ‚Üí recovery ‚Üí boom ‚Üí recession.

### **Difference from seasonality:**
| Feature       | Seasonality | Cyclic |
|---------------|-------------|--------|
| Repeats?      | Yes         | Yes    |
| Fixed period? | Yes         |   No   |
| Driven by     | Calendar    | Economy/other factors |

### **What Is Lag in Time Series?**

- A lag represents how far back in time you look at past values of a time series. Lag = previous value of the same time series.

- If you ‚Äúlag‚Äù a series by 1, you're looking at the value 1 time step earlier.
- If you lag by 2, you're looking 2 time steps earlier, and so on.
# **temporal data : data based on time**

# **What Is White Noise in Time Series?**
White noise is a time series with no pattern at all.

It is completely random.

**‚úî Characteristics of White Noise**

A white-noise series has:

1Ô∏è‚É£ **Mean = constant**

Example: mean around 0.

2Ô∏è‚É£ **Variance = constant**

Spread of values does not change with time.

3Ô∏è‚É£ **No autocorrelation**

Past values do NOT help predict future values.

If you check the ACF plot of white noise:

Lag-0 = 1 (always)

All other lags ‚âà 0
Meaning: no relationship with its past.

4Ô∏è‚É£ **Looks like random points**

- No trend
- No seasonality
- No cycles
- No structure
---

#  **Summary Table**

| Concept        | Meaning | Examples |
|----------------|---------|----------|
| **Correlation** | How values relate to past values or another series | ACF, PACF |
| **Stationarity** | Mean/variance stay constant | Needed for ARIMA |
| **Seasonality** | Repeating pattern with fixed frequency | Monthly sales, weekly traffic |
| **Trend** | Long-term upward/downward movement | Inflation, growth |
| **Cyclic** | Repeating pattern without fixed period | Business cycles |



# üìå Moving Average (MA)

## üîç What is a Moving Average?
A **Moving Average** is a technique in time series analysis where you take
the **average of the last N values** and slide this window forward through time.

It helps to:
- Smooth the data
- Remove noise
- Reveal the underlying trend

---

## üßæ Example

Given a time series:
5, 7, 9, 8, 12

### 3-Day Moving Average:
- MA at day 3: (5 + 7 + 9) / 3 = **7.0**
- MA at day 4: (7 + 9 + 8) / 3 = **8.0**
- MA at day 5: (9 + 8 + 12) / 3 = **9.66**

---

## ‚≠ê Types of Moving Averages
### 1. **SMA ‚Äì Simple Moving Average**
Equal weight to all values in the window.

### 2. **WMA ‚Äì Weighted Moving Average**
More weight to recent values.

### 3. **EMA ‚Äì Exponential Moving Average**
Most weight to the newest value; reacts faster to changes.

---

## üìà Why Use Moving Average?
- It removes short-term fluctuations  
- Makes long-term movement easier to see  
- Helps identify trends and cycles  



# **seasonality check using the knn**

In [None]:
df['Month'] = pd.to_datetime(df['Month'])
df['month_num'] = df['Month'].dt.month
df['year'] = df['Month'].dt.year
df.head()

In [None]:
import numpy as np

months = {m: [] for m in range(1, 13)}

for _, row in df.iterrows():
    m = row["month_num"]
    months[m].append(row["Passengers"])

# convert to a matrix (12 months √ó 12 years)
month_vectors = np.array([months[m] for m in range(1, 13)])
month_vectors.shape


In [None]:
from sklearn.cluster import KMeans

kmeans = KMeans(n_clusters=3, random_state=42)
labels = kmeans.fit_predict(month_vectors)

labels

In [None]:
month_names = ["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"]

for name, label in zip(month_names, labels):
    print(f"{name}: Cluster {label}")


In [None]:
import matplotlib.pyplot as plt

plt.figure(figsize=(12,5))
plt.plot(df['Month'], df['Passengers'])
plt.title("AirPassengers Data")
plt.show()
