# Time Series Forecasting for Weekly Data

This notebook outlines the process of forecasting weekly data using historical data from 2019 to 2022 as training data, and data from 2023 as testing data.

## Steps Overview:

1. **Data Loading**: Load the CSV files for each year.
2. **Data Preprocessing**: Handle missing values and encode categorical variables.
3. **Feature Selection**: Prepare the datasets for training and testing.
4. **Model Selection and Training**: Select a regression model for forecasting.
5. **Model Evaluation**: Use the model to predict 2023 data and evaluate its accuracy.
6. **Saving Results**: Save the forecasted data for 2023 into a CSV file.

## Data Loading

First, load the data for each year into Pandas DataFrames.

In [1]:
import pandas as pd

# Assuming the datasets are named as 'year.csv'
df_2019 = pd.read_csv('2019.csv')
df_2020 = pd.read_csv('2020.csv')
df_2021 = pd.read_csv('2021.csv')
df_2022 = pd.read_csv('2022.csv')
df_2023 = pd.read_csv('2023.csv')

## Data Preprocessing

Combine the training data and handle any missing values. Encode categorical variables if necessary.

In [2]:
# Combine the datasets from 2019 to 2022 for training
df_train = pd.concat([df_2019, df_2020, df_2021, df_2022])

# Handle missing values, example with forward fill
df_train.ffill(inplace=True)
df_2023.ffill(inplace=True)

# Encode categorical variables if necessary
# For simplicity, this example assumes no encoding is needed

## Feature Selection

Prepare the feature matrices (X) and target vectors (y) for both training and testing datasets.

In [3]:
# Assuming the target variable is the last week's data in each dataset
X_train = df_train.drop(columns=['target_column'])
y_train = df_train['target_column']

X_test = df_2023.drop(columns=['target_column'])
y_test = df_2023['target_column']

KeyError: "['target_column'] not found in axis"

## Model Selection and Training

Here we choose a linear regression model for its simplicity and effectiveness in many scenarios.

In [None]:
from sklearn.linear_model import LinearRegression

# Initialize and train the model
model = LinearRegression()
model.fit(X_train, y_train)

## Model Evaluation

Predict the 2023 data and evaluate the model's performance.

In [None]:
from sklearn.metrics import mean_absolute_error, r2_score

# Make predictions
predictions = model.predict(X_test)

# Evaluate the model
mae = mean_absolute_error(y_test, predictions)
r2 = r2_score(y_test, predictions)

print(f'MAE: {mae}, R^2: {r2}')

## Saving Results

Finally, save the forecasted results into a CSV file.

In [None]:
# Assuming predictions are ready to be saved
results_df = pd.DataFrame({'Actual': y_test, 'Predicted': predictions})
results_df.to_csv('/path/to/result_2023.csv', index=False)