# Time Series Forecasting for Weekly Data

This Jupyter notebook outlines the process of building a machine learning model to forecast time series data. The dataset comprises weekly data across various provinces and districts from 2019 to 2022, used for training, and data from 2023 for testing.

## Objective

The goal is to forecast weekly data for 2023 and evaluate the model's accuracy using the actual 2023 data as a benchmark.

## Approach

1. **Data Loading**: Load the historical data (2019-2022) for training and the current year data (2023) for testing.
2. **Data Preprocessing**: Combine the datasets, handle missing values, and encode categorical variables.
3. **Feature Engineering**: Extract and select relevant features for the forecasting model.
4. **Model Selection**: Choose a suitable model based on the data's characteristics.
5. **Model Training**: Train the model using historical data.
6. **Model Evaluation**: Evaluate the model's performance using the 2023 data.
7. **Results Generation**: Forecast the weekly data for 2023 and save the results to a CSV file.

### Why This Model?

A Linear Regression model is chosen for its simplicity and effectiveness in capturing linear relationships between features and the target variable. It serves as a baseline to understand the dataset's behavior and can be a starting point for more complex models if needed.

### Data Loading

First, we load the data from CSV files for each year.

In [2]:
import pandas as pd

# Load the CSV files into DataFrames
df_2019 = pd.read_csv('2019.csv')
df_2020 = pd.read_csv('2020.csv')
df_2021 = pd.read_csv('2021.csv')
df_2022 = pd.read_csv('2022.csv')
df_2023 = pd.read_csv('2023.csv')

### Data Preprocessing

Combine the datasets from 2019 to 2022 into a single DataFrame for training. Check for and handle missing values, ensuring the data is clean for modeling.

In [9]:
# Combine the datasets for 2019 to 2022 into a single DataFrame for training
df_train = pd.concat([df_2019, df_2020, df_2021, df_2022], ignore_index=True)

# Check for missing values in the training dataset
missing_values_train = df_train.isnull().sum().sum()

# Impute missing values if necessary (example: forward fill for time series)
df_train.fillna(method='ffill', inplace=True)

### Feature Engineering

Extract features that could be useful for the forecasting task. This may include transforming the weekly data into more suitable forms or creating new features.

In [10]:
# Example: Summarize weekly data into monthly data if relevant
# This is a placeholder for actual feature engineering steps based on the dataset's specifics

### Model Training

Train a Linear Regression model on the prepared dataset.

In [11]:
from sklearn.linear_model import LinearRegression

# Initialize the model
model = LinearRegression()

# Example training process (adjust according to actual features and target)
model.fit(X_train, y_train)

NameError: name 'X_train' is not defined

### Model Evaluation

Evaluate the model's performance using the testing data from 2023. Metrics like Mean Absolute Error (MAE) or R-squared (R²) can be useful.

In [7]:
from sklearn.metrics import mean_absolute_error, r2_score

# Example evaluation (adjust according to actual predictions and targets)
mae = mean_absolute_error(y_test, predictions)
r2 = r2_score(y_test, predictions)

NameError: name 'y_test' is not defined

### Results Generation

Finally, forecast the weekly data for 2023 using the trained model and save the results to a CSV file.

In [None]:
# Generate predictions for 2023
# predictions_2023 = model.predict(X_2023)

# Save the predictions to a CSV file
# predictions_2023.to_csv('result_2023.csv', index=False)