# Introduction 

This Jupyter Notebook contains all the codes and methodologies utilised to undertake and fulfil the instructions provided in part 2 of `main.pdf`, i.e. baseline.

Our aim is to implement LLMTIME procedure, we took into account one of the files provided in this repository, [llmtime.pdf](https://github.com/MatteoMancini01/M2_Cw/blob/main/instructions/llmtime.pdf). We were provided with two Python files `qwen.py` and `lora_skeleton.py`.  We created two extra Python files, `plotting.py` and `preprocessor.py`.  All Python files are stored in the directory `src`, the table below illustrates the purposes of each Python file, in `src`:

|File Name| Information|
|---------|------------|
|[`qwen.py`](https://github.com/MatteoMancini01/M2_Cw/blob/main/src/qwen.py)| Loads the Qwen2.5-0.5B-Instruct model and tokenizer from Hugging Face, freezes all model parameters except for the LM head bias, and prepares it for inference or fine-tuning.|
|[`lora_skeleton.py`](https://github.com/MatteoMancini01/M2_Cw/blob/main/src/lora_skeleton.py)| Implements LoRA (Low-Rank Adaptation) by wrapping the query and value projection layers of the Qwen2.5 model with trainable LoRA layers, processes the Lotka-Volterra dataset using LLMTIME, tokenizes it, and fine-tunes the model for up to 10,000 optimizer steps using PyTorch and accelerate.​|
|[`preprocessor.py`](https://github.com/MatteoMancini01/M2_Cw/blob/main/src/preprocessor.py)| Contains the class `Preprocessor`, which gives access to all the functions required for preprocessing the dataset from `lotka_volterra_data.h5`, this includes functions that scale data, converts array to string and sting back to array, all very useful pre-requisites for preprocessing the dataset.|
|[`plotting.py`](https://github.com/MatteoMancini01/M2_Cw/blob/main/src/plotting.py)|This file is not one of the requisites for this project. Designed for plotting. File contains a class PlotProject, which contains all the plotting functions required for the Jupyter Notebooks, this aims to keep the Notebooks tidy.|

For more details about each Python files, I encourage the reader to inspect them, each function within all files, have detailed doc-stings including examples on how to use them.


Through out this Notebook to run codes certain packages and designed functions are required, please make sure you run the following cell!

In [1]:
# Import required packages
import torch
import math
import numpy as np
import matplotlib.pyplot as plt
import h5py
import pandas as pd
from src.preprocessor import Preprocessor
from src.qwen import load_qwen 
from src.plotting import PlotProject


scaling_operator = Preprocessor.scaling_operator # Set scaling_operator to function 
model, tokenizer = load_qwen() # Set model = model and tokeinzer = tokenizer
array_to_string = Preprocessor.array_to_string # Importing array_to_string(data) to convert timeseries to string
string_to_array = Preprocessor.string_to_array # Importing string_to_array(formatted_string) to convert strings back to arrays

plot_hist_MSE = PlotProject.plot_hist_MSE # Set function designed to plot MSE histograms
plot_hist_RMSE = PlotProject.plot_hist_RMSE # Set function designed to plot RMSE histograms
plot_pred_vs_true = PlotProject.plot_pred_vs_true # Set function to plot predicted vs true system


  from .autonotebook import tqdm as notebook_tqdm
2025-03-18 19:17:57.248304: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1742325477.266238   12067 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1742325477.270554   12067 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-03-18 19:17:57.296015: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Sliding Window Attention is enabled but not implemented for `sdpa`; unexpec

This procedure takes a while to run with `model.generate()`, thus, we collected all of Qwen2.5 outcomes and the relevant metrics into the directory `saved_predictions_2b`. The reader is welcome to run the cell below for loading all the relevant metrics.

In [2]:
# Load decoded predictions generated from Qwen2.5, npz file
loaded = np.load("saved_predictions_2b/my_decoded_predictions.npz")
my_decoded_predictions = [loaded[key] for key in loaded]

# Load MSE and RMSE for each system, these are csv files
mse_true_predicted_loaded = pd.read_csv("saved_predictions_2b/mse_true_predicted.csv")
rmse_true_predicted_loaded = pd.read_csv("saved_predictions_2b/rmse_true_predicted.csv")

# Load error computed between true and predicted pairs prey and predator, npz file
loaded_error = np.load("saved_predictions_2b/error_per_system.npz")
error_per_system_loaded = [loaded_error[key] for key in loaded_error]