# Getting started

This notebook contains code samples to show the usage of the data and MIPHA implementations contained in this repository.

## Installing requirements

## Datasets

### Unzipping data
The data was zipped for sharing over git, as the unzipped files are bigger than the recommended size. In total, the provided datasets should take under 150MB disk space.
Execute the following cells to unpack the data from `g1g2_samples.zip` into the `data/` directory.

In [1]:
import zipfile
import os

def unzip_file(zip_path, extract_to):
    """Unzips a file to the specified directory."""
    os.makedirs(extract_to, exist_ok=True)  # Ensure the output directory exists
    with zipfile.ZipFile(zip_path, 'r') as zip_ref:
        zip_ref.extractall(extract_to)
        print(f"Data extracted to {extract_to}.")

In [2]:
unzip_file("data/g1g2_samples.zip", "data")

### Opening data
Each matrix is stored as a pickle file. All the given data sources correspond to 3D data of shape `(n_samples, n_timesteps, n_features)`.
- `n_samples`: number of records in the data source.
- `n_timesteps`: number of rows in each matrix (e.g. each row corresponds to one month).
- `n_features`: number of columns in each matrix (e.g. each column corresponds to the results of one analysis over `n_timesteps` months).


In [3]:
import pickle

def load_pickle(filename):
    """
    :param filename: file to load data from
    """
    print(f"Loading data from {filename}...")
    with open(filename, "rb") as file:
        result = pickle.load(file)
        print(f"Data successfully loaded from {filename}")
        return result

In [6]:
matrix_path = f"data/g1g2_samples/most_common_dataset/most_common_analyses.pkl"
matrix = load_pickle(matrix_path)
print(matrix.shape)

Loading data from data/g1g2_samples/most_common_dataset/most_common_analyses.pkl...
Data successfully loaded from data/g1g2_samples/most_common_dataset/most_common_analyses.pkl
(12250, 12, 27)


In [7]:
matrix[0]

array([[  0.        ,   0.        ,   0.        ,   0.        ,
          0.        ,  96.        ,   0.        ,   0.        ,
          0.        ,   0.        ,   0.        ,   0.        ,
          0.        ,   0.        ,   0.        ,   0.        ,
          0.        ,   0.        ,   0.        ,   0.        ,
          0.        ,   0.        ,   0.        ,   0.        ,
          0.        ,   0.        ,   0.        ],
       [  0.        ,   0.        ,   0.        ,   0.        ,
          0.        ,  96.        ,   0.        ,   0.        ,
          0.        ,   0.        ,   0.        ,   0.        ,
          0.        ,   0.        ,   0.        ,   0.        ,
          0.        ,   0.        ,   0.        ,   0.        ,
          0.        ,   0.        ,   0.        ,   0.        ,
          0.        ,   0.        ,   0.        ],
       [  0.        ,   0.        ,   0.        ,   0.        ,
          0.        ,  96.        ,   0.        ,   0.        ,
  