# Readme  
This code is intended for use as a base for building models on.  You should include the data from [Causal Structure Learning from Event Sequences](https://www.kaggle.com/datasets/lukemiller1987/causal-structure-learning-from-event-sequences)
## Libraries
This code base was implemented in Python 3.10.12.  If there is a mismatch, please run:  
  
`conda create --name myenv python=3.10.12  
conda activate myenv  
conda install jupyter  
jupyter notebook  `

### Install Specific Package Versions
- Uses `pip` to install specific versions of the following Python packages:
    - `scipy`: version 1.11.2
    - `numpy`: version 1.23.5
    - `pandas`: version 2.0.3
## Import and Version Check
Imports the installed packages and performs a version check using `assert` statements.
## Check Python Version
Checks if the Python version starts with '3.10.12'.
## Import Standard Libraries
Imports `os` and `pickle`, whose versions are tied to the Python version.

Each `assert`statement checks if the current package or Python version matches the expected version. If not, it raises an exception displaying the expected and current versions.

In [None]:
# Install specific versions
!pip install scipy==1.11.2 numpy==1.23.5 pandas==2.0.3

# Import specific versions
import scipy
assert scipy.__version__ == '1.11.2', f'Expected scipy version 1.11.2, got {scipy.__version__}'

import numpy as np
assert np.__version__ == '1.23.5', f'Expected numpy version 1.23.5, got {np.__version__}'

import pandas as pd
assert pd.__version__ == '2.0.3', f'Expected pandas version 2.0.3, got {pd.__version__}'

import sys
assert sys.version.startswith('3.10.12'), f'Expected Python version 3.10.12, got {sys.version}'

# os and pickle are standard libraries and their versions are tied to Python version.
import os
import pickle


## Function Overview: `convert_compressed_to_uncompressed`

### Purpose
Converts a compressed dataset to its normal form. Intended for unpacking datasets that are part of "Causal Structure Learning from Event Sequences." 

### Parameters
- `dataset_num`: The dataset number to process, ranging from 0 to 3.
- `slice_num`: The slice of the dataset to process, ranging from 0 to 1023.

### Implementation Details

#### 1. Load Compressed Data
- Constructs paths to compressed data files (.npz), causal data files (.pkl), and alarm ID mappings (.pkl).
- Reads in the compresseed data using NumPy's `np.load` function and constructs a compressed sparse row (CSR) matrix.

#### 2. Load Causal Data and Mapping
- Loads causal data and the alarm ID mapping from their respective pickle files.
- Converts causal data to NumPy `int8` arrays.

#### 3. Initialize Dictionary
- Initializes an empty dictionary (`data_dict`) to hold the uncompressed data.

#### 4. Conversion to Uncompressed Form
- Iterates through each alarm ID index in the compressed matrix.
- For each alarm, constructs an array representing involved devices at each time stamp (0-1023).
- Adds these arrays to `data_dict` under their respective alarm IDs.

#### 5. Final DataFrame
- Converts `data_dict` to a Pandas DataFrame.
- Appends a new column, `causal_data`, to the DataFrame, mapping the causal data to each alarm ID index.

### Output
- Returns an uncompressed DataFrame containing the expanded dataset and causal information.

### Required Libraries
- NumPy
- Pandas
- pickle


In [None]:
def convert_compressed_to_uncompressed(dataset_num, slice_num):
    # Load sparse data
    base_path = '/kaggle/input/causal-structure-learning-from-event-sequences/'
    data_path = f'{base_path}CSL Sparse Datasets/dataset_{dataset_num}/subfolder_{(slice_num//256)}/dataset_{dataset_num}_{slice_num}.npz'
    causal_path = f'{base_path}CSL Sparse Datasets/dataset_{dataset_num}/subfolder_{(slice_num//256)}/dataset_{dataset_num}_{slice_num}_causal.pkl'
    mapping_path = f'{base_path}dataset_{dataset_num}_alarm_id_mapping.pkl'
    loaded_data = np.load(data_path)
    sparse_mat = csr_matrix((loaded_data['data'], loaded_data['indices'], loaded_data['indptr']), shape=loaded_data['shape'])
    
    # Load causal data and mapping
    with open(causal_path, 'rb') as f:
        causal_data = pickle.load(f)

    causal_data = [np.array(arr, dtype=np.int8) for arr in causal_data]  # Convert to NumPy int8 arrays

    with open(mapping_path, 'rb') as f:
        alarm_id_mapping = pickle.load(f)

    # Initialize the dictionary to hold the dense data
    data_dict = {}
    
    for alarm_id_idx in range(sparse_mat.shape[0]):
        alarm_id = alarm_id_mapping[alarm_id_idx]  # Retrieve actual alarm_id from the mapping
        row = sparse_mat.getrow(alarm_id_idx).toarray().flatten()
        
        # Initialize an empty list to hold device arrays for each time_stamp
        device_arrays_for_alarm = []
        
        for time_stamp in range(1024):  # 2 ** 10 = 1024
            # Initialize a zero array of size 2**8
            device_array = np.zeros(2**8, dtype=int)
            
            for device_id in range(2**8):
                col_index = time_stamp * 2**8 + device_id
                
                if row[col_index] != 0:
                    device_array[device_id] = 1  # Mark device as involved

            # Append the device_array to the list for this alarm_id
            device_arrays_for_alarm.append(device_array)
        
        # Insert the full list of device arrays for this alarm_id into the dictionary
        data_dict[alarm_id] = device_arrays_for_alarm

    # Convert to a DataFrame
    final_df = pd.DataFrame.from_dict(data_dict, orient='index', columns=range(1024))

    # Add causal_data column to DataFrame
    final_df['causal_data'] = final_df.index.map(lambda idx: causal_data[idx])

    return final_df

## Code Snippet Overview

### Code Functionality
This code snippet calls the `convert_compressed_to_uncompressed` function with `dataset_num` set to 0 and `slice_num` set to 0. The function's output is then stored in the variable `ds_0_0`.

### Parameters
- `dataset_num = 0`: Targets the first dataset.
- `slice_num = 0`: Targets the first slice within the dataset.

### Output
`ds_0_0` will contain the uncompressed form of the first slice (`slice_num = 0`) from the first dataset (`dataset_num = 0`).

In [None]:
dataset_num = 0
slice_num = 0
ds_0_0 = convert_compressed_to_uncompressed(dataset_num, slice_num)