# Tests for `load_data` 
This notebook serves to address [this](https://github.com/ciemss/pyciemss/issues/434) Github issue.

The new interface for calibrate requires csv files, but there are lots of ways that csv files can fail to provide information in the right format. Test the most common failure modes, such as:

- missing data
- incorrectly typed columns
- mislabeled columns
- header columns have one fewer column than data
- alignment issues
- Escaping commas
- Na, NaN, None, '',,

All of these issues will make it difficult to convert a dataframe to a correctly typed tensor.

### Load dependencies

In [15]:
import os
import pyciemss
from pyciemss.interfaces import calibrate

### Load models

In [4]:
MODEL_PATH = "https://raw.githubusercontent.com/DARPA-ASKEM/simulation-integration/main/data/models/"

# Petrinets
petri1 = os.path.join(MODEL_PATH, "SEIRD_base_model01_petrinet.json")
petri2 = os.path.join(MODEL_PATH, "SEIRHD_base_model01_petrinet.json")
petri3 = os.path.join(MODEL_PATH, "SEIRHD_with_reinfection01_petrinet.json")
petri4 = os.path.join(MODEL_PATH, "SEIRHD_NPI_Type1_petrinet.json")
petri5 = os.path.join(MODEL_PATH, "SEIRHD_NPI_Type2_petrinet.json")

# Regnets
regnet1 = os.path.join(MODEL_PATH, "LV_goat_chupacabra_regnet.json")
regnet2 = os.path.join(MODEL_PATH, "LV_sheep_foxes_regnet.json")
regnet3 = os.path.join(MODEL_PATH, "LV_rabbits_wolves_regnet.json")
regnet4 = os.path.join(MODEL_PATH, "LV_rabbits_wolves_model02_regnet.json")
regnet5 = os.path.join(MODEL_PATH, "LV_rabbits_wolves_model03_regnet.json")

# Stock-and-Flow
stock1 = os.path.join(MODEL_PATH, "SIR_stockflow.json")
stock2 = os.path.join(MODEL_PATH, "SEIR_stockflow.json")
stock3 = os.path.join(MODEL_PATH, "SEIRD_stockflow.json")
stock4 = os.path.join(MODEL_PATH, "SEIRHD_stockflow.json")
stock5 = os.path.join(MODEL_PATH, "SEIRHDS_stockflow.json")

### Missing data

In [32]:
dataset1 = "/Users/altu809/Projects/pyciemss/docs/source/sa-testing-notebooks/SIR_missing_data.csv"
dataset2 = "/Users/altu809/Projects/pyciemss/docs/source/sa-testing-notebooks/SIR_data_case_hosp.csv"

import csv

# Specify the path to your CSV file
# csv_file_path = 'path/to/your/file.csv'

# Open the CSV file and create a CSV reader
with open(dataset1, 'r') as file:
    csv_reader = csv.reader(file)

    # Iterate through each row in the CSV file
    for row in csv_reader:
        # Print each row
        print(row)


# TODO: write tests for missing data

['Timestamp', 'S', 'I']
['1.1', '15.0', '0.1']
['2.2', '18.0', '1.0']
['3.3', '20.0', '2.2']


In [33]:
# data_mapping = {"I": "I"}
# results = calibrate(petri2, dataset1, data_mapping=data_mapping)
results

(AutoGuideList(
   (0): AutoDelta()
   (1): AutoLowRankMultivariateNormal()
 ),
 253.0996860563755)

In [None]:
data_mapping = {"case": "I", "hosp": "H"}
results = calibrate(petri2, dataset2, data_mapping=data_mapping)
results

### Incorrectly typed columns

### Mislabeled columns

### Header columns have one fewer column than data

### Other alignment issues

### Escaping commas

### Na, NaN, None, empty string