## Mouse MPS Study Data Loading & Preprocessing

The script is designed to handle the preprocessing of data in the Mouse MPS Study. Its primary purpose is to reorganize the data format for improved usability, without removing any data.

This script will only get the data needed and process them in the data folder.

In [1]:
import data_loading
import os
import pandas as pd

# the path of the raw dataset folder
source_folder = "../../../University of Adelaide/Mouse MPS Study/"

# the path of the destination dataset folder
destination_folder = "../../data/Mouse MPS Study/"

# create destination folder if not exists
os.makedirs(destination_folder, exist_ok=True)

### 1. Generating report summary

We will firstly get the summary of the dataset in one spreadsheet, which includes all the data from the study and the 6 parameters (VDP, MSV, TV, VH, VHSS, VHLS).

We will do that by scraping the data from the reports.

Also, we will add the column `FileName` of that animal to extract additional data.

In [2]:
df = data_loading.create_report_summary(source_folder+"PDF_reports/")
df[:5]

Unnamed: 0,ScanName,DatePrepared,VDP(%),MSV(mL/mL),TV(L),VH(%),VHSS(%),VHLS(%),FileName
0,474,2023-02-08-22:46:57.616572,10.5,0.32,0.144,38.94,14.35,29.99,474.ventilationReport.pdf
1,479,2023-02-08-22:10:00.609384,11.2,0.36,0.157,40.55,18.53,24.87,479.ventilationReport.pdf
2,448,2023-02-01-02:38:00.518173,11.2,0.34,0.163,40.4,16.35,26.45,448.ventilationReport.pdf
3,415,2023-01-19-22:53:53.501797,10.5,0.38,0.196,32.83,14.64,21.68,415.ventilationReport.pdf
4,496,2023-02-06-22:50:55.357456,12.1,0.34,0.155,41.64,16.84,29.73,496.ventilationReport.pdf


Now we will add a column `Genotype` according to `genotypes.csv`

In [3]:
genotypes_df = pd.read_csv(source_folder+"genotypes.csv")
genotypes = [genotypes_df[genotypes_df.Rat_ID==int(id)]["Genotype"].iloc[0] for id in df.ScanName]
df["Genotype"] = genotypes

Because `Het` in this study will be treated the same as `WT`, we will replace the values accordingly

In [4]:
df['Genotype'] = df['Genotype'].replace('Het', 'WT')

We will drop the column `FileName`, and save this Dataframe as a csv file and store it in the data folder

In [5]:
df = df.drop('FileName', axis=1)
df.to_csv(destination_folder+"report_summary.csv", index=False)

### 2. Copying 3D csv files

Now we will move all the csv data from the raw dataset to the destination dataset as well

In [6]:
data_loading.copy_3d_csvs(source_folder+"csv/", destination_folder+"csv/")