## Rat Sterile Bead Study Data Loading & Preprocessing

The script is designed to handle the preprocessing of data in the Rat PA Study. Its primary purpose is to reorganize the data format for improved usability, without removing any data.

This script will only get the data needed and process them in the data folder.

In [2]:
import data_loading
import os
import pandas as pd

# the path of the raw dataset folder
source_folder = "../../../University of Adelaide/Rat Sterile Bead Study/"

# the path of the destination dataset folder
destination_folder = "../../data/Rat Sterile Bead Study/"

# create destination folder if not exists
os.makedirs(destination_folder, exist_ok=True)

### 1. Generating report summary

We will firstly get the summary of the dataset in one spreadsheet, which includes all the data from the study and the 6 parameters (VDP, MSV, TV, VH, VHSS, VHLS).

We will do that by scraping the data from the reports.

Also, we will add the column `FileName` of that animal to extract additional data.

Notice there are two folders `baseline` & `post_beads` in this study, so we will add another column `BeadType` to the summary. 

In [4]:
baseline_df = data_loading.create_report_summary(source_folder+"PDF_report/baseline/")
baseline_list = ["Baseline" for i in range(baseline_df.shape[0])]
baseline_df["State"] = baseline_list
post_beads_df = data_loading.create_report_summary(source_folder+"PDF_report/post_beads/")
post_beads_list = ["Post beads" for i in range(post_beads_df.shape[0])]
post_beads_df["State"] = post_beads_list
df = pd.concat([baseline_df, post_beads_df], ignore_index=True)
df

Unnamed: 0,ScanName,DatePrepared,VDP(%),MSV(mL/mL),TV(L),VH(%),VHSS(%),VHLS(%),FileName,State
0,CF508RAT3245,2021-03-17-00:01:03.873649,20.4,0.200,0.100,60.32,22.15,42.36,3245.Phe508.ventilationReport.pdf,Baseline
1,4572_KO,2023-05-15-04:01:16.248838,16.4,0.204,1.041,48.98,19.30,36.90,4572.KO.ventilationReport.pdf,Baseline
2,CF508RAT3246,2021-03-17-19:39:34.686788,19.9,0.200,0.107,58.06,21.74,47.30,3246.Phe508.ventilationReport.pdf,Baseline
3,S17_508,2023-07-05-19:17:38.400397,14.8,0.212,0.912,46.75,21.80,30.24,S17.Phe508.ventilationReport.pdf,Baseline
4,S52_KO,2023-05-15-03:41:04.267753,14.0,0.252,1.106,42.60,16.41,33.38,S52.KO.ventilationReport.pdf,Baseline
...,...,...,...,...,...,...,...,...,...,...
99,CF510RAT3227beads,2021-02-19-16:52:24.828351,35.7,0.170,0.091,106.61,21.67,99.49,3227.KO.beads.ventilationReport.pdf,Post beads
100,S11_KO_BEADS,2023-06-29-05:50:41.190628,16.4,0.199,1.114,49.53,16.80,38.10,S11.KO.beads.ventilationReport.pdf,Post beads
101,S13_BEADS,2023-06-13-10:45:21.885873,22.3,0.139,1.117,66.04,24.10,44.15,S13.WT.beads.ventilationReport.pdf,Post beads
102,WTRAT3195beads,2021-02-18-22:25:48.237799,36.6,0.160,0.082,107.38,32.12,94.58,3195.WT.beads.ventilationReport.pdf,Post beads


We will drop the column `FileName`, and save this Dataframe as a csv file and store it in the data folder

In [5]:
df = df.drop('FileName', axis=1)
df.to_csv(destination_folder+"report_summary.csv", index=False)

### 2. Copying 3D csv files

Now we will move all the csv data from the raw dataset to the destination dataset as well

In [6]:
data_loading.copy_3d_csvs(source_folder+"csv/baseline/", destination_folder+"csv/baseline/")
data_loading.copy_3d_csvs(source_folder+"csv/post_beads/", destination_folder+"csv/post_beads/")