## Sheep Asthma Study Data Loading & Preprocessing

The script is designed to handle the preprocessing of data in the Sheep Asthma Study. Its primary purpose is to reorganize the data format for improved usability, without removing any data.

This script will only get the data needed and process them in the data folder.

In [1]:
import data_loading
import os
import pandas as pd

# the path of the raw dataset folder
source_folder = "../../../University of Adelaide/Sheep Asthma Study/"

# the path of the destination dataset folder
destination_folder = "../../data/Sleep Asthma/"

# create destination folder if not exists
os.makedirs(destination_folder, exist_ok=True)

## 1. Generating report summary

We will firstly get the summary of the dataset in one spreadsheet, which includes all the data from the study and the 6 parameters (VDP, MSV, TV, VH, VHSS, VHLS).

We will do that by scraping the data from the reports.

Also, we will add the column `FileName` of that animal to extract additional data.

In [2]:
df = data_loading.create_report_summary(source_folder+"Asthma/Output/XV reports/")
df[:5]

Unnamed: 0,ScanName,DatePrepared,VDP(%),MSV(mL/mL),TV(L),VH(%),VHSS(%),VHLS(%),FileName
0,KG_55_A,2021-03-22-17:05:57.713915,21.4,0.12,285.918,68.44,37.68,36.43,KG_55_A.ventilationReport.pdf
1,KG_14_B,2021-03-17-19:05:11.346516,20.0,0.12,302.865,70.81,40.26,48.16,KG_14_B.ventilationReport.pdf
2,KG_27_B,2021-03-17-19:04:47.532855,14.6,0.13,386.701,49.58,31.02,29.53,KG_27_B.ventilationReport.pdf
3,KG_01_A,2021-03-17-19:03:38.319948,21.5,0.15,253.233,71.34,33.05,39.59,KG_01_A.ventilationReport.pdf
4,KG_56_A,2021-03-22-17:07:06.870849,24.8,0.13,358.385,87.73,43.84,60.38,KG_56_A.ventilationReport.pdf


Now we will rename the file name, and extract another column `State`, which will contain `Pre` or `Post`.

In [3]:
#rename scan name
new_scan_name = []
for scan_name in df.ScanName:
    if len(scan_name) == 7:
        new_scan_name.append(scan_name[:5])
    else:
        if len(scan_name.split("-")) >=2:
            if len(scan_name.split("-")[-2].split("_")) == 2:
                new_scan_name.append(scan_name.split("-")[-2])
            else:
                t = scan_name.split("-")[-2].split("_")
                new_scan_name.append(t[-2]+"_"+t[-1])
        else:
            new_scan_name.append("N/A")
df.ScanName = new_scan_name


#add state
state = []
for file in df.FileName:
    if "POST" in file:
        state.append("Post")
    elif "PRE" in file:
        state.append("Pre")
    elif "A" in file:
        state.append("Pre")
    elif "B" in file:
        state.append("Post")

df["State"] = state

We will drop the column `FileName`, and save this Dataframe as a csv file and store it in the data folder

In [4]:
df = df.drop('FileName', axis=1)
df.to_csv(destination_folder+"report_summary.csv", index=False)

### 2. Updating from metadata

We will now update the report summary with the new information from the `sheep_ids_types.csv` (Challenge, U/S pregnancy, Weight)

In [5]:
# create dict from sheep ids types csv file
df = pd.read_csv(source_folder+"sheep_ids_types.csv")
sheeps = {}
for i, id in enumerate(df.ID):
    sheep_dict = {}
    sheep_dict["Challenge"] = df["Challenge"][i]
    sheep_dict["U/S pregnancy"] = df["U/S pregnancy"][i]
    sheep_dict["Weight (kg)"] = df["Weight (kg)"][i]
    sheeps[id] = sheep_dict


# add new info to report summary
df = pd.read_csv(destination_folder+"report_summary.csv")
df.head()
challenge = []
pregnancy = []
weight = []
for id in df["ScanName"]:
    challenge.append(sheeps[id]["Challenge"])
    pregnancy.append(sheeps[id]["U/S pregnancy"])
    weight.append(sheeps[id]["Weight (kg)"])

df["Challenge"] = challenge
df["Pregnancy"] = pregnancy
df["Weight (kg)"] = weight

#save csv
df.to_csv(destination_folder+"report_summary.csv", index=False)

### 3. Copying all csv data

Now we will copy the folder containing all the csv data to the destination folder

In [6]:
data_loading.copy_folder(source_folder+"Asthma/Output/Specific Ventilation", destination_folder+"Specific Ventilation")

Folder '../../../University of Adelaide/Sheep Asthma Study/Asthma/Output/Specific Ventilation' successfully copied to '../../data/Sleep Asthma/Specific Ventilation'.
