## 01 | Offspring Codes

This notebook contains code for filtering the Offspring_Codes.csv file.

**** AgAdapt Project ****

- Import required libraries.

In [1]:
import pandas as pd

- Load the offspring codes into a Pandas DataFrame.

In [2]:
offspring_codes = pd.read_csv("../Data/B__Intermediate_Data/Offspring_Codes.csv")
offspring_codes

Unnamed: 0,Code,Pedigree,Female Pedigree,Female ID,Female GBS,Male Pedigree,Male ID,Male GBS
0,2,LH162/CG60,LH162,254,PI539921:250031401,CG60,177,CG60:100001134
1,3,LH82/CG60,LH82,272,PI601170:250032540,CG60,177,CG60:100001134
2,4,LH82/CGR01,LH82,272,PI601170:250032540,CGR01,179,CGR01:100001221
3,5,LH82/LH198,LH82,272,PI601170:250032540,LH198,258,LH198:100000467
4,6,LH198/CGR01,LH198,258,LH198:100000467,CGR01,179,CGR01:100001221
...,...,...,...,...,...,...,...,...
2196,2484,Z037E0054/LH162,Z037E0054,876,Z037E0054:100001184,LH162,254,PI539921:250031401
2197,2485,Z037E0054/PHZ51,Z037E0054,876,Z037E0054:100001184,PHZ51,655,PI601322:250040790
2198,2486,Z038E0057/3IIH6,Z038E0057,877,Z038E0057:100001168,3IIH6,7,3IIH6:100000120
2199,2487,Z038E0057/LH162,Z038E0057,877,Z038E0057:100001168,LH162,254,PI539921:250031401


- Discard GBS, ID, and Code columns.

In [3]:
offspring_codes = offspring_codes[["Pedigree", "Female GBS", "Male GBS"]].copy()
offspring_codes.rename({"Female GBS" : "Female Parent", "Male GBS" : "Male Parent"}, axis = 1, inplace = True)
offspring_codes

Unnamed: 0,Pedigree,Female Parent,Male Parent
0,LH162/CG60,PI539921:250031401,CG60:100001134
1,LH82/CG60,PI601170:250032540,CG60:100001134
2,LH82/CGR01,PI601170:250032540,CGR01:100001221
3,LH82/LH198,PI601170:250032540,LH198:100000467
4,LH198/CGR01,LH198:100000467,CGR01:100001221
...,...,...,...
2196,Z037E0054/LH162,Z037E0054:100001184,PI539921:250031401
2197,Z037E0054/PHZ51,Z037E0054:100001184,PI601322:250040790
2198,Z038E0057/3IIH6,Z038E0057:100001168,3IIH6:100000120
2199,Z038E0057/LH162,Z038E0057:100001168,PI539921:250031401


- Separate the offspring that have missing information for either parent.
- Count the number of individuals with missing data.

In [4]:
missing_parents = offspring_codes[offspring_codes.isnull().any(axis = 1)].copy()
missing_parents.reset_index(drop = True, inplace = True)

print("Individuals with Missing Parents (#) =", missing_parents.shape[0])

missing_parents

Individuals with Missing Parents (#) = 52


Unnamed: 0,Pedigree,Female Parent,Male Parent
0,GT603/PHB47,,PHB47:100000755
1,GT603/PHZ51,,PI601322:250040790
2,PHJ31/PHB47,,PHB47:100000755
3,BGEM-0120-N/LH195,,PI537097:250033872
4,BGEM-0122-N/LH195,,PI537097:250033872
5,BGEM-0260-N/LH195,,PI537097:250033872
6,BGEM-0003-N/LH195,,PI537097:250033872
7,BGEM-0088-N/LH195,,PI537097:250033872
8,BGEM-0110-N/LH195,,PI537097:250033872
9,BGEM-0126-N/LH195,,PI537097:250033872


- Remove the offspring with missing data from the main dataset.

In [5]:
offspring_codes = offspring_codes.dropna()
offspring_codes.reset_index(drop = True, inplace = True)
offspring_codes

Unnamed: 0,Pedigree,Female Parent,Male Parent
0,LH162/CG60,PI539921:250031401,CG60:100001134
1,LH82/CG60,PI601170:250032540,CG60:100001134
2,LH82/CGR01,PI601170:250032540,CGR01:100001221
3,LH82/LH198,PI601170:250032540,LH198:100000467
4,LH198/CGR01,LH198:100000467,CGR01:100001221
...,...,...,...
2144,Z037E0054/LH162,Z037E0054:100001184,PI539921:250031401
2145,Z037E0054/PHZ51,Z037E0054:100001184,PI601322:250040790
2146,Z038E0057/3IIH6,Z038E0057:100001168,3IIH6:100000120
2147,Z038E0057/LH162,Z038E0057:100001168,PI539921:250031401


- Save the Offspring Codes Dataset into a new .csv file.
- Save the Missing Parents Dataset into a separate .csv file.

In [6]:
missing_parents.to_csv("../Data/C__Processed_Data/Supplemental_Data/Offspring_Missing_Parents.csv", index = False)
print("Missing Parents Dataset saved!")

offspring_codes.to_csv("../Data/C__Processed_Data/Supplemental_Data/Filtered_Offspring_Codes.csv", index = False)
print("Filtered Offspring Codes Dataset saved!")

Missing Parents Dataset saved!
Filtered Offspring Codes Dataset saved!
