## SRP144776/GSE114129

**paper:** [PMID: 30545297](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6293534/) - The temporal expression patterns of brain transcriptome during chicken development and ageing, 2018

**date, curator:** 2024-10-02, Sara Carsanaro

**notes**
* info added from paper (see methods section)
    * all samples are female
    * all samples are Tibetan chickens
    * all amples are chicken cerebrum
    * Ribo-Zero Gold Kit + NEBNext Ultra Directional RNA Library Prep Kit
    * updated stage for Embryonic day 12 and 16 b/c i used days post fertilization instead of days post incubation

### annotation summary
run this after annotation is complete

### set variables, import packages, define functions

In [1]:
experiment_id = "SRP144776"

path_to_create_exp_script = "/Users/scarsana/Desktop/git/scRNA-Seq/scripts/Create_ExpLib_tables.py" 
experiment_type = "bulk"

path_to_output_main = "/Users/scarsana/Desktop/git/expression-annotations/Notebooks/bulk/" 
path_to_output = "{}{}/".format(path_to_output_main, experiment_id)
library_path_from_script = "{}RNASeqLibrary_{}.tsv".format(path_to_output, experiment_id)
experiment_path_from_script = "{}RNASeqExperiment_{}.tsv".format(path_to_output, experiment_id)
library_to_add_path = "{}complete_RNASeqLibrary_{}.tsv".format(path_to_output, experiment_id)
experiment_to_add_path = "{}complete_RNASeqExperiment_{}.tsv".format(path_to_output, experiment_id)
script_file = "{}.ipynb".format(experiment_id)
commit_message_exp = '"adding annotated bulk experiment {}"'.format(experiment_id)
commit_message_py = '"adding annotation files for {} to notebook folder"'.format(experiment_id)


## to add to git
path_to_git_annotations = "/Users/scarsana/Desktop/git/expression-annotations/RNA_Seq/"
git_library_path = "{}RNASeqLibrary.tsv".format(path_to_git_annotations)
git_experiment_path = "{}RNASeqExperiment.tsv".format(path_to_git_annotations)

library_cols = ['#libraryId', 'experimentId', 'platform', 'SRSId', 'anatId', 'anatName', 'stageId', 'stageName', 'url_GSM', 'infoOrgan', 'infoStage', 'anatAnnotationStatus', 'anatBiologicalStatus', 'stageAnnotationStatus', 'sex', 'strain', 'genotype', 'speciesId', 'protocol', 'protocolType', 'RNASelection', 'globin_reduction', 'replicate', 'lib_name', 'sampleName', 'sampleAge_value', 'sampleAge_unit', 'PATOid', 'PATOname','comment', 'condition', 'physiologicalStatus', 'annotatorId', 'lastModificationDate']

In [2]:
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
import pandas as pd
import numpy as np
from IPython.display import display, HTML
import os
import csv

# displays df with the scrollbar next to the DataFrame
def display_df(df):
    pd.set_option("display.max_rows", None)
    pd.set_option("display.max_columns", None)
    display(HTML("<div style='height: 300px; overflow: auto; width: fit-content'>" +
        df.style.to_html(index=False) + "</div>"))

# function that compares two columns in a dataframe and tells you which ones are not equal (case insensitive)
def compare_columns(df, col1, col2, return_col):
    compare_return = df[col1].str.lower() != df[col2].str.lower()  
    df.loc[compare_return, return_col] 
    if not any(compare_return):
        print("The two columns are equal (case insensitive)")
    else:
        print("The following rows are not equal: ")
        print(df.loc[compare_return, return_col])

# fixes formatting of file to match libreoffice settings/historic file format
def update_format(path):
    with open(path, 'r') as file:
        filedata = file.read()
    # Replace the target string
    filedata = filedata.replace("\t\"\"", "\t")
    # Write the file out again
    with open(path, 'w') as file:
        file.write(filedata)

# checks for duplicate values in a specific column and prints those values + the corresponding library id
def dup_check(df, column):
    duplicateCheck = df.duplicated(subset=[column], keep=False)
    if duplicateCheck.unique() == False:
        print("no duplicate values in " + column)
    elif duplicateCheck.unique() == True and column != '#libraryId':
        print(df[duplicateCheck].loc[:,['#libraryId', column]])
    elif duplicateCheck.unique() == True and column == '#libraryId':
        print(df[duplicateCheck].loc[:,['#libraryId']])

# prints all unique values in a specific column
def unique_sorted(df, column):
    unique = df[column].unique()
    unique.sort()
    print(unique)

### script

In [3]:
! python3 $path_to_create_exp_script $experiment_id $path_to_output $experiment_type

  all_protoc = [w.replace('(', '\(') for w in all_protoc]
  all_protoc = [w.replace(')', '\)') for w in all_protoc] 
Be patient, it may take a few minutes.
0it [00:00, ?it/s]
2 samples dont have attributes, try to find them somewhere else
100%|█████████████████████████████████████████████| 2/2 [00:03<00:00,  1.61s/it]
0 samples dont have attributes


### library annnotations

In [4]:
library = pd.read_csv(library_path_from_script, sep='\t', index_col=False, keep_default_na=False, na_values=['NULL','null', 'nan','NaN'], dtype=object)
display_df(library)

Unnamed: 0,#libraryId,experimentId,platform,SRSId,anatId,anatName,stageId,stageName,url_GSM,infoOrgan,infoStage,anatAnnotationStatus,anatBiologicalStatus,stageAnnotationStatus,sex,strain,genotype,speciesId,protocol,protocolType,RNASelection,globin_reduction,replicate,lib_name,sampleName,sampleAge_value,sampleAge_unit,PATOid,PATOname,comment,condition,physiologicalStatus,annotatorId,lastModificationDate,library_contruction_protocol,source_qc,lib_name_2,lib_name_3,source_name,individual,infoStage_2,infoStage_3
0,SRX4048921,SRP144776,HiSeq X Ten,SRS3266038,,,,,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133338,Brain - cerebrum,5 years old,,,,,,,9031,,,,,,Y5-2,"SAMN09083321,GSM3133338",5.0,year,,,,,,,02/10/2024,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Adult and aged stage,
1,SRX4048920,SRP144776,HiSeq X Ten,SRS3266037,,,,,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133337,Brain - cerebrum,5 years old,,,,,,,9031,,,,,,Y5-1,"SAMN09083322,GSM3133337",5.0,year,,,,,,,02/10/2024,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Adult and aged stage,
2,SRX4048919,SRP144776,HiSeq X Ten,SRS3266036,,,,,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133336,Brain - cerebrum,3 years old,,,,,,,9031,,,,,,Y3-2,"SAMN09083293,GSM3133336",3.0,year,,,,,,,02/10/2024,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Adult and aged stage,
3,SRX4048918,SRP144776,HiSeq X Ten,SRS3266035,,,,,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133335,Brain - cerebrum,3 years old,,,,,,,9031,,,,,,Y3-1,"SAMN09083306,GSM3133335",3.0,year,,,,,,,02/10/2024,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Adult and aged stage,
4,SRX4048917,SRP144776,HiSeq X Ten,SRS3266034,,,,,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133334,Brain - cerebrum,1 year old,,,,,,,9031,,,,,,Y1-3,"SAMN09083307,GSM3133334",1.0,year,,,,,,,02/10/2024,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Adult and aged stage,
5,SRX4048916,SRP144776,HiSeq X Ten,SRS3266033,,,,,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133333,Brain - cerebrum,1 year old,,,,,,,9031,,,,,,Y1-2,"SAMN09083308,GSM3133333",1.0,year,,,,,,,02/10/2024,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Adult and aged stage,
6,SRX4048915,SRP144776,HiSeq X Ten,SRS3266032,,,,,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133332,Brain - cerebrum,1 year old,,,,,,,9031,,,,,,Y1-1,"SAMN09083309,GSM3133332",1.0,year,,,,,,,02/10/2024,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Adult and aged stage,
7,SRX4048914,SRP144776,HiSeq X Ten,SRS3266031,,,,,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133331,Brain - cerebrum,300 days old,,,,,,,9031,,,,,,D300-3,"SAMN09083310,GSM3133331",300.0,day,,,,,,,02/10/2024,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Rapid growth stage,
8,SRX4048913,SRP144776,HiSeq X Ten,SRS3266030,,,,,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133330,Brain - cerebrum,300 days old,,,,,,,9031,,,,,,D300-2,"SAMN09083311,GSM3133330",300.0,day,,,,,,,02/10/2024,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Rapid growth stage,
9,SRX4048912,SRP144776,HiSeq X Ten,SRS3266029,,,,,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133329,Brain - cerebrum,300 days old,,,,,,,9031,,,,,,D300-1,"SAMN09083312,GSM3133329",300.0,day,,,,,,,02/10/2024,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Rapid growth stage,


#### anatomical entity

In [5]:
unique_sorted(library, "infoOrgan")

['Brain - cerebrum']


In [6]:

# all
library.loc[:,'anatId'] = 'UBERON:0001893'
library.loc[:,'anatName'] = 'telencephalon'
# perfect match, missing child term, other
library.loc[:,'anatAnnotationStatus'] = 'perfect match'
# partial sampling, full sampling, not documented
library.loc[:,'anatBiologicalStatus'] = 'not documented'


# view
display_df(library)

Unnamed: 0,#libraryId,experimentId,platform,SRSId,anatId,anatName,stageId,stageName,url_GSM,infoOrgan,infoStage,anatAnnotationStatus,anatBiologicalStatus,stageAnnotationStatus,sex,strain,genotype,speciesId,protocol,protocolType,RNASelection,globin_reduction,replicate,lib_name,sampleName,sampleAge_value,sampleAge_unit,PATOid,PATOname,comment,condition,physiologicalStatus,annotatorId,lastModificationDate,library_contruction_protocol,source_qc,lib_name_2,lib_name_3,source_name,individual,infoStage_2,infoStage_3
0,SRX4048921,SRP144776,HiSeq X Ten,SRS3266038,UBERON:0001893,telencephalon,,,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133338,Brain - cerebrum,5 years old,perfect match,not documented,,,,,9031,,,,,,Y5-2,"SAMN09083321,GSM3133338",5.0,year,,,,,,,02/10/2024,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Adult and aged stage,
1,SRX4048920,SRP144776,HiSeq X Ten,SRS3266037,UBERON:0001893,telencephalon,,,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133337,Brain - cerebrum,5 years old,perfect match,not documented,,,,,9031,,,,,,Y5-1,"SAMN09083322,GSM3133337",5.0,year,,,,,,,02/10/2024,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Adult and aged stage,
2,SRX4048919,SRP144776,HiSeq X Ten,SRS3266036,UBERON:0001893,telencephalon,,,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133336,Brain - cerebrum,3 years old,perfect match,not documented,,,,,9031,,,,,,Y3-2,"SAMN09083293,GSM3133336",3.0,year,,,,,,,02/10/2024,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Adult and aged stage,
3,SRX4048918,SRP144776,HiSeq X Ten,SRS3266035,UBERON:0001893,telencephalon,,,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133335,Brain - cerebrum,3 years old,perfect match,not documented,,,,,9031,,,,,,Y3-1,"SAMN09083306,GSM3133335",3.0,year,,,,,,,02/10/2024,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Adult and aged stage,
4,SRX4048917,SRP144776,HiSeq X Ten,SRS3266034,UBERON:0001893,telencephalon,,,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133334,Brain - cerebrum,1 year old,perfect match,not documented,,,,,9031,,,,,,Y1-3,"SAMN09083307,GSM3133334",1.0,year,,,,,,,02/10/2024,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Adult and aged stage,
5,SRX4048916,SRP144776,HiSeq X Ten,SRS3266033,UBERON:0001893,telencephalon,,,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133333,Brain - cerebrum,1 year old,perfect match,not documented,,,,,9031,,,,,,Y1-2,"SAMN09083308,GSM3133333",1.0,year,,,,,,,02/10/2024,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Adult and aged stage,
6,SRX4048915,SRP144776,HiSeq X Ten,SRS3266032,UBERON:0001893,telencephalon,,,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133332,Brain - cerebrum,1 year old,perfect match,not documented,,,,,9031,,,,,,Y1-1,"SAMN09083309,GSM3133332",1.0,year,,,,,,,02/10/2024,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Adult and aged stage,
7,SRX4048914,SRP144776,HiSeq X Ten,SRS3266031,UBERON:0001893,telencephalon,,,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133331,Brain - cerebrum,300 days old,perfect match,not documented,,,,,9031,,,,,,D300-3,"SAMN09083310,GSM3133331",300.0,day,,,,,,,02/10/2024,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Rapid growth stage,
8,SRX4048913,SRP144776,HiSeq X Ten,SRS3266030,UBERON:0001893,telencephalon,,,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133330,Brain - cerebrum,300 days old,perfect match,not documented,,,,,9031,,,,,,D300-2,"SAMN09083311,GSM3133330",300.0,day,,,,,,,02/10/2024,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Rapid growth stage,
9,SRX4048912,SRP144776,HiSeq X Ten,SRS3266029,UBERON:0001893,telencephalon,,,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133329,Brain - cerebrum,300 days old,perfect match,not documented,,,,,9031,,,,,,D300-1,"SAMN09083312,GSM3133329",300.0,day,,,,,,,02/10/2024,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Rapid growth stage,


#### stage
- [species specific developmental ontologies](https://github.com/obophenotype/developmental-stage-ontologies/tree/master/src)

In [7]:
unique_sorted(library, "infoStage")

['1 year old' '100 days old' '3 years old' '300 days old' '5 years old'
 'Embryonic day 12' 'Embryonic day 16' 'Embryonic day 20']


In [8]:
# 5 years old
library.loc[library["infoStage"] == "5 years old", "stageId"] = "GgalDv:0000080"
library.loc[library["infoStage"] == "5 years old", "stageName"] = "late adult stage"
# Chicken developmental stage that refers to a chicken who is over 4 years old.
# perfect match, missing child term, other
library.loc[library["infoStage"] == "5 years old", "stageAnnotationStatus"] = "missing child term"

# 3 years old
library.loc[library["infoStage"] == "3 years old", "stageId"] = "GgalDv:0000089"
library.loc[library["infoStage"] == "3 years old", "stageName"] = "3-year-old stage"
# perfect match, missing child term, other
library.loc[library["infoStage"] == "3 years old", "stageAnnotationStatus"] = "perfect match"

# 1 year old
library.loc[library["infoStage"] == "1 year old", "stageId"] = "GgalDv:0000008"
library.loc[library["infoStage"] == "1 year old", "stageName"] = "1-year-old stage"
# perfect match, missing child term, other
library.loc[library["infoStage"] == "1 year old", "stageAnnotationStatus"] = "perfect match"

# 300 days old - approx 9.86 months
library.loc[library["infoStage"] == "300 days old", "stageId"] = "GgalDv:0000085"
library.loc[library["infoStage"] == "300 days old", "stageName"] = "9-month-old stage"
# perfect match, missing child term, other
library.loc[library["infoStage"] == "300 days old", "stageAnnotationStatus"] = "missing child term"

# 100 days old - approx 3.29 months
library.loc[library["infoStage"] == "100 days old", "stageId"] = "GgalDv:0000005"
library.loc[library["infoStage"] == "100 days old", "stageName"] = "juvenile stage"
# Chicken developmental stage that covers the period from 4 weeks old, when individuals are fully feathered, until 5 months old.
# perfect match, missing child term, other
library.loc[library["infoStage"] == "100 days old", "stageAnnotationStatus"] = "missing child term"

# Embryonic day 20
library.loc[library["infoStage"] == "Embryonic day 20", "stageId"] = "GgalDv:0000059"
library.loc[library["infoStage"] == "Embryonic day 20", "stageName"] = "Hamburger Hamilton stage 45"
# Usually obtained after 19.0-20.0 days of incubation.
# perfect match, missing child term, other
library.loc[library["infoStage"] == "Embryonic day 20", "stageAnnotationStatus"] = "perfect match"

# Embryonic day 16
library.loc[library["infoStage"] == "Embryonic day 16", "stageId"] = "GgalDv:0000056"
library.loc[library["infoStage"] == "Embryonic day 16", "stageName"] = "Hamburger Hamilton stage 42"
# Usually obtained after 16.0 days of incubation.
# perfect match, missing child term, other
library.loc[library["infoStage"] == "Embryonic day 16", "stageAnnotationStatus"] = "perfect match"

# Embryonic day 12
library.loc[library["infoStage"] == "Embryonic day 12", "stageId"] = "GgalDv:0000052"
library.loc[library["infoStage"] == "Embryonic day 12", "stageName"] = "Hamburger Hamilton stage 38"
# Usually obtained after 12.0 days of incubation.
# perfect match, missing child term, other
library.loc[library["infoStage"] == "Embryonic day 12", "stageAnnotationStatus"] = "perfect match"

# view
display_df(library)

Unnamed: 0,#libraryId,experimentId,platform,SRSId,anatId,anatName,stageId,stageName,url_GSM,infoOrgan,infoStage,anatAnnotationStatus,anatBiologicalStatus,stageAnnotationStatus,sex,strain,genotype,speciesId,protocol,protocolType,RNASelection,globin_reduction,replicate,lib_name,sampleName,sampleAge_value,sampleAge_unit,PATOid,PATOname,comment,condition,physiologicalStatus,annotatorId,lastModificationDate,library_contruction_protocol,source_qc,lib_name_2,lib_name_3,source_name,individual,infoStage_2,infoStage_3
0,SRX4048921,SRP144776,HiSeq X Ten,SRS3266038,UBERON:0001893,telencephalon,GgalDv:0000080,late adult stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133338,Brain - cerebrum,5 years old,perfect match,not documented,missing child term,,,,9031,,,,,,Y5-2,"SAMN09083321,GSM3133338",5.0,year,,,,,,,02/10/2024,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Adult and aged stage,
1,SRX4048920,SRP144776,HiSeq X Ten,SRS3266037,UBERON:0001893,telencephalon,GgalDv:0000080,late adult stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133337,Brain - cerebrum,5 years old,perfect match,not documented,missing child term,,,,9031,,,,,,Y5-1,"SAMN09083322,GSM3133337",5.0,year,,,,,,,02/10/2024,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Adult and aged stage,
2,SRX4048919,SRP144776,HiSeq X Ten,SRS3266036,UBERON:0001893,telencephalon,GgalDv:0000089,3-year-old stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133336,Brain - cerebrum,3 years old,perfect match,not documented,perfect match,,,,9031,,,,,,Y3-2,"SAMN09083293,GSM3133336",3.0,year,,,,,,,02/10/2024,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Adult and aged stage,
3,SRX4048918,SRP144776,HiSeq X Ten,SRS3266035,UBERON:0001893,telencephalon,GgalDv:0000089,3-year-old stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133335,Brain - cerebrum,3 years old,perfect match,not documented,perfect match,,,,9031,,,,,,Y3-1,"SAMN09083306,GSM3133335",3.0,year,,,,,,,02/10/2024,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Adult and aged stage,
4,SRX4048917,SRP144776,HiSeq X Ten,SRS3266034,UBERON:0001893,telencephalon,GgalDv:0000008,1-year-old stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133334,Brain - cerebrum,1 year old,perfect match,not documented,perfect match,,,,9031,,,,,,Y1-3,"SAMN09083307,GSM3133334",1.0,year,,,,,,,02/10/2024,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Adult and aged stage,
5,SRX4048916,SRP144776,HiSeq X Ten,SRS3266033,UBERON:0001893,telencephalon,GgalDv:0000008,1-year-old stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133333,Brain - cerebrum,1 year old,perfect match,not documented,perfect match,,,,9031,,,,,,Y1-2,"SAMN09083308,GSM3133333",1.0,year,,,,,,,02/10/2024,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Adult and aged stage,
6,SRX4048915,SRP144776,HiSeq X Ten,SRS3266032,UBERON:0001893,telencephalon,GgalDv:0000008,1-year-old stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133332,Brain - cerebrum,1 year old,perfect match,not documented,perfect match,,,,9031,,,,,,Y1-1,"SAMN09083309,GSM3133332",1.0,year,,,,,,,02/10/2024,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Adult and aged stage,
7,SRX4048914,SRP144776,HiSeq X Ten,SRS3266031,UBERON:0001893,telencephalon,GgalDv:0000085,9-month-old stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133331,Brain - cerebrum,300 days old,perfect match,not documented,missing child term,,,,9031,,,,,,D300-3,"SAMN09083310,GSM3133331",300.0,day,,,,,,,02/10/2024,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Rapid growth stage,
8,SRX4048913,SRP144776,HiSeq X Ten,SRS3266030,UBERON:0001893,telencephalon,GgalDv:0000085,9-month-old stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133330,Brain - cerebrum,300 days old,perfect match,not documented,missing child term,,,,9031,,,,,,D300-2,"SAMN09083311,GSM3133330",300.0,day,,,,,,,02/10/2024,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Rapid growth stage,
9,SRX4048912,SRP144776,HiSeq X Ten,SRS3266029,UBERON:0001893,telencephalon,GgalDv:0000085,9-month-old stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133329,Brain - cerebrum,300 days old,perfect match,not documented,missing child term,,,,9031,,,,,,D300-1,"SAMN09083312,GSM3133329",300.0,day,,,,,,,02/10/2024,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Rapid growth stage,


#### sex, strain, genotype, speciesId
- uniprot [strain list](https://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/docs/strains)
- uniprot [species list](https://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/docs/speclist)
- bgee [strain mapping](https://gitlab.sib.swiss/Bgee/expression-annotations/-/tree/develop/Strains?ref_type=heads)

In [9]:
library.loc[:,'sex'] = 'F'

library.loc[:,'strain'] = 'Tibetan'

#library.loc[:,'genotype'] = ''

#library.loc[:,'speciesId'] = ''

# view
display_df(library)

Unnamed: 0,#libraryId,experimentId,platform,SRSId,anatId,anatName,stageId,stageName,url_GSM,infoOrgan,infoStage,anatAnnotationStatus,anatBiologicalStatus,stageAnnotationStatus,sex,strain,genotype,speciesId,protocol,protocolType,RNASelection,globin_reduction,replicate,lib_name,sampleName,sampleAge_value,sampleAge_unit,PATOid,PATOname,comment,condition,physiologicalStatus,annotatorId,lastModificationDate,library_contruction_protocol,source_qc,lib_name_2,lib_name_3,source_name,individual,infoStage_2,infoStage_3
0,SRX4048921,SRP144776,HiSeq X Ten,SRS3266038,UBERON:0001893,telencephalon,GgalDv:0000080,late adult stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133338,Brain - cerebrum,5 years old,perfect match,not documented,missing child term,F,Tibetan,,9031,,,,,,Y5-2,"SAMN09083321,GSM3133338",5.0,year,,,,,,,02/10/2024,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Adult and aged stage,
1,SRX4048920,SRP144776,HiSeq X Ten,SRS3266037,UBERON:0001893,telencephalon,GgalDv:0000080,late adult stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133337,Brain - cerebrum,5 years old,perfect match,not documented,missing child term,F,Tibetan,,9031,,,,,,Y5-1,"SAMN09083322,GSM3133337",5.0,year,,,,,,,02/10/2024,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Adult and aged stage,
2,SRX4048919,SRP144776,HiSeq X Ten,SRS3266036,UBERON:0001893,telencephalon,GgalDv:0000089,3-year-old stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133336,Brain - cerebrum,3 years old,perfect match,not documented,perfect match,F,Tibetan,,9031,,,,,,Y3-2,"SAMN09083293,GSM3133336",3.0,year,,,,,,,02/10/2024,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Adult and aged stage,
3,SRX4048918,SRP144776,HiSeq X Ten,SRS3266035,UBERON:0001893,telencephalon,GgalDv:0000089,3-year-old stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133335,Brain - cerebrum,3 years old,perfect match,not documented,perfect match,F,Tibetan,,9031,,,,,,Y3-1,"SAMN09083306,GSM3133335",3.0,year,,,,,,,02/10/2024,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Adult and aged stage,
4,SRX4048917,SRP144776,HiSeq X Ten,SRS3266034,UBERON:0001893,telencephalon,GgalDv:0000008,1-year-old stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133334,Brain - cerebrum,1 year old,perfect match,not documented,perfect match,F,Tibetan,,9031,,,,,,Y1-3,"SAMN09083307,GSM3133334",1.0,year,,,,,,,02/10/2024,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Adult and aged stage,
5,SRX4048916,SRP144776,HiSeq X Ten,SRS3266033,UBERON:0001893,telencephalon,GgalDv:0000008,1-year-old stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133333,Brain - cerebrum,1 year old,perfect match,not documented,perfect match,F,Tibetan,,9031,,,,,,Y1-2,"SAMN09083308,GSM3133333",1.0,year,,,,,,,02/10/2024,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Adult and aged stage,
6,SRX4048915,SRP144776,HiSeq X Ten,SRS3266032,UBERON:0001893,telencephalon,GgalDv:0000008,1-year-old stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133332,Brain - cerebrum,1 year old,perfect match,not documented,perfect match,F,Tibetan,,9031,,,,,,Y1-1,"SAMN09083309,GSM3133332",1.0,year,,,,,,,02/10/2024,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Adult and aged stage,
7,SRX4048914,SRP144776,HiSeq X Ten,SRS3266031,UBERON:0001893,telencephalon,GgalDv:0000085,9-month-old stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133331,Brain - cerebrum,300 days old,perfect match,not documented,missing child term,F,Tibetan,,9031,,,,,,D300-3,"SAMN09083310,GSM3133331",300.0,day,,,,,,,02/10/2024,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Rapid growth stage,
8,SRX4048913,SRP144776,HiSeq X Ten,SRS3266030,UBERON:0001893,telencephalon,GgalDv:0000085,9-month-old stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133330,Brain - cerebrum,300 days old,perfect match,not documented,missing child term,F,Tibetan,,9031,,,,,,D300-2,"SAMN09083311,GSM3133330",300.0,day,,,,,,,02/10/2024,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Rapid growth stage,
9,SRX4048912,SRP144776,HiSeq X Ten,SRS3266029,UBERON:0001893,telencephalon,GgalDv:0000085,9-month-old stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133329,Brain - cerebrum,300 days old,perfect match,not documented,missing child term,F,Tibetan,,9031,,,,,,D300-1,"SAMN09083312,GSM3133329",300.0,day,,,,,,,02/10/2024,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Rapid growth stage,


#### protocol
see [bulk kits](https://gitlab.sib.swiss/Bgee/scRNA-Seq/-/blob/main/scripts/bulk_kits.csv) for some common protocols

In [10]:
# making these variables because we use them again in the experiment file
my_protocol = 'Ribo-Zero Gold Kit'
# full_length or 3'
my_protocolType = 'full_length'

library.loc[:,'protocol'] = my_protocol
library.loc[:,'protocolType'] = my_protocolType
# polyA, ribo-minus, miRNA, lncRNA, circRNA
library.loc[:,'RNASelection'] = 'ribo-minus'

# view
display_df(library)

Unnamed: 0,#libraryId,experimentId,platform,SRSId,anatId,anatName,stageId,stageName,url_GSM,infoOrgan,infoStage,anatAnnotationStatus,anatBiologicalStatus,stageAnnotationStatus,sex,strain,genotype,speciesId,protocol,protocolType,RNASelection,globin_reduction,replicate,lib_name,sampleName,sampleAge_value,sampleAge_unit,PATOid,PATOname,comment,condition,physiologicalStatus,annotatorId,lastModificationDate,library_contruction_protocol,source_qc,lib_name_2,lib_name_3,source_name,individual,infoStage_2,infoStage_3
0,SRX4048921,SRP144776,HiSeq X Ten,SRS3266038,UBERON:0001893,telencephalon,GgalDv:0000080,late adult stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133338,Brain - cerebrum,5 years old,perfect match,not documented,missing child term,F,Tibetan,,9031,Ribo-Zero Gold Kit,full_length,ribo-minus,,,Y5-2,"SAMN09083321,GSM3133338",5.0,year,,,,,,,02/10/2024,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Adult and aged stage,
1,SRX4048920,SRP144776,HiSeq X Ten,SRS3266037,UBERON:0001893,telencephalon,GgalDv:0000080,late adult stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133337,Brain - cerebrum,5 years old,perfect match,not documented,missing child term,F,Tibetan,,9031,Ribo-Zero Gold Kit,full_length,ribo-minus,,,Y5-1,"SAMN09083322,GSM3133337",5.0,year,,,,,,,02/10/2024,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Adult and aged stage,
2,SRX4048919,SRP144776,HiSeq X Ten,SRS3266036,UBERON:0001893,telencephalon,GgalDv:0000089,3-year-old stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133336,Brain - cerebrum,3 years old,perfect match,not documented,perfect match,F,Tibetan,,9031,Ribo-Zero Gold Kit,full_length,ribo-minus,,,Y3-2,"SAMN09083293,GSM3133336",3.0,year,,,,,,,02/10/2024,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Adult and aged stage,
3,SRX4048918,SRP144776,HiSeq X Ten,SRS3266035,UBERON:0001893,telencephalon,GgalDv:0000089,3-year-old stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133335,Brain - cerebrum,3 years old,perfect match,not documented,perfect match,F,Tibetan,,9031,Ribo-Zero Gold Kit,full_length,ribo-minus,,,Y3-1,"SAMN09083306,GSM3133335",3.0,year,,,,,,,02/10/2024,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Adult and aged stage,
4,SRX4048917,SRP144776,HiSeq X Ten,SRS3266034,UBERON:0001893,telencephalon,GgalDv:0000008,1-year-old stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133334,Brain - cerebrum,1 year old,perfect match,not documented,perfect match,F,Tibetan,,9031,Ribo-Zero Gold Kit,full_length,ribo-minus,,,Y1-3,"SAMN09083307,GSM3133334",1.0,year,,,,,,,02/10/2024,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Adult and aged stage,
5,SRX4048916,SRP144776,HiSeq X Ten,SRS3266033,UBERON:0001893,telencephalon,GgalDv:0000008,1-year-old stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133333,Brain - cerebrum,1 year old,perfect match,not documented,perfect match,F,Tibetan,,9031,Ribo-Zero Gold Kit,full_length,ribo-minus,,,Y1-2,"SAMN09083308,GSM3133333",1.0,year,,,,,,,02/10/2024,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Adult and aged stage,
6,SRX4048915,SRP144776,HiSeq X Ten,SRS3266032,UBERON:0001893,telencephalon,GgalDv:0000008,1-year-old stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133332,Brain - cerebrum,1 year old,perfect match,not documented,perfect match,F,Tibetan,,9031,Ribo-Zero Gold Kit,full_length,ribo-minus,,,Y1-1,"SAMN09083309,GSM3133332",1.0,year,,,,,,,02/10/2024,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Adult and aged stage,
7,SRX4048914,SRP144776,HiSeq X Ten,SRS3266031,UBERON:0001893,telencephalon,GgalDv:0000085,9-month-old stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133331,Brain - cerebrum,300 days old,perfect match,not documented,missing child term,F,Tibetan,,9031,Ribo-Zero Gold Kit,full_length,ribo-minus,,,D300-3,"SAMN09083310,GSM3133331",300.0,day,,,,,,,02/10/2024,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Rapid growth stage,
8,SRX4048913,SRP144776,HiSeq X Ten,SRS3266030,UBERON:0001893,telencephalon,GgalDv:0000085,9-month-old stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133330,Brain - cerebrum,300 days old,perfect match,not documented,missing child term,F,Tibetan,,9031,Ribo-Zero Gold Kit,full_length,ribo-minus,,,D300-2,"SAMN09083311,GSM3133330",300.0,day,,,,,,,02/10/2024,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Rapid growth stage,
9,SRX4048912,SRP144776,HiSeq X Ten,SRS3266029,UBERON:0001893,telencephalon,GgalDv:0000085,9-month-old stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133329,Brain - cerebrum,300 days old,perfect match,not documented,missing child term,F,Tibetan,,9031,Ribo-Zero Gold Kit,full_length,ribo-minus,,,D300-1,"SAMN09083312,GSM3133329",300.0,day,,,,,,,02/10/2024,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Rapid growth stage,


#### globin, replicates

In [11]:
# check for duplicate SRSId values
dup_check(library, "SRSId")

no duplicate values in SRSId


In [None]:
#library.loc[:,'globin_reduction'] = 'Y'

# replicates
#library.loc[library["#libraryId"] == "old", "replicate"] = "1"
#library.loc[library["#libraryId"] in ["one", "two"], "replicate"] = "1"

# view
display_df(library)

#### sample age, pato, physiological status
i set sample age manually

In [None]:
#library.loc[:,'sampleAge_value'] = ''
#library.loc[:,'sampleAge_unit'] = ''

# ex. castrated male
#library.loc[:,'PATOid'] = ''
#library.loc[:,'PATOname'] = ''

# ex. castrated, pregnant, pre-smoltification, post-smoltification, laying eggs
#library.loc[:,'physiologicalStatus'] = ''

# view
display_df(library)

#### condition

In [None]:
# ex. control, diet, light, reproductive capacity, time post mortem, time post feeding, 
# exercise details, menstruation, personality, litter size 
#library.loc[library["condition"] == "old", "condition"] = "new"

# view
display_df(library)

#### annotator id, last modification date

In [12]:
library.loc[:,'annotatorId'] = 'SAC'
library.loc[:,'lastModificationDate'] = '2024-10-07'

# view
display_df(library)

Unnamed: 0,#libraryId,experimentId,platform,SRSId,anatId,anatName,stageId,stageName,url_GSM,infoOrgan,infoStage,anatAnnotationStatus,anatBiologicalStatus,stageAnnotationStatus,sex,strain,genotype,speciesId,protocol,protocolType,RNASelection,globin_reduction,replicate,lib_name,sampleName,sampleAge_value,sampleAge_unit,PATOid,PATOname,comment,condition,physiologicalStatus,annotatorId,lastModificationDate,library_contruction_protocol,source_qc,lib_name_2,lib_name_3,source_name,individual,infoStage_2,infoStage_3
0,SRX4048921,SRP144776,HiSeq X Ten,SRS3266038,UBERON:0001893,telencephalon,GgalDv:0000080,late adult stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133338,Brain - cerebrum,5 years old,perfect match,not documented,missing child term,F,Tibetan,,9031,Ribo-Zero Gold Kit,full_length,ribo-minus,,,Y5-2,"SAMN09083321,GSM3133338",5.0,year,,,,,,SAC,2024-10-07,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Adult and aged stage,
1,SRX4048920,SRP144776,HiSeq X Ten,SRS3266037,UBERON:0001893,telencephalon,GgalDv:0000080,late adult stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133337,Brain - cerebrum,5 years old,perfect match,not documented,missing child term,F,Tibetan,,9031,Ribo-Zero Gold Kit,full_length,ribo-minus,,,Y5-1,"SAMN09083322,GSM3133337",5.0,year,,,,,,SAC,2024-10-07,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Adult and aged stage,
2,SRX4048919,SRP144776,HiSeq X Ten,SRS3266036,UBERON:0001893,telencephalon,GgalDv:0000089,3-year-old stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133336,Brain - cerebrum,3 years old,perfect match,not documented,perfect match,F,Tibetan,,9031,Ribo-Zero Gold Kit,full_length,ribo-minus,,,Y3-2,"SAMN09083293,GSM3133336",3.0,year,,,,,,SAC,2024-10-07,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Adult and aged stage,
3,SRX4048918,SRP144776,HiSeq X Ten,SRS3266035,UBERON:0001893,telencephalon,GgalDv:0000089,3-year-old stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133335,Brain - cerebrum,3 years old,perfect match,not documented,perfect match,F,Tibetan,,9031,Ribo-Zero Gold Kit,full_length,ribo-minus,,,Y3-1,"SAMN09083306,GSM3133335",3.0,year,,,,,,SAC,2024-10-07,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Adult and aged stage,
4,SRX4048917,SRP144776,HiSeq X Ten,SRS3266034,UBERON:0001893,telencephalon,GgalDv:0000008,1-year-old stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133334,Brain - cerebrum,1 year old,perfect match,not documented,perfect match,F,Tibetan,,9031,Ribo-Zero Gold Kit,full_length,ribo-minus,,,Y1-3,"SAMN09083307,GSM3133334",1.0,year,,,,,,SAC,2024-10-07,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Adult and aged stage,
5,SRX4048916,SRP144776,HiSeq X Ten,SRS3266033,UBERON:0001893,telencephalon,GgalDv:0000008,1-year-old stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133333,Brain - cerebrum,1 year old,perfect match,not documented,perfect match,F,Tibetan,,9031,Ribo-Zero Gold Kit,full_length,ribo-minus,,,Y1-2,"SAMN09083308,GSM3133333",1.0,year,,,,,,SAC,2024-10-07,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Adult and aged stage,
6,SRX4048915,SRP144776,HiSeq X Ten,SRS3266032,UBERON:0001893,telencephalon,GgalDv:0000008,1-year-old stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133332,Brain - cerebrum,1 year old,perfect match,not documented,perfect match,F,Tibetan,,9031,Ribo-Zero Gold Kit,full_length,ribo-minus,,,Y1-1,"SAMN09083309,GSM3133332",1.0,year,,,,,,SAC,2024-10-07,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Adult and aged stage,
7,SRX4048914,SRP144776,HiSeq X Ten,SRS3266031,UBERON:0001893,telencephalon,GgalDv:0000085,9-month-old stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133331,Brain - cerebrum,300 days old,perfect match,not documented,missing child term,F,Tibetan,,9031,Ribo-Zero Gold Kit,full_length,ribo-minus,,,D300-3,"SAMN09083310,GSM3133331",300.0,day,,,,,,SAC,2024-10-07,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Rapid growth stage,
8,SRX4048913,SRP144776,HiSeq X Ten,SRS3266030,UBERON:0001893,telencephalon,GgalDv:0000085,9-month-old stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133330,Brain - cerebrum,300 days old,perfect match,not documented,missing child term,F,Tibetan,,9031,Ribo-Zero Gold Kit,full_length,ribo-minus,,,D300-2,"SAMN09083311,GSM3133330",300.0,day,,,,,,SAC,2024-10-07,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Rapid growth stage,
9,SRX4048912,SRP144776,HiSeq X Ten,SRS3266029,UBERON:0001893,telencephalon,GgalDv:0000085,9-month-old stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133329,Brain - cerebrum,300 days old,perfect match,not documented,missing child term,F,Tibetan,,9031,Ribo-Zero Gold Kit,full_length,ribo-minus,,,D300-1,"SAMN09083312,GSM3133329",300.0,day,,,,,,SAC,2024-10-07,"Brains were collected and snap-frozen in liquid nitrogen immediately, and TRIzol Regent was used to isolate total RNA. Sequencing libraries were constructed using Illumina HiSeq X Ten platform with paired-end sequencing length of 150 bp (PE150). rRNA deleted RNA-Seq",,,,Brain,,Rapid growth stage,


#### comments

In [None]:
#library.loc[:,'comment'] = ''

#### save complete file with correct columns

In [13]:
library_file_complete = library[library_cols]
library_file_complete.to_csv(library_to_add_path, sep="\t", index=False, quoting=csv.QUOTE_ALL)

# view
display_df(library_file_complete)

Unnamed: 0,#libraryId,experimentId,platform,SRSId,anatId,anatName,stageId,stageName,url_GSM,infoOrgan,infoStage,anatAnnotationStatus,anatBiologicalStatus,stageAnnotationStatus,sex,strain,genotype,speciesId,protocol,protocolType,RNASelection,globin_reduction,replicate,lib_name,sampleName,sampleAge_value,sampleAge_unit,PATOid,PATOname,comment,condition,physiologicalStatus,annotatorId,lastModificationDate
0,SRX4048921,SRP144776,HiSeq X Ten,SRS3266038,UBERON:0001893,telencephalon,GgalDv:0000080,late adult stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133338,Brain - cerebrum,5 years old,perfect match,not documented,missing child term,F,Tibetan,,9031,Ribo-Zero Gold Kit,full_length,ribo-minus,,,Y5-2,"SAMN09083321,GSM3133338",5.0,year,,,,,,SAC,2024-10-07
1,SRX4048920,SRP144776,HiSeq X Ten,SRS3266037,UBERON:0001893,telencephalon,GgalDv:0000080,late adult stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133337,Brain - cerebrum,5 years old,perfect match,not documented,missing child term,F,Tibetan,,9031,Ribo-Zero Gold Kit,full_length,ribo-minus,,,Y5-1,"SAMN09083322,GSM3133337",5.0,year,,,,,,SAC,2024-10-07
2,SRX4048919,SRP144776,HiSeq X Ten,SRS3266036,UBERON:0001893,telencephalon,GgalDv:0000089,3-year-old stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133336,Brain - cerebrum,3 years old,perfect match,not documented,perfect match,F,Tibetan,,9031,Ribo-Zero Gold Kit,full_length,ribo-minus,,,Y3-2,"SAMN09083293,GSM3133336",3.0,year,,,,,,SAC,2024-10-07
3,SRX4048918,SRP144776,HiSeq X Ten,SRS3266035,UBERON:0001893,telencephalon,GgalDv:0000089,3-year-old stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133335,Brain - cerebrum,3 years old,perfect match,not documented,perfect match,F,Tibetan,,9031,Ribo-Zero Gold Kit,full_length,ribo-minus,,,Y3-1,"SAMN09083306,GSM3133335",3.0,year,,,,,,SAC,2024-10-07
4,SRX4048917,SRP144776,HiSeq X Ten,SRS3266034,UBERON:0001893,telencephalon,GgalDv:0000008,1-year-old stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133334,Brain - cerebrum,1 year old,perfect match,not documented,perfect match,F,Tibetan,,9031,Ribo-Zero Gold Kit,full_length,ribo-minus,,,Y1-3,"SAMN09083307,GSM3133334",1.0,year,,,,,,SAC,2024-10-07
5,SRX4048916,SRP144776,HiSeq X Ten,SRS3266033,UBERON:0001893,telencephalon,GgalDv:0000008,1-year-old stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133333,Brain - cerebrum,1 year old,perfect match,not documented,perfect match,F,Tibetan,,9031,Ribo-Zero Gold Kit,full_length,ribo-minus,,,Y1-2,"SAMN09083308,GSM3133333",1.0,year,,,,,,SAC,2024-10-07
6,SRX4048915,SRP144776,HiSeq X Ten,SRS3266032,UBERON:0001893,telencephalon,GgalDv:0000008,1-year-old stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133332,Brain - cerebrum,1 year old,perfect match,not documented,perfect match,F,Tibetan,,9031,Ribo-Zero Gold Kit,full_length,ribo-minus,,,Y1-1,"SAMN09083309,GSM3133332",1.0,year,,,,,,SAC,2024-10-07
7,SRX4048914,SRP144776,HiSeq X Ten,SRS3266031,UBERON:0001893,telencephalon,GgalDv:0000085,9-month-old stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133331,Brain - cerebrum,300 days old,perfect match,not documented,missing child term,F,Tibetan,,9031,Ribo-Zero Gold Kit,full_length,ribo-minus,,,D300-3,"SAMN09083310,GSM3133331",300.0,day,,,,,,SAC,2024-10-07
8,SRX4048913,SRP144776,HiSeq X Ten,SRS3266030,UBERON:0001893,telencephalon,GgalDv:0000085,9-month-old stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133330,Brain - cerebrum,300 days old,perfect match,not documented,missing child term,F,Tibetan,,9031,Ribo-Zero Gold Kit,full_length,ribo-minus,,,D300-2,"SAMN09083311,GSM3133330",300.0,day,,,,,,SAC,2024-10-07
9,SRX4048912,SRP144776,HiSeq X Ten,SRS3266029,UBERON:0001893,telencephalon,GgalDv:0000085,9-month-old stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3133329,Brain - cerebrum,300 days old,perfect match,not documented,missing child term,F,Tibetan,,9031,Ribo-Zero Gold Kit,full_length,ribo-minus,,,D300-1,"SAMN09083312,GSM3133329",300.0,day,,,,,,SAC,2024-10-07


### experiment annotations

In [15]:
experiment = pd.read_csv(experiment_path_from_script, sep='\t', index_col=False, keep_default_na=False, na_values=['NULL','null', 'nan','NaN'], dtype=object)
display_df(experiment)

Unnamed: 0,#experimentId,experimentName,experimentDescription,experimentSource,experimentStatus,projectTags,numberOfAnnotatedLibraries,protocol,protocolType,GSE,Bioproject,PMID,reference_url,DOI,xrefs,comment
0,SRP144776,Integrated analysis of lncRNA and mRNA reveals the temporal expression patterns during chicken brain development and aging,"We systematically investigated the lncRNA and mRNA temporal expression profile of the female chicken brain by high-throughput sequencing 8 stages across their entire lifespan. We identified and classified 39,907 putative lncRNAs, and predicted the potential biological functions of lncRNAs based on WGCNA. Temporal expression patterns were investigated based on a set of age-dependent genes, results showed that genes functioned in development, synapse and axon exhibited a progressive decay; genes related to immune response were up-regulated with age, And some genes showed inversion of their temporal profiles. These results demonstrated dynamic changes in lncRNA and mRNA with age, which may reflect changes in regulation of transcriptional networks and provides non-coding RNA gene candidates for further studies. It would be vital significance in avian epidemic prevention and contribute to comprehensively understand the molecular mechanisms of chicken breeding and reproduction. Besides, birds, as important species to bridge the evolutionary gap between mammals and other vertebrates, would contribute to further improve the understanding of the role in evolution. Overall design: rRNA-deleted RNA-Seq was performed on chicken brain to profile the lncRNAs and mRNAs across the entire life span",SRA,,,,,,GSE114129,PRJNA464381,30545297,,10.1186/s12864-018-5301-x,,


#### experiment and protocol details

In [14]:
# this will give you the number of rows in the complete library file 
# this should be the number of annotated libraries
ann_lib = len(library_file_complete.index)
len(library_file_complete.index)

21

In [16]:
# partial or total
experiment.loc[:,'experimentStatus'] = 'total'
#experiment.loc[:,'projectTags'] = '' 
# see above cell, also can add as free text
experiment.loc[:,'numberOfAnnotatedLibraries'] = ann_lib

# these variables should already exist from above but if not can just add as free text
experiment.loc[:,'protocol'] = my_protocol
experiment.loc[:,'protocolType'] = my_protocolType

display_df(experiment)

Unnamed: 0,#experimentId,experimentName,experimentDescription,experimentSource,experimentStatus,projectTags,numberOfAnnotatedLibraries,protocol,protocolType,GSE,Bioproject,PMID,reference_url,DOI,xrefs,comment
0,SRP144776,Integrated analysis of lncRNA and mRNA reveals the temporal expression patterns during chicken brain development and aging,"We systematically investigated the lncRNA and mRNA temporal expression profile of the female chicken brain by high-throughput sequencing 8 stages across their entire lifespan. We identified and classified 39,907 putative lncRNAs, and predicted the potential biological functions of lncRNAs based on WGCNA. Temporal expression patterns were investigated based on a set of age-dependent genes, results showed that genes functioned in development, synapse and axon exhibited a progressive decay; genes related to immune response were up-regulated with age, And some genes showed inversion of their temporal profiles. These results demonstrated dynamic changes in lncRNA and mRNA with age, which may reflect changes in regulation of transcriptional networks and provides non-coding RNA gene candidates for further studies. It would be vital significance in avian epidemic prevention and contribute to comprehensively understand the molecular mechanisms of chicken breeding and reproduction. Besides, birds, as important species to bridge the evolutionary gap between mammals and other vertebrates, would contribute to further improve the understanding of the role in evolution. Overall design: rRNA-deleted RNA-Seq was performed on chicken brain to profile the lncRNAs and mRNAs across the entire life span",SRA,total,,21,Ribo-Zero Gold Kit,full_length,GSE114129,PRJNA464381,30545297,,10.1186/s12864-018-5301-x,,


#### paper and xrefs

In [17]:
#experiment.loc[:,'GSE'] = ''
#experiment.loc[:,'Bioproject'] = '' 
#experiment.loc[:,'PMID'] = ''
experiment.loc[:,'reference_url'] = 'https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6293534/'
#experiment.loc[:,'DOI'] = ''
#experiment.loc[:,'xrefs'] = ''

display_df(experiment)

Unnamed: 0,#experimentId,experimentName,experimentDescription,experimentSource,experimentStatus,projectTags,numberOfAnnotatedLibraries,protocol,protocolType,GSE,Bioproject,PMID,reference_url,DOI,xrefs,comment
0,SRP144776,Integrated analysis of lncRNA and mRNA reveals the temporal expression patterns during chicken brain development and aging,"We systematically investigated the lncRNA and mRNA temporal expression profile of the female chicken brain by high-throughput sequencing 8 stages across their entire lifespan. We identified and classified 39,907 putative lncRNAs, and predicted the potential biological functions of lncRNAs based on WGCNA. Temporal expression patterns were investigated based on a set of age-dependent genes, results showed that genes functioned in development, synapse and axon exhibited a progressive decay; genes related to immune response were up-regulated with age, And some genes showed inversion of their temporal profiles. These results demonstrated dynamic changes in lncRNA and mRNA with age, which may reflect changes in regulation of transcriptional networks and provides non-coding RNA gene candidates for further studies. It would be vital significance in avian epidemic prevention and contribute to comprehensively understand the molecular mechanisms of chicken breeding and reproduction. Besides, birds, as important species to bridge the evolutionary gap between mammals and other vertebrates, would contribute to further improve the understanding of the role in evolution. Overall design: rRNA-deleted RNA-Seq was performed on chicken brain to profile the lncRNAs and mRNAs across the entire life span",SRA,total,,21,Ribo-Zero Gold Kit,full_length,GSE114129,PRJNA464381,30545297,https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6293534/,10.1186/s12864-018-5301-x,,


#### comments

In [None]:
#experiment.loc[:,'comment'] = ''

display_df(experiment)

#### save complete file

In [18]:
experiment.to_csv(experiment_to_add_path, sep="\t", index=False, quoting=csv.QUOTE_ALL)

### QA time

In [19]:
library_to_add = pd.read_csv(library_to_add_path, sep='\t', index_col=False, keep_default_na=False, na_values=['NULL','null', 'nan','NaN'], dtype=object)
experiment_to_add = pd.read_csv(experiment_to_add_path, sep='\t', index_col=False, keep_default_na=False, na_values=['NULL','null', 'nan','NaN'], dtype=object)

#### to add things here

#### check columns match

In [20]:
# pull from git and pull in library/experiment file
! git pull
git_library = pd.read_csv(git_library_path, sep='\t', index_col=False, keep_default_na=False, na_values=['NULL','null', 'nan','NaN'], dtype=object)
git_experiment = pd.read_csv(git_experiment_path, sep='\t', index_col=False, keep_default_na=False, na_values=['NULL','null', 'nan','NaN'], dtype=object)

# library file
if set(library_to_add.columns) == set(git_library.columns):
    print('The columns in the library file match')
else:
    print('The columns in the library file DO NOT MATCH')

# experiment file
if set(experiment_to_add.columns) == set(git_experiment.columns):
    print('The columns in the experiment file match')
else:
    print('The columns in the experiment file DO NOT MATCH')


# maybe to make this something more like "COLUMNS GOOD - LIBRARY" and "COLUMNS BAD - EXPERIMENT"

Already up to date.
The columns in the library file match
The columns in the experiment file match


#### view files

In [21]:
library_git_plus_new = pd.concat([git_library, library_to_add], ignore_index = True, sort = False)
library_git_plus_new.tail(n=25)

Unnamed: 0,#libraryId,experimentId,platform,SRSId,anatId,anatName,stageId,stageName,url_GSM,infoOrgan,infoStage,anatAnnotationStatus,anatBiologicalStatus,stageAnnotationStatus,sex,strain,genotype,speciesId,protocol,protocolType,RNASelection,globin_reduction,replicate,lib_name,sampleName,sampleAge_value,sampleAge_unit,PATOid,PATOname,comment,condition,physiologicalStatus,annotatorId,lastModificationDate
38694,SRX13781297,SRP354980,NextSeq 500,SRS11664038,UBERON:0002084,heart left ventricle,EcabDv:0000003,immature stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi...,left ventricle of the heart (H),1 to 2 years,perfect match,not documented,other,M,cold-blooded,,9796,TruSeq RNA Library Prep Kit v2,full_length,polyA,,,30s RNA-Seq,SAMN24964811,1 to 2,year,,,"PMID:39143382, Tissues were sampled from the l...",,uncastrated,ANN,2024-10-01
38695,SRX13781296,SRP354980,NextSeq 500,SRS11664037,UBERON:0001114,right lobe of liver,EcabDv:0000003,immature stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi...,right lobe of the liver (LR),1 to 2 years,perfect match,not documented,other,M,cold-blooded,,9796,TruSeq RNA Library Prep Kit v2,full_length,polyA,,,27w RNA-Seq,SAMN24964812,1 to 2,year,,,"PMID:39143382, Tissues were sampled from the l...",,uncastrated,ANN,2024-10-01
38696,SRX13781295,SRP354980,NextSeq 500,SRS11664036,UBERON:0002171,lower lobe of right lung,EcabDv:0000003,immature stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi...,caudal lobe of the right lung (L),1 to 2 years,perfect match,not documented,other,M,cold-blooded,,9796,TruSeq RNA Library Prep Kit v2,full_length,polyA,,,27p RNA-Seq,SAMN24964813,1 to 2,year,,,"PMID:39143382, Tissues were sampled from the l...",,uncastrated,ANN,2024-10-01
38697,SRX13781294,SRP354980,NextSeq 500,SRS11664035,UBERON:0002084,heart left ventricle,EcabDv:0000003,immature stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi...,left ventricle of the heart (H),1 to 2 years,perfect match,not documented,other,M,cold-blooded,,9796,TruSeq RNA Library Prep Kit v2,full_length,polyA,,,27s RNA-Seq,SAMN24964814,1 to 2,year,,,"PMID:39143382, Tissues were sampled from the l...",,uncastrated,ANN,2024-10-01
38698,SRX4048921,SRP144776,HiSeq X Ten,SRS3266038,UBERON:0001893,telencephalon,GgalDv:0000080,late adult stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi...,Brain - cerebrum,5 years old,perfect match,not documented,missing child term,F,Tibetan,,9031,Ribo-Zero Gold Kit,full_length,ribo-minus,,,Y5-2,"SAMN09083321,GSM3133338",5,year,,,,,,SAC,2024-10-07
38699,SRX4048920,SRP144776,HiSeq X Ten,SRS3266037,UBERON:0001893,telencephalon,GgalDv:0000080,late adult stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi...,Brain - cerebrum,5 years old,perfect match,not documented,missing child term,F,Tibetan,,9031,Ribo-Zero Gold Kit,full_length,ribo-minus,,,Y5-1,"SAMN09083322,GSM3133337",5,year,,,,,,SAC,2024-10-07
38700,SRX4048919,SRP144776,HiSeq X Ten,SRS3266036,UBERON:0001893,telencephalon,GgalDv:0000089,3-year-old stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi...,Brain - cerebrum,3 years old,perfect match,not documented,perfect match,F,Tibetan,,9031,Ribo-Zero Gold Kit,full_length,ribo-minus,,,Y3-2,"SAMN09083293,GSM3133336",3,year,,,,,,SAC,2024-10-07
38701,SRX4048918,SRP144776,HiSeq X Ten,SRS3266035,UBERON:0001893,telencephalon,GgalDv:0000089,3-year-old stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi...,Brain - cerebrum,3 years old,perfect match,not documented,perfect match,F,Tibetan,,9031,Ribo-Zero Gold Kit,full_length,ribo-minus,,,Y3-1,"SAMN09083306,GSM3133335",3,year,,,,,,SAC,2024-10-07
38702,SRX4048917,SRP144776,HiSeq X Ten,SRS3266034,UBERON:0001893,telencephalon,GgalDv:0000008,1-year-old stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi...,Brain - cerebrum,1 year old,perfect match,not documented,perfect match,F,Tibetan,,9031,Ribo-Zero Gold Kit,full_length,ribo-minus,,,Y1-3,"SAMN09083307,GSM3133334",1,year,,,,,,SAC,2024-10-07
38703,SRX4048916,SRP144776,HiSeq X Ten,SRS3266033,UBERON:0001893,telencephalon,GgalDv:0000008,1-year-old stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi...,Brain - cerebrum,1 year old,perfect match,not documented,perfect match,F,Tibetan,,9031,Ribo-Zero Gold Kit,full_length,ribo-minus,,,Y1-2,"SAMN09083308,GSM3133333",1,year,,,,,,SAC,2024-10-07


In [22]:
experiment_git_plus_new = pd.concat([git_experiment, experiment_to_add], ignore_index = True, sort = False)
experiment_git_plus_new.tail(n=5)

Unnamed: 0,#experimentId,experimentName,experimentDescription,experimentSource,experimentStatus,projectTags,numberOfAnnotatedLibraries,protocol,protocolType,GSE,Bioproject,PMID,reference_url,DOI,xrefs,comment
783,SRP217229,Differential gene expression in articular cart...,We report differential expression analysis of ...,SRA,total,FAANG,56,standard library preparation,full_length,GSE135322,PRJNA558390,31557843,https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6...,10.3390/genes10100745,,this experiment is an example of developmental...
784,ERP119658,Investigating the epithelial barrier and immun...,In order to investigate role of epithelial bar...,SRA,partial,FAANG,18,TruSeq Stranded mRNA,full_length,,PRJEB36462,32343720,https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7...,10.1371/journal.pone.0232189,,example of mixed stage annotation on both UBER...
785,SRP340407,Transcriptional signatures of bone marrow mono...,synovial macrophages through joint injection w...,SRA,partial,,8,DirectZol RNA microprep Kit and TruSeq DNA Lib...,full_length,GSE185521,PRJNA769419,34956173,https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8...,10.3389/fimmu.2021.734322,,maybe an error from the authors reporting ‘cDN...
786,SRP354980,Transcriptome analysis of the three equine tissue,The aim of the project was to identify the tis...,SRA,total,,12,TruSeq RNA Library Prep Kit v2,full_length,,PRJNA797088,39143382,https://link.springer.com/article/10.1007/s003...,10.1007/s00335-024-10057-0,,
787,SRP144776,Integrated analysis of lncRNA and mRNA reveals...,We systematically investigated the lncRNA and ...,SRA,total,,21,Ribo-Zero Gold Kit,full_length,GSE114129,PRJNA464381,30545297,https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6...,10.1186/s12864-018-5301-x,,


### add annotations to git

In [23]:
! git pull

Already up to date.


In [24]:
library_git_plus_new.to_csv(git_library_path, sep="\t", index=False, quoting=csv.QUOTE_ALL)
experiment_git_plus_new.to_csv(git_experiment_path, sep="\t", index=False, quoting=csv.QUOTE_ALL)
update_format(git_library_path)
update_format(git_experiment_path)

In [25]:
! git status

On branch develop
Your branch is up to date with 'origin/develop'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	[31mmodified:   ../1_sara_bulk_template.ipynb[m
	[31mmodified:   ../../../RNA_Seq/RNASeqExperiment.tsv[m
	[31mmodified:   ../../../RNA_Seq/RNASeqLibrary.tsv[m

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	[31m./[m

no changes added to commit (use "git add" and/or "git commit -a")


In [26]:
! git add $git_experiment_path $git_library_path

In [27]:
! git commit -m $commit_message_exp

[develop 2c16766] adding annotated bulk experiment SRP144776
 2 files changed, 374 insertions(+), 352 deletions(-)


In [28]:
! git push

Enumerating objects: 9, done.
Counting objects: 100% (9/9), done.
Delta compression using up to 12 threads
Compressing objects: 100% (5/5), done.
Writing objects: 100% (5/5), 5.58 KiB | 1.86 MiB/s, done.
Total 5 (delta 4), reused 0 (delta 0), pack-reused 0
remote: 
remote: To create a merge request for develop, visit:[K
remote:   https://gitlab.sib.swiss/Bgee/expression-annotations/-/merge_requests/new?merge_request%5Bsource_branch%5D=develop[K
remote: 
To https://gitlab.sib.swiss/Bgee/expression-annotations.git
   91408e4..2c16766  develop -> develop


### add annotation folder and script to git

In [None]:
! git status

In [None]:
! git add $path_to_output

In [None]:
! git commit -m $commit_message_py

In [None]:
! git push