## ERP009469

**paper:** [PMID: 26108680](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4576709/) - Quantitative Mass Spectrometry Reveals Partial Translational Regulation for Dosage Compensation in Chicken, 2015

**date, curator:** 2024-10-09, Sara Carsanaro

**notes**
- partial/full sampling stated in methods
- some infoOrgans are further specified in methods

### annotation summary
run this after annotation is complete

In [49]:
anat_summary = library_to_add[['infoOrgan', 'anatId', 'anatName', 'anatAnnotationStatus']]
unique_anat = anat_summary.drop_duplicates()
display_df(unique_anat)

Unnamed: 0,infoOrgan,anatId,anatName,anatAnnotationStatus
0,kidney,UBERON:0002113,kidney,perfect match
1,testis,UBERON:0000473,testis,perfect match
2,spleen,UBERON:0002106,spleen,perfect match
3,muscle,UBERON:0001495,pectoral muscle,perfect match
4,lung,UBERON:0002048,lung,perfect match
6,heart,UBERON:0000948,heart,perfect match
7,bursa,UBERON:0003903,bursa of Fabricius,perfect match
8,brain,UBERON:0000955,brain,perfect match
13,liver,UBERON:0002107,liver,perfect match
53,ovary,UBERON:0000992,ovary,perfect match


In [50]:
dev_summary = library_to_add[['infoStage', 'stageId', 'stageName', 'stageAnnotationStatus']]
unique_dev = dev_summary.drop_duplicates()
display_df(unique_dev)

Unnamed: 0,infoStage,stageId,stageName,stageAnnotationStatus
0,18 day embryo,GgalDv:0000058,Hamburger Hamilton stage 44,perfect match


### set variables, import packages, define functions

In [1]:
experiment_id = "ERP009469"

path_to_create_exp_script = "/Users/scarsana/Desktop/git/scRNA-Seq/scripts/Create_ExpLib_tables.py" 
experiment_type = "bulk"

path_to_output_main = "/Users/scarsana/Desktop/git/expression-annotations/Notebooks/bulk/" 
path_to_output = "{}{}/".format(path_to_output_main, experiment_id)
library_path_from_script = "{}RNASeqLibrary_{}.tsv".format(path_to_output, experiment_id)
experiment_path_from_script = "{}RNASeqExperiment_{}.tsv".format(path_to_output, experiment_id)
library_to_add_path = "{}complete_RNASeqLibrary_{}.tsv".format(path_to_output, experiment_id)
experiment_to_add_path = "{}complete_RNASeqExperiment_{}.tsv".format(path_to_output, experiment_id)
script_file = "{}.ipynb".format(experiment_id)
commit_message_exp = '"adding annotated bulk experiment {}"'.format(experiment_id)
commit_message_py = '"adding annotation files for {} to notebook folder"'.format(experiment_id)


## to add to git
path_to_git_annotations = "/Users/scarsana/Desktop/git/expression-annotations/RNA_Seq/"
git_library_path = "{}RNASeqLibrary.tsv".format(path_to_git_annotations)
git_experiment_path = "{}RNASeqExperiment.tsv".format(path_to_git_annotations)

library_cols = ['#libraryId', 'experimentId', 'platform', 'SRSId', 'anatId', 'anatName', 'stageId', 'stageName', 'url_GSM', 'infoOrgan', 'infoStage', 'anatAnnotationStatus', 'anatBiologicalStatus', 'stageAnnotationStatus', 'sex', 'strain', 'genotype', 'speciesId', 'protocol', 'protocolType', 'RNASelection', 'globin_reduction', 'replicate', 'lib_name', 'sampleName', 'sampleAge_value', 'sampleAge_unit', 'PATOid', 'PATOname','comment', 'condition', 'physiologicalStatus', 'annotatorId', 'lastModificationDate']

In [22]:
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
import pandas as pd
import numpy as np
from IPython.display import display, HTML
import os
import csv

# displays df with the scrollbar next to the DataFrame
def display_df(df):
    pd.set_option("display.max_rows", None)
    pd.set_option("display.max_columns", None)
    display(HTML("<div style='height: 300px; overflow: auto; width: fit-content'>" +
        df.style.to_html(index=False) + "</div>"))

# function that compares two columns in a dataframe and tells you which ones are not equal (case insensitive)
def compare_columns(df, col1, col2, return_col):
    compare_return = df[col1].str.lower() != df[col2].str.lower()  
    df.loc[compare_return, return_col] 
    if not any(compare_return):
        print("The two columns are equal (case insensitive)")
    else:
        print("The following rows are not equal: ")
        print(df.loc[compare_return, return_col])

# fixes formatting of file to match libreoffice settings/historic file format
def update_format(path):
    with open(path, 'r') as file:
        filedata = file.read()
    # Replace the target string
    filedata = filedata.replace("\t\"\"", "\t")
    # Write the file out again
    with open(path, 'w') as file:
        file.write(filedata)

# checks for duplicate values in a specific column and prints those values + the corresponding library id
def dup_check(df, column):
    duplicateCheck = df.duplicated(subset=[column], keep=False)
    duplicateCheck.sort_values(inplace=True)
    if duplicateCheck.unique().any() == False:
        print("no duplicate values in " + column)
    elif duplicateCheck.unique().any() == True and column != '#libraryId':
        dups = df[duplicateCheck].loc[:,['#libraryId', column]]
        df_dups = pd.DataFrame(dups)
        df_dups.sort_values(inplace=True, by=column)
        print(df_dups)
    elif duplicateCheck.unique().any() == True and column == '#libraryId':
        print(df[duplicateCheck].loc[:,['#libraryId']])

# prints all unique values in a specific column
def unique_sorted(df, column):
    unique = df[column].unique()
    unique.sort()
    print(unique)

### script

In [3]:
! python3 $path_to_create_exp_script $experiment_id $path_to_output $experiment_type

  all_protoc = [w.replace('(', '\(') for w in all_protoc]
  all_protoc = [w.replace(')', '\)') for w in all_protoc] 
Be patient, it may take a few minutes.
0it [00:00, ?it/s]
5 samples dont have attributes, try to find them somewhere else
100%|█████████████████████████████████████████████| 5/5 [00:08<00:00,  1.69s/it]
0 samples dont have attributes


### library annnotations

In [4]:
library = pd.read_csv(library_path_from_script, sep='\t', index_col=False, keep_default_na=False, na_values=['NULL','null', 'nan','NaN'], dtype=object)
display_df(library)

Unnamed: 0,#libraryId,experimentId,platform,SRSId,anatId,anatName,stageId,stageName,url_GSM,infoOrgan,infoStage,anatAnnotationStatus,anatBiologicalStatus,stageAnnotationStatus,sex,strain,genotype,speciesId,protocol,protocolType,RNASelection,globin_reduction,replicate,lib_name,sampleName,sampleAge_value,sampleAge_unit,PATOid,PATOname,comment,condition,physiologicalStatus,annotatorId,lastModificationDate,library_contruction_protocol,source_qc,lib_name_2,lib_name_3,source_name,individual,infoStage_2,infoStage_3
0,ERX697769,ERP009469,Illumina HiSeq 2500,ERS656920,,,,,,kidney,18 day embryo,,,,female,White Leghorn,,9031,,,,,,Female2Kidney,SAMEA3242468,,,,,,,,,09/10/2024,TruSeq stranded,,Female2Kidney,,,,18 day embryo,public
1,ERX697756,ERP009469,Illumina HiSeq 2500,ERS656994,,,,,,testis,18 day embryo,,,,male,White Leghorn,,9031,,,,,,Male5Testes,SAMEA3242542,,,,,,,,,09/10/2024,TruSeq stranded,,Male5Testes,,,,18 day embryo,public
2,ERX697755,ERP009469,Illumina HiSeq 2500,ERS656993,,,,,,spleen,18 day embryo,,,,male,White Leghorn,,9031,,,,,,Male5Spleen,SAMEA3242541,,,,,,,,,09/10/2024,TruSeq stranded,,Male5Spleen,,,,18 day embryo,public
3,ERX697754,ERP009469,Illumina HiSeq 2500,ERS656992,,,,,,muscle,18 day embryo,,,,male,White Leghorn,,9031,,,,,,Male5Muscle,SAMEA3242540,,,,,,,,,09/10/2024,TruSeq stranded,,Male5Muscle,,,,18 day embryo,public
4,ERX697753,ERP009469,Illumina HiSeq 2500,ERS656991,,,,,,lung,18 day embryo,,,,male,White Leghorn,,9031,,,,,,Male5Lung,SAMEA3242539,,,,,,,,,09/10/2024,TruSeq stranded,,Male5Lung,,,,18 day embryo,public
5,ERX697752,ERP009469,Illumina HiSeq 2500,ERS656990,,,,,,kidney,18 day embryo,,,,male,White Leghorn,,9031,,,,,,Male5Kidney,SAMEA3242538,,,,,,,,,09/10/2024,TruSeq stranded,,Male5Kidney,,,,18 day embryo,public
6,ERX697751,ERP009469,Illumina HiSeq 2500,ERS656989,,,,,,heart,18 day embryo,,,,male,White Leghorn,,9031,,,,,,Male5Heart,SAMEA3242537,,,,,,,,,09/10/2024,TruSeq stranded,,Male5Heart,,,,18 day embryo,public
7,ERX697750,ERP009469,Illumina HiSeq 2500,ERS656988,,,,,,bursa,18 day embryo,,,,male,White Leghorn,,9031,,,,,,Male5Bursa,SAMEA3242536,,,,,,,,,09/10/2024,TruSeq stranded,,Male5Bursa,,,,18 day embryo,public
8,ERX697749,ERP009469,Illumina HiSeq 2500,ERS656987,,,,,,brain,18 day embryo,,,,male,White Leghorn,,9031,,,,,,Male5Brain,SAMEA3242535,,,,,,,,,09/10/2024,TruSeq stranded,,Male5Brain,,,,18 day embryo,public
9,ERX697748,ERP009469,Illumina HiSeq 2500,ERS656986,,,,,,testis,18 day embryo,,,,male,White Leghorn,,9031,,,,,,Male4Testes,SAMEA3242534,,,,,,,,,09/10/2024,TruSeq stranded,,Male4Testes,,,,18 day embryo,public


#### anatomical entity

In [5]:
unique_sorted(library, "infoOrgan")

['brain' 'bursa' 'heart' 'kidney' 'liver' 'lung' 'muscle' 'ovary' 'spleen'
 'testis']


In [7]:

# brain
library.loc[library["infoOrgan"] == "brain", "anatId"] = "UBERON:0000955"
library.loc[library["infoOrgan"] == "brain", "anatName"] = "brain"
library.loc[library["infoOrgan"] == "brain", "anatAnnotationStatus"] = "perfect match"
library.loc[library["infoOrgan"] == "brain", "anatBiologicalStatus"] = "full sampling"

# bursa aka bursa Fabricii
library.loc[library["infoOrgan"] == "bursa", "anatId"] = "UBERON:0003903"
library.loc[library["infoOrgan"] == "bursa", "anatName"] = "bursa of Fabricius"
library.loc[library["infoOrgan"] == "bursa", "anatAnnotationStatus"] = "perfect match"
library.loc[library["infoOrgan"] == "bursa", "anatBiologicalStatus"] = "full sampling"

# heart
library.loc[library["infoOrgan"] == "heart", "anatId"] = "UBERON:0000948"
library.loc[library["infoOrgan"] == "heart", "anatName"] = "heart"
library.loc[library["infoOrgan"] == "heart", "anatAnnotationStatus"] = "perfect match"
library.loc[library["infoOrgan"] == "heart", "anatBiologicalStatus"] = "partial sampling"

# kidney
library.loc[library["infoOrgan"] == "kidney", "anatId"] = "UBERON:0002113"
library.loc[library["infoOrgan"] == "kidney", "anatName"] = "kidney"
library.loc[library["infoOrgan"] == "kidney", "anatAnnotationStatus"] = "perfect match"
library.loc[library["infoOrgan"] == "kidney", "anatBiologicalStatus"] = "partial sampling"

# liver
library.loc[library["infoOrgan"] == "liver", "anatId"] = "UBERON:0002107"
library.loc[library["infoOrgan"] == "liver", "anatName"] = "liver"
library.loc[library["infoOrgan"] == "liver", "anatAnnotationStatus"] = "perfect match"
library.loc[library["infoOrgan"] == "liver", "anatBiologicalStatus"] = "partial sampling"

# lung
library.loc[library["infoOrgan"] == "lung", "anatId"] = "UBERON:0002048"
library.loc[library["infoOrgan"] == "lung", "anatName"] = "lung"
library.loc[library["infoOrgan"] == "lung", "anatAnnotationStatus"] = "perfect match"
library.loc[library["infoOrgan"] == "lung", "anatBiologicalStatus"] = "partial sampling"

# muscle aka breast muscle aka pectoral muscle
library.loc[library["infoOrgan"] == "muscle", "anatId"] = "UBERON:0001495"
library.loc[library["infoOrgan"] == "muscle", "anatName"] = "pectoral muscle"
library.loc[library["infoOrgan"] == "muscle", "anatAnnotationStatus"] = "perfect match"
library.loc[library["infoOrgan"] == "muscle", "anatBiologicalStatus"] = "partial sampling"

# ovary or ovarium
library.loc[library["infoOrgan"] == "ovary", "anatId"] = "UBERON:0000992"
library.loc[library["infoOrgan"] == "ovary", "anatName"] = "ovary"
library.loc[library["infoOrgan"] == "ovary", "anatAnnotationStatus"] = "perfect match"
library.loc[library["infoOrgan"] == "ovary", "anatBiologicalStatus"] = "full sampling"

# spleen
library.loc[library["infoOrgan"] == "spleen", "anatId"] = "UBERON:0002106"
library.loc[library["infoOrgan"] == "spleen", "anatName"] = "spleen"
library.loc[library["infoOrgan"] == "spleen", "anatAnnotationStatus"] = "perfect match"
library.loc[library["infoOrgan"] == "spleen", "anatBiologicalStatus"] = "full sampling"

# testis
library.loc[library["infoOrgan"] == "testis", "anatId"] = "UBERON:0000473"
library.loc[library["infoOrgan"] == "testis", "anatName"] = "testis"
library.loc[library["infoOrgan"] == "testis", "anatAnnotationStatus"] = "perfect match"
library.loc[library["infoOrgan"] == "testis", "anatBiologicalStatus"] = "full sampling"

# view
display_df(library)

Unnamed: 0,#libraryId,experimentId,platform,SRSId,anatId,anatName,stageId,stageName,url_GSM,infoOrgan,infoStage,anatAnnotationStatus,anatBiologicalStatus,stageAnnotationStatus,sex,strain,genotype,speciesId,protocol,protocolType,RNASelection,globin_reduction,replicate,lib_name,sampleName,sampleAge_value,sampleAge_unit,PATOid,PATOname,comment,condition,physiologicalStatus,annotatorId,lastModificationDate,library_contruction_protocol,source_qc,lib_name_2,lib_name_3,source_name,individual,infoStage_2,infoStage_3
0,ERX697769,ERP009469,Illumina HiSeq 2500,ERS656920,UBERON:0002113,kidney,,,,kidney,18 day embryo,perfect match,partial sampling,,female,White Leghorn,,9031,,,,,,Female2Kidney,SAMEA3242468,,,,,,,,,09/10/2024,TruSeq stranded,,Female2Kidney,,,,18 day embryo,public
1,ERX697756,ERP009469,Illumina HiSeq 2500,ERS656994,UBERON:0000473,testis,,,,testis,18 day embryo,perfect match,full sampling,,male,White Leghorn,,9031,,,,,,Male5Testes,SAMEA3242542,,,,,,,,,09/10/2024,TruSeq stranded,,Male5Testes,,,,18 day embryo,public
2,ERX697755,ERP009469,Illumina HiSeq 2500,ERS656993,UBERON:0002106,spleen,,,,spleen,18 day embryo,perfect match,full sampling,,male,White Leghorn,,9031,,,,,,Male5Spleen,SAMEA3242541,,,,,,,,,09/10/2024,TruSeq stranded,,Male5Spleen,,,,18 day embryo,public
3,ERX697754,ERP009469,Illumina HiSeq 2500,ERS656992,UBERON:0001495,pectoral muscle,,,,muscle,18 day embryo,perfect match,partial sampling,,male,White Leghorn,,9031,,,,,,Male5Muscle,SAMEA3242540,,,,,,,,,09/10/2024,TruSeq stranded,,Male5Muscle,,,,18 day embryo,public
4,ERX697753,ERP009469,Illumina HiSeq 2500,ERS656991,UBERON:0002048,lung,,,,lung,18 day embryo,perfect match,partial sampling,,male,White Leghorn,,9031,,,,,,Male5Lung,SAMEA3242539,,,,,,,,,09/10/2024,TruSeq stranded,,Male5Lung,,,,18 day embryo,public
5,ERX697752,ERP009469,Illumina HiSeq 2500,ERS656990,UBERON:0002113,kidney,,,,kidney,18 day embryo,perfect match,partial sampling,,male,White Leghorn,,9031,,,,,,Male5Kidney,SAMEA3242538,,,,,,,,,09/10/2024,TruSeq stranded,,Male5Kidney,,,,18 day embryo,public
6,ERX697751,ERP009469,Illumina HiSeq 2500,ERS656989,UBERON:0000948,heart,,,,heart,18 day embryo,perfect match,partial sampling,,male,White Leghorn,,9031,,,,,,Male5Heart,SAMEA3242537,,,,,,,,,09/10/2024,TruSeq stranded,,Male5Heart,,,,18 day embryo,public
7,ERX697750,ERP009469,Illumina HiSeq 2500,ERS656988,UBERON:0003903,bursa of Fabricius,,,,bursa,18 day embryo,perfect match,full sampling,,male,White Leghorn,,9031,,,,,,Male5Bursa,SAMEA3242536,,,,,,,,,09/10/2024,TruSeq stranded,,Male5Bursa,,,,18 day embryo,public
8,ERX697749,ERP009469,Illumina HiSeq 2500,ERS656987,UBERON:0000955,brain,,,,brain,18 day embryo,perfect match,full sampling,,male,White Leghorn,,9031,,,,,,Male5Brain,SAMEA3242535,,,,,,,,,09/10/2024,TruSeq stranded,,Male5Brain,,,,18 day embryo,public
9,ERX697748,ERP009469,Illumina HiSeq 2500,ERS656986,UBERON:0000473,testis,,,,testis,18 day embryo,perfect match,full sampling,,male,White Leghorn,,9031,,,,,,Male4Testes,SAMEA3242534,,,,,,,,,09/10/2024,TruSeq stranded,,Male4Testes,,,,18 day embryo,public


#### stage
- [species specific developmental ontologies](https://github.com/obophenotype/developmental-stage-ontologies/tree/master/src)

In [6]:
unique_sorted(library, "infoStage")

['18 day embryo']


In [8]:
# 18 day embryo (18 days of incubation)
library.loc[:,'stageId'] = 'GgalDv:0000058'
# comment: Usually obtained after 18.0 days of incubation.
library.loc[:,'stageName'] = 'Hamburger Hamilton stage 44'
# perfect match, missing child term, other
library.loc[:,'stageAnnotationStatus'] = 'perfect match'



# view
display_df(library)

Unnamed: 0,#libraryId,experimentId,platform,SRSId,anatId,anatName,stageId,stageName,url_GSM,infoOrgan,infoStage,anatAnnotationStatus,anatBiologicalStatus,stageAnnotationStatus,sex,strain,genotype,speciesId,protocol,protocolType,RNASelection,globin_reduction,replicate,lib_name,sampleName,sampleAge_value,sampleAge_unit,PATOid,PATOname,comment,condition,physiologicalStatus,annotatorId,lastModificationDate,library_contruction_protocol,source_qc,lib_name_2,lib_name_3,source_name,individual,infoStage_2,infoStage_3
0,ERX697769,ERP009469,Illumina HiSeq 2500,ERS656920,UBERON:0002113,kidney,GgalDv:0000058,Hamburger Hamilton stage 44,,kidney,18 day embryo,perfect match,partial sampling,perfect match,female,White Leghorn,,9031,,,,,,Female2Kidney,SAMEA3242468,,,,,,,,,09/10/2024,TruSeq stranded,,Female2Kidney,,,,18 day embryo,public
1,ERX697756,ERP009469,Illumina HiSeq 2500,ERS656994,UBERON:0000473,testis,GgalDv:0000058,Hamburger Hamilton stage 44,,testis,18 day embryo,perfect match,full sampling,perfect match,male,White Leghorn,,9031,,,,,,Male5Testes,SAMEA3242542,,,,,,,,,09/10/2024,TruSeq stranded,,Male5Testes,,,,18 day embryo,public
2,ERX697755,ERP009469,Illumina HiSeq 2500,ERS656993,UBERON:0002106,spleen,GgalDv:0000058,Hamburger Hamilton stage 44,,spleen,18 day embryo,perfect match,full sampling,perfect match,male,White Leghorn,,9031,,,,,,Male5Spleen,SAMEA3242541,,,,,,,,,09/10/2024,TruSeq stranded,,Male5Spleen,,,,18 day embryo,public
3,ERX697754,ERP009469,Illumina HiSeq 2500,ERS656992,UBERON:0001495,pectoral muscle,GgalDv:0000058,Hamburger Hamilton stage 44,,muscle,18 day embryo,perfect match,partial sampling,perfect match,male,White Leghorn,,9031,,,,,,Male5Muscle,SAMEA3242540,,,,,,,,,09/10/2024,TruSeq stranded,,Male5Muscle,,,,18 day embryo,public
4,ERX697753,ERP009469,Illumina HiSeq 2500,ERS656991,UBERON:0002048,lung,GgalDv:0000058,Hamburger Hamilton stage 44,,lung,18 day embryo,perfect match,partial sampling,perfect match,male,White Leghorn,,9031,,,,,,Male5Lung,SAMEA3242539,,,,,,,,,09/10/2024,TruSeq stranded,,Male5Lung,,,,18 day embryo,public
5,ERX697752,ERP009469,Illumina HiSeq 2500,ERS656990,UBERON:0002113,kidney,GgalDv:0000058,Hamburger Hamilton stage 44,,kidney,18 day embryo,perfect match,partial sampling,perfect match,male,White Leghorn,,9031,,,,,,Male5Kidney,SAMEA3242538,,,,,,,,,09/10/2024,TruSeq stranded,,Male5Kidney,,,,18 day embryo,public
6,ERX697751,ERP009469,Illumina HiSeq 2500,ERS656989,UBERON:0000948,heart,GgalDv:0000058,Hamburger Hamilton stage 44,,heart,18 day embryo,perfect match,partial sampling,perfect match,male,White Leghorn,,9031,,,,,,Male5Heart,SAMEA3242537,,,,,,,,,09/10/2024,TruSeq stranded,,Male5Heart,,,,18 day embryo,public
7,ERX697750,ERP009469,Illumina HiSeq 2500,ERS656988,UBERON:0003903,bursa of Fabricius,GgalDv:0000058,Hamburger Hamilton stage 44,,bursa,18 day embryo,perfect match,full sampling,perfect match,male,White Leghorn,,9031,,,,,,Male5Bursa,SAMEA3242536,,,,,,,,,09/10/2024,TruSeq stranded,,Male5Bursa,,,,18 day embryo,public
8,ERX697749,ERP009469,Illumina HiSeq 2500,ERS656987,UBERON:0000955,brain,GgalDv:0000058,Hamburger Hamilton stage 44,,brain,18 day embryo,perfect match,full sampling,perfect match,male,White Leghorn,,9031,,,,,,Male5Brain,SAMEA3242535,,,,,,,,,09/10/2024,TruSeq stranded,,Male5Brain,,,,18 day embryo,public
9,ERX697748,ERP009469,Illumina HiSeq 2500,ERS656986,UBERON:0000473,testis,GgalDv:0000058,Hamburger Hamilton stage 44,,testis,18 day embryo,perfect match,full sampling,perfect match,male,White Leghorn,,9031,,,,,,Male4Testes,SAMEA3242534,,,,,,,,,09/10/2024,TruSeq stranded,,Male4Testes,,,,18 day embryo,public


#### sex, strain, genotype, speciesId
- uniprot [strain list](https://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/docs/strains)
- uniprot [species list](https://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/docs/speclist)
- bgee [strain mapping](https://gitlab.sib.swiss/Bgee/expression-annotations/-/tree/develop/Strains?ref_type=heads)

In [9]:
library.loc[library["sex"] == "male", "sex"] = "M"
library.loc[library["sex"] == "female", "sex"] = "F"

# updating from White Leghorn to White leghorn (uniprot has leghorn not capitalized for some reason)
library.loc[:,'strain'] = 'White leghorn'

#library.loc[:,'genotype'] = ''

#library.loc[:,'speciesId'] = ''

# view
display_df(library)

Unnamed: 0,#libraryId,experimentId,platform,SRSId,anatId,anatName,stageId,stageName,url_GSM,infoOrgan,infoStage,anatAnnotationStatus,anatBiologicalStatus,stageAnnotationStatus,sex,strain,genotype,speciesId,protocol,protocolType,RNASelection,globin_reduction,replicate,lib_name,sampleName,sampleAge_value,sampleAge_unit,PATOid,PATOname,comment,condition,physiologicalStatus,annotatorId,lastModificationDate,library_contruction_protocol,source_qc,lib_name_2,lib_name_3,source_name,individual,infoStage_2,infoStage_3
0,ERX697769,ERP009469,Illumina HiSeq 2500,ERS656920,UBERON:0002113,kidney,GgalDv:0000058,Hamburger Hamilton stage 44,,kidney,18 day embryo,perfect match,partial sampling,perfect match,F,White leghorn,,9031,,,,,,Female2Kidney,SAMEA3242468,,,,,,,,,09/10/2024,TruSeq stranded,,Female2Kidney,,,,18 day embryo,public
1,ERX697756,ERP009469,Illumina HiSeq 2500,ERS656994,UBERON:0000473,testis,GgalDv:0000058,Hamburger Hamilton stage 44,,testis,18 day embryo,perfect match,full sampling,perfect match,M,White leghorn,,9031,,,,,,Male5Testes,SAMEA3242542,,,,,,,,,09/10/2024,TruSeq stranded,,Male5Testes,,,,18 day embryo,public
2,ERX697755,ERP009469,Illumina HiSeq 2500,ERS656993,UBERON:0002106,spleen,GgalDv:0000058,Hamburger Hamilton stage 44,,spleen,18 day embryo,perfect match,full sampling,perfect match,M,White leghorn,,9031,,,,,,Male5Spleen,SAMEA3242541,,,,,,,,,09/10/2024,TruSeq stranded,,Male5Spleen,,,,18 day embryo,public
3,ERX697754,ERP009469,Illumina HiSeq 2500,ERS656992,UBERON:0001495,pectoral muscle,GgalDv:0000058,Hamburger Hamilton stage 44,,muscle,18 day embryo,perfect match,partial sampling,perfect match,M,White leghorn,,9031,,,,,,Male5Muscle,SAMEA3242540,,,,,,,,,09/10/2024,TruSeq stranded,,Male5Muscle,,,,18 day embryo,public
4,ERX697753,ERP009469,Illumina HiSeq 2500,ERS656991,UBERON:0002048,lung,GgalDv:0000058,Hamburger Hamilton stage 44,,lung,18 day embryo,perfect match,partial sampling,perfect match,M,White leghorn,,9031,,,,,,Male5Lung,SAMEA3242539,,,,,,,,,09/10/2024,TruSeq stranded,,Male5Lung,,,,18 day embryo,public
5,ERX697752,ERP009469,Illumina HiSeq 2500,ERS656990,UBERON:0002113,kidney,GgalDv:0000058,Hamburger Hamilton stage 44,,kidney,18 day embryo,perfect match,partial sampling,perfect match,M,White leghorn,,9031,,,,,,Male5Kidney,SAMEA3242538,,,,,,,,,09/10/2024,TruSeq stranded,,Male5Kidney,,,,18 day embryo,public
6,ERX697751,ERP009469,Illumina HiSeq 2500,ERS656989,UBERON:0000948,heart,GgalDv:0000058,Hamburger Hamilton stage 44,,heart,18 day embryo,perfect match,partial sampling,perfect match,M,White leghorn,,9031,,,,,,Male5Heart,SAMEA3242537,,,,,,,,,09/10/2024,TruSeq stranded,,Male5Heart,,,,18 day embryo,public
7,ERX697750,ERP009469,Illumina HiSeq 2500,ERS656988,UBERON:0003903,bursa of Fabricius,GgalDv:0000058,Hamburger Hamilton stage 44,,bursa,18 day embryo,perfect match,full sampling,perfect match,M,White leghorn,,9031,,,,,,Male5Bursa,SAMEA3242536,,,,,,,,,09/10/2024,TruSeq stranded,,Male5Bursa,,,,18 day embryo,public
8,ERX697749,ERP009469,Illumina HiSeq 2500,ERS656987,UBERON:0000955,brain,GgalDv:0000058,Hamburger Hamilton stage 44,,brain,18 day embryo,perfect match,full sampling,perfect match,M,White leghorn,,9031,,,,,,Male5Brain,SAMEA3242535,,,,,,,,,09/10/2024,TruSeq stranded,,Male5Brain,,,,18 day embryo,public
9,ERX697748,ERP009469,Illumina HiSeq 2500,ERS656986,UBERON:0000473,testis,GgalDv:0000058,Hamburger Hamilton stage 44,,testis,18 day embryo,perfect match,full sampling,perfect match,M,White leghorn,,9031,,,,,,Male4Testes,SAMEA3242534,,,,,,,,,09/10/2024,TruSeq stranded,,Male4Testes,,,,18 day embryo,public


#### protocol
see [bulk kits](https://gitlab.sib.swiss/Bgee/scRNA-Seq/-/blob/main/scripts/bulk_kits.csv) for some common protocols

In [10]:
# making these variables because we use them again in the experiment file
my_protocol = 'TruSeq Stranded mRNA'
# full_length or 3'
my_protocolType = 'full_length'

library.loc[:,'protocol'] = my_protocol
library.loc[:,'protocolType'] = my_protocolType
# polyA, ribo-minus, miRNA, lncRNA, circRNA
library.loc[:,'RNASelection'] = 'polyA'

# view
display_df(library)

Unnamed: 0,#libraryId,experimentId,platform,SRSId,anatId,anatName,stageId,stageName,url_GSM,infoOrgan,infoStage,anatAnnotationStatus,anatBiologicalStatus,stageAnnotationStatus,sex,strain,genotype,speciesId,protocol,protocolType,RNASelection,globin_reduction,replicate,lib_name,sampleName,sampleAge_value,sampleAge_unit,PATOid,PATOname,comment,condition,physiologicalStatus,annotatorId,lastModificationDate,library_contruction_protocol,source_qc,lib_name_2,lib_name_3,source_name,individual,infoStage_2,infoStage_3
0,ERX697769,ERP009469,Illumina HiSeq 2500,ERS656920,UBERON:0002113,kidney,GgalDv:0000058,Hamburger Hamilton stage 44,,kidney,18 day embryo,perfect match,partial sampling,perfect match,F,White leghorn,,9031,TruSeq Stranded mRNA,full_length,polyA,,,Female2Kidney,SAMEA3242468,,,,,,,,,09/10/2024,TruSeq stranded,,Female2Kidney,,,,18 day embryo,public
1,ERX697756,ERP009469,Illumina HiSeq 2500,ERS656994,UBERON:0000473,testis,GgalDv:0000058,Hamburger Hamilton stage 44,,testis,18 day embryo,perfect match,full sampling,perfect match,M,White leghorn,,9031,TruSeq Stranded mRNA,full_length,polyA,,,Male5Testes,SAMEA3242542,,,,,,,,,09/10/2024,TruSeq stranded,,Male5Testes,,,,18 day embryo,public
2,ERX697755,ERP009469,Illumina HiSeq 2500,ERS656993,UBERON:0002106,spleen,GgalDv:0000058,Hamburger Hamilton stage 44,,spleen,18 day embryo,perfect match,full sampling,perfect match,M,White leghorn,,9031,TruSeq Stranded mRNA,full_length,polyA,,,Male5Spleen,SAMEA3242541,,,,,,,,,09/10/2024,TruSeq stranded,,Male5Spleen,,,,18 day embryo,public
3,ERX697754,ERP009469,Illumina HiSeq 2500,ERS656992,UBERON:0001495,pectoral muscle,GgalDv:0000058,Hamburger Hamilton stage 44,,muscle,18 day embryo,perfect match,partial sampling,perfect match,M,White leghorn,,9031,TruSeq Stranded mRNA,full_length,polyA,,,Male5Muscle,SAMEA3242540,,,,,,,,,09/10/2024,TruSeq stranded,,Male5Muscle,,,,18 day embryo,public
4,ERX697753,ERP009469,Illumina HiSeq 2500,ERS656991,UBERON:0002048,lung,GgalDv:0000058,Hamburger Hamilton stage 44,,lung,18 day embryo,perfect match,partial sampling,perfect match,M,White leghorn,,9031,TruSeq Stranded mRNA,full_length,polyA,,,Male5Lung,SAMEA3242539,,,,,,,,,09/10/2024,TruSeq stranded,,Male5Lung,,,,18 day embryo,public
5,ERX697752,ERP009469,Illumina HiSeq 2500,ERS656990,UBERON:0002113,kidney,GgalDv:0000058,Hamburger Hamilton stage 44,,kidney,18 day embryo,perfect match,partial sampling,perfect match,M,White leghorn,,9031,TruSeq Stranded mRNA,full_length,polyA,,,Male5Kidney,SAMEA3242538,,,,,,,,,09/10/2024,TruSeq stranded,,Male5Kidney,,,,18 day embryo,public
6,ERX697751,ERP009469,Illumina HiSeq 2500,ERS656989,UBERON:0000948,heart,GgalDv:0000058,Hamburger Hamilton stage 44,,heart,18 day embryo,perfect match,partial sampling,perfect match,M,White leghorn,,9031,TruSeq Stranded mRNA,full_length,polyA,,,Male5Heart,SAMEA3242537,,,,,,,,,09/10/2024,TruSeq stranded,,Male5Heart,,,,18 day embryo,public
7,ERX697750,ERP009469,Illumina HiSeq 2500,ERS656988,UBERON:0003903,bursa of Fabricius,GgalDv:0000058,Hamburger Hamilton stage 44,,bursa,18 day embryo,perfect match,full sampling,perfect match,M,White leghorn,,9031,TruSeq Stranded mRNA,full_length,polyA,,,Male5Bursa,SAMEA3242536,,,,,,,,,09/10/2024,TruSeq stranded,,Male5Bursa,,,,18 day embryo,public
8,ERX697749,ERP009469,Illumina HiSeq 2500,ERS656987,UBERON:0000955,brain,GgalDv:0000058,Hamburger Hamilton stage 44,,brain,18 day embryo,perfect match,full sampling,perfect match,M,White leghorn,,9031,TruSeq Stranded mRNA,full_length,polyA,,,Male5Brain,SAMEA3242535,,,,,,,,,09/10/2024,TruSeq stranded,,Male5Brain,,,,18 day embryo,public
9,ERX697748,ERP009469,Illumina HiSeq 2500,ERS656986,UBERON:0000473,testis,GgalDv:0000058,Hamburger Hamilton stage 44,,testis,18 day embryo,perfect match,full sampling,perfect match,M,White leghorn,,9031,TruSeq Stranded mRNA,full_length,polyA,,,Male4Testes,SAMEA3242534,,,,,,,,,09/10/2024,TruSeq stranded,,Male4Testes,,,,18 day embryo,public


#### globin, replicates

In [24]:
# check for duplicate SRSId values
dup_check(library, "SRSId")

   #libraryId      SRSId
0   ERX697769  ERS656920
84  ERX697673  ERS656920
59  ERX697698  ERS656944
60  ERX697697  ERS656944
57  ERX697700  ERS656946
56  ERX697701  ERS656946
52  ERX697705  ERS656950
51  ERX697706  ERS656950
41  ERX697716  ERS656960
40  ERX697717  ERS656960
38  ERX697719  ERS656962
37  ERX697720  ERS656962
36  ERX697721  ERS656963
35  ERX697722  ERS656963
33  ERX697724  ERS656965
32  ERX697725  ERS656965
29  ERX697728  ERS656968
28  ERX697729  ERS656968
13  ERX697744  ERS656982
14  ERX697743  ERS656982


  dups = df[duplicateCheck].loc[:,['#libraryId', column]]


In [26]:
#library.loc[:,'globin_reduction'] = 'Y'

# replicates
#library.loc[library["#libraryId"] == "old", "replicate"] = "1"
library.loc[library["#libraryId"].isin(["ERX697769", "ERX697673"]), "replicate"] = "1"
library.loc[library["#libraryId"].isin(["ERX697698", "ERX697697"]), "replicate"] = "2"
library.loc[library["#libraryId"].isin(["ERX697700", "ERX697701"]), "replicate"] = "3"
library.loc[library["#libraryId"].isin(["ERX697705", "ERX697706"]), "replicate"] = "4"
library.loc[library["#libraryId"].isin(["ERX697716", "ERX697717"]), "replicate"] = "5"
library.loc[library["#libraryId"].isin(["ERX697719", "ERX697720"]), "replicate"] = "6"
library.loc[library["#libraryId"].isin(["ERX697721", "ERX697722"]), "replicate"] = "7"
library.loc[library["#libraryId"].isin(["ERX697724", "ERX697725"]), "replicate"] = "8"
library.loc[library["#libraryId"].isin(["ERX697728", "ERX697729"]), "replicate"] = "9"
library.loc[library["#libraryId"].isin(["ERX697744", "ERX697743"]), "replicate"] = "10"

# view
display_df(library)

Unnamed: 0,#libraryId,experimentId,platform,SRSId,anatId,anatName,stageId,stageName,url_GSM,infoOrgan,infoStage,anatAnnotationStatus,anatBiologicalStatus,stageAnnotationStatus,sex,strain,genotype,speciesId,protocol,protocolType,RNASelection,globin_reduction,replicate,lib_name,sampleName,sampleAge_value,sampleAge_unit,PATOid,PATOname,comment,condition,physiologicalStatus,annotatorId,lastModificationDate,library_contruction_protocol,source_qc,lib_name_2,lib_name_3,source_name,individual,infoStage_2,infoStage_3
0,ERX697769,ERP009469,Illumina HiSeq 2500,ERS656920,UBERON:0002113,kidney,GgalDv:0000058,Hamburger Hamilton stage 44,,kidney,18 day embryo,perfect match,partial sampling,perfect match,F,White leghorn,,9031,TruSeq Stranded mRNA,full_length,polyA,,1.0,Female2Kidney,SAMEA3242468,,,,,,,,,09/10/2024,TruSeq stranded,,Female2Kidney,,,,18 day embryo,public
1,ERX697756,ERP009469,Illumina HiSeq 2500,ERS656994,UBERON:0000473,testis,GgalDv:0000058,Hamburger Hamilton stage 44,,testis,18 day embryo,perfect match,full sampling,perfect match,M,White leghorn,,9031,TruSeq Stranded mRNA,full_length,polyA,,,Male5Testes,SAMEA3242542,,,,,,,,,09/10/2024,TruSeq stranded,,Male5Testes,,,,18 day embryo,public
2,ERX697755,ERP009469,Illumina HiSeq 2500,ERS656993,UBERON:0002106,spleen,GgalDv:0000058,Hamburger Hamilton stage 44,,spleen,18 day embryo,perfect match,full sampling,perfect match,M,White leghorn,,9031,TruSeq Stranded mRNA,full_length,polyA,,,Male5Spleen,SAMEA3242541,,,,,,,,,09/10/2024,TruSeq stranded,,Male5Spleen,,,,18 day embryo,public
3,ERX697754,ERP009469,Illumina HiSeq 2500,ERS656992,UBERON:0001495,pectoral muscle,GgalDv:0000058,Hamburger Hamilton stage 44,,muscle,18 day embryo,perfect match,partial sampling,perfect match,M,White leghorn,,9031,TruSeq Stranded mRNA,full_length,polyA,,,Male5Muscle,SAMEA3242540,,,,,,,,,09/10/2024,TruSeq stranded,,Male5Muscle,,,,18 day embryo,public
4,ERX697753,ERP009469,Illumina HiSeq 2500,ERS656991,UBERON:0002048,lung,GgalDv:0000058,Hamburger Hamilton stage 44,,lung,18 day embryo,perfect match,partial sampling,perfect match,M,White leghorn,,9031,TruSeq Stranded mRNA,full_length,polyA,,,Male5Lung,SAMEA3242539,,,,,,,,,09/10/2024,TruSeq stranded,,Male5Lung,,,,18 day embryo,public
5,ERX697752,ERP009469,Illumina HiSeq 2500,ERS656990,UBERON:0002113,kidney,GgalDv:0000058,Hamburger Hamilton stage 44,,kidney,18 day embryo,perfect match,partial sampling,perfect match,M,White leghorn,,9031,TruSeq Stranded mRNA,full_length,polyA,,,Male5Kidney,SAMEA3242538,,,,,,,,,09/10/2024,TruSeq stranded,,Male5Kidney,,,,18 day embryo,public
6,ERX697751,ERP009469,Illumina HiSeq 2500,ERS656989,UBERON:0000948,heart,GgalDv:0000058,Hamburger Hamilton stage 44,,heart,18 day embryo,perfect match,partial sampling,perfect match,M,White leghorn,,9031,TruSeq Stranded mRNA,full_length,polyA,,,Male5Heart,SAMEA3242537,,,,,,,,,09/10/2024,TruSeq stranded,,Male5Heart,,,,18 day embryo,public
7,ERX697750,ERP009469,Illumina HiSeq 2500,ERS656988,UBERON:0003903,bursa of Fabricius,GgalDv:0000058,Hamburger Hamilton stage 44,,bursa,18 day embryo,perfect match,full sampling,perfect match,M,White leghorn,,9031,TruSeq Stranded mRNA,full_length,polyA,,,Male5Bursa,SAMEA3242536,,,,,,,,,09/10/2024,TruSeq stranded,,Male5Bursa,,,,18 day embryo,public
8,ERX697749,ERP009469,Illumina HiSeq 2500,ERS656987,UBERON:0000955,brain,GgalDv:0000058,Hamburger Hamilton stage 44,,brain,18 day embryo,perfect match,full sampling,perfect match,M,White leghorn,,9031,TruSeq Stranded mRNA,full_length,polyA,,,Male5Brain,SAMEA3242535,,,,,,,,,09/10/2024,TruSeq stranded,,Male5Brain,,,,18 day embryo,public
9,ERX697748,ERP009469,Illumina HiSeq 2500,ERS656986,UBERON:0000473,testis,GgalDv:0000058,Hamburger Hamilton stage 44,,testis,18 day embryo,perfect match,full sampling,perfect match,M,White leghorn,,9031,TruSeq Stranded mRNA,full_length,polyA,,,Male4Testes,SAMEA3242534,,,,,,,,,09/10/2024,TruSeq stranded,,Male4Testes,,,,18 day embryo,public


#### sample age, pato, physiological status

In [27]:
library.loc[:,'sampleAge_value'] = '18'
library.loc[:,'sampleAge_unit'] = 'embryonic day'

# ex. castrated male
#library.loc[:,'PATOid'] = ''
#library.loc[:,'PATOname'] = ''

# ex. castrated, pregnant, pre-smoltification, post-smoltification, laying eggs
#library.loc[:,'physiologicalStatus'] = ''

# view
display_df(library)

Unnamed: 0,#libraryId,experimentId,platform,SRSId,anatId,anatName,stageId,stageName,url_GSM,infoOrgan,infoStage,anatAnnotationStatus,anatBiologicalStatus,stageAnnotationStatus,sex,strain,genotype,speciesId,protocol,protocolType,RNASelection,globin_reduction,replicate,lib_name,sampleName,sampleAge_value,sampleAge_unit,PATOid,PATOname,comment,condition,physiologicalStatus,annotatorId,lastModificationDate,library_contruction_protocol,source_qc,lib_name_2,lib_name_3,source_name,individual,infoStage_2,infoStage_3
0,ERX697769,ERP009469,Illumina HiSeq 2500,ERS656920,UBERON:0002113,kidney,GgalDv:0000058,Hamburger Hamilton stage 44,,kidney,18 day embryo,perfect match,partial sampling,perfect match,F,White leghorn,,9031,TruSeq Stranded mRNA,full_length,polyA,,1.0,Female2Kidney,SAMEA3242468,18,embryonic day,,,,,,,09/10/2024,TruSeq stranded,,Female2Kidney,,,,18 day embryo,public
1,ERX697756,ERP009469,Illumina HiSeq 2500,ERS656994,UBERON:0000473,testis,GgalDv:0000058,Hamburger Hamilton stage 44,,testis,18 day embryo,perfect match,full sampling,perfect match,M,White leghorn,,9031,TruSeq Stranded mRNA,full_length,polyA,,,Male5Testes,SAMEA3242542,18,embryonic day,,,,,,,09/10/2024,TruSeq stranded,,Male5Testes,,,,18 day embryo,public
2,ERX697755,ERP009469,Illumina HiSeq 2500,ERS656993,UBERON:0002106,spleen,GgalDv:0000058,Hamburger Hamilton stage 44,,spleen,18 day embryo,perfect match,full sampling,perfect match,M,White leghorn,,9031,TruSeq Stranded mRNA,full_length,polyA,,,Male5Spleen,SAMEA3242541,18,embryonic day,,,,,,,09/10/2024,TruSeq stranded,,Male5Spleen,,,,18 day embryo,public
3,ERX697754,ERP009469,Illumina HiSeq 2500,ERS656992,UBERON:0001495,pectoral muscle,GgalDv:0000058,Hamburger Hamilton stage 44,,muscle,18 day embryo,perfect match,partial sampling,perfect match,M,White leghorn,,9031,TruSeq Stranded mRNA,full_length,polyA,,,Male5Muscle,SAMEA3242540,18,embryonic day,,,,,,,09/10/2024,TruSeq stranded,,Male5Muscle,,,,18 day embryo,public
4,ERX697753,ERP009469,Illumina HiSeq 2500,ERS656991,UBERON:0002048,lung,GgalDv:0000058,Hamburger Hamilton stage 44,,lung,18 day embryo,perfect match,partial sampling,perfect match,M,White leghorn,,9031,TruSeq Stranded mRNA,full_length,polyA,,,Male5Lung,SAMEA3242539,18,embryonic day,,,,,,,09/10/2024,TruSeq stranded,,Male5Lung,,,,18 day embryo,public
5,ERX697752,ERP009469,Illumina HiSeq 2500,ERS656990,UBERON:0002113,kidney,GgalDv:0000058,Hamburger Hamilton stage 44,,kidney,18 day embryo,perfect match,partial sampling,perfect match,M,White leghorn,,9031,TruSeq Stranded mRNA,full_length,polyA,,,Male5Kidney,SAMEA3242538,18,embryonic day,,,,,,,09/10/2024,TruSeq stranded,,Male5Kidney,,,,18 day embryo,public
6,ERX697751,ERP009469,Illumina HiSeq 2500,ERS656989,UBERON:0000948,heart,GgalDv:0000058,Hamburger Hamilton stage 44,,heart,18 day embryo,perfect match,partial sampling,perfect match,M,White leghorn,,9031,TruSeq Stranded mRNA,full_length,polyA,,,Male5Heart,SAMEA3242537,18,embryonic day,,,,,,,09/10/2024,TruSeq stranded,,Male5Heart,,,,18 day embryo,public
7,ERX697750,ERP009469,Illumina HiSeq 2500,ERS656988,UBERON:0003903,bursa of Fabricius,GgalDv:0000058,Hamburger Hamilton stage 44,,bursa,18 day embryo,perfect match,full sampling,perfect match,M,White leghorn,,9031,TruSeq Stranded mRNA,full_length,polyA,,,Male5Bursa,SAMEA3242536,18,embryonic day,,,,,,,09/10/2024,TruSeq stranded,,Male5Bursa,,,,18 day embryo,public
8,ERX697749,ERP009469,Illumina HiSeq 2500,ERS656987,UBERON:0000955,brain,GgalDv:0000058,Hamburger Hamilton stage 44,,brain,18 day embryo,perfect match,full sampling,perfect match,M,White leghorn,,9031,TruSeq Stranded mRNA,full_length,polyA,,,Male5Brain,SAMEA3242535,18,embryonic day,,,,,,,09/10/2024,TruSeq stranded,,Male5Brain,,,,18 day embryo,public
9,ERX697748,ERP009469,Illumina HiSeq 2500,ERS656986,UBERON:0000473,testis,GgalDv:0000058,Hamburger Hamilton stage 44,,testis,18 day embryo,perfect match,full sampling,perfect match,M,White leghorn,,9031,TruSeq Stranded mRNA,full_length,polyA,,,Male4Testes,SAMEA3242534,18,embryonic day,,,,,,,09/10/2024,TruSeq stranded,,Male4Testes,,,,18 day embryo,public


#### condition

In [None]:
# ex. control, diet, light, reproductive capacity, time post mortem, time post feeding, 
# exercise details, menstruation, personality, litter size 
#library.loc[library["condition"] == "old", "condition"] = "new"

# view
display_df(library)

#### annotator id, last modification date

In [28]:
library.loc[:,'annotatorId'] = 'SAC'
library.loc[:,'lastModificationDate'] = '2024-10-10'

# view
display_df(library)

Unnamed: 0,#libraryId,experimentId,platform,SRSId,anatId,anatName,stageId,stageName,url_GSM,infoOrgan,infoStage,anatAnnotationStatus,anatBiologicalStatus,stageAnnotationStatus,sex,strain,genotype,speciesId,protocol,protocolType,RNASelection,globin_reduction,replicate,lib_name,sampleName,sampleAge_value,sampleAge_unit,PATOid,PATOname,comment,condition,physiologicalStatus,annotatorId,lastModificationDate,library_contruction_protocol,source_qc,lib_name_2,lib_name_3,source_name,individual,infoStage_2,infoStage_3
0,ERX697769,ERP009469,Illumina HiSeq 2500,ERS656920,UBERON:0002113,kidney,GgalDv:0000058,Hamburger Hamilton stage 44,,kidney,18 day embryo,perfect match,partial sampling,perfect match,F,White leghorn,,9031,TruSeq Stranded mRNA,full_length,polyA,,1.0,Female2Kidney,SAMEA3242468,18,embryonic day,,,,,,SAC,2024-10-10,TruSeq stranded,,Female2Kidney,,,,18 day embryo,public
1,ERX697756,ERP009469,Illumina HiSeq 2500,ERS656994,UBERON:0000473,testis,GgalDv:0000058,Hamburger Hamilton stage 44,,testis,18 day embryo,perfect match,full sampling,perfect match,M,White leghorn,,9031,TruSeq Stranded mRNA,full_length,polyA,,,Male5Testes,SAMEA3242542,18,embryonic day,,,,,,SAC,2024-10-10,TruSeq stranded,,Male5Testes,,,,18 day embryo,public
2,ERX697755,ERP009469,Illumina HiSeq 2500,ERS656993,UBERON:0002106,spleen,GgalDv:0000058,Hamburger Hamilton stage 44,,spleen,18 day embryo,perfect match,full sampling,perfect match,M,White leghorn,,9031,TruSeq Stranded mRNA,full_length,polyA,,,Male5Spleen,SAMEA3242541,18,embryonic day,,,,,,SAC,2024-10-10,TruSeq stranded,,Male5Spleen,,,,18 day embryo,public
3,ERX697754,ERP009469,Illumina HiSeq 2500,ERS656992,UBERON:0001495,pectoral muscle,GgalDv:0000058,Hamburger Hamilton stage 44,,muscle,18 day embryo,perfect match,partial sampling,perfect match,M,White leghorn,,9031,TruSeq Stranded mRNA,full_length,polyA,,,Male5Muscle,SAMEA3242540,18,embryonic day,,,,,,SAC,2024-10-10,TruSeq stranded,,Male5Muscle,,,,18 day embryo,public
4,ERX697753,ERP009469,Illumina HiSeq 2500,ERS656991,UBERON:0002048,lung,GgalDv:0000058,Hamburger Hamilton stage 44,,lung,18 day embryo,perfect match,partial sampling,perfect match,M,White leghorn,,9031,TruSeq Stranded mRNA,full_length,polyA,,,Male5Lung,SAMEA3242539,18,embryonic day,,,,,,SAC,2024-10-10,TruSeq stranded,,Male5Lung,,,,18 day embryo,public
5,ERX697752,ERP009469,Illumina HiSeq 2500,ERS656990,UBERON:0002113,kidney,GgalDv:0000058,Hamburger Hamilton stage 44,,kidney,18 day embryo,perfect match,partial sampling,perfect match,M,White leghorn,,9031,TruSeq Stranded mRNA,full_length,polyA,,,Male5Kidney,SAMEA3242538,18,embryonic day,,,,,,SAC,2024-10-10,TruSeq stranded,,Male5Kidney,,,,18 day embryo,public
6,ERX697751,ERP009469,Illumina HiSeq 2500,ERS656989,UBERON:0000948,heart,GgalDv:0000058,Hamburger Hamilton stage 44,,heart,18 day embryo,perfect match,partial sampling,perfect match,M,White leghorn,,9031,TruSeq Stranded mRNA,full_length,polyA,,,Male5Heart,SAMEA3242537,18,embryonic day,,,,,,SAC,2024-10-10,TruSeq stranded,,Male5Heart,,,,18 day embryo,public
7,ERX697750,ERP009469,Illumina HiSeq 2500,ERS656988,UBERON:0003903,bursa of Fabricius,GgalDv:0000058,Hamburger Hamilton stage 44,,bursa,18 day embryo,perfect match,full sampling,perfect match,M,White leghorn,,9031,TruSeq Stranded mRNA,full_length,polyA,,,Male5Bursa,SAMEA3242536,18,embryonic day,,,,,,SAC,2024-10-10,TruSeq stranded,,Male5Bursa,,,,18 day embryo,public
8,ERX697749,ERP009469,Illumina HiSeq 2500,ERS656987,UBERON:0000955,brain,GgalDv:0000058,Hamburger Hamilton stage 44,,brain,18 day embryo,perfect match,full sampling,perfect match,M,White leghorn,,9031,TruSeq Stranded mRNA,full_length,polyA,,,Male5Brain,SAMEA3242535,18,embryonic day,,,,,,SAC,2024-10-10,TruSeq stranded,,Male5Brain,,,,18 day embryo,public
9,ERX697748,ERP009469,Illumina HiSeq 2500,ERS656986,UBERON:0000473,testis,GgalDv:0000058,Hamburger Hamilton stage 44,,testis,18 day embryo,perfect match,full sampling,perfect match,M,White leghorn,,9031,TruSeq Stranded mRNA,full_length,polyA,,,Male4Testes,SAMEA3242534,18,embryonic day,,,,,,SAC,2024-10-10,TruSeq stranded,,Male4Testes,,,,18 day embryo,public


#### comments

In [29]:
library.loc[:,'comment'] = 'PMID:26108680'

#### save complete file with correct columns

In [30]:
library_file_complete = library[library_cols]
library_file_complete.to_csv(library_to_add_path, sep="\t", index=False, quoting=csv.QUOTE_ALL)

# view
display_df(library_file_complete)

Unnamed: 0,#libraryId,experimentId,platform,SRSId,anatId,anatName,stageId,stageName,url_GSM,infoOrgan,infoStage,anatAnnotationStatus,anatBiologicalStatus,stageAnnotationStatus,sex,strain,genotype,speciesId,protocol,protocolType,RNASelection,globin_reduction,replicate,lib_name,sampleName,sampleAge_value,sampleAge_unit,PATOid,PATOname,comment,condition,physiologicalStatus,annotatorId,lastModificationDate
0,ERX697769,ERP009469,Illumina HiSeq 2500,ERS656920,UBERON:0002113,kidney,GgalDv:0000058,Hamburger Hamilton stage 44,,kidney,18 day embryo,perfect match,partial sampling,perfect match,F,White leghorn,,9031,TruSeq Stranded mRNA,full_length,polyA,,1.0,Female2Kidney,SAMEA3242468,18,embryonic day,,,PMID:26108680,,,SAC,2024-10-10
1,ERX697756,ERP009469,Illumina HiSeq 2500,ERS656994,UBERON:0000473,testis,GgalDv:0000058,Hamburger Hamilton stage 44,,testis,18 day embryo,perfect match,full sampling,perfect match,M,White leghorn,,9031,TruSeq Stranded mRNA,full_length,polyA,,,Male5Testes,SAMEA3242542,18,embryonic day,,,PMID:26108680,,,SAC,2024-10-10
2,ERX697755,ERP009469,Illumina HiSeq 2500,ERS656993,UBERON:0002106,spleen,GgalDv:0000058,Hamburger Hamilton stage 44,,spleen,18 day embryo,perfect match,full sampling,perfect match,M,White leghorn,,9031,TruSeq Stranded mRNA,full_length,polyA,,,Male5Spleen,SAMEA3242541,18,embryonic day,,,PMID:26108680,,,SAC,2024-10-10
3,ERX697754,ERP009469,Illumina HiSeq 2500,ERS656992,UBERON:0001495,pectoral muscle,GgalDv:0000058,Hamburger Hamilton stage 44,,muscle,18 day embryo,perfect match,partial sampling,perfect match,M,White leghorn,,9031,TruSeq Stranded mRNA,full_length,polyA,,,Male5Muscle,SAMEA3242540,18,embryonic day,,,PMID:26108680,,,SAC,2024-10-10
4,ERX697753,ERP009469,Illumina HiSeq 2500,ERS656991,UBERON:0002048,lung,GgalDv:0000058,Hamburger Hamilton stage 44,,lung,18 day embryo,perfect match,partial sampling,perfect match,M,White leghorn,,9031,TruSeq Stranded mRNA,full_length,polyA,,,Male5Lung,SAMEA3242539,18,embryonic day,,,PMID:26108680,,,SAC,2024-10-10
5,ERX697752,ERP009469,Illumina HiSeq 2500,ERS656990,UBERON:0002113,kidney,GgalDv:0000058,Hamburger Hamilton stage 44,,kidney,18 day embryo,perfect match,partial sampling,perfect match,M,White leghorn,,9031,TruSeq Stranded mRNA,full_length,polyA,,,Male5Kidney,SAMEA3242538,18,embryonic day,,,PMID:26108680,,,SAC,2024-10-10
6,ERX697751,ERP009469,Illumina HiSeq 2500,ERS656989,UBERON:0000948,heart,GgalDv:0000058,Hamburger Hamilton stage 44,,heart,18 day embryo,perfect match,partial sampling,perfect match,M,White leghorn,,9031,TruSeq Stranded mRNA,full_length,polyA,,,Male5Heart,SAMEA3242537,18,embryonic day,,,PMID:26108680,,,SAC,2024-10-10
7,ERX697750,ERP009469,Illumina HiSeq 2500,ERS656988,UBERON:0003903,bursa of Fabricius,GgalDv:0000058,Hamburger Hamilton stage 44,,bursa,18 day embryo,perfect match,full sampling,perfect match,M,White leghorn,,9031,TruSeq Stranded mRNA,full_length,polyA,,,Male5Bursa,SAMEA3242536,18,embryonic day,,,PMID:26108680,,,SAC,2024-10-10
8,ERX697749,ERP009469,Illumina HiSeq 2500,ERS656987,UBERON:0000955,brain,GgalDv:0000058,Hamburger Hamilton stage 44,,brain,18 day embryo,perfect match,full sampling,perfect match,M,White leghorn,,9031,TruSeq Stranded mRNA,full_length,polyA,,,Male5Brain,SAMEA3242535,18,embryonic day,,,PMID:26108680,,,SAC,2024-10-10
9,ERX697748,ERP009469,Illumina HiSeq 2500,ERS656986,UBERON:0000473,testis,GgalDv:0000058,Hamburger Hamilton stage 44,,testis,18 day embryo,perfect match,full sampling,perfect match,M,White leghorn,,9031,TruSeq Stranded mRNA,full_length,polyA,,,Male4Testes,SAMEA3242534,18,embryonic day,,,PMID:26108680,,,SAC,2024-10-10


### experiment annotations

In [31]:
experiment = pd.read_csv(experiment_path_from_script, sep='\t', index_col=False, keep_default_na=False, na_values=['NULL','null', 'nan','NaN'], dtype=object)
display_df(experiment)

Unnamed: 0,#experimentId,experimentName,experimentDescription,experimentSource,experimentStatus,projectTags,numberOfAnnotatedLibraries,protocol,protocolType,GSE,Bioproject,PMID,reference_url,DOI,xrefs,comment
0,ERP009469,RNA sequencing of ten chicken embryo tissues in comparison to proteome mass spectrometry,We sequenced total mRNA from five male and five female chicken embryos for ten different tissues. The same samples were run through a tandem mass spectrometre for proteome quantification. The study aims at comparing RNA and protein levels and making inferences about gene expression regulation at the translation level.,SRA,,,,,,,PRJEB8390,,,"E,r,r,o,r,:, ,U,n,a,b,l,e, ,t,o, ,r,e,t,r,i,e,v,e, ,d,a,t,a,,, ,S,t,a,t,u,s, ,c,o,d,e, ,4,0,4",,


#### experiment and protocol details

In [32]:
# this will give you the number of rows in the complete library file 
# this should be the number of annotated libraries
ann_lib = len(library_file_complete.index)
len(library_file_complete.index)

97

In [33]:
# partial or total
experiment.loc[:,'experimentStatus'] = 'total'
experiment.loc[:,'projectTags'] = 'FAANG' 
# see above cell, also can add as free text
experiment.loc[:,'numberOfAnnotatedLibraries'] = ann_lib

# these variables should already exist from above but if not can just add as free text
experiment.loc[:,'protocol'] = my_protocol
experiment.loc[:,'protocolType'] = my_protocolType

display_df(experiment)

Unnamed: 0,#experimentId,experimentName,experimentDescription,experimentSource,experimentStatus,projectTags,numberOfAnnotatedLibraries,protocol,protocolType,GSE,Bioproject,PMID,reference_url,DOI,xrefs,comment
0,ERP009469,RNA sequencing of ten chicken embryo tissues in comparison to proteome mass spectrometry,We sequenced total mRNA from five male and five female chicken embryos for ten different tissues. The same samples were run through a tandem mass spectrometre for proteome quantification. The study aims at comparing RNA and protein levels and making inferences about gene expression regulation at the translation level.,SRA,total,FAANG,97,TruSeq Stranded mRNA,full_length,,PRJEB8390,,,"E,r,r,o,r,:, ,U,n,a,b,l,e, ,t,o, ,r,e,t,r,i,e,v,e, ,d,a,t,a,,, ,S,t,a,t,u,s, ,c,o,d,e, ,4,0,4",,


#### paper and xrefs

In [34]:
#experiment.loc[:,'GSE'] = ''
#experiment.loc[:,'Bioproject'] = '' 
experiment.loc[:,'PMID'] = '26108680'
experiment.loc[:,'reference_url'] = 'https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4576709/'
experiment.loc[:,'DOI'] = '10.1093/molbev/msv147'
#experiment.loc[:,'xrefs'] = ''

display_df(experiment)

Unnamed: 0,#experimentId,experimentName,experimentDescription,experimentSource,experimentStatus,projectTags,numberOfAnnotatedLibraries,protocol,protocolType,GSE,Bioproject,PMID,reference_url,DOI,xrefs,comment
0,ERP009469,RNA sequencing of ten chicken embryo tissues in comparison to proteome mass spectrometry,We sequenced total mRNA from five male and five female chicken embryos for ten different tissues. The same samples were run through a tandem mass spectrometre for proteome quantification. The study aims at comparing RNA and protein levels and making inferences about gene expression regulation at the translation level.,SRA,total,FAANG,97,TruSeq Stranded mRNA,full_length,,PRJEB8390,26108680,https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4576709/,10.1093/molbev/msv147,,


#### comments

In [None]:
#experiment.loc[:,'comment'] = ''

display_df(experiment)

#### save complete file

In [35]:
experiment.to_csv(experiment_to_add_path, sep="\t", index=False, quoting=csv.QUOTE_ALL)

### QA time

In [36]:
library_to_add = pd.read_csv(library_to_add_path, sep='\t', index_col=False, keep_default_na=False, na_values=['NULL','null', 'nan','NaN'], dtype=object)
experiment_to_add = pd.read_csv(experiment_to_add_path, sep='\t', index_col=False, keep_default_na=False, na_values=['NULL','null', 'nan','NaN'], dtype=object)

#### to add things here

#### check columns match

In [37]:
# pull from git and pull in library/experiment file
! git pull
git_library = pd.read_csv(git_library_path, sep='\t', index_col=False, keep_default_na=False, na_values=['NULL','null', 'nan','NaN'], dtype=object)
git_experiment = pd.read_csv(git_experiment_path, sep='\t', index_col=False, keep_default_na=False, na_values=['NULL','null', 'nan','NaN'], dtype=object)

# library file
if set(library_to_add.columns) == set(git_library.columns):
    print('The columns in the library file match')
else:
    print('The columns in the library file DO NOT MATCH')

# experiment file
if set(experiment_to_add.columns) == set(git_experiment.columns):
    print('The columns in the experiment file match')
else:
    print('The columns in the experiment file DO NOT MATCH')


# maybe to make this something more like "COLUMNS GOOD - LIBRARY" and "COLUMNS BAD - EXPERIMENT"

Already up to date.
The columns in the library file match
The columns in the experiment file match


#### view files

In [39]:
library_git_plus_new = pd.concat([git_library, library_to_add], ignore_index = True, sort = False)
library_git_plus_new.tail(n=100)

Unnamed: 0,#libraryId,experimentId,platform,SRSId,anatId,anatName,stageId,stageName,url_GSM,infoOrgan,infoStage,anatAnnotationStatus,anatBiologicalStatus,stageAnnotationStatus,sex,strain,genotype,speciesId,protocol,protocolType,RNASelection,globin_reduction,replicate,lib_name,sampleName,sampleAge_value,sampleAge_unit,PATOid,PATOname,comment,condition,physiologicalStatus,annotatorId,lastModificationDate
40400,SRX3937400,SRP140120,Illumina HiSeq 2000,SRS3169055,UBERON:0001477,infraspinatus muscle,EcabDv:0000005,adulthood stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi...,M.infraspinatus,"Born 2012, sequenced around 2018",perfect match,not documented,other,M,Jeju,,9796,RiboMinus Eukaryote Kit and TruSeq RNA Kit,full_length,ribo-minus,,,JJH-18,"SAMN08933816,GSM3098096",,,,,"PMID:33213000, Ribosomal RNA was removed from ...",,,ANN,2024-10-08
40401,SRX3937399,SRP140120,Illumina HiSeq 2000,SRS3169054,UBERON:0001507,biceps brachii,EcabDv:0000005,adulthood stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi...,M.biceps brachii,"Born 2012, sequenced around 2018",perfect match,not documented,other,M,Jeju,,9796,RiboMinus Eukaryote Kit and TruSeq RNA Kit,full_length,ribo-minus,,,JJH-17,"SAMN08933817,GSM3098095",,,,,"PMID:33213000, Ribosomal RNA was removed from ...",,,ANN,2024-10-08
40402,SRX3937398,SRP140120,Illumina HiSeq 2000,SRS3169053,UBERON:0001509,triceps brachii,EcabDv:0000005,adulthood stage,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi...,Caput longum des M. triceps brachii,"Born 2012, sequenced around 2018",missing child term,not documented,other,M,Jeju,,9796,RiboMinus Eukaryote Kit and TruSeq RNA Kit,full_length,ribo-minus,,,JJH-16,"SAMN08933818,GSM3098094",,,,,"PMID:33213000, Ribosomal RNA was removed from ...",,,ANN,2024-10-08
40403,ERX697769,ERP009469,Illumina HiSeq 2500,ERS656920,UBERON:0002113,kidney,GgalDv:0000058,Hamburger Hamilton stage 44,,kidney,18 day embryo,perfect match,partial sampling,perfect match,F,White leghorn,,9031,TruSeq Stranded mRNA,full_length,polyA,,1.0,Female2Kidney,SAMEA3242468,18.0,embryonic day,,,PMID:26108680,,,SAC,2024-10-10
40404,ERX697756,ERP009469,Illumina HiSeq 2500,ERS656994,UBERON:0000473,testis,GgalDv:0000058,Hamburger Hamilton stage 44,,testis,18 day embryo,perfect match,full sampling,perfect match,M,White leghorn,,9031,TruSeq Stranded mRNA,full_length,polyA,,,Male5Testes,SAMEA3242542,18.0,embryonic day,,,PMID:26108680,,,SAC,2024-10-10
40405,ERX697755,ERP009469,Illumina HiSeq 2500,ERS656993,UBERON:0002106,spleen,GgalDv:0000058,Hamburger Hamilton stage 44,,spleen,18 day embryo,perfect match,full sampling,perfect match,M,White leghorn,,9031,TruSeq Stranded mRNA,full_length,polyA,,,Male5Spleen,SAMEA3242541,18.0,embryonic day,,,PMID:26108680,,,SAC,2024-10-10
40406,ERX697754,ERP009469,Illumina HiSeq 2500,ERS656992,UBERON:0001495,pectoral muscle,GgalDv:0000058,Hamburger Hamilton stage 44,,muscle,18 day embryo,perfect match,partial sampling,perfect match,M,White leghorn,,9031,TruSeq Stranded mRNA,full_length,polyA,,,Male5Muscle,SAMEA3242540,18.0,embryonic day,,,PMID:26108680,,,SAC,2024-10-10
40407,ERX697753,ERP009469,Illumina HiSeq 2500,ERS656991,UBERON:0002048,lung,GgalDv:0000058,Hamburger Hamilton stage 44,,lung,18 day embryo,perfect match,partial sampling,perfect match,M,White leghorn,,9031,TruSeq Stranded mRNA,full_length,polyA,,,Male5Lung,SAMEA3242539,18.0,embryonic day,,,PMID:26108680,,,SAC,2024-10-10
40408,ERX697752,ERP009469,Illumina HiSeq 2500,ERS656990,UBERON:0002113,kidney,GgalDv:0000058,Hamburger Hamilton stage 44,,kidney,18 day embryo,perfect match,partial sampling,perfect match,M,White leghorn,,9031,TruSeq Stranded mRNA,full_length,polyA,,,Male5Kidney,SAMEA3242538,18.0,embryonic day,,,PMID:26108680,,,SAC,2024-10-10
40409,ERX697751,ERP009469,Illumina HiSeq 2500,ERS656989,UBERON:0000948,heart,GgalDv:0000058,Hamburger Hamilton stage 44,,heart,18 day embryo,perfect match,partial sampling,perfect match,M,White leghorn,,9031,TruSeq Stranded mRNA,full_length,polyA,,,Male5Heart,SAMEA3242537,18.0,embryonic day,,,PMID:26108680,,,SAC,2024-10-10


In [40]:
experiment_git_plus_new = pd.concat([git_experiment, experiment_to_add], ignore_index = True, sort = False)
experiment_git_plus_new.tail(n=5)

Unnamed: 0,#experimentId,experimentName,experimentDescription,experimentSource,experimentStatus,projectTags,numberOfAnnotatedLibraries,protocol,protocolType,GSE,Bioproject,PMID,reference_url,DOI,xrefs,comment
787,SRP144776,Integrated analysis of lncRNA and mRNA reveals...,We systematically investigated the lncRNA and ...,SRA,total,,21,Ribo-Zero Gold Kit,full_length,GSE114129,PRJNA464381,30545297,https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6...,10.1186/s12864-018-5301-x,,
788,SRP465528,Transcriptomic data from 100 to 105 tissues fr...,Domesticated herbivores are an important agric...,SRA,total,,1642,NEBNext RNA First Strand Synthesis Module,full_length,,PRJNA1017964,38734729,https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1...,10.1038/s41597-024-03338-5,,contact to authors (Wed 10/2/2024) for platfor...
789,SRP222884,Mongolian Horse Muscle RNA-seq,Mongolian Horse Muscle RNA-seq,SRA,total,,24,mRNA Seq sample preparation Kit,full_length,,PRJNA573500,31869634,https://www.sciencedirect.com/science/article/...,10.1016/j.cbd.2019.100649,,there are technical replicates for each of the...
790,SRP140120,Classification and functional identification b...,Purpose : The RNA-seq data of 18 skeletal musc...,SRA,total,,18,RiboMinus Eukaryote Kit and TruSeq RNA Kit,full_length,GSE113147,PRJNA450263,33213000,https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7...,10.3390/genes11111359,,
791,ERP009469,RNA sequencing of ten chicken embryo tissues i...,We sequenced total mRNA from five male and fiv...,SRA,total,FAANG,97,TruSeq Stranded mRNA,full_length,,PRJEB8390,26108680,https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4...,10.1093/molbev/msv147,,


### add annotations to git

In [41]:
! git pull

Already up to date.


In [42]:
library_git_plus_new.to_csv(git_library_path, sep="\t", index=False, quoting=csv.QUOTE_ALL)
experiment_git_plus_new.to_csv(git_experiment_path, sep="\t", index=False, quoting=csv.QUOTE_ALL)
update_format(git_library_path)
update_format(git_experiment_path)

In [43]:
! git status

On branch develop
Your branch is up to date with 'origin/develop'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	[31mmodified:   ../../../RNA_Seq/RNASeqExperiment.tsv[m
	[31mmodified:   ../../../RNA_Seq/RNASeqLibrary.tsv[m

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	[31m./[m

no changes added to commit (use "git add" and/or "git commit -a")


In [44]:
! git add $git_experiment_path $git_library_path

In [45]:
! git commit -m $commit_message_exp

[develop 9df9908] adding annotated bulk experiment ERP009469
 2 files changed, 1785 insertions(+), 1687 deletions(-)


In [46]:
! git push

Enumerating objects: 9, done.
Counting objects: 100% (9/9), done.
Delta compression using up to 12 threads
Compressing objects: 100% (5/5), done.
Writing objects: 100% (5/5), 18.56 KiB | 188.00 KiB/s, done.
Total 5 (delta 4), reused 0 (delta 0), pack-reused 0
remote: 
remote: To create a merge request for develop, visit:[K
remote:   https://gitlab.sib.swiss/Bgee/expression-annotations/-/merge_requests/new?merge_request%5Bsource_branch%5D=develop[K
remote: 
To https://gitlab.sib.swiss/Bgee/expression-annotations.git
   7bc1187..9df9908  develop -> develop


### add annotation folder and script to git

In [47]:
! git status

On branch develop
Your branch is up to date with 'origin/develop'.

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	[31m./[m

nothing added to commit but untracked files present (use "git add" to track)


1. run first two cells (annotation summary)
2. export as html

In [None]:
! git add $path_to_output

In [None]:
! git commit -m $commit_message_py

In [None]:
! git push