## SRP006787 / GSE29278

**paper:** [PMID: 22763441](https://www.nature.com/articles/nature11243) - A map of the cis-regulatory sequences in the mouse genome

**date, curator:** 2024-09-19, Sara Carsanaro

In [1]:
experiment_id = "SRP006787"

path_to_create_exp_script = "/Users/scarsana/Desktop/git/scRNA-Seq/scripts/Create_ExpLib_tables.py" 
experiment_type = "bulk"

path_to_output_main = "/Users/scarsana/Desktop/git/expression-annotations/Notebooks/bulk/" 
path_to_output = "{}{}/".format(path_to_output_main, experiment_id)
library_path_from_script = "{}RNASeqLibrary_{}.tsv".format(path_to_output, experiment_id)
experiment_path_from_script = "{}RNASeqExperiment_{}.tsv".format(path_to_output, experiment_id)
library_to_add_path = "{}complete_RNASeqLibrary_{}.tsv".format(path_to_output, experiment_id)
experiment_to_add_path = "{}complete_RNASeqExperiment_{}.tsv".format(path_to_output, experiment_id)
script_file = "{}.ipynb".format(experiment_id)
commit_message_exp = '"adding annotated bulk experiment {}"'.format(experiment_id)
commit_message_py = '"adding annotation files for {} to notebook folder"'.format(experiment_id)

## for desktop testing
#path_to_output_desktop = "/Users/scarsana/Desktop/annotate/redo/" 
#path_to_output = "{}{}/".format(path_to_output_desktop, experiment_id)

## to add to git
path_to_git_annotations = "/Users/scarsana/Desktop/git/expression-annotations/RNA_Seq/"
git_library_path = "{}RNASeqLibrary.tsv".format(path_to_git_annotations)
git_experiment_path = "{}RNASeqExperiment.tsv".format(path_to_git_annotations)


In [15]:
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
import pandas as pd
import numpy as np
from IPython.display import display, HTML
import os
import csv

# displays df with the scrollbar next to the DataFrame
def display_df(df):
    pd.set_option("display.max_rows", None)
    pd.set_option("display.max_columns", None)
    display(HTML("<div style='height: 200px; overflow: auto; width: fit-content'>" +
        df.style.to_html(index=False) + "</div>"))

# function that compares two columns in a dataframe and tells you which ones are not equal (case insensitive)
def compare_columns(df, col1, col2, return_col):
    compare_return = df[col1].str.lower() != df[col2].str.lower()  
    df.loc[compare_return, return_col] 
    if not any(compare_return):
        print("The two columns are equal (case insensitive)")
    else:
        print("The following rows are not equal: ")
        print(df.loc[compare_return, return_col])


def update_format(path):
    with open(path, 'r') as file:
        filedata = file.read()
    # Replace the target string
    filedata = filedata.replace("\t\"\"", "\t")
    # Write the file out again
    with open(path, 'w') as file:
        file.write(filedata)

def dup_check(df, column):
    duplicateCheck = df.duplicated(subset=[column], keep=False)
    if duplicateCheck.unique() == False:
        print("no duplicate values in " + column)
    elif duplicateCheck.unique() == True and column != '#libraryId':
        print(df[duplicateCheck].loc[:,['#libraryId', column]])
    elif duplicateCheck.unique() == True and column == '#libraryId':
        print(df[duplicateCheck].loc[:,['#libraryId']])

def unique_sorted(df, column):
    unique = df[column].unique()
    unique.sort()
    print(unique)

### script

In [4]:
! python3 $path_to_create_exp_script $experiment_id $path_to_output $experiment_type

  all_protoc = [w.replace('(', '\(') for w in all_protoc]
  all_protoc = [w.replace(')', '\)') for w in all_protoc] 
Be patient, it may take a few minutes.
19-Sep-2024 15:22:36 DEBUG utils - Directory ./ already exists. Skipping.
19-Sep-2024 15:22:36 INFO GEOparse - Downloading ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE29nnn/GSE29278/soft/GSE29278_family.soft.gz to ./GSE29278_family.soft.gz
100%|██████████████████████████████████████| 4.21k/4.21k [00:00<00:00, 9.42kB/s]
19-Sep-2024 15:22:37 DEBUG downloader - Size validation passed
19-Sep-2024 15:22:37 DEBUG downloader - Moving /var/folders/b5/crkp117d43q5mcndnwlrww3w0000gn/T/tmprrvsoppf to /Users/scarsana/Desktop/git/expression-annotations/Notebooks/bulk/SRP006787/GSE29278_family.soft.gz
19-Sep-2024 15:22:37 DEBUG downloader - Successfully downloaded ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE29nnn/GSE29278/soft/GSE29278_family.soft.gz
19-Sep-2024 15:22:37 INFO GEOparse - Parsing ./GSE29278_family.soft.gz: 
19-Sep-2024 15:22:37 DEBUG GEO

### manual updates
general fix
- copied contents of source name to infoOrgan 

all from supplemental methods:
- Adult bone marrow, cerebellum, cortex, heart, intestine, kidney, liver, lung, olfactory bulb, spleen, testis, and thymus were dissected from 8-week old male C57Bl/6 mice -> info stage + sex update
- for placenta: updaing sex to F, see [issue 14](https://gitlab.sib.swiss/Bgee/expression-annotations/-/issues/14), Placenta was dissected from pregnant C57Bl/6 mice at E14.5, updating infostage to E14.5, since pregnant this is sexually mature adult
- E14.5 brain, heart, limb and liver, and mouse embryonic fibroblast (MEF) cells were derived from E14.5 C57Bl/6 mouse embryos -> info stage update
- MEF cells were genotyped to select male MEF cells used for this study -> sex update to M
- all other E14.5 libraries should have sex as NA because it's not known 


removed library
- SRX063006: Bruce4 embryonic stem cells, this is a cell line and should be removd

### library annnotations

In [8]:
library = pd.read_csv(library_path_from_script, sep='\t', index_col=False, keep_default_na=False, na_values=['NULL','null', 'nan','NaN'], dtype=object)
display_df(library)

Unnamed: 0,#libraryId,experimentId,platform,SRSId,anatId,anatName,stageId,stageName,url_GSM,infoOrgan,infoStage,anatAnnotationStatus,anatBiologicalStatus,stageAnnotationStatus,sex,strain,genotype,speciesId,protocol,protocolType,RNASelection,globin_reduction,replicate,lib_name,sampleName,sampleAge_value,sampleAge_unit,PATOid,PATOname,comment,condition,physiologicalStatus,annotatorId,lastModificationDate,library_contruction_protocol,source_qc,lib_name_2,lib_name_3,source_name,individual,infoStage_2,infoStage_3
0,SRX113077,SRP006787,Illumina HiSeq 2000,SRS283492,UBERON:0002370,thymus,,,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM850914,mouse thymus,8-week old,perfect match,,,M,C57BL/6,,10090,Dynabeads mRNA Purification Kit,,polyA,,,RenLab-RNA-Seq-thymus,"SAMN00768235,GSM850914",,,,,,,,,19/09/2024,"RNA samples from tissues and primary cells were extracted from Trizol according to protocol (Invitrogen). polyA+RNA was purified with the Dynabeads mRNA purification kit (Invitrogen). The mRNA libraries were prepared for strand-specific sequencing as described previously in Parkhomchuk, D. et al. (2009).",,,,mouse thymus,,,
1,SRX113076,SRP006787,Illumina HiSeq 2000,SRS283491,UBERON:0000473,testis,,,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM850913,mouse testes,8-week old,perfect match,,,M,C57BL/6,,10090,Dynabeads mRNA Purification Kit,,polyA,,,RenLab-RNA-Seq-testes,"SAMN00768234,GSM850913",,,,,,,,,19/09/2024,"RNA samples from tissues and primary cells were extracted from Trizol according to protocol (Invitrogen). polyA+RNA was purified with the Dynabeads mRNA purification kit (Invitrogen). The mRNA libraries were prepared for strand-specific sequencing as described previously in Parkhomchuk, D. et al. (2009).",,,,mouse testes,,,
2,SRX113075,SRP006787,Illumina HiSeq 2000,SRS283490,UBERON:0001987,placenta,,,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM850912,mouse placenta,"E14.5, since pregnant this is sexually mature adult",perfect match,,,F,C57BL/6,,10090,Dynabeads mRNA Purification Kit,,polyA,,,RenLab-RNA-Seq-placenta,"SAMN00768233,GSM850912",,,,,,,,,19/09/2024,"RNA samples from tissues and primary cells were extracted from Trizol according to protocol (Invitrogen). polyA+RNA was purified with the Dynabeads mRNA purification kit (Invitrogen). The mRNA libraries were prepared for strand-specific sequencing as described previously in Parkhomchuk, D. et al. (2009).",,,,mouse placenta,,,
3,SRX113074,SRP006787,Illumina HiSeq 2000,SRS283489,UBERON:0002264,olfactory bulb,,,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM850911,mouse olfactory bulb,8-week old,perfect match,,,M,C57BL/6,,10090,Dynabeads mRNA Purification Kit,,polyA,,,RenLab-RNA-Seq-olfactory,"SAMN00768232,GSM850911",,,,,,,,,19/09/2024,"RNA samples from tissues and primary cells were extracted from Trizol according to protocol (Invitrogen). polyA+RNA was purified with the Dynabeads mRNA purification kit (Invitrogen). The mRNA libraries were prepared for strand-specific sequencing as described previously in Parkhomchuk, D. et al. (2009).",,,,mouse olfactory,,,
4,SRX113073,SRP006787,Illumina HiSeq 2000,SRS283488,UBERON:0000160,intestine,,,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM850910,mouse intestine,8-week old,perfect match,,,M,C57BL/6,,10090,Dynabeads mRNA Purification Kit,,polyA,,,RenLab-RNA-Seq-intestine,"SAMN00768231,GSM850910",,,,,,,,,19/09/2024,"RNA samples from tissues and primary cells were extracted from Trizol according to protocol (Invitrogen). polyA+RNA was purified with the Dynabeads mRNA purification kit (Invitrogen). The mRNA libraries were prepared for strand-specific sequencing as described previously in Parkhomchuk, D. et al. (2009).",,,,mouse intestine,,,
5,SRX113072,SRP006787,Illumina HiSeq 2000,SRS283487,UBERON:0002107,liver,,,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM850909,Mouse E14.5 liver,E14.5,perfect match,,,,C57BL/6,,10090,Dynabeads mRNA Purification Kit,,polyA,,,RenLab-RNA-Seq-E14.5-liver,"SAMN00768230,GSM850909",,,,,,,,,19/09/2024,"RNA samples from tissues and primary cells were extracted from Trizol according to protocol (Invitrogen). polyA+RNA was purified with the Dynabeads mRNA purification kit (Invitrogen). The mRNA libraries were prepared for strand-specific sequencing as described previously in Parkhomchuk, D. et al. (2009).",,,,Mouse E14.5 liver,,,
6,SRX113071,SRP006787,Illumina HiSeq 2000,SRS283486,UBERON:0002101,limb,,,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM850908,Mouse E14.5 limb,E14.5,perfect match,,,,C57BL/6,,10090,Dynabeads mRNA Purification Kit,,polyA,,,RenLab-RNA-Seq-E14.5-limb,"SAMN00768229,GSM850908",,,,,,,,,19/09/2024,"RNA samples from tissues and primary cells were extracted from Trizol according to protocol (Invitrogen). polyA+RNA was purified with the Dynabeads mRNA purification kit (Invitrogen). The mRNA libraries were prepared for strand-specific sequencing as described previously in Parkhomchuk, D. et al. (2009).",,,,Mouse E14.5 limb,,,
7,SRX113070,SRP006787,Illumina HiSeq 2000,SRS283485,UBERON:0000948,heart,,,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM850907,Mouse E14.5 heart,E14.5,perfect match,,,,C57BL/6,,10090,Dynabeads mRNA Purification Kit,,polyA,,,RenLab-RNA-Seq-E14.5-heart,"SAMN00768228,GSM850907",,,,,,,,,19/09/2024,"RNA samples from tissues and primary cells were extracted from Trizol according to protocol (Invitrogen). polyA+RNA was purified with the Dynabeads mRNA purification kit (Invitrogen). The mRNA libraries were prepared for strand-specific sequencing as described previously in Parkhomchuk, D. et al. (2009).",,,,Mouse E14.5 heart,,,
8,SRX113069,SRP006787,Illumina HiSeq 2000,SRS283484,UBERON:0000955,brain,,,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM850906,Mouse E14.5 brain,E14.5,perfect match,,,,C57BL/6,,10090,Dynabeads mRNA Purification Kit,,polyA,,,RenLab-RNA-Seq-E14.5-brain,"SAMN00768227,GSM850906",,,,,,,,,19/09/2024,"RNA samples from tissues and primary cells were extracted from Trizol according to protocol (Invitrogen). polyA+RNA was purified with the Dynabeads mRNA purification kit (Invitrogen). The mRNA libraries were prepared for strand-specific sequencing as described previously in Parkhomchuk, D. et al. (2009).",,,,Mouse E14.5 brain,,,
9,SRX063005,SRP006787,Illumina Genome Analyzer II,SRS193192,CL:2000042,embryonic fibroblast,,,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM723775,Mouse embryonic fibroblast,E14.5,perfect match,,,M,C57BL/6,,10090,Dynabeads mRNA Purification Kit,,polyA,,,RenLab-RNA-Seq-MEF,"SAMN00618950,GSM723775",,,,,,,,,19/09/2024,"RNA samples from tissues and primary cells were extracted from Trizol according to protocol (Invitrogen). polyA+RNA was purified with the Dynabeads mRNA purification kit (Invitrogen). The mRNA libraries were prepared for strand-specific sequencing as described previously in Parkhomchuk, D. et al. (2009).",,,,Mouse embryonic fibroblast,,,


#### anatomical entity

completed anat entity manually

In [9]:

# partial sampling, full sampling, not documented
library.loc[:,'anatBiologicalStatus'] = 'not documented'

# view
display_df(library)

Unnamed: 0,#libraryId,experimentId,platform,SRSId,anatId,anatName,stageId,stageName,url_GSM,infoOrgan,infoStage,anatAnnotationStatus,anatBiologicalStatus,stageAnnotationStatus,sex,strain,genotype,speciesId,protocol,protocolType,RNASelection,globin_reduction,replicate,lib_name,sampleName,sampleAge_value,sampleAge_unit,PATOid,PATOname,comment,condition,physiologicalStatus,annotatorId,lastModificationDate,library_contruction_protocol,source_qc,lib_name_2,lib_name_3,source_name,individual,infoStage_2,infoStage_3
0,SRX113077,SRP006787,Illumina HiSeq 2000,SRS283492,UBERON:0002370,thymus,,,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM850914,mouse thymus,8-week old,perfect match,not documented,,M,C57BL/6,,10090,Dynabeads mRNA Purification Kit,,polyA,,,RenLab-RNA-Seq-thymus,"SAMN00768235,GSM850914",,,,,,,,,19/09/2024,"RNA samples from tissues and primary cells were extracted from Trizol according to protocol (Invitrogen). polyA+RNA was purified with the Dynabeads mRNA purification kit (Invitrogen). The mRNA libraries were prepared for strand-specific sequencing as described previously in Parkhomchuk, D. et al. (2009).",,,,mouse thymus,,,
1,SRX113076,SRP006787,Illumina HiSeq 2000,SRS283491,UBERON:0000473,testis,,,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM850913,mouse testes,8-week old,perfect match,not documented,,M,C57BL/6,,10090,Dynabeads mRNA Purification Kit,,polyA,,,RenLab-RNA-Seq-testes,"SAMN00768234,GSM850913",,,,,,,,,19/09/2024,"RNA samples from tissues and primary cells were extracted from Trizol according to protocol (Invitrogen). polyA+RNA was purified with the Dynabeads mRNA purification kit (Invitrogen). The mRNA libraries were prepared for strand-specific sequencing as described previously in Parkhomchuk, D. et al. (2009).",,,,mouse testes,,,
2,SRX113075,SRP006787,Illumina HiSeq 2000,SRS283490,UBERON:0001987,placenta,,,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM850912,mouse placenta,"E14.5, since pregnant this is sexually mature adult",perfect match,not documented,,F,C57BL/6,,10090,Dynabeads mRNA Purification Kit,,polyA,,,RenLab-RNA-Seq-placenta,"SAMN00768233,GSM850912",,,,,,,,,19/09/2024,"RNA samples from tissues and primary cells were extracted from Trizol according to protocol (Invitrogen). polyA+RNA was purified with the Dynabeads mRNA purification kit (Invitrogen). The mRNA libraries were prepared for strand-specific sequencing as described previously in Parkhomchuk, D. et al. (2009).",,,,mouse placenta,,,
3,SRX113074,SRP006787,Illumina HiSeq 2000,SRS283489,UBERON:0002264,olfactory bulb,,,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM850911,mouse olfactory bulb,8-week old,perfect match,not documented,,M,C57BL/6,,10090,Dynabeads mRNA Purification Kit,,polyA,,,RenLab-RNA-Seq-olfactory,"SAMN00768232,GSM850911",,,,,,,,,19/09/2024,"RNA samples from tissues and primary cells were extracted from Trizol according to protocol (Invitrogen). polyA+RNA was purified with the Dynabeads mRNA purification kit (Invitrogen). The mRNA libraries were prepared for strand-specific sequencing as described previously in Parkhomchuk, D. et al. (2009).",,,,mouse olfactory,,,
4,SRX113073,SRP006787,Illumina HiSeq 2000,SRS283488,UBERON:0000160,intestine,,,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM850910,mouse intestine,8-week old,perfect match,not documented,,M,C57BL/6,,10090,Dynabeads mRNA Purification Kit,,polyA,,,RenLab-RNA-Seq-intestine,"SAMN00768231,GSM850910",,,,,,,,,19/09/2024,"RNA samples from tissues and primary cells were extracted from Trizol according to protocol (Invitrogen). polyA+RNA was purified with the Dynabeads mRNA purification kit (Invitrogen). The mRNA libraries were prepared for strand-specific sequencing as described previously in Parkhomchuk, D. et al. (2009).",,,,mouse intestine,,,
5,SRX113072,SRP006787,Illumina HiSeq 2000,SRS283487,UBERON:0002107,liver,,,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM850909,Mouse E14.5 liver,E14.5,perfect match,not documented,,,C57BL/6,,10090,Dynabeads mRNA Purification Kit,,polyA,,,RenLab-RNA-Seq-E14.5-liver,"SAMN00768230,GSM850909",,,,,,,,,19/09/2024,"RNA samples from tissues and primary cells were extracted from Trizol according to protocol (Invitrogen). polyA+RNA was purified with the Dynabeads mRNA purification kit (Invitrogen). The mRNA libraries were prepared for strand-specific sequencing as described previously in Parkhomchuk, D. et al. (2009).",,,,Mouse E14.5 liver,,,
6,SRX113071,SRP006787,Illumina HiSeq 2000,SRS283486,UBERON:0002101,limb,,,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM850908,Mouse E14.5 limb,E14.5,perfect match,not documented,,,C57BL/6,,10090,Dynabeads mRNA Purification Kit,,polyA,,,RenLab-RNA-Seq-E14.5-limb,"SAMN00768229,GSM850908",,,,,,,,,19/09/2024,"RNA samples from tissues and primary cells were extracted from Trizol according to protocol (Invitrogen). polyA+RNA was purified with the Dynabeads mRNA purification kit (Invitrogen). The mRNA libraries were prepared for strand-specific sequencing as described previously in Parkhomchuk, D. et al. (2009).",,,,Mouse E14.5 limb,,,
7,SRX113070,SRP006787,Illumina HiSeq 2000,SRS283485,UBERON:0000948,heart,,,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM850907,Mouse E14.5 heart,E14.5,perfect match,not documented,,,C57BL/6,,10090,Dynabeads mRNA Purification Kit,,polyA,,,RenLab-RNA-Seq-E14.5-heart,"SAMN00768228,GSM850907",,,,,,,,,19/09/2024,"RNA samples from tissues and primary cells were extracted from Trizol according to protocol (Invitrogen). polyA+RNA was purified with the Dynabeads mRNA purification kit (Invitrogen). The mRNA libraries were prepared for strand-specific sequencing as described previously in Parkhomchuk, D. et al. (2009).",,,,Mouse E14.5 heart,,,
8,SRX113069,SRP006787,Illumina HiSeq 2000,SRS283484,UBERON:0000955,brain,,,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM850906,Mouse E14.5 brain,E14.5,perfect match,not documented,,,C57BL/6,,10090,Dynabeads mRNA Purification Kit,,polyA,,,RenLab-RNA-Seq-E14.5-brain,"SAMN00768227,GSM850906",,,,,,,,,19/09/2024,"RNA samples from tissues and primary cells were extracted from Trizol according to protocol (Invitrogen). polyA+RNA was purified with the Dynabeads mRNA purification kit (Invitrogen). The mRNA libraries were prepared for strand-specific sequencing as described previously in Parkhomchuk, D. et al. (2009).",,,,Mouse E14.5 brain,,,
9,SRX063005,SRP006787,Illumina Genome Analyzer II,SRS193192,CL:2000042,embryonic fibroblast,,,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM723775,Mouse embryonic fibroblast,E14.5,perfect match,not documented,,M,C57BL/6,,10090,Dynabeads mRNA Purification Kit,,polyA,,,RenLab-RNA-Seq-MEF,"SAMN00618950,GSM723775",,,,,,,,,19/09/2024,"RNA samples from tissues and primary cells were extracted from Trizol according to protocol (Invitrogen). polyA+RNA was purified with the Dynabeads mRNA purification kit (Invitrogen). The mRNA libraries were prepared for strand-specific sequencing as described previously in Parkhomchuk, D. et al. (2009).",,,,Mouse embryonic fibroblast,,,


#### stage
- [species specific developmental ontologies](https://github.com/obophenotype/developmental-stage-ontologies/tree/master/src)

In [16]:
unique_sorted(library, "infoStage")

['8-week old' 'E14.5'
 'E14.5, since pregnant this is sexually mature adult']


In [17]:
# 8-week old
library.loc[library["infoStage"] == "8-week old", "stageId"] = "MmusDv:0000154"
library.loc[library["infoStage"] == "8-week old", "stageName"] = "8-week-old stage"
# perfect match, missing child term, other
library.loc[library["infoStage"] == "8-week old", "stageAnnotationStatus"] = "perfect match"

# E14.5
library.loc[library["infoStage"] == "E14.5", "stageId"] = "MmusDv:0000029"
library.loc[library["infoStage"] == "E14.5", "stageName"] = "Theiler stage 22"
# perfect match, missing child term, other
library.loc[library["infoStage"] == "E14.5", "stageAnnotationStatus"] = "perfect match"

# E14.5, since pregnant this is sexually mature adult
library.loc[library["infoStage"] == "E14.5, since pregnant this is sexually mature adult", "stageId"] = "MmusDv:0000110"
library.loc[library["infoStage"] == "E14.5, since pregnant this is sexually mature adult", "stageName"] = "mature stage"
# perfect match, missing child term, other
library.loc[library["infoStage"] == "E14.5, since pregnant this is sexually mature adult", "stageAnnotationStatus"] = "perfect match"

# view
display_df(library)

Unnamed: 0,#libraryId,experimentId,platform,SRSId,anatId,anatName,stageId,stageName,url_GSM,infoOrgan,infoStage,anatAnnotationStatus,anatBiologicalStatus,stageAnnotationStatus,sex,strain,genotype,speciesId,protocol,protocolType,RNASelection,globin_reduction,replicate,lib_name,sampleName,sampleAge_value,sampleAge_unit,PATOid,PATOname,comment,condition,physiologicalStatus,annotatorId,lastModificationDate,library_contruction_protocol,source_qc,lib_name_2,lib_name_3,source_name,individual,infoStage_2,infoStage_3
0,SRX113077,SRP006787,Illumina HiSeq 2000,SRS283492,UBERON:0002370,thymus,MmusDv:0000154,8-week-old stage,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM850914,mouse thymus,8-week old,perfect match,not documented,perfect match,M,C57BL/6,,10090,Dynabeads mRNA Purification Kit,,polyA,,,RenLab-RNA-Seq-thymus,"SAMN00768235,GSM850914",,,,,,,,,19/09/2024,"RNA samples from tissues and primary cells were extracted from Trizol according to protocol (Invitrogen). polyA+RNA was purified with the Dynabeads mRNA purification kit (Invitrogen). The mRNA libraries were prepared for strand-specific sequencing as described previously in Parkhomchuk, D. et al. (2009).",,,,mouse thymus,,,
1,SRX113076,SRP006787,Illumina HiSeq 2000,SRS283491,UBERON:0000473,testis,MmusDv:0000154,8-week-old stage,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM850913,mouse testes,8-week old,perfect match,not documented,perfect match,M,C57BL/6,,10090,Dynabeads mRNA Purification Kit,,polyA,,,RenLab-RNA-Seq-testes,"SAMN00768234,GSM850913",,,,,,,,,19/09/2024,"RNA samples from tissues and primary cells were extracted from Trizol according to protocol (Invitrogen). polyA+RNA was purified with the Dynabeads mRNA purification kit (Invitrogen). The mRNA libraries were prepared for strand-specific sequencing as described previously in Parkhomchuk, D. et al. (2009).",,,,mouse testes,,,
2,SRX113075,SRP006787,Illumina HiSeq 2000,SRS283490,UBERON:0001987,placenta,MmusDv:0000110,mature stage,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM850912,mouse placenta,"E14.5, since pregnant this is sexually mature adult",perfect match,not documented,perfect match,F,C57BL/6,,10090,Dynabeads mRNA Purification Kit,,polyA,,,RenLab-RNA-Seq-placenta,"SAMN00768233,GSM850912",,,,,,,,,19/09/2024,"RNA samples from tissues and primary cells were extracted from Trizol according to protocol (Invitrogen). polyA+RNA was purified with the Dynabeads mRNA purification kit (Invitrogen). The mRNA libraries were prepared for strand-specific sequencing as described previously in Parkhomchuk, D. et al. (2009).",,,,mouse placenta,,,
3,SRX113074,SRP006787,Illumina HiSeq 2000,SRS283489,UBERON:0002264,olfactory bulb,MmusDv:0000154,8-week-old stage,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM850911,mouse olfactory bulb,8-week old,perfect match,not documented,perfect match,M,C57BL/6,,10090,Dynabeads mRNA Purification Kit,,polyA,,,RenLab-RNA-Seq-olfactory,"SAMN00768232,GSM850911",,,,,,,,,19/09/2024,"RNA samples from tissues and primary cells were extracted from Trizol according to protocol (Invitrogen). polyA+RNA was purified with the Dynabeads mRNA purification kit (Invitrogen). The mRNA libraries were prepared for strand-specific sequencing as described previously in Parkhomchuk, D. et al. (2009).",,,,mouse olfactory,,,
4,SRX113073,SRP006787,Illumina HiSeq 2000,SRS283488,UBERON:0000160,intestine,MmusDv:0000154,8-week-old stage,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM850910,mouse intestine,8-week old,perfect match,not documented,perfect match,M,C57BL/6,,10090,Dynabeads mRNA Purification Kit,,polyA,,,RenLab-RNA-Seq-intestine,"SAMN00768231,GSM850910",,,,,,,,,19/09/2024,"RNA samples from tissues and primary cells were extracted from Trizol according to protocol (Invitrogen). polyA+RNA was purified with the Dynabeads mRNA purification kit (Invitrogen). The mRNA libraries were prepared for strand-specific sequencing as described previously in Parkhomchuk, D. et al. (2009).",,,,mouse intestine,,,
5,SRX113072,SRP006787,Illumina HiSeq 2000,SRS283487,UBERON:0002107,liver,MmusDv:0000029,Theiler stage 22,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM850909,Mouse E14.5 liver,E14.5,perfect match,not documented,perfect match,,C57BL/6,,10090,Dynabeads mRNA Purification Kit,,polyA,,,RenLab-RNA-Seq-E14.5-liver,"SAMN00768230,GSM850909",,,,,,,,,19/09/2024,"RNA samples from tissues and primary cells were extracted from Trizol according to protocol (Invitrogen). polyA+RNA was purified with the Dynabeads mRNA purification kit (Invitrogen). The mRNA libraries were prepared for strand-specific sequencing as described previously in Parkhomchuk, D. et al. (2009).",,,,Mouse E14.5 liver,,,
6,SRX113071,SRP006787,Illumina HiSeq 2000,SRS283486,UBERON:0002101,limb,MmusDv:0000029,Theiler stage 22,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM850908,Mouse E14.5 limb,E14.5,perfect match,not documented,perfect match,,C57BL/6,,10090,Dynabeads mRNA Purification Kit,,polyA,,,RenLab-RNA-Seq-E14.5-limb,"SAMN00768229,GSM850908",,,,,,,,,19/09/2024,"RNA samples from tissues and primary cells were extracted from Trizol according to protocol (Invitrogen). polyA+RNA was purified with the Dynabeads mRNA purification kit (Invitrogen). The mRNA libraries were prepared for strand-specific sequencing as described previously in Parkhomchuk, D. et al. (2009).",,,,Mouse E14.5 limb,,,
7,SRX113070,SRP006787,Illumina HiSeq 2000,SRS283485,UBERON:0000948,heart,MmusDv:0000029,Theiler stage 22,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM850907,Mouse E14.5 heart,E14.5,perfect match,not documented,perfect match,,C57BL/6,,10090,Dynabeads mRNA Purification Kit,,polyA,,,RenLab-RNA-Seq-E14.5-heart,"SAMN00768228,GSM850907",,,,,,,,,19/09/2024,"RNA samples from tissues and primary cells were extracted from Trizol according to protocol (Invitrogen). polyA+RNA was purified with the Dynabeads mRNA purification kit (Invitrogen). The mRNA libraries were prepared for strand-specific sequencing as described previously in Parkhomchuk, D. et al. (2009).",,,,Mouse E14.5 heart,,,
8,SRX113069,SRP006787,Illumina HiSeq 2000,SRS283484,UBERON:0000955,brain,MmusDv:0000029,Theiler stage 22,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM850906,Mouse E14.5 brain,E14.5,perfect match,not documented,perfect match,,C57BL/6,,10090,Dynabeads mRNA Purification Kit,,polyA,,,RenLab-RNA-Seq-E14.5-brain,"SAMN00768227,GSM850906",,,,,,,,,19/09/2024,"RNA samples from tissues and primary cells were extracted from Trizol according to protocol (Invitrogen). polyA+RNA was purified with the Dynabeads mRNA purification kit (Invitrogen). The mRNA libraries were prepared for strand-specific sequencing as described previously in Parkhomchuk, D. et al. (2009).",,,,Mouse E14.5 brain,,,
9,SRX063005,SRP006787,Illumina Genome Analyzer II,SRS193192,CL:2000042,embryonic fibroblast,MmusDv:0000029,Theiler stage 22,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM723775,Mouse embryonic fibroblast,E14.5,perfect match,not documented,perfect match,M,C57BL/6,,10090,Dynabeads mRNA Purification Kit,,polyA,,,RenLab-RNA-Seq-MEF,"SAMN00618950,GSM723775",,,,,,,,,19/09/2024,"RNA samples from tissues and primary cells were extracted from Trizol according to protocol (Invitrogen). polyA+RNA was purified with the Dynabeads mRNA purification kit (Invitrogen). The mRNA libraries were prepared for strand-specific sequencing as described previously in Parkhomchuk, D. et al. (2009).",,,,Mouse embryonic fibroblast,,,


#### sex, strain, genotype, speciesId
- uniprot [strain list](https://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/docs/strains)
- uniprot [species list](https://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/docs/speclist)
- bgee [strain mapping](https://gitlab.sib.swiss/Bgee/expression-annotations/-/tree/develop/Strains?ref_type=heads)

sex is already updates

C57BL/6 is appropriate strain notation 

no genotype listed

In [None]:
#library.loc[library["sex"] == "male", "sex"] = "M"
#library.loc[library["sex"] == "female", "sex"] = "F"

#library.loc[:,'strain'] = ''

#library.loc[:,'genotype'] = ''

#library.loc[:,'speciesId'] = ''

# view
display_df(library)

#### protocol
see [bulk kits](https://gitlab.sib.swiss/Bgee/scRNA-Seq/-/blob/main/scripts/bulk_kits.csv) for some common protocols

this is full length per paper, protocol and RNASelection were already picked up correctly by script

In [18]:
# making these variables because we use them again in the experiment file
# my_protocol = ''

# full_length or 3'
my_protocolType = 'full_length'

#library.loc[:,'protocol'] = my_protocol
library.loc[:,'protocolType'] = my_protocolType
# polyA, ribo-minus, miRNA, lncRNA, circRNA
#library.loc[:,'RNASelection'] = ''

# view
display_df(library)

Unnamed: 0,#libraryId,experimentId,platform,SRSId,anatId,anatName,stageId,stageName,url_GSM,infoOrgan,infoStage,anatAnnotationStatus,anatBiologicalStatus,stageAnnotationStatus,sex,strain,genotype,speciesId,protocol,protocolType,RNASelection,globin_reduction,replicate,lib_name,sampleName,sampleAge_value,sampleAge_unit,PATOid,PATOname,comment,condition,physiologicalStatus,annotatorId,lastModificationDate,library_contruction_protocol,source_qc,lib_name_2,lib_name_3,source_name,individual,infoStage_2,infoStage_3
0,SRX113077,SRP006787,Illumina HiSeq 2000,SRS283492,UBERON:0002370,thymus,MmusDv:0000154,8-week-old stage,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM850914,mouse thymus,8-week old,perfect match,not documented,perfect match,M,C57BL/6,,10090,Dynabeads mRNA Purification Kit,full_length,polyA,,,RenLab-RNA-Seq-thymus,"SAMN00768235,GSM850914",,,,,,,,,19/09/2024,"RNA samples from tissues and primary cells were extracted from Trizol according to protocol (Invitrogen). polyA+RNA was purified with the Dynabeads mRNA purification kit (Invitrogen). The mRNA libraries were prepared for strand-specific sequencing as described previously in Parkhomchuk, D. et al. (2009).",,,,mouse thymus,,,
1,SRX113076,SRP006787,Illumina HiSeq 2000,SRS283491,UBERON:0000473,testis,MmusDv:0000154,8-week-old stage,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM850913,mouse testes,8-week old,perfect match,not documented,perfect match,M,C57BL/6,,10090,Dynabeads mRNA Purification Kit,full_length,polyA,,,RenLab-RNA-Seq-testes,"SAMN00768234,GSM850913",,,,,,,,,19/09/2024,"RNA samples from tissues and primary cells were extracted from Trizol according to protocol (Invitrogen). polyA+RNA was purified with the Dynabeads mRNA purification kit (Invitrogen). The mRNA libraries were prepared for strand-specific sequencing as described previously in Parkhomchuk, D. et al. (2009).",,,,mouse testes,,,
2,SRX113075,SRP006787,Illumina HiSeq 2000,SRS283490,UBERON:0001987,placenta,MmusDv:0000110,mature stage,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM850912,mouse placenta,"E14.5, since pregnant this is sexually mature adult",perfect match,not documented,perfect match,F,C57BL/6,,10090,Dynabeads mRNA Purification Kit,full_length,polyA,,,RenLab-RNA-Seq-placenta,"SAMN00768233,GSM850912",,,,,,,,,19/09/2024,"RNA samples from tissues and primary cells were extracted from Trizol according to protocol (Invitrogen). polyA+RNA was purified with the Dynabeads mRNA purification kit (Invitrogen). The mRNA libraries were prepared for strand-specific sequencing as described previously in Parkhomchuk, D. et al. (2009).",,,,mouse placenta,,,
3,SRX113074,SRP006787,Illumina HiSeq 2000,SRS283489,UBERON:0002264,olfactory bulb,MmusDv:0000154,8-week-old stage,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM850911,mouse olfactory bulb,8-week old,perfect match,not documented,perfect match,M,C57BL/6,,10090,Dynabeads mRNA Purification Kit,full_length,polyA,,,RenLab-RNA-Seq-olfactory,"SAMN00768232,GSM850911",,,,,,,,,19/09/2024,"RNA samples from tissues and primary cells were extracted from Trizol according to protocol (Invitrogen). polyA+RNA was purified with the Dynabeads mRNA purification kit (Invitrogen). The mRNA libraries were prepared for strand-specific sequencing as described previously in Parkhomchuk, D. et al. (2009).",,,,mouse olfactory,,,
4,SRX113073,SRP006787,Illumina HiSeq 2000,SRS283488,UBERON:0000160,intestine,MmusDv:0000154,8-week-old stage,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM850910,mouse intestine,8-week old,perfect match,not documented,perfect match,M,C57BL/6,,10090,Dynabeads mRNA Purification Kit,full_length,polyA,,,RenLab-RNA-Seq-intestine,"SAMN00768231,GSM850910",,,,,,,,,19/09/2024,"RNA samples from tissues and primary cells were extracted from Trizol according to protocol (Invitrogen). polyA+RNA was purified with the Dynabeads mRNA purification kit (Invitrogen). The mRNA libraries were prepared for strand-specific sequencing as described previously in Parkhomchuk, D. et al. (2009).",,,,mouse intestine,,,
5,SRX113072,SRP006787,Illumina HiSeq 2000,SRS283487,UBERON:0002107,liver,MmusDv:0000029,Theiler stage 22,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM850909,Mouse E14.5 liver,E14.5,perfect match,not documented,perfect match,,C57BL/6,,10090,Dynabeads mRNA Purification Kit,full_length,polyA,,,RenLab-RNA-Seq-E14.5-liver,"SAMN00768230,GSM850909",,,,,,,,,19/09/2024,"RNA samples from tissues and primary cells were extracted from Trizol according to protocol (Invitrogen). polyA+RNA was purified with the Dynabeads mRNA purification kit (Invitrogen). The mRNA libraries were prepared for strand-specific sequencing as described previously in Parkhomchuk, D. et al. (2009).",,,,Mouse E14.5 liver,,,
6,SRX113071,SRP006787,Illumina HiSeq 2000,SRS283486,UBERON:0002101,limb,MmusDv:0000029,Theiler stage 22,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM850908,Mouse E14.5 limb,E14.5,perfect match,not documented,perfect match,,C57BL/6,,10090,Dynabeads mRNA Purification Kit,full_length,polyA,,,RenLab-RNA-Seq-E14.5-limb,"SAMN00768229,GSM850908",,,,,,,,,19/09/2024,"RNA samples from tissues and primary cells were extracted from Trizol according to protocol (Invitrogen). polyA+RNA was purified with the Dynabeads mRNA purification kit (Invitrogen). The mRNA libraries were prepared for strand-specific sequencing as described previously in Parkhomchuk, D. et al. (2009).",,,,Mouse E14.5 limb,,,
7,SRX113070,SRP006787,Illumina HiSeq 2000,SRS283485,UBERON:0000948,heart,MmusDv:0000029,Theiler stage 22,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM850907,Mouse E14.5 heart,E14.5,perfect match,not documented,perfect match,,C57BL/6,,10090,Dynabeads mRNA Purification Kit,full_length,polyA,,,RenLab-RNA-Seq-E14.5-heart,"SAMN00768228,GSM850907",,,,,,,,,19/09/2024,"RNA samples from tissues and primary cells were extracted from Trizol according to protocol (Invitrogen). polyA+RNA was purified with the Dynabeads mRNA purification kit (Invitrogen). The mRNA libraries were prepared for strand-specific sequencing as described previously in Parkhomchuk, D. et al. (2009).",,,,Mouse E14.5 heart,,,
8,SRX113069,SRP006787,Illumina HiSeq 2000,SRS283484,UBERON:0000955,brain,MmusDv:0000029,Theiler stage 22,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM850906,Mouse E14.5 brain,E14.5,perfect match,not documented,perfect match,,C57BL/6,,10090,Dynabeads mRNA Purification Kit,full_length,polyA,,,RenLab-RNA-Seq-E14.5-brain,"SAMN00768227,GSM850906",,,,,,,,,19/09/2024,"RNA samples from tissues and primary cells were extracted from Trizol according to protocol (Invitrogen). polyA+RNA was purified with the Dynabeads mRNA purification kit (Invitrogen). The mRNA libraries were prepared for strand-specific sequencing as described previously in Parkhomchuk, D. et al. (2009).",,,,Mouse E14.5 brain,,,
9,SRX063005,SRP006787,Illumina Genome Analyzer II,SRS193192,CL:2000042,embryonic fibroblast,MmusDv:0000029,Theiler stage 22,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM723775,Mouse embryonic fibroblast,E14.5,perfect match,not documented,perfect match,M,C57BL/6,,10090,Dynabeads mRNA Purification Kit,full_length,polyA,,,RenLab-RNA-Seq-MEF,"SAMN00618950,GSM723775",,,,,,,,,19/09/2024,"RNA samples from tissues and primary cells were extracted from Trizol according to protocol (Invitrogen). polyA+RNA was purified with the Dynabeads mRNA purification kit (Invitrogen). The mRNA libraries were prepared for strand-specific sequencing as described previously in Parkhomchuk, D. et al. (2009).",,,,Mouse embryonic fibroblast,,,


#### globin, replicates

In [19]:
# check for duplicate SRSId values
dup_check(library, "SRSId")

no duplicate values in SRSId


In [None]:
#library.loc[:,'globin_reduction'] = 'Y'

# replicates
#library.loc[library["#libraryId"] == "old", "replicate"] = "1"
#library.loc[library["#libraryId"] in ["one", "two"], "replicate"] = "1"

# view
display_df(library)

#### sample age, pato, physiological status

In [None]:
#library.loc[:,'sampleAge_value'] = ''
#library.loc[:,'sampleAge_unit'] = ''

# ex. castrated male
#library.loc[:,'PATOid'] = ''
#library.loc[:,'PATOname'] = ''

# ex. castrated, pregnant, pre-smoltification, post-smoltification, laying eggs
#library.loc[:,'physiologicalStatus'] = ''

# view
display_df(library)

#### condition

In [None]:
# ex. control, diet, light, reproductive capacity, time post mortem, time post feeding, 
# exercise details, menstruation, personality, litter size 
#library.loc[library["condition"] == "old", "condition"] = "new"

# view
display_df(library)

#### annotator id, last modification date

In [22]:
library.loc[:,'annotatorId'] = 'SAC'
library.loc[:,'lastModificationDate'] = '2024-09-19'

# view
display_df(library)

Unnamed: 0,#libraryId,experimentId,platform,SRSId,anatId,anatName,stageId,stageName,url_GSM,infoOrgan,infoStage,anatAnnotationStatus,anatBiologicalStatus,stageAnnotationStatus,sex,strain,genotype,speciesId,protocol,protocolType,RNASelection,globin_reduction,replicate,lib_name,sampleName,sampleAge_value,sampleAge_unit,PATOid,PATOname,comment,condition,physiologicalStatus,annotatorId,lastModificationDate,library_contruction_protocol,source_qc,lib_name_2,lib_name_3,source_name,individual,infoStage_2,infoStage_3
0,SRX113077,SRP006787,Illumina HiSeq 2000,SRS283492,UBERON:0002370,thymus,MmusDv:0000154,8-week-old stage,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM850914,mouse thymus,8-week old,perfect match,not documented,perfect match,M,C57BL/6,,10090,Dynabeads mRNA Purification Kit,full_length,polyA,,,RenLab-RNA-Seq-thymus,"SAMN00768235,GSM850914",,,,,,,,SAC,2024-09-19,"RNA samples from tissues and primary cells were extracted from Trizol according to protocol (Invitrogen). polyA+RNA was purified with the Dynabeads mRNA purification kit (Invitrogen). The mRNA libraries were prepared for strand-specific sequencing as described previously in Parkhomchuk, D. et al. (2009).",,,,mouse thymus,,,
1,SRX113076,SRP006787,Illumina HiSeq 2000,SRS283491,UBERON:0000473,testis,MmusDv:0000154,8-week-old stage,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM850913,mouse testes,8-week old,perfect match,not documented,perfect match,M,C57BL/6,,10090,Dynabeads mRNA Purification Kit,full_length,polyA,,,RenLab-RNA-Seq-testes,"SAMN00768234,GSM850913",,,,,,,,SAC,2024-09-19,"RNA samples from tissues and primary cells were extracted from Trizol according to protocol (Invitrogen). polyA+RNA was purified with the Dynabeads mRNA purification kit (Invitrogen). The mRNA libraries were prepared for strand-specific sequencing as described previously in Parkhomchuk, D. et al. (2009).",,,,mouse testes,,,
2,SRX113075,SRP006787,Illumina HiSeq 2000,SRS283490,UBERON:0001987,placenta,MmusDv:0000110,mature stage,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM850912,mouse placenta,"E14.5, since pregnant this is sexually mature adult",perfect match,not documented,perfect match,F,C57BL/6,,10090,Dynabeads mRNA Purification Kit,full_length,polyA,,,RenLab-RNA-Seq-placenta,"SAMN00768233,GSM850912",,,,,,,,SAC,2024-09-19,"RNA samples from tissues and primary cells were extracted from Trizol according to protocol (Invitrogen). polyA+RNA was purified with the Dynabeads mRNA purification kit (Invitrogen). The mRNA libraries were prepared for strand-specific sequencing as described previously in Parkhomchuk, D. et al. (2009).",,,,mouse placenta,,,
3,SRX113074,SRP006787,Illumina HiSeq 2000,SRS283489,UBERON:0002264,olfactory bulb,MmusDv:0000154,8-week-old stage,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM850911,mouse olfactory bulb,8-week old,perfect match,not documented,perfect match,M,C57BL/6,,10090,Dynabeads mRNA Purification Kit,full_length,polyA,,,RenLab-RNA-Seq-olfactory,"SAMN00768232,GSM850911",,,,,,,,SAC,2024-09-19,"RNA samples from tissues and primary cells were extracted from Trizol according to protocol (Invitrogen). polyA+RNA was purified with the Dynabeads mRNA purification kit (Invitrogen). The mRNA libraries were prepared for strand-specific sequencing as described previously in Parkhomchuk, D. et al. (2009).",,,,mouse olfactory,,,
4,SRX113073,SRP006787,Illumina HiSeq 2000,SRS283488,UBERON:0000160,intestine,MmusDv:0000154,8-week-old stage,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM850910,mouse intestine,8-week old,perfect match,not documented,perfect match,M,C57BL/6,,10090,Dynabeads mRNA Purification Kit,full_length,polyA,,,RenLab-RNA-Seq-intestine,"SAMN00768231,GSM850910",,,,,,,,SAC,2024-09-19,"RNA samples from tissues and primary cells were extracted from Trizol according to protocol (Invitrogen). polyA+RNA was purified with the Dynabeads mRNA purification kit (Invitrogen). The mRNA libraries were prepared for strand-specific sequencing as described previously in Parkhomchuk, D. et al. (2009).",,,,mouse intestine,,,
5,SRX113072,SRP006787,Illumina HiSeq 2000,SRS283487,UBERON:0002107,liver,MmusDv:0000029,Theiler stage 22,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM850909,Mouse E14.5 liver,E14.5,perfect match,not documented,perfect match,,C57BL/6,,10090,Dynabeads mRNA Purification Kit,full_length,polyA,,,RenLab-RNA-Seq-E14.5-liver,"SAMN00768230,GSM850909",,,,,,,,SAC,2024-09-19,"RNA samples from tissues and primary cells were extracted from Trizol according to protocol (Invitrogen). polyA+RNA was purified with the Dynabeads mRNA purification kit (Invitrogen). The mRNA libraries were prepared for strand-specific sequencing as described previously in Parkhomchuk, D. et al. (2009).",,,,Mouse E14.5 liver,,,
6,SRX113071,SRP006787,Illumina HiSeq 2000,SRS283486,UBERON:0002101,limb,MmusDv:0000029,Theiler stage 22,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM850908,Mouse E14.5 limb,E14.5,perfect match,not documented,perfect match,,C57BL/6,,10090,Dynabeads mRNA Purification Kit,full_length,polyA,,,RenLab-RNA-Seq-E14.5-limb,"SAMN00768229,GSM850908",,,,,,,,SAC,2024-09-19,"RNA samples from tissues and primary cells were extracted from Trizol according to protocol (Invitrogen). polyA+RNA was purified with the Dynabeads mRNA purification kit (Invitrogen). The mRNA libraries were prepared for strand-specific sequencing as described previously in Parkhomchuk, D. et al. (2009).",,,,Mouse E14.5 limb,,,
7,SRX113070,SRP006787,Illumina HiSeq 2000,SRS283485,UBERON:0000948,heart,MmusDv:0000029,Theiler stage 22,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM850907,Mouse E14.5 heart,E14.5,perfect match,not documented,perfect match,,C57BL/6,,10090,Dynabeads mRNA Purification Kit,full_length,polyA,,,RenLab-RNA-Seq-E14.5-heart,"SAMN00768228,GSM850907",,,,,,,,SAC,2024-09-19,"RNA samples from tissues and primary cells were extracted from Trizol according to protocol (Invitrogen). polyA+RNA was purified with the Dynabeads mRNA purification kit (Invitrogen). The mRNA libraries were prepared for strand-specific sequencing as described previously in Parkhomchuk, D. et al. (2009).",,,,Mouse E14.5 heart,,,
8,SRX113069,SRP006787,Illumina HiSeq 2000,SRS283484,UBERON:0000955,brain,MmusDv:0000029,Theiler stage 22,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM850906,Mouse E14.5 brain,E14.5,perfect match,not documented,perfect match,,C57BL/6,,10090,Dynabeads mRNA Purification Kit,full_length,polyA,,,RenLab-RNA-Seq-E14.5-brain,"SAMN00768227,GSM850906",,,,,,,,SAC,2024-09-19,"RNA samples from tissues and primary cells were extracted from Trizol according to protocol (Invitrogen). polyA+RNA was purified with the Dynabeads mRNA purification kit (Invitrogen). The mRNA libraries were prepared for strand-specific sequencing as described previously in Parkhomchuk, D. et al. (2009).",,,,Mouse E14.5 brain,,,
9,SRX063005,SRP006787,Illumina Genome Analyzer II,SRS193192,CL:2000042,embryonic fibroblast,MmusDv:0000029,Theiler stage 22,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM723775,Mouse embryonic fibroblast,E14.5,perfect match,not documented,perfect match,M,C57BL/6,,10090,Dynabeads mRNA Purification Kit,full_length,polyA,,,RenLab-RNA-Seq-MEF,"SAMN00618950,GSM723775",,,,,,,,SAC,2024-09-19,"RNA samples from tissues and primary cells were extracted from Trizol according to protocol (Invitrogen). polyA+RNA was purified with the Dynabeads mRNA purification kit (Invitrogen). The mRNA libraries were prepared for strand-specific sequencing as described previously in Parkhomchuk, D. et al. (2009).",,,,Mouse embryonic fibroblast,,,


#### comments

In [None]:
#library.loc[:,'comment'] = ''

#### save complete file with correct columns

In [23]:
library_file_complete = library[['#libraryId', 'experimentId', 'platform', 'SRSId', 'anatId', 'anatName', 'stageId', 'stageName', 'url_GSM', 'infoOrgan', 'infoStage', 'anatAnnotationStatus', 'anatBiologicalStatus', 'stageAnnotationStatus', 'sex', 'strain', 'genotype', 'speciesId', 'protocol', 'protocolType', 'RNASelection', 'globin_reduction', 'replicate', 'lib_name', 'sampleName', 'sampleAge_value', 'sampleAge_unit', 'PATOid', 'PATOname','comment', 'condition', 'physiologicalStatus', 'annotatorId', 'lastModificationDate']]

library_file_complete.to_csv(library_to_add_path, sep="\t", index=False, quoting=csv.QUOTE_ALL)

# view
display_df(library_file_complete)

Unnamed: 0,#libraryId,experimentId,platform,SRSId,anatId,anatName,stageId,stageName,url_GSM,infoOrgan,infoStage,anatAnnotationStatus,anatBiologicalStatus,stageAnnotationStatus,sex,strain,genotype,speciesId,protocol,protocolType,RNASelection,globin_reduction,replicate,lib_name,sampleName,sampleAge_value,sampleAge_unit,PATOid,PATOname,comment,condition,physiologicalStatus,annotatorId,lastModificationDate
0,SRX113077,SRP006787,Illumina HiSeq 2000,SRS283492,UBERON:0002370,thymus,MmusDv:0000154,8-week-old stage,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM850914,mouse thymus,8-week old,perfect match,not documented,perfect match,M,C57BL/6,,10090,Dynabeads mRNA Purification Kit,full_length,polyA,,,RenLab-RNA-Seq-thymus,"SAMN00768235,GSM850914",,,,,,,,SAC,2024-09-19
1,SRX113076,SRP006787,Illumina HiSeq 2000,SRS283491,UBERON:0000473,testis,MmusDv:0000154,8-week-old stage,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM850913,mouse testes,8-week old,perfect match,not documented,perfect match,M,C57BL/6,,10090,Dynabeads mRNA Purification Kit,full_length,polyA,,,RenLab-RNA-Seq-testes,"SAMN00768234,GSM850913",,,,,,,,SAC,2024-09-19
2,SRX113075,SRP006787,Illumina HiSeq 2000,SRS283490,UBERON:0001987,placenta,MmusDv:0000110,mature stage,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM850912,mouse placenta,"E14.5, since pregnant this is sexually mature adult",perfect match,not documented,perfect match,F,C57BL/6,,10090,Dynabeads mRNA Purification Kit,full_length,polyA,,,RenLab-RNA-Seq-placenta,"SAMN00768233,GSM850912",,,,,,,,SAC,2024-09-19
3,SRX113074,SRP006787,Illumina HiSeq 2000,SRS283489,UBERON:0002264,olfactory bulb,MmusDv:0000154,8-week-old stage,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM850911,mouse olfactory bulb,8-week old,perfect match,not documented,perfect match,M,C57BL/6,,10090,Dynabeads mRNA Purification Kit,full_length,polyA,,,RenLab-RNA-Seq-olfactory,"SAMN00768232,GSM850911",,,,,,,,SAC,2024-09-19
4,SRX113073,SRP006787,Illumina HiSeq 2000,SRS283488,UBERON:0000160,intestine,MmusDv:0000154,8-week-old stage,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM850910,mouse intestine,8-week old,perfect match,not documented,perfect match,M,C57BL/6,,10090,Dynabeads mRNA Purification Kit,full_length,polyA,,,RenLab-RNA-Seq-intestine,"SAMN00768231,GSM850910",,,,,,,,SAC,2024-09-19
5,SRX113072,SRP006787,Illumina HiSeq 2000,SRS283487,UBERON:0002107,liver,MmusDv:0000029,Theiler stage 22,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM850909,Mouse E14.5 liver,E14.5,perfect match,not documented,perfect match,,C57BL/6,,10090,Dynabeads mRNA Purification Kit,full_length,polyA,,,RenLab-RNA-Seq-E14.5-liver,"SAMN00768230,GSM850909",,,,,,,,SAC,2024-09-19
6,SRX113071,SRP006787,Illumina HiSeq 2000,SRS283486,UBERON:0002101,limb,MmusDv:0000029,Theiler stage 22,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM850908,Mouse E14.5 limb,E14.5,perfect match,not documented,perfect match,,C57BL/6,,10090,Dynabeads mRNA Purification Kit,full_length,polyA,,,RenLab-RNA-Seq-E14.5-limb,"SAMN00768229,GSM850908",,,,,,,,SAC,2024-09-19
7,SRX113070,SRP006787,Illumina HiSeq 2000,SRS283485,UBERON:0000948,heart,MmusDv:0000029,Theiler stage 22,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM850907,Mouse E14.5 heart,E14.5,perfect match,not documented,perfect match,,C57BL/6,,10090,Dynabeads mRNA Purification Kit,full_length,polyA,,,RenLab-RNA-Seq-E14.5-heart,"SAMN00768228,GSM850907",,,,,,,,SAC,2024-09-19
8,SRX113069,SRP006787,Illumina HiSeq 2000,SRS283484,UBERON:0000955,brain,MmusDv:0000029,Theiler stage 22,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM850906,Mouse E14.5 brain,E14.5,perfect match,not documented,perfect match,,C57BL/6,,10090,Dynabeads mRNA Purification Kit,full_length,polyA,,,RenLab-RNA-Seq-E14.5-brain,"SAMN00768227,GSM850906",,,,,,,,SAC,2024-09-19
9,SRX063005,SRP006787,Illumina Genome Analyzer II,SRS193192,CL:2000042,embryonic fibroblast,MmusDv:0000029,Theiler stage 22,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM723775,Mouse embryonic fibroblast,E14.5,perfect match,not documented,perfect match,M,C57BL/6,,10090,Dynabeads mRNA Purification Kit,full_length,polyA,,,RenLab-RNA-Seq-MEF,"SAMN00618950,GSM723775",,,,,,,,SAC,2024-09-19


### experiment annotations

In [24]:
experiment = pd.read_csv(experiment_path_from_script, sep='\t', index_col=False, keep_default_na=False, na_values=['NULL','null', 'nan','NaN'], dtype=object)
display_df(experiment)

Unnamed: 0,#experimentId,experimentName,experimentDescription,experimentSource,experimentStatus,projectTags,numberOfAnnotatedLibraries,protocol,protocolType,GSE,Bioproject,PMID,reference_url,DOI,xrefs,comment
0,SRP006787,A draft map of cis-regulatory sequences in the mouse genome [RNA-Seq],"As the most widely used mammalian model organism, mice play a critical role in biomedical research for mechanistic study of human development and diseases. Today, functional sequences in the mouse genome are still poorly annotated a decade after its initial sequencing. We report here a map of nearly 300,000 cis-regulatory sequences in the mouse genome, representing active promoters, enhancers and CTCF binding sites in a diverse set of 19 tissues and cell types. This map provides functional annotation to nearly 11% of the genome, and over 70% of conserved, non-coding sequences. We define tissue-specific enhancers and identify potential transcription factors regulating gene expression in each tissue or cell type. Finally, we demonstrate that cis-regulatory sequences are organized into domains of coordinately regulated enhancers and promoters. Our results provide a valuable resource for the annotation of functional elements in the mammalian genome, and study of regulatory mechanisms for tissue-specific gene expression. Overall design: 19 tissues and primary cell types were examined.",SRA,,,,Dynabeads mRNA Purification Kit,,GSE29278,PRJNA142823,22763441,,10.1038/nature11243,,


#### experiment and protocol details

In [25]:
# this will give you the number of rows in the complete library file 
# this should be the number of annotated libraries
ann_lib = len(library_file_complete.index)
len(library_file_complete.index)

18

In [26]:
# partial or total
experiment.loc[:,'experimentStatus'] = 'partial'
#experiment.loc[:,'projectTags'] = '' 
# see above cell, also can add as free text
experiment.loc[:,'numberOfAnnotatedLibraries'] = ann_lib

# these variables should already exist from above but if not can just add as free text
#experiment.loc[:,'protocol'] = my_protocol
experiment.loc[:,'protocolType'] = my_protocolType

display_df(experiment)

Unnamed: 0,#experimentId,experimentName,experimentDescription,experimentSource,experimentStatus,projectTags,numberOfAnnotatedLibraries,protocol,protocolType,GSE,Bioproject,PMID,reference_url,DOI,xrefs,comment
0,SRP006787,A draft map of cis-regulatory sequences in the mouse genome [RNA-Seq],"As the most widely used mammalian model organism, mice play a critical role in biomedical research for mechanistic study of human development and diseases. Today, functional sequences in the mouse genome are still poorly annotated a decade after its initial sequencing. We report here a map of nearly 300,000 cis-regulatory sequences in the mouse genome, representing active promoters, enhancers and CTCF binding sites in a diverse set of 19 tissues and cell types. This map provides functional annotation to nearly 11% of the genome, and over 70% of conserved, non-coding sequences. We define tissue-specific enhancers and identify potential transcription factors regulating gene expression in each tissue or cell type. Finally, we demonstrate that cis-regulatory sequences are organized into domains of coordinately regulated enhancers and promoters. Our results provide a valuable resource for the annotation of functional elements in the mammalian genome, and study of regulatory mechanisms for tissue-specific gene expression. Overall design: 19 tissues and primary cell types were examined.",SRA,partial,,18,Dynabeads mRNA Purification Kit,full_length,GSE29278,PRJNA142823,22763441,,10.1038/nature11243,,


#### paper and xrefs

In [27]:
#experiment.loc[:,'GSE'] = ''
#experiment.loc[:,'Bioproject'] = '' 
#experiment.loc[:,'PMID'] = ''
experiment.loc[:,'reference_url'] = 'https://www.nature.com/articles/nature11243'
#experiment.loc[:,'DOI'] = ''
#experiment.loc[:,'xrefs'] = ''

display_df(experiment)

Unnamed: 0,#experimentId,experimentName,experimentDescription,experimentSource,experimentStatus,projectTags,numberOfAnnotatedLibraries,protocol,protocolType,GSE,Bioproject,PMID,reference_url,DOI,xrefs,comment
0,SRP006787,A draft map of cis-regulatory sequences in the mouse genome [RNA-Seq],"As the most widely used mammalian model organism, mice play a critical role in biomedical research for mechanistic study of human development and diseases. Today, functional sequences in the mouse genome are still poorly annotated a decade after its initial sequencing. We report here a map of nearly 300,000 cis-regulatory sequences in the mouse genome, representing active promoters, enhancers and CTCF binding sites in a diverse set of 19 tissues and cell types. This map provides functional annotation to nearly 11% of the genome, and over 70% of conserved, non-coding sequences. We define tissue-specific enhancers and identify potential transcription factors regulating gene expression in each tissue or cell type. Finally, we demonstrate that cis-regulatory sequences are organized into domains of coordinately regulated enhancers and promoters. Our results provide a valuable resource for the annotation of functional elements in the mammalian genome, and study of regulatory mechanisms for tissue-specific gene expression. Overall design: 19 tissues and primary cell types were examined.",SRA,partial,,18,Dynabeads mRNA Purification Kit,full_length,GSE29278,PRJNA142823,22763441,https://www.nature.com/articles/nature11243,10.1038/nature11243,,


#### comments

In [28]:
experiment.loc[:,'comment'] = 'removed SRX063006 (cell culture)'

display_df(experiment)

Unnamed: 0,#experimentId,experimentName,experimentDescription,experimentSource,experimentStatus,projectTags,numberOfAnnotatedLibraries,protocol,protocolType,GSE,Bioproject,PMID,reference_url,DOI,xrefs,comment
0,SRP006787,A draft map of cis-regulatory sequences in the mouse genome [RNA-Seq],"As the most widely used mammalian model organism, mice play a critical role in biomedical research for mechanistic study of human development and diseases. Today, functional sequences in the mouse genome are still poorly annotated a decade after its initial sequencing. We report here a map of nearly 300,000 cis-regulatory sequences in the mouse genome, representing active promoters, enhancers and CTCF binding sites in a diverse set of 19 tissues and cell types. This map provides functional annotation to nearly 11% of the genome, and over 70% of conserved, non-coding sequences. We define tissue-specific enhancers and identify potential transcription factors regulating gene expression in each tissue or cell type. Finally, we demonstrate that cis-regulatory sequences are organized into domains of coordinately regulated enhancers and promoters. Our results provide a valuable resource for the annotation of functional elements in the mammalian genome, and study of regulatory mechanisms for tissue-specific gene expression. Overall design: 19 tissues and primary cell types were examined.",SRA,partial,,18,Dynabeads mRNA Purification Kit,full_length,GSE29278,PRJNA142823,22763441,https://www.nature.com/articles/nature11243,10.1038/nature11243,,removed SRX063006 (cell culture)


#### save complete file

In [29]:
experiment.to_csv(experiment_to_add_path, sep="\t", index=False, quoting=csv.QUOTE_ALL)

### QA time

In [31]:
library_to_add = pd.read_csv(library_to_add_path, sep='\t', index_col=False, keep_default_na=False, na_values=['NULL','null', 'nan','NaN'], dtype=object)
experiment_to_add = pd.read_csv(experiment_to_add_path, sep='\t', index_col=False, keep_default_na=False, na_values=['NULL','null', 'nan','NaN'], dtype=object)

#### to add things here

#### check columns match

In [32]:
# pull from git and pull in library/experiment file
! git pull
git_library = pd.read_csv(git_library_path, sep='\t', index_col=False, keep_default_na=False, na_values=['NULL','null', 'nan','NaN'], dtype=object)
git_experiment = pd.read_csv(git_experiment_path, sep='\t', index_col=False, keep_default_na=False, na_values=['NULL','null', 'nan','NaN'], dtype=object)

# library file
if set(library_to_add.columns) == set(git_library.columns):
    print('The columns in the library file match')
else:
    print('The columns in the library file DO NOT MATCH')

# experiment file
if set(experiment_to_add.columns) == set(git_experiment.columns):
    print('The columns in the experiment file match')
else:
    print('The columns in the experiment file DO NOT MATCH')


# maybe to make this something more like "COLUMNS GOOD - LIBRARY" and "COLUMNS BAD - EXPERIMENT"

Already up to date.
The columns in the library file match
The columns in the experiment file match


#### view files

In [33]:
library_git_plus_new = pd.concat([git_library, library_to_add], ignore_index = True, sort = False)
library_git_plus_new.tail(n=20)

Unnamed: 0,#libraryId,experimentId,platform,SRSId,anatId,anatName,stageId,stageName,url_GSM,infoOrgan,infoStage,anatAnnotationStatus,anatBiologicalStatus,stageAnnotationStatus,sex,strain,genotype,speciesId,protocol,protocolType,RNASelection,globin_reduction,replicate,lib_name,sampleName,sampleAge_value,sampleAge_unit,PATOid,PATOname,comment,condition,physiologicalStatus,annotatorId,lastModificationDate
38336,SRX14667155,SRP366684,Illumina NovaSeq 6000,SRS12431778,UBERON:0005335,chorioallantoic membrane,GgalDv:0000051,Hamburger Hamilton stage 37,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi...,Chorioallantoic membrane,Prehatch : 11 days of incubation,perfect match,not documented,perfect match,F,Ross 308,,9031,NEBNext Ultra RNA Library Prep Kit,full_length,polyA,,,CAM11F_3,SAMN27103228,,,,,,,,SAC,2024-09-19
38337,SRX14667154,SRP366684,Illumina NovaSeq 6000,SRS12431776,UBERON:0005335,chorioallantoic membrane,GgalDv:0000051,Hamburger Hamilton stage 37,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi...,Chorioallantoic membrane,Prehatch : 11 days of incubation,perfect match,not documented,perfect match,M,Ross 308,,9031,NEBNext Ultra RNA Library Prep Kit,full_length,polyA,,,CAM11M_2,SAMN27103229,,,,,,,,SAC,2024-09-19
38338,SRX113077,SRP006787,Illumina HiSeq 2000,SRS283492,UBERON:0002370,thymus,MmusDv:0000154,8-week-old stage,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?...,mouse thymus,8-week old,perfect match,not documented,perfect match,M,C57BL/6,,10090,Dynabeads mRNA Purification Kit,full_length,polyA,,,RenLab-RNA-Seq-thymus,"SAMN00768235,GSM850914",,,,,,,,SAC,2024-09-19
38339,SRX113076,SRP006787,Illumina HiSeq 2000,SRS283491,UBERON:0000473,testis,MmusDv:0000154,8-week-old stage,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?...,mouse testes,8-week old,perfect match,not documented,perfect match,M,C57BL/6,,10090,Dynabeads mRNA Purification Kit,full_length,polyA,,,RenLab-RNA-Seq-testes,"SAMN00768234,GSM850913",,,,,,,,SAC,2024-09-19
38340,SRX113075,SRP006787,Illumina HiSeq 2000,SRS283490,UBERON:0001987,placenta,MmusDv:0000110,mature stage,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?...,mouse placenta,"E14.5, since pregnant this is sexually mature ...",perfect match,not documented,perfect match,F,C57BL/6,,10090,Dynabeads mRNA Purification Kit,full_length,polyA,,,RenLab-RNA-Seq-placenta,"SAMN00768233,GSM850912",,,,,,,,SAC,2024-09-19
38341,SRX113074,SRP006787,Illumina HiSeq 2000,SRS283489,UBERON:0002264,olfactory bulb,MmusDv:0000154,8-week-old stage,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?...,mouse olfactory bulb,8-week old,perfect match,not documented,perfect match,M,C57BL/6,,10090,Dynabeads mRNA Purification Kit,full_length,polyA,,,RenLab-RNA-Seq-olfactory,"SAMN00768232,GSM850911",,,,,,,,SAC,2024-09-19
38342,SRX113073,SRP006787,Illumina HiSeq 2000,SRS283488,UBERON:0000160,intestine,MmusDv:0000154,8-week-old stage,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?...,mouse intestine,8-week old,perfect match,not documented,perfect match,M,C57BL/6,,10090,Dynabeads mRNA Purification Kit,full_length,polyA,,,RenLab-RNA-Seq-intestine,"SAMN00768231,GSM850910",,,,,,,,SAC,2024-09-19
38343,SRX113072,SRP006787,Illumina HiSeq 2000,SRS283487,UBERON:0002107,liver,MmusDv:0000029,Theiler stage 22,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?...,Mouse E14.5 liver,E14.5,perfect match,not documented,perfect match,,C57BL/6,,10090,Dynabeads mRNA Purification Kit,full_length,polyA,,,RenLab-RNA-Seq-E14.5-liver,"SAMN00768230,GSM850909",,,,,,,,SAC,2024-09-19
38344,SRX113071,SRP006787,Illumina HiSeq 2000,SRS283486,UBERON:0002101,limb,MmusDv:0000029,Theiler stage 22,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?...,Mouse E14.5 limb,E14.5,perfect match,not documented,perfect match,,C57BL/6,,10090,Dynabeads mRNA Purification Kit,full_length,polyA,,,RenLab-RNA-Seq-E14.5-limb,"SAMN00768229,GSM850908",,,,,,,,SAC,2024-09-19
38345,SRX113070,SRP006787,Illumina HiSeq 2000,SRS283485,UBERON:0000948,heart,MmusDv:0000029,Theiler stage 22,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?...,Mouse E14.5 heart,E14.5,perfect match,not documented,perfect match,,C57BL/6,,10090,Dynabeads mRNA Purification Kit,full_length,polyA,,,RenLab-RNA-Seq-E14.5-heart,"SAMN00768228,GSM850907",,,,,,,,SAC,2024-09-19


In [34]:
experiment_git_plus_new = pd.concat([git_experiment, experiment_to_add], ignore_index = True, sort = False)
experiment_git_plus_new.tail(n=5)

Unnamed: 0,#experimentId,experimentName,experimentDescription,experimentSource,experimentStatus,projectTags,numberOfAnnotatedLibraries,protocol,protocolType,GSE,Bioproject,PMID,reference_url,DOI,xrefs,comment
772,SRP474581,Pituitary transcriptome data of Leizhou goats ...,Pituitary transcriptome data of Leizhou goats ...,SRA,total,,7,"Ribo-Zero rRNA Removal Kit (Illumina, Inc.) an...",full_length,,PRJNA1043736,38152654,https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1...,10.3389/fgene.2023.1303031,,
773,SRP329780,Wu'an goat muscle sequencing data (mRNA+lncRNA),Sequencing data of the longissimus dorsi muscl...,SRA,total,,6,Ribo-zeroTM Gole Kit,full_length,,PRJNA749569,36230427,https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9...,10.3390/ani12192683,,no brand protocol named
774,SRP140736,Ovis aries Transcriptome or Gene expression,folate supplentation on the effects of muscle ...,SRA,total,FAANG,18,NEBNext Ultra RNA Library Prep Kit,full_length,,PRJNA450309,31229912,https://www.sciencedirect.com/science/article/...,10.1016/j.jnutbio.2019.05.011,,
775,SRP366684,Transcriptome of the chicken chorioallantoic m...,The aim the the sequencing project was to inve...,SRA,total,,40,NEBNext Ultra RNA Library Prep Kit,full_length,GSE199780,PRJNA821480,36642281,https://www.sciencedirect.com/science/article/...,10.1016/j.ygeno.2023.110564,39263233[PMID],
776,SRP006787,A draft map of cis-regulatory sequences in the...,As the most widely used mammalian model organi...,SRA,partial,,18,Dynabeads mRNA Purification Kit,full_length,GSE29278,PRJNA142823,22763441,https://www.nature.com/articles/nature11243,10.1038/nature11243,,removed SRX063006 (cell culture)


### add annotations to git

In [35]:
! git pull

Already up to date.


In [36]:
library_git_plus_new.to_csv(git_library_path, sep="\t", index=False, quoting=csv.QUOTE_ALL)
experiment_git_plus_new.to_csv(git_experiment_path, sep="\t", index=False, quoting=csv.QUOTE_ALL)
update_format(git_library_path)
update_format(git_experiment_path)

In [37]:
! git status

On branch develop
Your branch is up to date with 'origin/develop'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	[31mmodified:   ../../../RNA_Seq/RNASeqExperiment.tsv[m
	[31mmodified:   ../../../RNA_Seq/RNASeqLibrary.tsv[m

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	[31m./[m

no changes added to commit (use "git add" and/or "git commit -a")


In [38]:
! git add $git_experiment_path $git_library_path

In [39]:
! git commit -m $commit_message_exp

[develop 571bb8b] adding annotated bulk experiment SRP006787
 2 files changed, 19 insertions(+)


In [40]:
! git push

Enumerating objects: 9, done.
Counting objects: 100% (9/9), done.
Delta compression using up to 12 threads
Compressing objects: 100% (5/5), done.
Writing objects: 100% (5/5), 2.70 KiB | 1.35 MiB/s, done.
Total 5 (delta 4), reused 0 (delta 0), pack-reused 0
remote: 
remote: To create a merge request for develop, visit:[K
remote:   https://gitlab.sib.swiss/Bgee/expression-annotations/-/merge_requests/new?merge_request%5Bsource_branch%5D=develop[K
remote: 
To https://gitlab.sib.swiss/Bgee/expression-annotations.git
   bb60669..571bb8b  develop -> develop


### add annotation folder and script to git

In [48]:
! git status

On branch develop
Your branch is up to date with 'origin/develop'.

Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
	[32mmodified:   SRP006787.ipynb[m



In [47]:
! git add $path_to_output

In [49]:
! git commit -m $commit_message_py

[develop 7a20b63] adding annotation files for SRP006787 to notebook folder
 1 file changed, 1394 insertions(+), 25 deletions(-)


In [50]:
! git push

Enumerating objects: 11, done.
Counting objects: 100% (11/11), done.
Delta compression using up to 12 threads
Compressing objects: 100% (5/5), done.
Writing objects: 100% (6/6), 6.69 KiB | 2.23 MiB/s, done.
Total 6 (delta 4), reused 0 (delta 0), pack-reused 0
remote: 
remote: To create a merge request for develop, visit:[K
remote:   https://gitlab.sib.swiss/Bgee/expression-annotations/-/merge_requests/new?merge_request%5Bsource_branch%5D=develop[K
remote: 
To https://gitlab.sib.swiss/Bgee/expression-annotations.git
   24d5a6f..7a20b63  develop -> develop
