# C4 to C6 - Single cell RNA-seq studies
Single-cell RNA sequencing (scRNA-seq) is arguably the most dramatically growing technology in both scale and use today. 
A curated database of scRNA-seq studies is available at https://www.nxn.se/single-cell-studies. 
Answer the following questions using the snapshot of data from https://github.com/NGSchoolEU/ngs22_registration_form/blob/1cc647a3733e2c8a21b47aa497b4ca8c42457aa8/data/single-cell-studies.tsv

In [2]:
# importing re library
import re
import pandas as pd
import numpy as np

In [3]:
data = pd.read_csv('single-cell-studies.tsv',sep="\t")


In [4]:
data.head()

Unnamed: 0,Shorthand,DOI,Authors,Journal,Title,Date,bioRxiv DOI,Reported cells total,Organism,Tissue,...,Number of reported cell types or clusters,Cell clustering,Pseudotime,RNA Velocity,PCA,tSNE,H5AD location,Isolation,BC --> Cell ID _OR_ BC --> Cluster ID,Number individuals
0,Cauli et al PNAS,10.1073/pnas.97.11.6144,"B. Cauli, J. T. Porter, K. Tsuzuki, B. Lambole...",Proceedings of the National Academy of Sciences,Classification of fusiform neocortical interne...,20020726,-,85,Rat,Brain,...,3.0,Yes,No,No,Yes,No,,Patch-clamp,,
1,Malnic et al Cell,10.1016/S0092-8674(00)80581-4,"Bettina Malnic, Junzo Hirono, Takaaki Sato, Li...",Cell,Combinatorial Receptor Codes for Odors,20040410,-,18,Mouse,Brain,...,,,,,,,,,,
2,Tietjen et al Neuron,10.1016/S0896-6273(03)00229-0,"Ian Tietjen, Jason M. Rihel, Yanxiang Cao, Geo...",Neuron,Single-Cell Transcriptional Analysis of Neuron...,20040415,-,37,"Human, Mouse",Brain,...,6.0,,,,,,,"Manual, LCM",,
3,Gallopin et al CCortex,10.1093/cercor/bhj081,"Thierry Gallopin, Hélène Geoffroy, Jean Rossie...",Cerebral Cortex,"Cortical Sources of CRF, NKB, and CCK and Thei...",20051208,-,157,Rat,Brain,...,4.0,Yes,No,No,Yes,No,,Patch-clamp,,
4,Kurimoto et al NAR,10.1093/nar/gkl050,K. Kurimoto,Nucleic Acids Research,An improved single-cell cDNA amplification met...,20060330,-,20,Mouse,ICM,...,2.0,,,,,,,,,


In [5]:
print(data.dtypes)

Shorthand                                     object
DOI                                           object
Authors                                       object
Journal                                       object
Title                                         object
Date                                           int64
bioRxiv DOI                                   object
Reported cells total                          object
Organism                                      object
Tissue                                        object
Technique                                     object
Data location                                 object
Panel size                                    object
Measurement                                   object
Cell source                                   object
Disease                                       object
Contrasts                                     object
Developmental stage                           object
Number of reported cell types or clusters    f

In [6]:
# determines the number of items in the datasets
print(data.shape) 

(1593, 28)


#### C4 - How many studies report data from more than one organism?

In [7]:
###Identifies the organisms that are more than one
## code to answer question C4 
df = data[data['Organism']== data['Organism'].str.extract(r'(.*,.*)', expand=False).str.strip()]
count=df['Organism'].value_counts()
print(count)
print(df.shape)

Human, Mouse                                                                     106
Mouse, Human                                                                       9
Human, Macaque                                                                     2
Cynomolgus macaque, Human, Mouse, Pig, Rhesus macaque                              2
Human, Zebrafish, Chicken, Marmoset, Sheep, Hamster, Mouse, Rat, Mole rat          1
Human, Marmoset, Mouse                                                             1
Human, Mouse, Macaque                                                              1
Mouse, Rat                                                                         1
Drosophila, Human                                                                  1
Human, Rabbit                                                                      1
Human, Macaque, Mouse                                                              1
Bonobo, Chimpanzee, Human, Macaque                               

#### C5 - After excluding the studies that report data from more than one organism, for which organism there are the most published studies, and how many?


In [8]:
#code to answer to question C5
#Identifies the number of organisms 
df2 = data[data['Organism']!= data['Organism'].str.extract(r'(.*,.*)', expand=False).str.strip()]
count = df2['Organism'].value_counts()
print(count)

Mouse                       632
Human                       584
Zebrafish                    35
Drosophila                   15
Rat                          12
C elegans                     8
Chicken                       5
Arabidopsis thaliana          4
Arabidopsis                   4
Plasmodium                    3
Yeast                         3
Axolotl                       3
Trypanosoma brucei            2
Crab-eating macaque           2
Ciona                         2
Sea urchin                    2
Xenopus                       2
Macaque                       2
Pig                           2
Schmidtea mediterranea        2
Planarian                     2
Dictyostelium                 2
Cynamolgus monkey             1
Horse                         1
Chlamydomonas                 1
Bacillus subtilis             1
Stony coral                   1
Plasmodium falciparum         1
Rabbit                        1
Branchiostoma japonicum       1
Marmoset                      1
Rice    

#### C6 - After excluding the studies that report data from more than one organism, for which organism there are most reported cells, and how many?

In [9]:
#code to answer question 6
#number of cells in the organisms
count2 = df2[['Organism', 'Cell source']].value_counts()
print(count2)

Organism   Cell source                                   
Human      PBMCs                                             25
Mouse      Microglia                                          9
Human      Microglia                                          8
Mouse      mESCs                                              8
Human      Glioblastoma                                       8
                                                             ..
           Small cell lung cancer                             1
           Sorted from PBMCs: DCs, monocytes, progenitors     1
           Spermatogonial stem cells                          1
           Subpallium (ganglionic eminances)                  1
Zebrafish  cd41 cells from kidney and heart                   1
Length: 627, dtype: int64
