# Getting Start
For this tutorial, we will be imputing a dataset of Melanoma Cells
freely available from
[GSE99330](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE99330).
 
## 1. Download example dataset

There are 8,640 single cells that were sequenced on the Illumina
NextSeq 500. The raw RNA-seq data can be found
[here](https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE99330&format=file&file=GSE99330%5FdropseqHumanDge%2Etxt%2Egz).

The authors also provide single molecule RNA FISH measurements of
26 genes in thousands of melanoma cells to provide an independent
reference dataset to assess the performance, which can be download
[here](https://www.dropbox.com/s/ia9x0iom6dwueix/fishSubset.txt?dl=0).

In [3]:
import numpy as np
import pandas as pd
import h5py

Download melanoma RNA-seq data for imputation

In [18]:
melanoma_rnaseq_url = "C:/Users/yh/Downloads/GSE99330_dropseqHumanDge.txt.gz"
melanoma_rnaseq_pd = pd.read_csv(melanoma_rnaseq_url, sep=" ", compression='gzip', index_col=0, skiprows=1)

In [26]:
melanoma_rnaseq_pd

Unnamed: 0,CTCGCGAGTAGC,CGGAGGCACTCG,GCAAGTCGATAT,GGACAATTTGTA,TGACAATTGACC,TAAGACTTCCCT,GAGGAAGGACTC,GAAACGGACAGA,TCGATTGGAGAA,ATCTAGTCCCCA,...,AGCCCTGACAAC,ACTCTCGATTCC,GGTCAAATAAGA,ACCTCCCCTATA,ACCTCCCCTACC,CCATTTTTTCCT,TAAAGCGTGTAC,GATCAGAAGGTA,AGCGAGACGATG,ATTCTTGTGTAC
A1BG,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
A1BG-AS1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
A1CF,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
A2M,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
A2M-AS1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
A2ML1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
A2ML1-AS1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
A2MP1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
A3GALT2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
A4GALT,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


We download melanoma FISH data for validation

In [34]:
melanoma_fish_url = "https://uca4658518792a42f99d55b964ae.dl.dropboxusercontent.com/cd/0/get/AsRnjPV9Rd-9a-CvL2iDKx-HzRPEzdN-cEEhQduYqvZmKLp2m3vk-WgNBoTtsTQar7E7BYJUHMcudslv4ntb9PkRpF47KOt5lvlf6vzuvsCpGA/file?_download_id=94289233294326464437343861673430667197789856989089551180792566163&_notify_domain=www.dropbox.com&dl=1"
melanoma_fish_pd = pd.read_csv(melanoma_fish_url, sep=" ", index_col=0)

In [35]:
melanoma_fish_pd

Unnamed: 0,EGFR,SOX10,CCNA2,GAPDH,WNT5A,PDGFRB,PDGFC,SERPINE1,NGFR,NRG1,...,FGFR1,JUN,VGF,BABAM1,KDM5A,LMNA,KDM5B,C1S,VCL,TXNRD1
fish1.1,0.0,191.0,2.0,270,0.0,0.0,0.0,21.0,4.0,1.0,...,0.0,0.0,0.0,,,,,,,
fish1.2,1.0,115.0,2.0,241,0.0,0.0,0.0,7.0,8.0,5.0,...,1.0,4.0,0.0,,,,,,,
fish1.3,0.0,86.0,7.0,192,0.0,0.0,0.0,4.0,4.0,0.0,...,0.0,1.0,0.0,,,,,,,
fish1.4,0.0,40.0,4.0,87,0.0,0.0,0.0,0.0,2.0,1.0,...,0.0,0.0,0.0,,,,,,,
fish1.5,1.0,74.0,4.0,149,1.0,0.0,0.0,2.0,1.0,2.0,...,0.0,0.0,0.0,,,,,,,
fish1.6,0.0,97.0,5.0,165,0.0,0.0,0.0,6.0,4.0,1.0,...,0.0,0.0,0.0,,,,,,,
fish1.8,1.0,151.0,10.0,267,0.0,0.0,0.0,16.0,5.0,11.0,...,0.0,1.0,0.0,,,,,,,
fish1.9,0.0,98.0,7.0,199,0.0,0.0,0.0,2.0,1.0,3.0,...,1.0,1.0,0.0,,,,,,,
fish1.10,2.0,97.0,14.0,292,0.0,0.0,0.0,7.0,2.0,2.0,...,0.0,0.0,0.0,,,,,,,
fish1.11,0.0,167.0,15.0,225,0.0,0.0,0.0,12.0,3.0,6.0,...,0.0,0.0,0.0,,,,,,,


`DISC` uses [loom](http://loompy.org/) as its I/O format and we
provide a transform script
[here](https://www.dropbox.com/s/ia9x0iom6dwueix/fishSubset.txt?dl=0).

Reference: 

1. Huang, M. et al. Nature methods 15, 539–542 (2018).