This notebook run through a basic tutorial for loading scRNA-seq data using Scanpy. I utilize a single-cell mouse tissue dataset accessible at [GSE4230077](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM4230077).



# Set-Up and Installation

In [None]:
#Upload your data to google drive
#Mount google drive in order to access the data
from google.colab import drive
drive.mount("/content/drive")

Mounted at /content/drive


In [None]:
#Install scanpy
!pip install --quiet scanpy

In [None]:
#Import packages
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import scipy as sp
import numpy as np
import scanpy as sc

# Loading the Data

In [None]:
#Load the data to create an anndata object
adata_gsm4230077 = sc.read_10x_mtx("/content/drive/MyDrive/scrna_tutorials/gsm4230077", var_names="gene_ids",cache=True, gex_only=True, prefix="gsm4230077_")

In [None]:
#Let's look at how many cell and gene counts are contained in the dataset
adata_gsm4230077

AnnData object with n_obs × n_vars = 5517 × 27998
    var: 'gene_symbols'

The AnnData object contains 5517 cells and 27998 genes.

In [None]:
#Here,we can observe the cellular attributes of the data
adata_gsm4230077.obs

AAACCTGAGTCGCCGT-1
AAACCTGAGTTAACGA-1
AAACCTGCAACGATGG-1
AAACCTGCAAGTCATC-1
AAACCTGCAATAGCGG-1
...
TTTGTCAGTTTGACTG-1
TTTGTCATCACCGTAA-1
TTTGTCATCCCAGGTG-1
TTTGTCATCCGTAGTA-1
TTTGTCATCCTTTCGG-1


In [None]:
#Here,we can observe the genetic attributes of the data
adata_gsm4230077.var

Unnamed: 0,gene_symbols
ENSMUSG00000051951,Xkr4
ENSMUSG00000089699,Gm1992
ENSMUSG00000102343,Gm37381
ENSMUSG00000025900,Rp1
ENSMUSG00000109048,Rp1
...,...
ENSMUSG00000079808,AC168977.1
ENSMUSG00000095041,PISD
ENSMUSG00000063897,DHRSX
ENSMUSG00000096730,Vmn2r122
