# Baglole Lab Tutorial Setup

This file contains the instructions to set up our tutorial.

## Scanpy

To get ready for the [scanpy tutorial](https://scanpy-tutorials.readthedocs.io/en/latest/pbmc3k.html), we need to import
some packages and download some data.

In [1]:
#
# first import the packages we need to use
#
import scanpy as sc
import pandas as pd
from pathlib import Path
import utils

In [2]:
#
# now set the parameters for the scanpy tutorial
#
sc.settings.verbosity = 3  # verbosity: errors (0), warnings (1), info (2), hints (3)
sc.logging.print_header()
sc.settings.set_figure_params(dpi=80, facecolor="white")

scanpy==1.10.0rc2 anndata==0.10.6 umap==0.5.5 numpy==1.26.4 scipy==1.12.0 pandas==2.2.1 scikit-learn==1.4.1.post1 statsmodels==0.14.1 igraph==0.11.4 pynndescent==0.5.11


In [9]:
# set a results file as recommended in the scanpy tutorial
results_file = "write/pbmc3k.h5ad"

### The download problem

Although the tutorial asks us to use the wget program to download the data, this is not available on all computers.
Also, the current version of scanpy requires the datafiles to compressed with gzip after they are downloaded and unpacked.  

To ensure that everyone can do the tutorial regardless of which computer system they're using, we will use the utils.py module
located in the root directory of this project.  It exports some convenience routines that emulate programs 
that are available on Linux machines but would need special installation procedures on Windows.

In [6]:
#
# download and unpack the data
#
utils.bag_wget(
    "http://cf.10xgenomics.com/samples/cell-exp/1.1.0/pbmc3k/pbmc3k_filtered_gene_bc_matrices.tar.gz",
    "data/pbmc.tar.gz",
)
utils.bag_extract("data/pbmc.tar.gz", "data")
utils.bag_gzip("data/filtered_gene_bc_matrices/hg19/barcodes.tsv")
utils.bag_gzip("data/filtered_gene_bc_matrices/hg19/genes.tsv")
utils.bag_gzip("data/filtered_gene_bc_matrices/hg19/matrix.mtx")

Now that the files have been downloaded into the appropriate directory and in the right format, you should be able to follow
the [tutorial](https://scanpy-tutorials.readthedocs.io/en/latest/pbmc3k.html)

In [8]:
adata = sc.read_10x_mtx("data/filtered_gene_bc_matrices/hg19",
                        var_names='gene_symbols',
                        cache=True)
# adata = sc.read_10x_mtx('data')
adata

... writing an h5ad cache file to speedup reading next time


AnnData object with n_obs × n_vars = 2700 × 32738
    var: 'gene_ids'

## What about Pandas?

First we need to produce files to work with during the tutorial.  We will do this with a series of Python commands.

In [5]:
# first read in the data in comma separated value format
folks = pd.read_csv("data/folks.csv")
folks.head()

Unnamed: 0,Name,Occupation,Sex,Age,Weight,Income
0,Ted,Tinker,Male,33,70,50000
1,Carol,Tailor,Female,27,50,60000
2,Alice,Soldier,Female,48,60,70000
3,Bob,Spy,Male,61,80,55000


In [6]:
# now save the data in tab-separated value format
folks_tsv_path = Path("data/folks.tsv")
if not folks_tsv_path.exists():
    folks.to_csv("data/folks.tsv", sep='\t')

In [2]:
# read the data in Excel format
# make sure to install openpyxl first using `conda install -c conda-forge openpyxl -y`
# This may require you to restart the kernel.
#
folks_xl = pd.read_excel("data/folks.xlsx")
folks_xl.head()

Unnamed: 0,Name,Occupation,Sex,Age,Weight,Income
0,Ted,Tinker,Male,33,70,50000
1,Carol,Tailor,Female,27,50,60000
2,Alice,Soldier,Female,48,60,70000
3,Bob,Spy,Male,61,80,55000


In [8]:
#
# read in the tab-separated value file we created above.
#
folks_tsv = pd.read_csv("data/folks.tsv", sep='\t')
folks_tsv.head()

Unnamed: 0.1,Unnamed: 0,Name,Occupation,Sex,Age,Weight,Income
0,0,Ted,Tinker,Male,33,70,50000
1,1,Carol,Tailor,Female,27,50,60000
2,2,Alice,Soldier,Female,48,60,70000
3,3,Bob,Spy,Male,61,80,55000


In [9]:
#
# get the number of columns
#
folks.columns

Index(['Name', 'Occupation', 'Sex', 'Age', 'Weight', 'Income'], dtype='object')

In [11]:
#
# create a dataframe of occupations
#
df = pd.DataFrame(folks.iloc[:, 1])
df

Unnamed: 0,Occupation
0,Tinker
1,Tailor
2,Soldier
3,Spy
