# Legacy Scanpy Tutorial Setup

This file contains the instructions to set up for the [2017 scanpy tutorial](https://scanpy.readthedocs.io/en/stable/tutorials/basics/clustering-2017.html).  This uses the "legacy workflow" and is no longer recommended.

In order to ensure that all packages are compatibel with the legacy tutorial, it is necessary to create
an conda environment with specific versions of some packages.  This requires using both cond and pip.

You can create a suitable environment using the baglab.yml file in the baglabtut directory:

```bash
conda env create -n somename -f baglab.yml
```

## Getting ready to scanpy

Fist, we need to import
some packages and download some data.

In [1]:
#
# first import the packages we need to use
#
import scanpy as sc
import pandas as pd

from pathlib import Path # pathlib is a builtin package

import utils # this is a custom package that enables downloads 

In [2]:
#
# now set the parameters for the scanpy tutorial
#
sc.settings.verbosity = 3  # verbosity: errors (0), warnings (1), info (2), hints (3)
sc.logging.print_header()
sc.settings.set_figure_params(dpi=80, facecolor="white")

scanpy==1.10.0rc2 anndata==0.10.6 umap==0.5.5 numpy==1.26.4 scipy==1.12.0 pandas==2.2.1 scikit-learn==1.4.1.post1 statsmodels==0.14.1 igraph==0.11.4 pynndescent==0.5.11


In [9]:
# set a results file as recommended in the scanpy tutorial
results_file = "write/pbmc3k.h5ad"

### The download problem

Although the tutorial asks us to use the wget program to download the data, this is not available on all computers.
Also, the current version of scanpy requires the datafiles to compressed with gzip after they are downloaded and unpacked.  

To ensure that everyone can do the tutorial regardless of which computer system they're using, we will use the utils.py module
located in the root directory of this project.  It exports some convenience routines that emulate programs 
that are available on Linux machines but would need special installation procedures on Windows.

In [6]:
#
# download and unpack the data
#
utils.bag_wget(
    "http://cf.10xgenomics.com/samples/cell-exp/1.1.0/pbmc3k/pbmc3k_filtered_gene_bc_matrices.tar.gz",
    "data/pbmc.tar.gz",
)
utils.bag_extract("data/pbmc.tar.gz", "data")
utils.bag_gzip("data/filtered_gene_bc_matrices/hg19/barcodes.tsv")
utils.bag_gzip("data/filtered_gene_bc_matrices/hg19/genes.tsv")
utils.bag_gzip("data/filtered_gene_bc_matrices/hg19/matrix.mtx")

Now that the files have been downloaded into the appropriate directory and in the right format, you should be able to follow
the [tutorial](https://scanpy-tutorials.readthedocs.io/en/latest/pbmc3k.html)

adata = sc.read_10x_mtx("data/filtered_gene_bc_matrices/hg19",
                        var_names='gene_symbols',
                        cache=True)
# adata = sc.read_10x_mtx('data')
adata

You should be able to follow the legacy workflow tutorial from this point.