# Tutorial Setup

This notebook should be run once, as early as possible.  It will download all Python packages the attendees need, as well as the data.  If they have not downloaded the data before, it will take a couple minutes.  After downloading the data, even if they leave Codespaces it will still be stored in Codespaces, so this only needs to be run once.

After successfully running this notebook, one can move on to `experiment.ipynb`, in which attendees will be able to analyze the dataset with the algorithms talked about in the meeting.

In [1]:
# Install all required packages
%pip install -r ../requirements.txt

Collecting ipywidgets==8.1.2 (from -r ../requirements.txt (line 1))
  Using cached ipywidgets-8.1.2-py3-none-any.whl.metadata (2.4 kB)
Collecting matplotlib==3.8.4 (from -r ../requirements.txt (line 2))
  Using cached matplotlib-3.8.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.8 kB)
Collecting pandas==2.2.2 (from -r ../requirements.txt (line 3))
  Using cached pandas-2.2.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (19 kB)
Collecting tqdm==4.66.4 (from -r ../requirements.txt (line 4))
  Using cached tqdm-4.66.4-py3-none-any.whl.metadata (57 kB)
Collecting scanpy==1.10 (from -r ../requirements.txt (line 5))
  Using cached scanpy-1.10.0-py3-none-any.whl.metadata (8.6 kB)
Collecting GmGM==0.5.6 (from -r ../requirements.txt (line 6))
  Downloading gmgm-0.5.6-py3-none-any.whl.metadata (6.1 kB)
Collecting python-igraph (from -r ../requirements.txt (line 7))
  Downloading python_igraph-0.11.8-py3-none-any.whl.metadata (2.8 kB)
Collecting le

In [2]:
import scanpy as sc
from pathlib import Path



We'll use scanpy's "built-in" datasets for this workshop.  The next code cell should make sure they are all downloaded.

In [None]:
sc.datasets.blobs()

AnnData object with n_obs × n_vars = 640 × 11
    obs: 'blobs'

We'll use this dataset:
    [Spatio-temporal immune zonation of the human kidney](https://www.ebi.ac.uk/gxa/sc/experiments/E-HCAD-10/results/tsne), with [an associated paper](https://pubmed.ncbi.nlm.nih.gov/31604275/).

It has ~30,000 cells, which is about as large as we can reasonably expect people to be able to download during a workshop.  Downloading it will take a few minutes, but once its downloaded it will 

In [3]:
# This establishes the directory that the data will be saved to
sc._settings.ScanpyConfig.datasetdir = Path('../data/')
adata = sc.datasets.ebi_expression_atlas(
    accession='E-HCAD-10',
)
adata

AnnData object with n_obs × n_vars = 31711 × 26385
    obs: 'Sample Characteristic[organism]', 'Sample Characteristic Ontology Term[organism]', 'Sample Characteristic[individual]', 'Sample Characteristic Ontology Term[individual]', 'Sample Characteristic[sex]', 'Sample Characteristic Ontology Term[sex]', 'Sample Characteristic[gestational age]', 'Sample Characteristic Ontology Term[gestational age]', 'Sample Characteristic[developmental stage]', 'Sample Characteristic Ontology Term[developmental stage]', 'Sample Characteristic[organism part]', 'Sample Characteristic Ontology Term[organism part]', 'Sample Characteristic[immunophenotype]', 'Sample Characteristic Ontology Term[immunophenotype]', 'Sample Characteristic[disease]', 'Sample Characteristic Ontology Term[disease]', 'Sample Characteristic[organism status]', 'Sample Characteristic Ontology Term[organism status]', 'Sample Characteristic[cause of death]', 'Sample Characteristic Ontology Term[cause of death]', 'Factor Value[individu