![igvflogo](images/igvf-winter-logo.png)
# How to get access to single-cell AnnData (h5ad) files interrogate and display the calculated UMAPS


In [None]:
!pip install -r requirements.txt

In [None]:

import anndata as ad
import json
import pandas as pd
import requests
import scanpy as sc
from IPython.display import display
from scipy import sparse
from urllib.parse import quote

import boto3
import io
from urllib.parse import urlparse

# Loading the AnnData object
### *The matrix `IGVFFI5345SNRS.h5ad` is from a mouse cerebral cortext PARSE-split-seq scRNA dataset: [IGVFDS4883TFKC](https://data.igvf.org/analysis-sets/IGVFDS4883TFKC/)*
We will start by fetching the File metadata and the s3_uri

In [None]:
file_metadata = requests.get("https://api.data.igvf.org/matrix-files/IGVFFI5345SNRS").json()
uri = file_metadata['s3_uri']
uri

In [None]:
parsed = urlparse(uri)

bucket_name = parsed.netloc
object_key = parsed.path.lstrip("/")

print("Bucket:", bucket_name)
print("Key:", object_key)

### From this point on you will need to be authenticated in AWS 

In [None]:

# Initialize S3 client
s3_client = boto3.client('s3')

# Get the object from S3 - requires creds
response = s3_client.get_object(Bucket=bucket_name, Key=object_key)

# Read the content of the object into a BytesIO stream
data_stream = io.BytesIO(response['Body'].read())


**Load the AnnData object**  (this might take a minute)

In [None]:
adata = sc.read_h5ad(data_stream)
adata

# The "obsm" object represents the Multi-dimensional annotation of observations (usually cell type annotations)
**Confirm at least one set of embeddings is present**

In [None]:
adata.obsm

** Let's View the umap **

In [None]:

e = 'X_umap'
cellpop_field = 'celltype' # this can vary from example to example
sc.set_figure_params(dpi=100, fontsize=8, figsize=(12.0,8.0))
sc.pl.embedding(adata, basis=e, color=cellpop_field, legend_loc='on data')


**The uns is the unstructured annotations; let's take a look**

In [None]:
adata.uns

# Let's look at 'obs' the one dimensional annotatations

In [None]:
adata.obs.info()

In [None]:
adata.obs

# Finally we'll take a look at var, the one-dimensional data frame of features (typically genes or accessible-peaks)


In [None]:
adata.var

 ### *[IGVFDS4883TFKC](https://data.igvf.org/analysis-sets/IGVFDS4883TFKC/) is a human B-Cell 10X Multiomics dataset*
 `IGVFFI9438KOCK.h5ad` is the snRNA-seq matrix and 
 `IGVFFI3541FUQE.h5ad` is the snATAC-seq matrix

We will condense some steps

In [None]:
file_metadata = requests.get("https://api.data.igvf.org/matrix-files/IGVFFI9438KOCK/").json()
uri = file_metadata['s3_uri']
parsed = urlparse(uri)

bucket_name = parsed.netloc
object_key = parsed.path.lstrip("/")

# Get the object from S3 - requires creds
response = s3_client.get_object(Bucket=bucket_name, Key=object_key)

# Read the content of the object into a BytesIO stream
data_stream = io.BytesIO(response['Body'].read())

adata = sc.read_h5ad(data_stream)
adata

# The "obsm" object represents the Multi-dimensional annotation of observations (usually cell type annotations)
**Confirm at least one set of embeddings is present**

In [None]:
adata.obsm

** Let's View the umap **

In [None]:

e = 'X_umap'
cellpop_field = 'cell_type_annotation' # this can vary from example to example
sc.set_figure_params(dpi=100, fontsize=8, figsize=(12.0,8.0))
sc.pl.embedding(adata, basis=e, color=cellpop_field, legend_loc='on data')


In [None]:
file_metadata = requests.get("https://api.data.igvf.org/matrix-files/IGVFFI3541FUQE/").json()
uri = file_metadata['s3_uri']
parsed = urlparse(uri)

bucket_name = parsed.netloc
object_key = parsed.path.lstrip("/")

# Get the object from S3 - requires creds
response = s3_client.get_object(Bucket=bucket_name, Key=object_key)

# Read the content of the object into a BytesIO stream
data_stream = io.BytesIO(response['Body'].read())

adata = sc.read_h5ad(data_stream)
adata

Here we see the 'varm' object (Variable multi-dimensional annotations)

In [None]:
adata.varm