# 1b_load_thalamus_data_custom

This notebook demonstrates how to use the thalamus_merfish_analysis module to 
load a customized thalamus subsets of the Allen Brain Cell (ABC) Atlas' whole 
mouse brain MERFISH dataset (https://portal.brain-map.org/atlases-and-data/bkp/abc-atlas).

For instructions on how to load the standard thalamus dataset, as well as 
detailed descriptions of the data structure(s) content, please see `1a_load_thalamus_data_standard.ipynb`.

Additional information on the full ABC Atlas dataset can be found at: https://alleninstitute.github.io/abc_atlas_access/intro.html

In [1]:
from thalamus_merfish_analysis.abc_load_thalamus import ThalamusWrapper
get_ipython().run_line_magic('matplotlib', 'inline') 

## Steps to load a custom thalamus AnnData object

`abc.load_standard_thalamus(data_structure='adata')` includes the following loading & preprocessing steps:
1. `abc.load_adata_thalamus()` loads a TH+ZI spatial subset of the ABC Atlas
2. `abc.filter_by_class_thalamus()` filters the cell by their mapped class
3. `abc.filter_by_thalamus_coords()` further filters cell based on the CCF thalamus spatial boundaries

Users can generate a more customized version of the thalamus dataset by calling
these 3 functions individually and adjusting their input parameters as desired.

### 1. `abc.load_adata_thalamus()`

In [2]:
CURRENT_VERSION = '20230830'  # default version set in abc_load.py
abc = ThalamusWrapper(version=CURRENT_VERSION)  # instantiate the ThalamusWrapper, can load older or newer versions of the ABC Atlas

adata_th_zi = abc.load_adata_thalamus(transform='log2cpt', # {'log2cpt', 'log2cpm', 'log2cpv', 'raw'}, default='log2cpt'
                                                           # (cpt: counts per thousand, cpm: per million, cpv: per cell volume)
                                                           # select 'raw' if implementing custom transformations/analyses
                                      
                                      subset_to_TH_ZI=True, # if False, loads full coronal ABC Atlas dataset
                                      
                                      with_metadata=True, # if False, loads just gene expression counts array 
                                      
                                      flip_y=False, # if writing custom plotting code, you may wish to invert y coords
                                                    # ccf_plots.py assumes y coords are not inverted
                                      
                                      drop_unused=True, # set to False to keep some lesser-used metadata columns
                                      
                                      drop_blanks=True  # if False, keeps 'blank' barcodes, which
                                                        # can be used for QC purposes
                                      ) 


  cells_df = cells_df.replace("ZI-unassigned", "ZI")


In [3]:
display(adata_th_zi)

AnnData object with n_obs × n_vars = 152248 × 500
    obs: 'brain_section_label', 'average_correlation_score', 'class', 'cluster', 'cluster_alias', 'left_hemisphere', 'neurotransmitter', 'parcellation_division', 'parcellation_index', 'parcellation_structure', 'parcellation_substructure', 'subclass', 'supertype', 'x_ccf', 'x_reconstructed', 'x_section', 'y_ccf', 'y_reconstructed', 'y_section', 'z_ccf', 'z_reconstructed', 'z_section'
    var: 'gene_symbol', 'transcript_identifier'
    uns: 'accessed_on', 'src', 'counts_transform'

### 2. `abc.filter_by_class_thalamus()`

In [4]:
# filter_by_class_thalamus() parameters found in load_standard_thalamus():
adata_th_zi_neurons = abc.filter_by_class_thalamus(adata_th_zi,
                                                include=abc.TH_ZI_CLASSES + abc.MB_CLASSES
                                                # exclude=abc.NN_CLASSES 
                                                # (replacing the include filter with an explicit exclude will filter out ONLY non-neuronal cells)
                                                )

Classes present in input data: ['01 IT-ET Glut', '03 OB-CR Glut', '05 OB-IMN GABA', '06 CTX-CGE GABA', '07 CTX-MGE GABA', '08 CNU-MGE GABA', '09 CNU-LGE GABA', '10 LSX GABA', '11 CNU-HYa GABA', '12 HY GABA', '13 CNU-HYa Glut', '14 HY Glut', '17 MH-LH Glut', '18 TH Glut', '19 MB Glut', '20 MB GABA', '21 MB Dopa', '23 P Glut', '24 MY Glut', '26 P GABA', '27 MY GABA', '28 CB GABA', '30 Astro-Epen', '31 OPC-Oligo', '33 Vascular', '34 Immune']
Classes present in output data: ['12 HY GABA', '17 MH-LH Glut', '18 TH Glut', '19 MB Glut', '20 MB GABA']
Classes filtered out of input data: ['01 IT-ET Glut', '03 OB-CR Glut', '05 OB-IMN GABA', '06 CTX-CGE GABA', '07 CTX-MGE GABA', '08 CNU-MGE GABA', '09 CNU-LGE GABA', '10 LSX GABA', '11 CNU-HYa GABA', '13 CNU-HYa Glut', '14 HY Glut', '21 MB Dopa', '23 P Glut', '24 MY Glut', '26 P GABA', '27 MY GABA', '28 CB GABA', '30 Astro-Epen', '31 OPC-Oligo', '33 Vascular', '34 Immune']


In [5]:
display(adata_th_zi_neurons)

View of AnnData object with n_obs × n_vars = 80170 × 500
    obs: 'brain_section_label', 'average_correlation_score', 'class', 'cluster', 'cluster_alias', 'left_hemisphere', 'neurotransmitter', 'parcellation_division', 'parcellation_index', 'parcellation_structure', 'parcellation_substructure', 'subclass', 'supertype', 'x_ccf', 'x_reconstructed', 'x_section', 'y_ccf', 'y_reconstructed', 'y_section', 'z_ccf', 'z_reconstructed', 'z_section'
    var: 'gene_symbol', 'transcript_identifier'
    uns: 'accessed_on', 'src', 'counts_transform'

### 3. `abc.filter_by_thalamus_coords()`

In [6]:
adata_th_zi_neurons.obs.columns

Index(['brain_section_label', 'average_correlation_score', 'class', 'cluster',
       'cluster_alias', 'left_hemisphere', 'neurotransmitter',
       'parcellation_division', 'parcellation_index', 'parcellation_structure',
       'parcellation_substructure', 'subclass', 'supertype', 'x_ccf',
       'x_reconstructed', 'x_section', 'y_ccf', 'y_reconstructed', 'y_section',
       'z_ccf', 'z_reconstructed', 'z_section'],
      dtype='object')

In [7]:
# filter_by_thalamus_coords() parameters found in load_standard_thalamus():
adata_th_zi_neurons = abc.filter_by_thalamus_coords(adata_th_zi_neurons, 
                                                    buffer=0,  # if >0px, sets dilation radius of thalamus mask
                                                               # in pixels (1px = 10um)
                                                    # realigned=False
                                                    )

In [8]:
display(adata_th_zi_neurons)

View of AnnData object with n_obs × n_vars = 80170 × 500
    obs: 'brain_section_label', 'average_correlation_score', 'class', 'cluster', 'cluster_alias', 'left_hemisphere', 'neurotransmitter', 'parcellation_division', 'parcellation_index', 'parcellation_structure', 'parcellation_substructure', 'subclass', 'supertype', 'x_ccf', 'x_reconstructed', 'x_section', 'y_ccf', 'y_reconstructed', 'y_section', 'z_ccf', 'z_reconstructed', 'z_section', 'region_mask'
    var: 'gene_symbol', 'transcript_identifier'
    uns: 'accessed_on', 'src', 'counts_transform'