# Data Preprocessing

This notebook will demonstrate how to generating anndata h5ad file from the original dataset 

*Dataset Reference*: Jeffrey R. Moffitt et al., *Molecular, spatial, and functional single-cell profiling of the hypothalamic preoptic region*, *Science*, 362, eaau5324 (2018). DOI: 10.1126/science.aau5324

The original data can be downloaded from [here](https://datadryad.org/stash/dataset/doi:10.5061/dryad.8t8s248). We used this annotated section as a reference and aligned cells from sections of naïve mice (ID 5) and pup-exposed mice (ID 36) at bregma 0.16 mm for the analysis. Each sample contains 5,839 and 5,835 cells, respectively.

If you have any questions, please contact Yeojin Kim at ykim3030@gatech.edu.

In [30]:
# Import Required Libraries
import warnings
warnings.filterwarnings("ignore")

import scanpy as sc
import pandas as pd
import anndata as ad
import numpy as np

In [31]:
# Load and Filter the Dataset
data = pd.read_csv("./Tutorial_data/Moffitt_and_Bambah-Mukku_et_al_merfish_all_cells.csv")
selected_data = data[((data['Behavior']== "Naive")| (data['Behavior']== "Aggression to pup")) & (data['Animal_sex'] == "Male") & (data['Bregma']==0.16)]
selected_data.head()

Unnamed: 0,Cell_ID,Animal_ID,Animal_sex,Behavior,Bregma,Centroid_X,Centroid_Y,Cell_class,Neuron_cluster_ID,Ace2,...,Penk,Scg2,Sln,Sst,Tac1,Tac2,Th,Trh,Ucn3,Vgf
216033,9ecbfb96-7809-467c-830e-a82e45c1551f,5,Male,Naive,0.16,-2865.420125,-878.936333,Inhibitory,I-17,0.0,...,0.000348,0.0,1.138553,0.0,0.5152,0.0,0.0,0.0,0.0,0.01492
216034,eb48ded9-fa25-4753-b529-56c4b0148186,5,Male,Naive,0.16,-2860.087775,-807.19821,Inhibitory,I-5,0.0,...,0.067781,0.0,0.0,0.013191,0.002197,0.0,0.091973,0.014562,0.002329,0.0
216035,da0668bb-cb1b-4062-99c3-aa845501c320,5,Male,Naive,0.16,-2853.80021,-827.837887,Inhibitory,I-9,0.0,...,0.618006,0.035836,0.255472,0.0,0.0,0.0,0.011101,0.0,0.0,0.144316
216036,824bc22d-6a0c-4310-95e2-ea8eca9a8be8,5,Male,Naive,0.16,-2852.524429,-845.173251,Inhibitory,I-9,0.0,...,0.033687,0.0,0.910689,0.0,0.082764,0.008978,0.0,0.0,0.020921,0.03638
216037,2e06f938-a9bd-4837-a92a-ed9f7b4d2d0d,5,Male,Naive,0.16,-2848.546867,-863.592643,Ambiguous,,0.0,...,0.012057,0.0,1.232852,0.128085,0.381275,0.0,0.035516,0.0,0.0,0.043004


In [34]:
# Create AnnData Objects for Each Animal
id_list = list(selected_data['Animal_ID'].unique())
print('IDs: ', id_list)
df = []
for id_v in id_list:
    temp = selected_data[(selected_data['Animal_ID'] == id_v)]
    df.append(temp)
    
for i in range(0,len(id_list)):
    temp = df[i].copy()
    adata = ad.AnnData(temp.iloc[:,9:])
    adata.obsm['spatial'] = temp.iloc[:,range(5,7)].values
    adata.obs_names = temp.iloc[:,0].values
    adata.var_names = temp.columns[9:].values
    adata.obs['Animal_sex'] = temp['Animal_sex'].values
    adata.obs['Behavior'] = temp['Behavior'].values
    adata.obs['Bregma']  = temp['Bregma'].values
    adata.obs['Animal_ID'] = temp['Animal_ID'].values
    adata.obs['Cell_class'] = temp['Cell_class'].values
    adata.obs['Neuron_cluster_ID'] = temp['Neuron_cluster_ID'].values
    adata.write(f'Tutorial_data/animal_id_{id_list[i]}.h5ad')
    print(f'Data for animal ID {id_list[i]} has been successfully saved to: Tutorial_data/animal_id_{id_list[i]}.h5ad')


IDs:  [5, 6, 7, 34, 35, 36]
Data for animal ID 5 has been successfully saved to: Tutorial_data/animal_id_5.h5ad
Data for animal ID 6 has been successfully saved to: Tutorial_data/animal_id_6.h5ad
Data for animal ID 7 has been successfully saved to: Tutorial_data/animal_id_7.h5ad
Data for animal ID 34 has been successfully saved to: Tutorial_data/animal_id_34.h5ad
Data for animal ID 35 has been successfully saved to: Tutorial_data/animal_id_35.h5ad
Data for animal ID 36 has been successfully saved to: Tutorial_data/animal_id_36.h5ad
