# Make celltype metadata

The purpose of this notebook is to map the id (integer number) of each cluster to a biological name. 


This was done by looking at [Figure 5D](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4481139/figure/F5/) in the paper.


http://scb.sanger.ac.uk/#/base/experiment/1/sample_metadata

[![sample_metadata_o](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4481139/bin/nihms687993f5.jpg)](https://www.ncbi.nlm.nih.gov/core/lw/2.0/html/tileshop_pmc/tileshop_pmc_inline.html?title=Click%20on%20image%20to%20zoom&p=PMC3&id=4481139_nihms687993f5.jpg)

In [3]:
import os
import itertools

import pandas as pd

import common

In [4]:
# Assign notebook and folder names
notebook_name = '02_make_celltype_metadata'
figure_folder = os.path.join(common.FIGURE_FOLDER, notebook_name)
data_folder = os.path.join(common.DATA_FOLDER, notebook_name)
print('Figure folder:', figure_folder)
print('Data folder:', data_folder)

# Make the folders
! mkdir -p $figure_folder
! mkdir -p $data_folder

Figure folder: ../figures/02_make_celltype_metadata
Data folder: ../data/02_make_celltype_metadata


## Cluster IDs (numbers) to the type of cell from the paper

Hardcoded, could have errors

In [5]:
cluster_name_to_ids = {'Horizontal cells': 1, 
                       'Retinal ganglion cells': 2,
                       'Microglia': 39}

In [6]:
pairs = [zip(v, itertools.cycle([k])) if not isinstance(v, int) else [(v, k)] 
     for k, v in cluster_name_to_ids.items()]
pairs = list(itertools.chain(*pairs))
pairs

[(1, 'Horizontal cells'), (2, 'Retinal ganglion cells'), (39, 'Microglia')]

In [7]:
celltypes = [name for i, name in pairs]
ids = ['cluster_' + str(i).zfill(2) for i, name in pairs]

In [8]:
cluster_celltypes = pd.Series(celltypes, index=ids, name='celltype')
cluster_celltypes

cluster_01          Horizontal cells
cluster_02    Retinal ganglion cells
cluster_39                 Microglia
Name: celltype, dtype: object

In [10]:
csv_filename = os.path.join(data_folder, 'cluster_ids_to_celltypes.csv')
csv_filename

'../data/02_make_celltype_metadata/cluster_ids_to_celltypes.csv'

In [11]:
cluster_celltypes.to_csv(csv_filename, index=True, index_label='cluster_id', header=True)
! head $csv

cluster_id,celltype
cluster_01,Horizontal cells
cluster_02,Retinal ganglion cells
cluster_39,Microglia
