# Neighbor Optimization MRE

This Jupyter notebook (05.02) details the analysis steps to optimize the nearest neighbor analysis.

**THIS ANALYSIS STEP DOES NOT OUTPUT AN H5AD FILE.**

## Initialize Environment
First import all the necessary packages here:

In [None]:
# Import necessary packages
import os
import scanpy as sc
import numpy as np
import pandas as pd
import anndata as ad
from datetime import datetime as dt

# Scanpy settings
sc.settings.verbosity = 3
sc.logging.print_header()

# Set working directory
os.chdir("/home/dalbao/2023-012-Runx3mutD8scRNA/AlbaoRunx3Manuscript/single_cell/02_optimization")

adata = "01_25-12-05-19-13_preprocessing_MRE.h5ad"

Identify the starting directory. Get a timestamp for the run.

In [None]:
# Determine work location
print("The work location for this notebook is: " + os.getcwd() + "\n")

# Get a timestamp for the start of the run
timestamp = dt.now()
print("This notebook was last run on " + timestamp.strftime("%y-%m-%d %H:%M") + "\n")

The work directory is structured to contain a folder named "h5ad" which itself contains output from previous analysis steps. Import data from the latest analysis:

In [None]:
# List items in outs folder
adata = ad.read_h5ad("../../h5ad/" + adata)

# Inspect AnnData
print(adata)

## Optimize Nearest Neighbors

Write a function to automate iteration of nearest neighbor analysis:

In [None]:
# Define a function to automate the iteration of different neighbor values, holding PC constant.
def iterate_neighbors(adata_local, pcs = 25, neighbors = 10, min = 0, max = 11):
    # Loop over different nearest neighbor values based on provided min and max values
    for index in range(min, max):
        # Calculate nearest neighbors with n_neighbors of neighbors + index
        adata_mod = sc.pp.neighbors(adata_local, n_neighbors = neighbors + index, n_pcs = pcs, copy = True)
        
        # Do leiden clustering from neighborhood graph
        sc.tl.leiden(adata_mod)
        
        # Initialize the PAGA
        sc.tl.paga(adata_mod)
        sc.pl.paga(adata_mod, title = "neighbor " + str(neighbors + index) + " pcs " + str(pcs))

        # Do a UMAP based on initialized PAGA graph
        sc.tl.umap(adata_mod, init_pos='paga')

        # Plot UMAP annotated by leiden clusters and original identity
        sc.pl.umap(adata_mod, color = "group", title = "neighbor " + str(neighbors + index) + " pcs " + str(pcs))
        sc.pl.umap(adata_mod, color = "leiden", title = "neighbor " + str(neighbors + index) + " pcs " + str(pcs))

        # Get leiden clusters and original identities
        obs_df = sc.get.obs_df(adata_mod, ["group", "leiden"])
        # Print tapply
        print(obs_df.groupby("group").value_counts().unstack())

        del(adata_mod) # Cleanup
        
    # End of loop
    
    del(adata_local)
            
# End of function

## 25 PCs, neigbhor 3 to 18

In [None]:
iterate_neighbors(adata, pcs = 25, neighbors = 3, min = 0, max = 18)

## 30 PCs, neigbhor 3 to 18

In [None]:
iterate_neighbors(adata, pcs = 30, neighbors = 3, min = 0, max = 18)

## 35 PCs, neigbhor 3 to 18

In [None]:
iterate_neighbors(adata, pcs = 35, neighbors = 3, min = 0, max = 18)

## 40 PCs, neigbhor 3 to 18

In [None]:
iterate_neighbors(adata, pcs = 40, neighbors = 3, min = 0, max = 18)

In [None]:
# End of Notebook
print("\nNotebook Ends")