# Finding location neurons
A location neurons are neurons which are activated when animal is located in particular place. In this notebook few methods of identifying such neurons will be suggested.
## Method 1. Use by-neuron spatial information

<p>Considering all positions where mouse was we can divide arena in parts which will be called as "bins" further. Such a division can be made by $k$-means algorithm.</p>
<p> The spatial information score is the probability of a mouse being in particular place(i.e. bean) multiplied on the mean value of mouse being in a particular bin when neuron is active. In mathematical terms:
    $$
    \displaystyle
    SI=\sum_{i=1}^{n} \theta(a_i-a_{min})\mathbf{P}(p\in B_i)\mathbf{E}(a|p\in B_i)log_2\left(\frac{\mathbf{E}(a|p\in B_i)}{\mathbf{E}(a)}\right)
    $$
</p>
<p> In order to find best neurons we take $n$ neurons with the highest $SI$ score.

In [1]:
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from tqdm import trange
from math import *

In [2]:
atRoot = 'atRoot' in vars()
if not atRoot:
    os.chdir("..")
atRoot = True

In [3]:
DATASET="data/22ht1_normalized.csv" # Dataset path
N_BINS=10 # How many bins to use 

data=pd.read_csv(DATASET)

In [4]:
extract_pos = lambda data: pd.DataFrame({"x": data["x"],"y": data["y"]},columns = ["x", "y"])
unique = lambda data: list(set(data))
def make_bins(data):
    kmeans = KMeans(n_clusters=N_BINS).fit(data)
    return kmeans.labels_

In [5]:
def calc_p_in_bin(labels):
    p_in_bin=dict()
    for b in labels:
        if b not in p_in_bin:
            p_in_bin[b]=1
        else:
            p_in_bin[b]+=1
    for key in p_in_bin:
        p_in_bin[key]/=len(labels)
    return p_in_bin

In [6]:
labels = make_bins(extract_pos(data))
p_in_bin = calc_p_in_bin(labels)
def si_score(activity, labels, activity_threshold=0.3):
    assert len(activity)==len(labels)
    score=0.0
    exp_activity=np.mean(activity)
    for i in range(len(activity)):
        if activity[i]<activity_threshold:
            continue
        pos=labels[i]
        activity_in_pos = np.mean(activity[labels == pos])
        p_prob=p_in_bin[labels[i]]
        if not np.allclose(activity_in_pos,0.0):
            score+=p_prob*activity_in_pos*np.log2(activity_in_pos/exp_activity)
    return score

In [9]:
# Find SI score for all neurons
n_neurons=554
scores=dict()
for i in trange(n_neurons):
    scores[i]=(si_score(data[str(i)+".0"],labels))
scored=[k for k, v in sorted(scores.items(), key=lambda item: item[1])]

  score+=p_prob*activity_in_pos*np.log2(activity_in_pos/exp_activity)
100%|██████████| 554/554 [45:20<00:00,  4.91s/it]


NameError: name 'x' is not defined

## Method 2. Mutual information
