## Projecting the Social Network in 2025

The social network was developed using a depricated deep learning framework version called TensorFlow 1.0. This framework has since been replaced by more modern and easy to use offerings, but since our model depends on it, some unique steps must be taken to accurately create social network projections.

#### Accessing the Duke Compute Cluster
There are multiple ways to access the DCC. First you will need account access which can be given by David. Once you have it, you can log in by using the following command in your terminal

`ssh {netID}@dcc-login.oit.duke.edu`

In the DCC you have 3 primary file locations of interest:

- /hpc/home/{netID} - Permanent storage location for code / results with 10-20GB of storage.
- /work/{netID} - a temporary directory with fast access to data with unlimited storage. Warning, un-changed files will be purged every 90 days. You may need to create this the first time with `mkdir /work/{netID}` the first time you login.
- /datacommons/carlsonlab/{netID} - permanent slow storage that our lab pays for. You may also need to create this with `mkdir /datacommons/carlsonlab/{netID}` the first time you login.

Lastly, in many cases we want to use new coding environments for different projects. To make this easier, the dcc has created a web portal for running and editing code: http://dcc-ondemand-01.oit.duke.edu

Here, you can create a new session using the Jupyter Lab Apptainer Option. The parameters you want to use are:
- Account: Carlsonlab
- Partition: carlsonlab-gpu
- Walltime: 24 hrs (max)
- CPUs: 16 (more can be available)
- Memory: 80GB (max)
- GPUs: 1
- Apptainer Container File: /work/{yourNetID}/{containerName}

You can copy containers from the datacommons to use for this setup. From here, the program will land you in the home directory where you can create jupyter notebooks and run code. I recommend you store all of your data in the /work/ directory since storage in home is limited.

The exact container needed for social network projections is outlined below.

#### Setting Up the Environment

To run Tensorflow 1.0 code, we will need a python environment that can run it. To do this, copy the following container on the duke compute cluster to your work directory by running the following script once you are already in the dcc.

`cp /datacommons/carlsonlab/Containers/social_proj.simg /work/{yourNetID}/`

#### Preparing the Data

All of the data used for the social paper was prepared using the lpne-data-analysis (https://github.com/carlson-lab/lpne-data-analysis) pipeline. Therefore, feature extraction must be done using this code base before any projections can be done. 

To do this, you first will need an excel file named channel_info.xlsx or something similar. Column A is a list of all of the recorded electrode names as they appear in the .mat files you wish to process. Column B is the corresponding area name you would like a particular electrode to be averaged into. For example in column A we may see [BLA_1,BLA_2,CeA_1,CeA_2] and in column B we may see [Amy, Amy, Amy, Amy].

Second, you need all of your lfp .mat files in a Data folder, and all of the CHANS files in a CHANS folder. The repository goes into more detail about setting this up

It is crucial that the column B naming scheme be consistent across all projects!!!!!!!

The naming scheme for the social project is:
['Amy', 'Cg Cx', 'Hipp', 'IL Cx', 'Nac', 'PrL Cx', 'Thal', 'VTA']

Do not mixup any ordering of Cg_Cx for Cx_Cg or anything similar since features will be computed and sorted alphabetically and your projections will not be valid.

Additionally, you must save your features using the "saveFeatures_1.1" version. This is what matches the original work.

From here, you can follow the demos provided in the lpne-data-analysis repository. Or if it is more convenient, here is a matlab script you can adapt to your dataset:

```
chan_info_file = "{Your Channel Info File Location}";
base_dir = "{Location of your Data and CHANS subdirs parent directory}";

secs_per_window = 1;
sample_rate = 1000;

opts.mvgcFolder = './mvgc/';
opts.parCores = 32;
opts.version.power = 'saveFeatures_1.1';
opts.version.coherence = 'saveFeatures_1.1';
opts.version.granger = 'saveFeatures_1.1';
opts.featureList = {'power','coherence','granger'};

savePath = base_dir + "{name of your saved features}.mat";
formatWindows(char(savePath),false,char(base_dir),char(chan_info_file),sample_rate,secs_per_window)
preprocessData(char(savePath))
saveFeatures(char(savePath),opts)
```

### After feature generation

There are no automated scripts for aligning behavioral labels. Most of the time, this is easiest to accomplish using the time labels from the features.

Once you have your features and your labels, AND YOU ARE CERTAIN THE FEATURE VERSION (1.1) AND THE BRAIN REGION AREA NAMES ARE CORRECT, we are ready to project!

In [1]:
import numpy as np
import sklearn.decomposition as dp
import pickle
import sys,os
import numpy.random as rand
from norm_encoded import NMF 
from norm_supervised import sNMF
from sklearn.linear_model import LogisticRegression as LR
from sklearn.metrics import auc,roc_curve
from sklearn.utils.random import sample_without_replacement
import scipy.io
from data_tools import load_data





### Import your Data

Data is imported using the data_tools library. In preprocessing, granger features are exponentiated to make them more linear and then are cut-off at magnitudes greater than 10 as is done in the paper

In [None]:
fileName = "/work/mk423/Social_v_1_2/Social_Chunk_1_v_1_1.mat"

#Actually load in the data using the correct contemporary loader
myList = load_data(fileName,fBounds=(1,56),feature_list=['power','coherence','granger'])

#Gets the power, coherence and exp(granger) features along with truncation
power = myList[0]
coherence = myList[1]
granger = myList[2]
granger = np.exp(granger)
granger[granger>10] = 10

labels = myList[3]
windows = labels['windows']
mouse = windows['mouse']
time = windows['time']
expDate = windows['expDate']

In [None]:
# When stacking the features together, the power features are multiplied by 10. This is done for the feature
#magnitudes to match more closely
X = np.hstack([10*power,coherence,granger])

### Construct the Model

These parameters and initializations were defined by austin in his original projection demo. They retrieve the original model parameters from the paper

In [None]:
nIter=5
dev=3
number_test = str(3906)
number_components = str(9)
trial = str(6)
dirName = './supervised_rep_' + number_test+'_'+number_components+'_'+trial
supStr=3.0
model = sNMF(5,outerIter=nIter,device=dev,dirName=dirName,LR=1e-5,
                percGPU=.35,
                n_blessed=1,mu=supStr)

#Actually load mapping from subdirectory
model.meta = model.dirName + '/' + model.name + '.ckpt.meta'

### Actually Project

In [None]:
#The projection
S_DEP = model.transform(X)

s_social = S_DEP[:,0]
s_unsup = S_DEP[:,1:]

#save relevant variables to a csv
saveDict = {
    "s_social":s_social,
    "time":time,
    "mouse":mouse,
    "expDate":expDate,
}

df = pd.DataFrame.from_dict(saveDict)
#df.to_csv("demo.csv",index=False)