# Connectome Pipeline

Hi and welcome to the Connectome Pipeline!

## 1. Preprocessing

In the first step, you will preprocess the CONN Matlab files to a analysis ready dataset.

Here is an overview on the parameters for the preprocessing pipeline. Parameters marked with a (*) are optional.


+    *matlab_dir*: path to matlab files
+    *excel_path*: path to excel list
+    *save_file*: If false return as pd dataframe
+    *preprocessing_type*: conn for connectivity matrix, "aggregation" for aggregated conn matrix, "graph" for graph matrices
+    *write_dir**: path where to write the dataset to if save_file = True
+    *network**: Yeo7 or Yeo17 network (only applicable if preprocessing_type = aggregation)
+    *statistic**: Summary statistic to be applied (only applicable if preprocessing_type = aggregation)
+    *upper**: boolean whether only upper diagonal elements of connecivity matrices should be used
+    *split_size**: the size of the train dataset (default .8)
+    *seed**: pass an int for reproducibility purposes (default 42)
+    *file_format**: Pass "h5" for further modelling in python or "csv" for R (default "csv")

In [2]:
from src.preprocessing.preprocessing_matlab_files import transform_mat_write_to_hdf

In [16]:
matlab_dir = "C:/Users/likai/Desktop/My Life/Master/3. Semester/Innolabs/Connectome Git/data/Matlab/" # Enter the directory for the matlab files
excel_path = "C:/Users/likai/Desktop/My Life/Master/3. Semester/Innolabs/Connectome Git/data/DELCODE_dataset_910.xlsx" # Enter the directory for the corresponding excel sheet
write_dir = "C:/Users/likai/Desktop/My Life/Master/3. Semester/Innolabs/Connectome Git/data/" # ...
save_file = False # rename to export file
preprocessing_type = 'conn' 

In [18]:
df = transform_mat_write_to_hdf(matlab_dir = matlab_dir, excel_path = excel_path, write_dir = write_dir,
                           save_file = save_file, preprocessing_type = 'conn')

loading files
Starting Preprocessing
Creating Final Dataset
Done!


In [20]:
df.head()

## 2. Modelling

In the second step, you can decide between running the new input files on a pretrained model or train a new model

### 2.1  Data preparation
Preparation of the data for modelling. Creates the target variable, drops unnecessary columns, performs a train/test split. \\
The user has to specify:
- *classification*: is it a classification task (True) or a regression task (False)
- *columns_drop*: which variables shoulnd't be used for modelling
- *target*: what is the name of the target variable
- *y_0, y_1* (only relevant for classification task): which values of the target variable are 0, which are 1
- *train_size*: size of the training data
- *seed*: a seed to ensure reproducibility of train/test split

In [None]:
from src.models import ...
from sklearn.preprocessing import StandardScaler()

### 2.2 Run Model or get pretrained model

In [None]:
from src.models import gradient_boosting, pipeline_elastic_net

## 3. Model Evaluation

In this step you can now evaluate the Model on a set of prespecified metrics.

+ For Classification: Accuracy, Precision, Recall, F1 and AUC
+ For Regression: MSE, MAE and R2

Checkout https://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics for details

In [25]:
from src.models.evaluation import model_evaluation

In [None]:
model = model
X_test = ...
y_test = ...

In [None]:
model_evaluation(model, X_test, y_test)

## 4. Feature Visualization and Interpretation

In the final step you can choose between several feature visualization and interpretation techniques.

In [26]:
from src.visualization.viz_utils import ordered_regions

In [None]:
model , X, y