# Installing Net2Brain and Relevant Dependencies



In [None]:
# !pip install -U git+https://github.com/cvai-roig-lab/Net2Brain



---



---


## Restart Runtime



---



---



# Net2Brain

<img src="data/Net2Brain_Logo.png" width="25%" />

__Net2Brain__ allows you to use one of over 600 Deep Neural Networks (DNNs) for your experiments comparing human brain activity with the activations of artificial neural networks. The DNNs in __Net2Brain__ are obtained from what we call different _netsets_, which are libraries that provide different pretrained models. 

__Net2Brain__ provides access to the following _netsets_:
- [Standard torchvision](https://pytorch.org/vision/stable/models.html) (`Pytorch`).
This netset is a collection of the torchvision models including models for image classification, pixelwise semantic segmentation, object detection, instance segmentation, person keypoint detection, video classification, and optical flow.
- [Timm](https://github.com/rwightman/pytorch-image-models#models) (`Timm`). 
A deep-learning library created by Ross Wightman that contains a collection of state-of-the-art computer vision models.
- [PyTorch Hub](https://pytorch.org/docs/stable/hub.html) (`Torchhub`). 
These models are accessible through the torch.hub API and are trained for different visual tasks. They are not included in the torchvision module.
- [PyTorch Video](https://pytorch.org/docs/stable/hub.html) (`Pyvideo`). 
Offers models for video analysis, including action recognition and motion classification.
- [Unet](https://pytorch.org/hub/mateuszbuda_brain-segmentation-pytorch_unet/) (`Unet`). 
Unet also is available through the torch.hub.API and is trained for abnormality segmentation in brain MRI.
- [Taskonomy](https://github.com/StanfordVL/taskonomy) (`Taskonomy`). A set of networks trained for different visual tasks, like Keypoint-Detection, Depth-Estimation, Reshading, etc. The initial idea for these networks was to find relationships between different visual tasks.
- [Slowfast](https://github.com/facebookresearch/pytorchvideo) (`Pyvideo`). 
These models are state-of-the-art video classification models trained on the Kinetics 400 dataset, acessible through the torch.hub API.
- [CLIP](https://github.com/openai/CLIP) (`Clip`). 
CLIP (Contrastive Language-Image Pre-Training) is a vision+language multimodal neural network trained on a variety of (image, text) pairs.
- [CorNet](https://github.com/dicarlolab/CORnet) (`Cornet`). 
A set of neural networks whose structure is supposed to resemble the one of the ventral visual pathway and therefore implements more recurrent connections that are commonplace in the VVS.
- [Huggingface](https://huggingface.co/) (`Huggingface`). 
Features a broad range of advanced language models that deal with text-input.
- [Yolo](https://github.com/ultralytics/yolov5) (`Yolo`). 
Includes fast, accurate YOLOv5 models for real-time object detection in images and video streams.
- **Toolbox** (`Toolbox`). 
A set of networks that are implemented within Net2Brain.



---
---

**Net2Brain** consists of 4 main parts:
1. **Feature Extraction**
  > Handles input in the form of images, videos or text and extracts relevant features for analysis and saves them into .npz files.
2. **Representational Dissimilarity Matrix (RDM) Creation**
> Utilizes numpy arrays (.npz files) from the feature extraction process to create RDMs that quantify dissimilarities between data representations with different distance metrics.
3. **Evaluation**
> Provides a comprehensive suite of evaluation methods including Linear Encoding, Representational Similarity Analysis (RSA), and more, to assess and compare model performance
4. **Plotting**
> Offers advanced visualization tools to graphically display the results of various analyses, enhancing interpretability and presentation of findings.




---



---




# Step 0: Exploring the Toolbox - Model Taxonomy

In [None]:
from net2brain.taxonomy import show_all_architectures
from net2brain.taxonomy import show_all_netsets
from net2brain.taxonomy import show_taxonomy
from net2brain.taxonomy import print_netset_models

from net2brain.taxonomy import find_model_like_name
from net2brain.taxonomy import find_model_by_dataset
from net2brain.taxonomy import find_model_by_training_method
from net2brain.taxonomy import find_model_by_visual_task
from net2brain.taxonomy import find_model_by_custom

To view a list of all available models along with the information on which netset they belong to, you can use the `print_all_models()` function to print them.

In [None]:
show_all_architectures()
show_all_netsets()

You can also inspect the models available from a particular _netset_ using the function `print_netset_models()`:

We also offer a comprehensive model taxonomy to help you find the most suitable model for your study. Each model in our toolbox has distinct attributes that cater to various research requirements. To facilitate your selection process, we provide a taxonomic overview of the models available.

To see the available attributes, use the show_taxonomy function. You can then search for a model based on one or more attributes using the following functions:

- `find_model_like(model_name)`
- `find_model_by_dataset(attributes)`
- `find_model_by_training_method(attributes)`
- `find_model_by_visual_task(attributes)`
- `find_model_by_custom([attributes], model_name)`

This taxonomy system is designed to help you easily identify and choose the most appropriate model for your research needs.

In [None]:
show_taxonomy()

Or you can find a model by its name using the function `find_model_like()`:

In [None]:
find_model_like_name('ResNet')

The `find_model_by_dataset(attributes)` function enables you to search for models associated with a specific dataset, such as 'ImageNet', 'ImageNet 22K', or 'COCO'.

In [None]:
find_model_by_dataset("Taskonomy")

The `find_model_by_training_method(attributes)` function helps you discover models based on their training methodology, such as 'Supervised', 'Jigsaw', or 'NPID'.

In [None]:
find_model_by_training_method("SimCLR")

The `find_model_by_visual_task(attributes)` function allows you to search for models specifically trained for a particular visual task, such as 'Object Detection', 'Panoptic Segmentation', or 'Semantic Segmentation'. 

In [None]:
find_model_by_visual_task("Panoptic Segmentation")

The `find_model_by_custom([attributes], model_name)` function enables you to search for models based on a combination of the attributes mentioned above. You can provide a list of attributes to filter the models, and optionally specify a particular model name to further refine your search.

In [None]:
find_model_by_custom(["COCO", "Object Detection"], model_name="fpn")



---



---



# Example Study: Predictive capabiltiies of Large Language Models for the Visual Cortex

<img src="data/Net2Brain_LE_Tutorial.png" width="100%" />

In this tutorial, we explore the **intersection of artificial intelligence and neuroscience**, specifically focusing on the **predictive capabilities of Large Language Models (LLMs)** and **Vision Transformers** in relation to the **Visual Cortex** of the human brain.

For our experiment we use the **Net2Brain toolbox**, which facilitates the comparison of activations within deep neural networks (DNNs) to human brain activity in response to identical stimuli.

By comparing the visual and language components of models like **CLIP**, along with other LLMs, we hope to shed light on well these systems can mimic, or perhaps even elucidate, the operations of the human visual cortex. While it is hypothesized that the visual component of CLIP may demonstrate a stronger correlation with EVC activity due to its direct engagement with visual stimuli, the potential explanatory power of LLMs regarding visual processing, especially in other brain areas, remains an open question.

Through this example study, we hope to not only provide insights into the parallels between artificial and biological neural networks but also offer a guide on using the **Net2Brain toolbox**. This tutorial is designed to equip you with the knowledge and tools necessary to make your own explorations.


# Step 1: Preparing the Dataset

The **Net2Brain** toolbox provides access to a wide array of datasets suitable for various studies. For a comprehensive list of available datasets and their descriptions, please refer to the **Net2Brain dataset notebook**.

For our current experiment we will use the **Natural Scenes Dataset (NSD)**. The NSD dataset is known for its rich collection of natural scene images, making it an excellent choice for our comparative study between human brain responses and model activations.

## Downloading the NSD Dataset

The **Net2Brain API** simplifies the process of acquiring datasets. To download the NSD dataset, you can use the following commands within the Net2Brain toolbox:

In [None]:
from net2brain.utils.download_datasets import DatasetAlgonauts_NSD
from pprint import pprint

Algonauts = DatasetAlgonauts_NSD()
available_paths = Algonauts.load_dataset()
pprint(available_paths)

In this tutorial we only want to conduct our experiment with one subject. Please select the subject of your choice down below

In [None]:
subject = 1

Since we will be using LLMs, we will need captions describing the images, that the participants have seen. Luckily NSD is derived from the COCO-Dataset. Net2Brain allows to download the COCO-Cpations via the API.

In [None]:
image_stimuli = available_paths[f"subj0{subject}_images"]
brain_data = available_paths[f"subj0{subject}_rois"]
text_stimuli = available_paths[f"subj0{subject}"] + "/training_split/training_text/"

#Algonauts.Download_COCO_Captions(image_stimuli, text_stimuli)

> For a comprehensive tutorial on how to use the datasets API, including details on additional datasets available, please refer to our detailed guide in the following notebook: [Exploring Net2Brain](../0_Exploring_Net2Brain.ipynb).


## Utilizing precomputed data

Given the time-intensive nature of Linear Encoding, especially within a tutorial setting, we recommend using precomputed data to expedite the learning process. (The actual download is further below right after Linear Encoding)



---



---



# Feature Extraction


## Step 2: Feature Extraction

In this step we are going to extract the features of different Large Language Models (LLMs) and  CLIP's Vision Transformer (ViT), which will be later used to train a Linear Regression Model.

### Model Inputs

- **Large Language Models**: These models will use the original COCO Captions from the NSD images.
- **Vision Models**: These models will use the original NSD (COCO) images, the same ones shown to human participants in the brain studies.

### Models for Comparison

We've chosen a mix of models for this study:

- **CLIP ViT-B/32**: A Vision Transformer known for its strong performance in understanding images.
- **CLIP RN50**: A different CLIP model to see how it stacks up against the ViT version.
- **bert-base-uncased**: A basic but powerful language model.
- **bert-large-cased-whole-word-masking**: A bigger version of BERT that's better at understanding the context of words.
- **gpt2**: A well-known model for generating text, included here for a broader view of language model capabilities.



### Extracting model features using `FeatureExtractor`

### Clip ViT-B/32

In [None]:
from net2brain.feature_extraction import FeatureExtractor
extractor = FeatureExtractor("ViT-B/32", "Clip", device="cpu")

This initializes the feature extractor and loads the model and any specified layers for extraction into the instance. To view the layers that are set to be extracted, you can execute `extractor.layers_to_extract`.

In [None]:
suggested_layers = extractor.layers_to_extract
pprint(suggested_layers)

Note that the layers we suggested for extraction might not be well-suited for your experiemnts. To view a complete list of all available layers, you can use `extractor.get_all_layers()` and overwrite the `layers_to_extract` attribute with your desired subset.

In [None]:
extractor.get_all_layers()

Using this knowledge, we have the option to select the layers we wish to extract from the models.

In [None]:
visual_layers = ["visual.transformer.resblocks.0",
            "visual.transformer.resblocks.1",
            "visual.transformer.resblocks.2",
            "visual.transformer.resblocks.3",
            "visual.transformer.resblocks.4",
            "visual.transformer.resblocks.5",
            "visual.transformer.resblocks.6",
            "visual.transformer.resblocks.7",
            "visual.transformer.resblocks.8",
            "visual.transformer.resblocks.9",
            "visual.transformer.resblocks.10",
            "visual.transformer.resblocks.11"]

text_layers = ["transformer.resblocks.0",
            "transformer.resblocks.1",
            "transformer.resblocks.2",
            "transformer.resblocks.3",
            "transformer.resblocks.4",
            "transformer.resblocks.5",
            "transformer.resblocks.6",
            "transformer.resblocks.7",
            "transformer.resblocks.8",
            "transformer.resblocks.9",
            "transformer.resblocks.10",
            "transformer.resblocks.11"]

In [None]:
# Only visual layers
extractor = FeatureExtractor("ViT-B/32", "Clip", device="cpu")
extractor.extract(data_path=image_stimuli, save_path=f"Tutorial_subj{subject}/clip_vit_b32_vision_feats", consolidate_per_layer=False, layers_to_extract=visual_layers)

# Only text layers:
extractor = FeatureExtractor("ViT-B/32", "Clip", device="cpu")
extractor.extract(data_path=text_stimuli, save_path=f"Tutorial_subj{subject}/clip_vit_b32_text_feats", consolidate_per_layer=False, layers_to_extract=text_layers)

**Alternative for multimodal models:**

Clip takes both text and visual input. You can also create a folder with both stimuli and send the entire folder to CLIP and it will take care of the extraction!

### Other models:

In [None]:
# Bert-base-uncased
extractor = FeatureExtractor("bert-base-uncased", "Huggingface", device="cpu")
extractor.extract(data_path=text_stimuli, save_path=f"Tutorial_subj{subject}/bert_base_feats", consolidate_per_layer=False)


# Bert-large-uncased
extractor = FeatureExtractor("bert-large-cased-whole-word-masking", "Huggingface", device="cpu")
extractor.extract(data_path=text_stimuli, save_path=f"Tutorial_subj{subject}/bert_large_whole_word_feats", consolidate_per_layer=False)


# GPT2
extractor = FeatureExtractor("gpt2", "Huggingface", device="cpu")
extractor.extract(data_path=text_stimuli, save_path=f"Tutorial_subj{subject}/gpt2_feats", consolidate_per_layer=False)



---



---




## Step 3: Linear Encoding

In this step, we use Linear Encoding to compare model activations with human fMRI responses to the same images. This method trains a classifier using the model activations to see how well it can predict brain activity it has not seen before. This helps us understand the models' effectiveness in mimicking human brain processes.

> **Note**: Due to its computational intensity, Linear Encoding can be time-consuming. For a more efficient tutorial experience, consider using precomputed results from the provided CSV files. You can start the download a few cells further down.


In [None]:
from net2brain.evaluations.encoding import Linear_Encoding

This step has been automated to cycle through all available models efficiently.

In [None]:
# Dictionary to loop through all models for which we extracted features from
loop_dict = {f"Tutorial_subj{subject}//bert_base_feats": "BERT_Base",
             f"Tutorial_subj{subject}/bert_large_whole_word_feats": "BERT_large_whole_word",
             f"Tutorial_subj{subject}//gpt2_feats": "GPT2",
             f"Tutorial_subj{subject}/clip_vit_b32_text_feats": "CLIP_Text",
             f"Tutorial_subj{subject}/clip_vit_b32_vision_feats": "CLIP_Vision"}



# Loop through our models
for feat_path, model_name in loop_dict.items():
    print(f"Linear Encoding for {model_name} ...")

    # Call the linear encoding function
    results_df_clip = Linear_Encoding(
        feat_path=feat_path,
        roi_path=brain_data,
        model_name=model_name,
        trn_tst_split=0.8,
        n_folds=3,
        n_components=100,
        batch_size=100,
        random_state=42,
        return_correlations=True,
        save_path=f"Tutorial_LE_Results/subj{subject}"
    )

--- 

---

### Alternative: Load Precomputed Data

In [None]:
import pandas as pd
from net2brain.utils.download_datasets import Tutorial_LE_Results
Tutorial_LE_Results.load_dataset()


def open_results(file_path, roi_category=None):
    """
    Loads a DataFrame from a CSV file and optionally filters it by an ROI category.

    Args:
    - file_path (str): The path to the CSV file.
    - roi_category (str, optional): The category key from roi_mappings to filter by.

    Returns:
    - pd.DataFrame: The loaded (and optionally filtered) DataFrame.
    """
    # Load the DataFrame from the CSV file
    dataframe = pd.read_csv(file_path)
    
    # If an ROI category is specified, filter the DataFrame
    if roi_category:
        # Define ROI mappings
        roi_mappings = {
            'prf-visualrois': ['V1', 'V2', 'V3', 'hV4'], 
            'floc-bodies': ['EBA', 'FBA-1', 'FBA-2', 'mTL-bodies'],
            'floc-faces': ['OFA', 'FFA-1', 'FFA-2', 'mTL-faces', 'aTL-faces'],
            'floc-places': ['OPA', 'PPA', 'RSC'],
            'floc-words': ['OWFA', 'VWFA-1', 'VWFA-2', 'mfs-words', 'mTL-words']
        }
        
        # Check if the specified category exists in roi_mappings
        if roi_category not in roi_mappings:
            raise ValueError(f"Category '{roi_category}' not found in roi_mappings.")
        
        # Filter the DataFrame
        dataframe = dataframe.loc[dataframe['ROI'].str.split('_').str[0].isin(roi_mappings[roi_category])]
    
    return dataframe

In [None]:
import pandas as pd

roi_category = 'prf-visualrois'  # If you want to display the data only for certain brain regions
#roi_category = 'floc-bodies'
# roi_category = 'floc-faces'
# roi_category = 'floc-places'
#roi_category = 'floc-words'

# Select subject
subject = 1 # Downloaded data only available for subject 1

results_df_clip_vision = open_results(f"Tutorial_LE_Results/subj{subject}/CLIP_Vision.csv", roi_category)
results_df_clip_text = open_results(f"Tutorial_LE_Results/subj{subject}/CLIP_Text.csv", roi_category)
results_df_bert = open_results(f"Tutorial_LE_Results/subj{subject}/BERT_Base.csv", roi_category)
results_df_bert_masking = open_results(f"Tutorial_LE_Results/subj{subject}/BERT_large_whole_word.csv", roi_category)
results_df_gpt = open_results(f"Tutorial_LE_Results/subj{subject}/GPT2.csv", roi_category)


# Step 4: Plotting Results

In [None]:
from net2brain.evaluations.plotting import Plotting


plotter = Plotting([results_df_clip_vision, 
                    results_df_clip_text, 
                    results_df_bert, 
                    results_df_bert_masking,
                    results_df_gpt])

results_dataframe = plotter.plot_all_layers(metric="R", columns_per_row=2, simplified_legend=True)



---



---



# Mixed captions

In our analysis, we observed that while large language models (LLMs) didn't quite match the predictive accuracy of the vision-based model, they were still capable of capturing some patterns in brain data. This raises an important question: Are these correlations meaningful or coincidental? To delve deeper into this, we have a couple of strategies we could pursue. One approach is to initialize the models with random weights by using `FeatureExtractor(model, netset, pretrained=False)`. Another strategy involves shuffling the captions to see if the models maintain their predictive performance even when there's a mismatch between the captions and the brain data. To skip the step of feature extraction, we can just mix the extracted features!

In [None]:
import os
import random
import shutil
from tqdm import tqdm

def mix_activation_files(base_folder):
    # Iterate over subdirectories in the base folder
    for subdir in tqdm(os.listdir(base_folder)):
        if "_feats" in subdir:
            source_folder = os.path.join(base_folder, subdir)
            target_folder = os.path.join(base_folder, subdir.replace("_feats", "_mixed_feats"))
            
            if not os.path.exists(target_folder):
                os.makedirs(target_folder)
            
            files = os.listdir(source_folder)
            shuffled_files = random.sample(files, len(files))
            
            for original, shuffled in zip(files, shuffled_files):
                shutil.copy(os.path.join(source_folder, original), os.path.join(target_folder, shuffled))




In [None]:
base_folder = "Tutorial_subj1"
mix_activation_files(base_folder)

### Linear Encoding

In [None]:
loop_dict = {f"Tutorial_subj{subject}//bert_base_mixed_feats": "BERT_Base_mixed",
             f"Tutorial_subj{subject}/bert_large_whole_word_mixed_feats": "BERT_large_whole_word_mixed",
             f"Tutorial_subj{subject}//gpt2_mixed_feats": "GPT2_mixed",
             f"Tutorial_subj{subject}/clip_vit_b32_text_mixed_feats": "CLIP_Text_mixed",}


for feat_path, model_name in loop_dict.items():
    print(f"Linear Encoding for {model_name} ...")

    # Call the linear encoding function
    results_df_clip = Linear_Encoding(
        feat_path=feat_path,
        roi_path=brain_data,
        model_name=model_name,
        trn_tst_split=0.8,
        n_folds=3,
        n_components=100,
        batch_size=100,
        random_state=42,
        return_correlations=True,
        save_path=f"Tutorial_LE_Results/subj{subject}"
    )
    


### Loading Precomputed Data

In [None]:
import pandas as pd

roi_category = 'prf-visualrois'
# roi_category = 'floc-bodies'
# roi_category = 'floc-faces'
# roi_category = 'floc-places'
# roi_category = 'floc-words'

# Select subject
subject = 1 # Downloaded data only available for subject 1

results_df_clip_vision_cls = open_results(f"Tutorial_LE_Results/subj{subject}/CLIP_Vision.csv", roi_category) # Normal CLIP for comparison
results_df_clip_text_mix = open_results(f"Tutorial_LE_Results/subj{subject}/CLIP_Text_mixed.csv", roi_category)
results_df_bert_mix = open_results(f"Tutorial_LE_Results/subj{subject}/BERT_Base_mixed.csv", roi_category)
results_df_bert_masking_mix = open_results(f"Tutorial_LE_Results/subj{subject}/BERT_large_whole_word_mixed.csv", roi_category)
results_df_gpt_mix = open_results(f"Tutorial_LE_Results/subj{subject}/GPT2_mixed.csv", roi_category)


### Plotting

In [None]:
from net2brain.evaluations.plotting import Plotting


plotter = Plotting([results_df_clip_vision_cls, 
                    results_df_clip_text_mix, 
                    results_df_bert_mix, 
                    results_df_bert_masking_mix,
                    results_df_gpt_mix])

results_dataframe = plotter.plot_all_layers(metric="R", columns_per_row=2, simplified_legend=True)