# Installing Net2Brain

<img src="workshops/data/Net2Brain_Logo.png" width="25%" />

In [None]:
#!pip install -U git+https://github.com/cvai-roig-lab/Net2Brain

# Step 1: Feature Extraction

## Using `FeatureExtractor` with a model from Net2Brain

The FeatureExtractor class provides an interface for extracting features from a given model. When initializing this class, you can customize its behavior by setting various parameters:

- `model` (required): The model from which you want to extract features. Either string in combination with a netset (next parameter), or a variable with a model-type.
- `netset` (optional): The netset (collection of networks) that the model belongs to.
- `device` (optional): The device on which to perform the computations, e.g., 'cuda' for GPU or 'cpu' for CPU. Default is None, which will use the device specified in the global PyTorch settings.
- `pretrained` (optional): A boolean flag indicating whether to use a pretrained model (if available) or to initialize the model with random weights. Default is True, which means that a pretrained model will be used if possible.

- - -


First we need to a dataset to play around with. For that we will use the dataset by [Micheal F. Bonner (2017)](https://www.pnas.org/doi/full/10.1073/pnas.1618228114), which we can download using the `load_dataset` function

In [None]:
from net2brain.utils.download_datasets import DatasetBonnerPNAS2017
from pprint import pprint

paths_bonner = DatasetBonnerPNAS2017.load_dataset()
pprint(paths_bonner)


In [None]:
stimuli_path = paths_bonner["stimuli_path"]
roi_path = paths_bonner["roi_path"]

### Initating FeatureExtractor


To extract the activations of a pretrained model from a netset, you can use the FeatureExtractor class. First, you need to initialize the class by providing the name of the model and the name of the netset. You can find a suitable model and netset by exploring the taxonomy options available in the Net2Brain toolbox, as shown in the previous notebook "0_Exploring_Net2Brain". For instance, in the following example, we will use AlexNet from the standard netset.

In [None]:
from net2brain.feature_extraction import FeatureExtractor
fx = FeatureExtractor(model='AlexNet', netset='Standard', device='cpu')

The `extract` method computes feature extraction from an image dataset. It takes in the following parameters:

- `data_path` (required): The path to the images from which to extract the features. The images must be in JPEG or PNG format.
- `save_path` (optional): The path to save the extracted features to. If None, the folder where the features are saved is named after the current date in the format "{year}{month}{day}{hour}{minute}".
- `layers_to_extract` (optional): Either provide the list of layers to extract from the model, or a single string 
option from "all", "top_level", and "json", to extract all layers, only the top level blocks or predefined layers 
from a json file. Defaults to "top_level".
- `consolidate_per_layer` (optional): The features are extracted image-wise. This is defaulted to true and will 
consolidate them per layer if not set to False. Defaults to True.
- `dim_reduction` (optional): Type of dimensionality reduction to apply to the extracted features. Choose from `srp` (Sparse Random Projection) and `pca` (Principal Component Analysis). Defaults to None.
- `n_samples_estim`: The number of samples used for estimating the dimensionality reduction. Defaults to 100.
- `n_components` (optonal): Number of components for dimensionality reduction. If None, the number of components is estimated. Defaults to 10,000 (good value for SRP, not PCA).
- `max_dim_allowed` (optional): The threshold over which the dimensionality reduction is applied. If None, it is always applied. Defaults to None.

In [None]:
from net2brain.feature_extraction import FeatureExtractor

fx = FeatureExtractor(model='AlexNet', netset='Standard', device='cpu')
fx.extract(data_path=stimuli_path, save_path='AlexNet_Feat', consolidate_per_layer=False)

__Net2Brain__ chooses by default to extract from the top level blocks of the model. You can inspect 
which layers are selected by default by calling the `layers_to_extract` attribute:

In [None]:
fx.layers_to_extract

These are not all the layers that **can** be extracted. If you want to see all the layers that can possibly be extracted you you call `get_all_layers()`.

In [None]:
fx.get_all_layers()

If you wish to change the layers to be extracted you can add it to the `extract` function like with the parameter 
```
fx.extract(..., layers_to_extract=[your_layers])
```
or layers_to_extract="all" to extract from all layers.

- - - 

- - -

# Adding dimensionality reduction
If you wish you can also reduce the dimensionality of the extracted feautures using:

- `srp` (Sparse Random Projection)
- `pca` (Principal Component Analysis)

In [None]:
from net2brain.feature_extraction import FeatureExtractor
fx = FeatureExtractor(model='AlexNet', netset='Standard', device='cpu')
fx.extract(data_path=stimuli_path, save_path='AlexNet_Feat_dim_red', dim_reduction="srp", n_components=50)

If you want to save the original features to disk, but still want to reduce dimensionality for your analyses, this is 
also possible further down the pipeline when the features are loaded (see 2_RDM_Creation notebook). In that case, set
 `dim_reduction` to None in the extract function.

---

---

# Extracting Features from Large Language Models

We have also added optionality to extract features from Large Language Models (LLMs) using .txt files. For this you just enter the path to your .txt files, in which each new line represents one new sentence.

Since the feautures are saved per file, and since a .txt file might contain multiple sentences, you can `consolidate_per_txt_file()` in order to seperate each sentence into its own .npz file!

In [None]:
from net2brain.feature_extraction import FeatureExtractor


extractor = FeatureExtractor("facebook/bart-base", "Huggingface", device="cpu")
layers_to_extract = extractor.get_all_layers()
print(layers_to_extract)

extractor.extract(data_path="textinput_folder", 
                  save_path="LLM_output",
                  consolidate_per_layer=True)


- - -

- - -

# Extracting Features from Audio Models


You can also extract features from audio models using any kind of audio data e.g. *.wav* or *.mp3* files. For this you just enter the path to your audio files. Currently, we provide a selection of CNN based models and Transformer based models. You can find the available models in the taxonomy notebook. For each model group there are mutliple models available, that differ in many design choices like the time window used or the number of layers. Please refer to the paper of the model for more details:
* CNN-based models: [PANNS](https://arxiv.org/pdf/1912.10211)
* Transformer-based models: [AST](https://arxiv.org/pdf/2104.01778)

In the future we will also add more models to this list.

In [None]:
from net2brain.feature_extraction import FeatureExtractor


extractor = FeatureExtractor(model='PANNS_Cnn10', netset='Audio', device='cpu')
print(extractor.layers_to_extract)

extractor.extract(data_path="../net2brain/tests/audios",
                  save_path="Audio_output",
                  consolidate_per_layer=True)

- - - 

- - -

## Using `FeatureExtractor` with your own DNN

You can also incorporate your own custom model with __Net2Brain__. To do this, supply the `FeatureExtractor` with the following components:

1. Your model
2. An existing netset to fall back to (e.g. Standard, Clip, Pyvideo) when loading the data and applying standard 
functions.
3. Optionally, your custom transform function (if not provided, standard ImageNet transformations will be used)
4. Optionally your custom extraction function (if not provided, standard Torchextractor will be used)
5. Optionally, your custom feature cleaner (if not provided, no cleaning will be done)
6. The specific layers you want to extract features from

In [None]:
from torchvision import models

# Define a model
model = models.resnet50(weights=models.ResNet50_Weights.IMAGENET1K_V1)  # This one exists in the toolbox as well, it is just supposed to be an example!


# Define extractor 
fx = FeatureExtractor(model=model, device='cpu')

# Run extractor
fx.extract(data_path=stimuli_path,  save_path='ResNet50_Feat', layers_to_extract=['layer1', 'layer2', 'layer3', 'layer4'])

Here an example with your custom functions. Make sure the parameters of your custom function match the ones here.

In [None]:
from torchvision import transforms as T
from torchvision import transforms as trn
import torchextractor as tx

def my_preprocessor(image, model_name, device):
    """
    Args:
        image (Union[Image.Image, List[Image.Image]]): A PIL Image or a list of PIL Images.
        model_name (str): The name of the model, used to determine specific preprocessing if necessary.
        device (str): The device to which the tensor should be transferred ('cuda' for GPU, 'cpu' for CPU).

    Returns:
        Union[torch.Tensor, List[torch.Tensor]]: The preprocessed image(s) as PyTorch tensor(s).
    """

    transforms = trn.Compose([
        trn.Resize((224, 224)),
        trn.ToTensor(),
        trn.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ])

    img_tensor = transforms(image).unsqueeze(0)
    if device == 'cuda':
        img_tensor = img_tensor.cuda()

    return img_tensor


def my_extactor(preprocessed_data, layers_to_extract, model):
    # Create a extractor instance
    extractor_model = tx.Extractor(model, layers_to_extract)
    
    # Extract actual features
    _, features = extractor_model(preprocessed_data)

    return features


def my_cleaner(features):
    return features


# Define a model
model = models.resnet50(weights=models.ResNet50_Weights.IMAGENET1K_V1)  # This one exists in the toolbox as well, it is just supposed to be an example!

## Define extractor (Note: NO NETSET NEEDED HERE)
fx = FeatureExtractor(model=model, device='cpu', preprocessor=my_preprocessor, feature_cleaner=my_cleaner, extraction_function=my_extactor)

# Run extractor
fx.extract(stimuli_path, layers_to_extract=['layer1', 'layer2', 'layer3', 'layer4'])