<a href="https://colab.research.google.com/github/IvaroEkel/Probabilistic-Machine-Learning_lecture-PROJECTS/blob/main/TEMPLATE_Probabilistic_Machine_Learning_Project_Report.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Probabilistic Machine Learning - Project Report - fMRI Vector Embeddings

**Course:** Probabilistic Machine Learning (SoSe 2025)
**Lecturer:** Alvaro Diaz 
**Student(s) Name(s):**  Felix Filius
**GitHub Username(s):**  thehappyson
**Date:**  20.08.2025
**PROJECT-ID:** 13-1FFXXXX_fmri_vector_embeddings

---


## 0. Preface
This project is part of ongoing research at the Max Planck Institute for Cognitive and Brain Sciences in Leipzig with the research group under Dr. Nico Scherf. The work outlined in this document and is my own as port of that research engagement or specifically for this course project. When I relied on or used work of the other researchers I highlight this in the text and make sure to mention their contributions. Proper citations are sadly not possible as none of this work is published yet. I highly appreciate the collaboration with and guidance from the team.

All code used and referenced here is located in the notebooks inside the repository.
Some path references might be broken in the notebooks as the code was run on a HPC-Cluster, for this I acknowledge the support of the Max Planck Computing and Data Facility where the computations were performed on the HPC systems Raven and Robin.
I advise against running the code in the notebooks again as the memory and compute requriements are immense due to the size of the data and the models, however I also stored fitted models and some visuals as separate files in the repository to enable more efficient computations.

**Please consider the "CEBRA_one_subject.ipynb" notebook in the root folder of this project as a reference unless otherwise stated, as that containts the full workflow with multiple models and results.**

---

## 1. Introduction

- Brief description of the dataset and problem
- Motivation for your project
- Hypothesis or research question

The presented work is based on the research and data by [Finn, E.S., Corlett, P.R., Chen, G. et al. Trait paranoia shapes inter-subject synchrony in brain activity during an ambiguous social narrative](https://rdcu.be/eBHsQ)[^1]. The paper investigated a potential correlation between neural activations of patient reacting to a stimulus and their inherent character traits, in this case a continuous paranoia score.
The data used in this project is the dataset produced by the above credited paper and can be accessed here: https://openneuro.org/datasets/ds001338/versions/1.0.0 [^2]
The easiest way to download the data is with the following code



In [None]:
# can be run in shell
# copies data in current working directory in a new folder
%%sh
# install AWS CLI first, if not already (code below for Mac might need to adjust for Linux or windows)
# Install instructions can be found under https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html#getting-started-install-instructions
curl "https://awscli.amazonaws.com/AWSCLIV2.pkg" -o "AWSCLIV2.pkg"
sudo installer -pkg AWSCLIV2.pkg -target /

# download data
aws s3 sync --no-sign-request s3://openneuro.org/ds001338 ds001338-download/

### Dataset overview

The dataset consists of the following components
- anatomical functional MRI scan for 22 patients as .nii.gz
- functional scan split into three, due to experiment setup
- parameter file for the used scanner
- The used stimuli
    - The story as .txt and .doc file
    - The recording of the story as played to the subjects

The technical details of the scan in the original experiment is not relevant for the scope of this project as a thorough neurological evaluation is not feasible without extensive domain knowledge.
The dataset consists of preprocessed neuroimaging data stored in NIfTI format (.nii.gz files), representing BOLD signal measurements across voxels in the brain during experimental tasks. A voxel represents a single unit of the smallest part of the brain the scanner can capture. It roughly translates to one pixel in the 3D image. Each of these voxels has a measured activation level for every timepoint during the scan, here the timesteps are 1 second each. An activation in this context is every activity in the brain. The primary challenge lies in extracting meaningful low-dimensional representations from this high-dimensional, noisy neural data while preserving the underlying neural dynamics and behavioral correlates.


### Motivation

Data produced from neural experiments naturally results in a temporal structure being available as subjects are routinely exposed to stimuli where their reaction to the stimulus is what is actually observed and studied. Conventional dimensionality reduction methods struggle with properly incorporating the temporal aspect. For functional data, like we explore here which measures the brain functions for an activity, the temporal structure is inherently important as the original experiment investigated the changes in neural activation to different aspects of the story being presented. The CEBRA[^3] model also provides the ability to incorporate auxiliary variables directly into the model during training. Auxiliary data, or behavioral data in this context, is data describing the action or stimulus during the functional recording. Incorporating this directly to the model could theoretically provide a better insight into the relation of stimulus and neural activation.

Investigating the functional brain activity offers promising insight into our current understanding of the brain and how we perceive things. Can certain traits in a person dictate their perception of reality, and we are able to construct something like and objective reality? This research is also closely tied to prior works in the area of brain waves and neurostimulation, as identifying activation patterns in the latent space could then in return to experiment with using these patterns to force certain neurological reactions.
While the current research is in an early stage and purely exploration without a concrete hypothesis it is easy to see the potential applications and implications making this an interesting and worthwhile project.


## 2. Data Loading and Exploration


### Basic Data Exploration
The raw data can not be explored meaningfully for different reasons.
The exploration will therefore begin after some well-known required preprocessing so actual sense can be made out of the data with regard to our problem.

Loading the data requires specialized libraries, packages or tools, which in return also require the user to have extensive domain knowledge in radiology and/or neurosciences to undertake the required and meaningful processing steps.

#### Text
The story that was read to the subject during the scan is characteristically unstructured data. The story hold limited information by itself to our research.
For this reason the story will not be reviewed or explored in its unprocessed form.

#### Neural Data
The neural data is loaded as a nifti file, as mentioned before, which means the data is not fully raw out of the scanner but was converted from the DICOM format into the aforementioned .nii.gz format to make it usable with libraries for analysis.



## 3. Data Preprocessing

### Textual Data Preprocessing
To work with the story in the context of the latent space analysis the story was converted into a csv file and the text was split up into small chunks correlating to individual time points of the recording. Timepoints were used based on the timeslices of the fMRI scanning procedure to ensure an alignment between the text the subject heard and the functional response we observed. The file with these 'tokens' can be found in the project repository as 'tokens.csv'. This chunking and timepoint alignment was already done in manual effort.

Afterward the story was labelled with the help of an LLm, namely the Claude Sonnet model[^3] via API calls. Every token was sent to the API together with a system prompt explaining the task and a prompt outlining the instructions and a sliding window for story context. The labels were created in four categories as well as certainty scores were produced. This approach will be outlined in more detail in Section 4.
The labelled and unlabelled texts were embedded using the 'BAAI/bge-large-en-v1.5' model[^4].


### Neural Data Processing
The neural data preprocessing for the neural data was handled by subject-matter experts within the research group and already done when I joined the project. However, I still attempted it to identify problems and understand the process at least roughly.

The entire workflow can be seen in the "MRI_Data_Processing.ipynb" and "cebra_demo.ipynb" notebooks. The demo notebook only contains the numerical adjustments and some raw data inspection.

The first step was the so-called alignment of the functional scan to the anatomical scan of the brain. This step was already were my personal preprocessing pipeline failed, because the process of mapping the functional scan to a reference anatomical model requires some domain knowledge. Mapping to a reference model is done to ensure the gathered data is comparable among different subjects and the results are seen in the functional space and not the anatomical space. This process is similar to genetics were a human reference genome is considered during analysis. Acknowledging that my preprocessing was failing I still conducted the temporal smoothing spatial smoothing.

![Effect of preprocessing](./output/visuals/raw_vs_smoothed_signal.png "Preprocessing effect")

The smoothing and normalization worked and reduced the noise considerably, but considering the initial alignment and mapping was not working as intended I could not use my results and had to resort to the aforementioned already prepared data by another researcher.

The resulting data for a single patient is a 2-dimensional matrix with the shape (1311, 139501). The first dimension (1311) consists of the time. So we have 1311 timepoints in total per patient.
The second dimension represents the voxels of the brain. The fMRI takes a scan of the entire brain once per time unit, this is once per second in this experiment, and this scan records BOLD signals together with the anatomy resulting in a 4D datastructures with the 3 spatial dimensions of the anatomy of the brain and the measured activation per voxel.


With the data prepared we are able to to move to the next step and start working with our models to try and gain insight from the data.





## 4. Probabilistic Modeling Approach

In this section the different probabalistic models and their application in the scope of this project will be outlined starting with the LLM used for labelling the story followed by the conrastive learning model which is the main focus of the project. The dimensionality reduction methods addressed afterwards contribute less to the project outcome.

### 4.1 Claude
The commercial model Claude (claude.ai) was used via its API to label the text. As it is visible from the prompt given below, the context window given was -5 to. +5, meaning that the model saw the curernt token it was to label together with the previous five and the next five tokens to get some context for the current token.

The prompt that was used:
```
You are a semantic token labeling expert. Analyze the given token within its context and provide labels for these four categories:

1. **location** - Physical places, geographical locations, spatial references
2. **characters** - People, character names, pronouns referring to people, roles
3. **emotions** - Emotional states, feelings, mood descriptors
4. **time** - Temporal references, time periods, time-related words

**Target Token:** "{context_info['target_token']}"

**Context Before:** {' '.join(context_info['context_before'][-5:]) if context_info['context_before'] else '[none]'}
**Context After:** {' '.join(context_info['context_after'][:5]) if context_info['context_after'] else '[none]'}

For each category, provide:
- **value**: The specific semantic label (e.g., "office", "Dr. Carmen Reed", "tired", "afternoon") or "null" if not applicable
- **confidence**: Float between 0.0-1.0 indicating your confidence in the label

Make sure the character you assign is one of the following and no other:
- Dr. Carmen Reed
- Dr. John Torreson
- Antonio
- Juan Torres
- Alba
- Maria
- Linda
- Ramiro
- Boat Driver
- Alba's Mother

Remember that you are not strictly constrained to one value per label. If you assign multiple values to one category for any given token make sure it is complying with the JSON formating restrictions:
Consider the context when labeling. If the token doesn't directly contain a category but the context suggests it should be labeled (e.g., pronouns referring to characters, implicit time/location), include those labels.
Null values should be avoided and only used in scenarios where you a are completely unsure about the label. A value with a low confidence is better than a null value in most cases.

Respond in this exact JSON format:
{{
  "location": {{"value": "label_or_null", "confidence": 0.0}},
  "characters": {{"value": "label_or_null", "confidence": 0.0}},
  "emotions": {{"value": "label_or_null", "confidence": 0.0}},
  "time": {{"value": "label_or_null", "confidence": 0.0}}
}}"""
```

The categories to label each token in were:
- location: Where is the current scene set?
- characters: Who is acting in the current scene?
- emotions: What emotion is the scene conveying?
- time: What is the current time in the story? (not to be confused with the timestamps of the experiment)

These categories were chosen as they cover well known triggers for resonses in humans. Each of them possibly in distinct brain regions. The emotions category was of particular intereset as the original story from the paper was written in a way to be intentionally emotionally ambigious to study the different responses between patients with differently expressed paranoia.

The strict requirement to format the response in JSON enabled easier extraction of the values and handling in downstream tasks.

The commercial LLM was particulary suited for this task as it is probabilistic in nature and will not be likely to label the story exactly identical in multiple runs. The results can be seen as the json files in the "results" folder of this project.

### 4.2 CEBRA: "Consistent EmBeddings of high-dimensional Recordings using Auxiliary variables"
CEBRA is a contrastive learning model specifically developed for the analysis of neural data with a temporal component by a group of reasearchers. The original paper relied on neural data with much higher temporal resolution compared to fMRI, like EEG. This means that while fMRI takes one scan per second and EEG take measurements in miliseconds, therefore allowing a more consistent temporal smoothing.

The model was selected because it showed promising results in the initial paper it was presented in for excrating latent components out of the high dimensional and highly specialised neural data domain. The built in consideration of time, which is almost always available in neuroscience experiemts, already addressed a considerable drawback of conventional models. The ability of the contrastive learning model to also capture behavioral data during training allows a unique combination of the observed neural activation and the stimulus triggering this. Theoretically this would allow to create a much mre meaningful latent representation.

It has yet to be seen if CEBRA is also suitable for application to fMRI data as it is of considerably lower temporal resolution and the model architecture explicitly considers the temporal distances for the contrastive examples, where it considers postive paits if they are within a set time window and negative pairs outside.

### 4.3 GMM:
The GMM was used to identify clustering in the text embeddings that were created together with assigned labels. This proved difficult because of memory constraints the model ran into. The final reults of the GMM was berly meaningful as the dimensions had to be reduced significantly prior to training so that it resulted in being not usable in this scenario.

### 4.4 UMAP
UMAP was used once to represent the latents learned by CEBRA differnetly and another time to construct a differnet latent space directly from the patient data.

Considering the large size of the neural data matrix the UMAP approach was not able to run in a time that was managable. Over 130k pairwise comparisons of neighbbors could easily take days for the entire matrix. For this reason the data had to be reduced in diennsionalty to even allow the computation of the UMAP. This was done by applying PCA to get the first 50 principal components. The result is obviously less interpretable as it would be on the unreduced data, however the result was still very distinct. The limitation and result will be discussed in the appropriate section later.

### 4.5 t-SNE
t-SNE was used once in the project to create a low dimensional representation of the subject functional data to compare with the results of the CEBRA model. It was also applied to the embeddings learned by CEBRA to create a different visual representation of the same space.
Considering the size of the Matrix applying t-SNE to the entire dataset was not feasible as the training would have likely taken days if memory allowed it at all, similar to UMAP.  For this reason t-SNE used the same reduced dimensionality as UMAP (50) to enable computation. This constraint also equally limits the quality of the results, however they still illustrate the difference to CEBRA well enough.


### Code for the comparison of the models:

In [None]:
# Compare CEBRA to PCA, t-SNE, and UMAP
# all on original data to compare learned latent
# UMAP and t-SNE got reduced features via PCA to enable computation of otherwise huge matrix

fig, axes = plt.subplots(3, 4, figsize=(20, 12))  # Changed to 3x4 for UMAP
neural_corrected = patient_data

# Simple preprocessing: reduce to 50 components for t-SNE/UMAP
pca_prep = PCA(n_components=50)
neural_reduced = pca_prep.fit_transform(neural_corrected)

for i, (model_name, latent_data) in enumerate(models):
    time_colors = np.arange(len(latent_data))
    
    # CEBRA (first 2 dims)
    axes[i, 0].scatter(latent_data[:, 0], latent_data[:, 1], 
                      c=time_colors, cmap='viridis', s=15, alpha=0.7)
    axes[i, 0].set_title(f'{model_name}\nCEBRA')
    
    # PCA comparison (on original neural data)
    pca = PCA(n_components=2)
    pca_result = pca.fit_transform(neural_corrected)
    axes[i, 1].scatter(pca_result[:, 0], pca_result[:, 1], 
                      c=time_colors, cmap='viridis', s=15, alpha=0.7)
    axes[i, 1].set_title(f'{model_name}\nPCA')
    
    # t-SNE comparison (on reduced neural data)
    tsne = TSNE(n_components=2, random_state=42, perplexity=min(30, len(neural_reduced)//4))
    tsne_result = tsne.fit_transform(neural_reduced)
    axes[i, 2].scatter(tsne_result[:, 0], tsne_result[:, 1], 
                      c=time_colors, cmap='viridis', s=15, alpha=0.7)
    axes[i, 2].set_title(f'{model_name}\nt-SNE')
    
    # UMAP comparison (on reduced neural data)
    umap_reducer = umap.UMAP(n_components=2, random_state=42, 
                            n_neighbors=min(15, len(neural_reduced)//10),
                            min_dist=0.1)
    umap_result = umap_reducer.fit_transform(neural_reduced)
    axes[i, 3].scatter(umap_result[:, 0], umap_result[:, 1], 
                      c=time_colors, cmap='viridis', s=15, alpha=0.7)
    axes[i, 3].set_title(f'{model_name}\nUMAP')

plt.tight_layout()
plt.show()
plt.savefig('./output/visuals/latents_comparison.png', dpi=300, bbox_inches='tight')

## 5. Model Training and Evaluation

### Model Training 
The training process for CEBRA is very straightforward as it is shipped in its own package where all required functions are implemented. 

The first step is to define the model we want to use with its hyperparameters:

```
model10= cebra.CEBRA(
    model_architecture='offset10-model',
    batch_size=1024,
    learning_rate=3e-4,
    output_dimension=8,  # 3D for visualization
    num_hidden_units = 512,
    max_iterations=10000,
    distance='euclidean',
    conditional='time',
    verbose=True
)
```
The model architecture defines the basic model we are using, the offset is here the distance of timepoints that considered a positive pair. CEBRA offers various different model architectures as part of the package that can readily be used. In the notebook I used the variants with offset 10, 5 and 1 to compare them.
The batch size describes in how many batches the training process runs. Can be set to none to load all the data in one batch, if memory allows. Here a higher batch size was picked beacause of the dimensions of the data.
Output dimensions can be set to 3 to yield a more intuitive latent representation in visuals, I chose because that is appraently a best practise for this kind of data. I relied on information from other researchers for this.
Number of hidden units, Max. Iterations and learning rate are as usual for ML models. 
The distance metric parameter determines by which method the distance between positive and negative pairs is determined. Cosine is the default however euclidean seems to perform better for data where we also observe actual spatial relationships, like in the anatomical space of the fMRI. This information I also got from other researchers and relied on it during this phase of the project.
Ther verbrose parameter just toggles progress bars for the training. 

Many more paramters can be set and changed for the model, according to the documentation, however those were kept default as I aimed to observe the differences between the offsets considering the already mentioned low temporal resolution of the data.

The actual training is then done simply by:

```
model10.fit(patient_data)
```
Then the latent space is constructed with:

```
latent_10off = model10.transform(patient_data)
```
The transformation step constructs the latent space with the fit model over the specified data.
The "patient_data" is the preprocessed neural data of the subject. If the auxilliary variables are prepred they also het introduced during the fitting.

### Model Evaluation

![Comparison of latent construction between three different CEBRA models, PCA, t-SNE and UMAP](./output/visuals/latents_comparison.png "Latent space comparison")
The colorcoding indicates the time: Dark purple is the earliest timepoint while bright yellow the newest. Following the color gradient follows the time of the experiment.

- The graphic clearly shows that CEBRA identifies a continous temporal pattern in all three offset modes, which moves somewhat smoothly through the latent space.
- The PCA method breaks down completely with a single outlier dominating one of the 2 PCs. Even correcting for this would obviously not help the PCA too much as it would not be able to show the complex temporal flow.
- t-SNe and UMAP were run with fixed random seeds, hence their visual looks the same for all three runs. Other than that both models created a large cluster with no visual distinctions and also completely disregards the temporal pattern.

The CEBRA models holds up to the expectation that it is able to capture and present the temporal aspects of the data in a meaningful way, where the more classical methods are not able to capture this dynamic while also learning meaningful patterns.



## 6. Results

### CEBRA models - three different offset models
!["Latent embeddings of the different latent embedings"](./output/visuals/cebra_models.png)

!["3D interactive visual of the latent space"](./output/visuals/neural_cebra_comparison(2).html)

The different offset architectures of the CEBRA models seem to produce extremly similar results on the first glance some smaller differneces between the path through the space can already be seen and warrant a closer look.

!["Comparison of the step-wise consistency between models"](./output/visuals/step-wise-consistency.png)

Comparing the step-wise consistency between the different models reveals that the offset-10 and offset-5 models behave very similar, while the the offset-10 model is even more consistent within itself. The offset-1 model however clearly demonstrates huge inconsistencies at some timepoints.

Using UMAP and t-SNE to illustrate the learned latent differently shows even more similarities but also highlights some inconsistencies previously not seen:

!["CEBRA learned latent and the same latent projected into 2d via t-SNE and UMAP"](./output/visuals/cebra_visual_comparison.png)

This approach leverages the fact that the latent space constructed by CEBRA is 8-dimensional according to the specification of the modesl. This size is very much managable for the t-SNE and UMAO models. All three here show the same latent space projected into 2D with their respective method. The offset-1 model appearys more fragmented in the t-SNE and UMAP figures, this might also be the case in the CEBRA figure but is hard to tell due to the nature of the plot. This fragmentation is to be expected when the model only considers the immediate next timepoint and considers all others to be ngative pairs. 

The offset-10 and offset-5 models respectively show a smooth progression through the latent space over time, where the UMAP visual is even more consistent than the t-SNE. Whether this difference stems from and what is a better representation is not readly visible from the figures.

#### Comparison to text embeddings

Since the embeddings generated proved to be very inconsistent and not expressive after some continous experimentation a more naive approach was selected.

The embedings were computed on the unlabelled text directly and would be introduced later again after clustering. Since this was a recent change of direction in the research group I could only provide the first step of the new process as of now. 
For this the text embeddings were directly used to train a CEBRA model and create a latent space with the offset-10 model.

!["Text embeddings latent by CEBRA compared to neural latents"](./output/visuals/neural_textual_comparison.png)

Iterestingly the semantic space seems to be even more sparse than the neural space. However this is hardly useful to draw conclusions at this stage as further analysis is both required and planned.

## 7. Discussion

### Interpretation of Results

The different results of the CEBRA different CEBRA models indicate that, while the overall structure of the latent space is robust, there are different features learned within the different models that somehow lead to similar conclusions. 

A correlation analysis between the models seems to confirm this:
!["Correlation Analyis between CEBRA models"](./output/visuals/inter_model_comparison.png)

While overall correlation between the modesl is weak certain combinations are strongly positive or negative correlated. If the models learned the same representations for the same features we expect to see strong correlations all over. The fact that the models overall have very weak correlations indicates that capture very different neural dynamics. While this might seem counterintuitive this is actually reasonable. A different offset model learns on different temporal scales of brain activity. So the offset 1 model learns the "quick" changes from one state to another, the offset-10 model actually learns longer patterns which could correspond to longer thought processes or more complex cognitive patterns.

Overall the CEBRA models show to capture and illustrate the temporal aspect of neural data well and seem promising to use in further analysis and for neuroscience research. The t-SNE and UMAP in comparison specifically only show a cluster of random ddots and completely remove the temporal dynamic from the latent space. This new approach seems to be a clear advantage the temporal dimension is of imense value to neurosciences. 

### Limitations of the approach
One obvious limitation was addressed early already which was the lack of models to meaningfully compare the results yielded by CEBRA to those of other models. The reson for this is threefold:
1. This specific field of neuro- and computer science is still very new and not many publications are available with different models on similar problems, or even on similar problems at all.
2. Coventional models, like t-SNE, which could be used for an easy baseline are obviously ot suitable to capture the temporal complexity.
3. Model would need to be able to deliver results in a meaningful way to allow comparisons as these neuroscience topics often alck ground truths to compare to.

The incomplete pipeline of the semantic embeddings for the story text was hindering the progress of the project and prevented the evaluation of the stimulus directly in the model. This is a considerable factor that CEBRA would be able to leverage which we were not yet able to even investigate.

### Possible improvements or extensions

Future research could possibly develop other specialised models to compete with CEBRA in order to identify recurring neural representations in the latent space and maybe identify common latent components. Furhter investigation into the enrichement of semantic text embeddings with labels before the embedding process could also prove worthwhile, if a meaningful enrichment could be confirmed during clustering. 

Generally it seems imperative to do more repetitive testing with hyperparameter tuning and comparisons with random noise to avoid some presentaition effects the CEBRA model might display without actually showing a meaningful or robust framework.

It should also be crtically evaluated if the different neural dynamics learned by the different offset models are actually present in the brain activation patterns. Especially the short offset model might be prone to pick up on fragments as the BOLD signals introduce lag to the scan and remnants of activations can still be visible on the next timepoint. Generally the preprocessing tries to account for that, however a critical check is in order given how different the results are.

## 8. Conclusion

The interpretation and preparation of neural data requires extensive domain knowledge, hence I focused on the exploration of the latent space trying to identify patterns in neural activation.
After initial challenges of training the models, and I was able to show the expected strong temporal consistency of the neural activations throughout the observed time. The emerging complex activation pattern was captured similarly by different model architectures suggesting robust findings with expected differences due to the different model architectures. The different learned representaions of the different offset models and the maintained temporal pattern all demonstrate results as they would be expected given the existing data, therefore strongly suggesting that CEBRA delivers a considerable benefit to the neurosciences, specifically the analysis of behavioral driven neural data.
Inconsistencies in the embeddings structure when labels were used indicated that this approach was flawed in the used setup and would warrant further investigation.
Going forward the training of the CEBRA model with auxiliary variables will be the main focus to investigate the possibility of extracting patterns from the resulting latent space. It might also be worth it to identify emotional triggers within the story and investigate them against expected activation in certain brain regions.


## 9. References


[^1]: E.S Finn Trait Paranoia

[^2]: [Dataset ](https://openneuro.org/datasets/ds001338/versions/1.0.0)

[^3]: [CEBRA AI](https://cebra.ai)

[^4]: [Anthropic Claude](https://claude.ai)

[^5]: BGE Embedder Model