# Elements advanced search with Sentence Embeddings

## Overview

This project offers a framework for analyzing and visualizing a collection of elements using Sentence Embeddings. It includes functionalities for generating embeddings, visualizing them using t-SNE, and interactively finding the closest element based on user input.

## Sentence Embeddings

Sentence embedding models aim to capture the semantic essence of a sentence by representing it as a fixed-length vector. For this, the SentenceTransformers library was used ([see documentation](https://www.sbert.net/))

<img src="media/embeddings.png" width="750">

The result is a multidimensional vector describing a sentance:

<img src="media/sentenceembedding.png" width="300">

## t-SNE
t-SNE (t-distributed Stochastic Neighbor Embedding) is an unsupervised non-linear dimensionality reduction technique for data exploration and visualizing high-dimensional data. Non-linear dimensionality reduction means that the algorithm allows us to separate data that cannot be separated by a straight line. This has been used for plotting the embedded elements in a 2D plot.

## Project Structure

- `embeddings_generator.ipynb`: File including a function which given an element, it summarizes its description using a summarization transformer and a function which generates the embeddings for a given data.
- `graphical_representation.ipynb`: File which graphically represents the embeddings in a 2D-plot by reducing the vector components to 2.
- `ask.ipynb`: File which allows to interactively find the closest element based on user input.

## Data Format

A ".json" file of data elements must be provided and stored in the folder "/datafile". It's format must follow the following structure:

```
[{
    'id': 'id of the element',
    'name': 'name of the element',
    'description': 'data describing the element'
},
{
    'id': 'id of the element',
    'name': 'name of the element',
    'description': 'data describing the element'
}]
```

## Initialize project

Create a conda environment: ``conda create -n (env_name)``

Activate the environment: ``conda activate (env_name)``

Start the jupyter lab: ``jupyter lab``