## Embedding Techniques

The goal is to convert text into vectors

### OLLAMA Embedding

Nota: Prima di eseguire questo codice, assicurati di avere Ollama installato e di aver scaricato il modello specificato (ad esempio, "gemma:2b") con il comando `ollama pull gemma:2b`.

In [6]:
from langchain_community.embeddings import OllamaEmbeddings

#Create embeddings using a local model with Ollama
#By default the model is "llama2", here we explicitly use "gemma:2b"
embeddings = OllamaEmbeddings(model="gemma:2b")

#embeddings is an OllamaEmbeddings object with all default parameters
print(embeddings)

base_url='http://localhost:11434' model='gemma:2b' embed_instruction='passage: ' query_instruction='query: ' mirostat=None mirostat_eta=None mirostat_tau=None num_ctx=None num_gpu=None num_thread=None repeat_last_n=None repeat_penalty=None temperature=None stop=None tfs_z=None top_k=None top_p=None show_progress=False headers=None model_kwargs=None


##### Embedding a list of Documents

In [8]:
#Create embedded vectors for a list of short documents
documents = [
    "Alfa è la prima lettera dell'alfabeto greco",
    "Beta è la seconda lettera dell'alfabeto greco",
    "Gamma è la terza lettera dell'alfabeto greco",
]

r1 = embeddings.embed_documents(documents)

#The number of vectors equals the number of input sentences
print(len(r1))   #3

#Each vector has dimension 2048 (depends on the chosen model)
print(len(r1[0]))  #2048

3
2048


Assicurarsi di aver eseguito: `ollama pull mxbai-embed-large` nel terminale Windows

In [11]:
#See available embedding models at: https://ollama.com/blog/embedding-models
embeddings = OllamaEmbeddings(model="mxbai-embed-large")

#Embed a single query string
text = "This is a test document"
query_result = embeddings.embed_query(text)

#query_result is a high-dimensional vector representation of the input text
print(len(query_result))  #check vector dimension


1024


### Hugging Face Embedding

In [12]:
#Import API KEY di Hugging Face
import os
from dotenv import load_dotenv
load_dotenv()

os.environ['HF_TOKEN']=os.getenv("HF_TOKEN")

#### Sentence Transformer on Hugging Face


In [13]:
from langchain_huggingface import HuggingFaceEmbeddings
embedding=HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

  from .autonotebook import tqdm as notebook_tqdm
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


In [14]:
text="this is a test document"
query_result=embedding.embed_query(text)
query_result

[-0.04820788651704788,
 0.11789615452289581,
 -0.03746979311108589,
 0.056620508432388306,
 0.015501769259572029,
 -0.03674933686852455,
 -0.05957154557108879,
 0.057209186255931854,
 -0.020756389945745468,
 0.057084694504737854,
 0.07765144109725952,
 0.018936702981591225,
 0.0006362755666486919,
 -0.0004041104984935373,
 -0.06529418379068375,
 -0.028550026938319206,
 -0.011813565157353878,
 -0.04569055885076523,
 -0.007525651715695858,
 0.08929043263196945,
 0.05310368910431862,
 0.06305599957704544,
 -0.004552426282316446,
 0.0003609247214626521,
 0.008460210636258125,
 0.030092813074588776,
 -0.06308870017528534,
 0.03805897384881973,
 0.08158418536186218,
 -0.05816444009542465,
 0.032057568430900574,
 0.06851984560489655,
 0.07797118276357651,
 0.03416930511593819,
 0.06476923823356628,
 0.004228685982525349,
 0.07276935130357742,
 0.002517967252060771,
 0.038405708968639374,
 0.03519800305366516,
 -0.017656495794653893,
 -0.11470235139131546,
 0.009409678168594837,
 0.03478090092

In [16]:
len(query_result) #dimension is 384 (depends on the chosen model: all-MiniLM-L6-v2)

384