# Langchain embeddings classes

Text embedding models

https://python.langchain.com/docs/modules/data_connection/text_embedding/


CacheBackedEmbeddings

https://python.langchain.com/docs/modules/data_connection/text_embedding/caching_embeddings

EmbeddingDistanceEvalChain

https://api.python.langchain.com/en/stable/evaluation/langchain.evaluation.embedding_distance.base.EmbeddingDistanceEvalChain.html#

Checkout LC example on Colab

https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/docs/guides/evaluation/string/embedding_distance.ipynb#scrollTo=PoVTQVj9Xp-1



## Setup environment

In [1]:
from dotenv import load_dotenv
import os

import warnings

warnings.filterwarnings("ignore")

# Load the file that contains the API keys
load_dotenv('C:\\Users\\raj\\.jupyter\\.env')

True

## 1. HuggingFace embeddings

Checkout the HF embeddings model

https://huggingface.co/models?pipeline_tag=sentence-similarity&sort=downloads

When inference API is used model is NOT downloaded to the local file system.

https://api.python.langchain.com/en/latest/embeddings/langchain_community.embeddings.huggingface.HuggingFaceInferenceAPIEmbeddings.html#

Parameters:

* param model_name: str = 'sentence-transformers/all-MiniLM-L6-v2'
* param api_key: SecretStr. You should have it in the .env file for the token (HUGGINGFACEHUB_API_TOKEN)

**Note**
Free model access via infrence API is rate restricted. You may see failures in case of large number of calls !!

In [2]:
from langchain_community.embeddings import HuggingFaceInferenceAPIEmbeddings

model_name = 'sentence-transformers/all-mpnet-base-v2'

HUGGING_FACE_API_KEY=os.getenv('HUGGINGFACEHUB_API_TOKEN')

model = HuggingFaceInferenceAPIEmbeddings(api_key=HUGGING_FACE_API_KEY,model_name=model_name)

test_docs = [
    'Industrial revolution changed the world around us',
    'The Renaissance was a period of cultural rebirth that emerged in Europe during the 14th to 17th centuries'
]

embed_docs = model.embed_documents(test_docs)

In [3]:
# Print the size & count of the embeddings
print('Size = ', len(embed_docs[0]), '  Count = ', len(embed_docs))

Size =  768   Count =  2


In [4]:
query = "when was steam engine invented?"

embed_query = model.embed_query(query)

## 2. Distance Metric

https://python.langchain.com/docs/guides/evaluation/string/embedding_distance



In [5]:
from langchain.evaluation import EmbeddingDistance

list(EmbeddingDistance)

[<EmbeddingDistance.COSINE: 'cosine'>,
 <EmbeddingDistance.EUCLIDEAN: 'euclidean'>,
 <EmbeddingDistance.MANHATTAN: 'manhattan'>,
 <EmbeddingDistance.CHEBYSHEV: 'chebyshev'>,
 <EmbeddingDistance.HAMMING: 'hamming'>]

## 3. Use evaluator to calculate similarity

https://python.langchain.com/docs/guides/evaluation/string/embedding_distance

#### Pairwise distance evaluation

**NOTE:**

It returns the distance and not the score. Which means lower value = higher similarity

param distance_metric: EmbeddingDistance = EmbeddingDistance.COSINE




https://api.python.langchain.com/en/latest/evaluation/langchain.evaluation.embedding_distance.base.PairwiseEmbeddingDistanceEvalChain.html#

In [11]:
# https://api.python.langchain.com/en/stable/evaluation/langchain.evaluation.loading.load_evaluator.html#
# https://api.python.langchain.com/en/stable/evaluation/langchain.evaluation.schema.EvaluatorType.html#langchain.evaluation.schema.EvaluatorType

from langchain.evaluation import load_evaluator

pairwise_distance_evaluator = load_evaluator("pairwise_embedding_distance", embeddings=model)

type(pairwise_distance_evaluator)

langchain.evaluation.embedding_distance.base.PairwiseEmbeddingDistanceEvalChain

In [12]:
print(query)
print('------------')

distance_metric = EmbeddingDistance.COSINE

for str in test_docs:
    distance = pairwise_distance_evaluator.evaluate_string_pairs(prediction=query, prediction_b=str, distance_metric=distance_metric)
    distance = round(distance['score'],2)
    print('Distance = ', distance, '  -  ', str)

when was steam engine invented?
------------
Distance =  0.64   -   Industrial revolution changed the world around us
Distance =  0.85   -   The Renaissance was a period of cultural rebirth that emerged in Europe during the 14th to 17th centuries
