## Huggingface Embeddings Integration
This document provides a guide on how to integrate Huggingface embeddings into your project. Huggingface offers a wide range of pre-trained models that can be used for various natural language processing tasks, including generating embeddings.

In [1]:
import os
from dotenv import load_dotenv

load_dotenv()

True

In [2]:
os.environ['HF_TOKEN'] = os.getenv('HF_TOKEN')

### Sentence Transformers on Huggingface

Huggingface hosts a variety of Sentence Transformer models that can be used to generate embeddings for sentences, paragraphs, or documents. These embeddings can be utilized in various applications such as semantic search, clustering, and classification.
We have also added support for Huggingface embeddings in Langchain. You can use the `HuggingFaceEmbeddings` class to easily integrate these models into your Langchain applications.


In [3]:
from langchain_huggingface import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(model_name='all-MiniLM-L6-V2')

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


In [4]:
embeddings

HuggingFaceEmbeddings(model_name='all-MiniLM-L6-V2', cache_folder=None, model_kwargs={}, encode_kwargs={}, query_encode_kwargs={}, multi_process=False, show_progress=False)

In [5]:
text = "This is a normal test documents..."

result = embeddings.embed_query(text=text)
result

[0.0078102718107402325,
 0.11162975430488586,
 -0.011739488691091537,
 0.02422947622835636,
 0.022343087941408157,
 -0.045947231352329254,
 -0.07744567096233368,
 0.010366133414208889,
 -0.011178369633853436,
 0.020904934033751488,
 0.0827404111623764,
 0.011732976883649826,
 -0.025348693132400513,
 0.026740677654743195,
 -0.09461788833141327,
 -0.05451967567205429,
 -0.004301006905734539,
 -0.08009646832942963,
 -0.0007337604183703661,
 0.09989404678344727,
 0.005334663670510054,
 0.09380760043859482,
 0.0017471406608819962,
 -0.033042632043361664,
 0.01214210968464613,
 0.037623677402734756,
 -0.03393930941820145,
 -0.02432306297123432,
 0.05901329591870308,
 0.011137271299958229,
 0.045175015926361084,
 0.09230855107307434,
 0.07343574613332748,
 0.07064588367938995,
 0.10530608892440796,
 -0.05023163557052612,
 0.03389428183436394,
 0.032850075513124466,
 0.02319127880036831,
 0.010540527291595936,
 0.04934801161289215,
 -0.11632226407527924,
 -0.0017456774367019534,
 0.01205468922

In [6]:
len(result)

384

In [7]:
result = embeddings.embed_documents([text, 'This is not a good test document..'])

result

[[0.0078102718107402325,
  0.11162975430488586,
  -0.011739488691091537,
  0.02422947622835636,
  0.022343087941408157,
  -0.045947231352329254,
  -0.07744567096233368,
  0.010366133414208889,
  -0.011178369633853436,
  0.020904934033751488,
  0.0827404111623764,
  0.011732976883649826,
  -0.025348693132400513,
  0.026740677654743195,
  -0.09461788833141327,
  -0.05451967567205429,
  -0.004301006905734539,
  -0.08009646832942963,
  -0.0007337604183703661,
  0.09989404678344727,
  0.005334663670510054,
  0.09380760043859482,
  0.0017471406608819962,
  -0.033042632043361664,
  0.01214210968464613,
  0.037623677402734756,
  -0.03393930941820145,
  -0.02432306297123432,
  0.05901329591870308,
  0.011137271299958229,
  0.045175015926361084,
  0.09230855107307434,
  0.07343574613332748,
  0.07064588367938995,
  0.10530608892440796,
  -0.05023163557052612,
  0.03389428183436394,
  0.032850075513124466,
  0.02319127880036831,
  0.010540527291595936,
  0.04934801161289215,
  -0.1163222640752792

In [12]:
len(result[0]), len(result[1]), len(result)

(384, 384, 2)