## Different types of embeddings

### Exact Search - Basic Approach

In [3]:
def string_match(text, pattern):
    n = len(text)
    m = len(pattern)
    
    for i in range(n - m + 1):
        match = True
        for j in range(m):
            if text[i + j] != pattern[j]:
                match = False
                break
        if match:
            print(f"Pattern found at index {i}")

text = "ababcabc"
pattern = "abc"
string_match(text, pattern)

Pattern found at index 2
Pattern found at index 5


In [4]:
#! pip install scikit-learn

## Different techiniques

### 1) CountVectorizer
--------------------------
CountVectorizer is a text feature extraction tool in scikit-learn (sklearn) used to convert text documents into numerical vectors ‚Äî specifically, vectors of word counts.
* It counts how many times each word appears in a document.
* Each document (sentence, paragraph, etc.) becomes a vector of numbers.
* Each position in that vector corresponds to a specific word in the overall vocabulary.

In [6]:
## A simple model that represents text as a vector of word counts.

from sklearn.feature_extraction.text import CountVectorizer
corpus = ["Embeddings convert text into vectors.", "Vectors can be compared mathematically."]
vectorizer = CountVectorizer()

Example :  

Vocabulary is : ['be', 'can', 'compared', 'convert', 'embeddings', 'into', 'mathematically', 'text', 'vectors']

    Each vector is 9-dimensional (one dimension for each unique word)
    
    Values are word counts: 1 if word appears, 0 if not
    
    Limitation - Order doesn't matter: "cat dog" and "dog cat" would have identical vectors

In [7]:
print(vectorizer.fit_transform(corpus).toarray())

[[0 0 0 1 1 1 0 1 1]
 [1 1 1 0 0 0 1 0 1]]


### 2) TF-IDF

TF-IDF stands for

üëâ Term Frequency ‚Äì Inverse Document Frequency

It‚Äôs a way to measure how important a word is in a document, relative to all the other documents in a collection.
TF-IDF assigns weights to words based on two factors:

1. Term Frequency (TF): How often a word appears in a document
2. Inverse Document Frequency (IDF): How rare a word is across all documents

Limitation - Still no semantic understanding: "happy" vs "joyful" treated differently

In [8]:
from sklearn.feature_extraction.text import TfidfVectorizer
vectorizer = TfidfVectorizer()
tfidf_matrix = vectorizer.fit_transform(corpus)

print("TF-IDF Matrix:")
print(tfidf_matrix.toarray())
print("\n=== Dimension Analysis ===")
print(f"Number of documents: {tfidf_matrix.shape[0]}")
print(f"Vocabulary size (dimensions): {tfidf_matrix.shape[1]}")
print(f"Total elements in matrix: {tfidf_matrix.shape[0] * tfidf_matrix.shape[1]}")

TF-IDF Matrix:
[[0.         0.         0.         0.47107781 0.47107781 0.47107781
  0.         0.47107781 0.33517574]
 [0.47107781 0.47107781 0.47107781 0.         0.         0.
  0.47107781 0.         0.33517574]]

=== Dimension Analysis ===
Number of documents: 2
Vocabulary size (dimensions): 9
Total elements in matrix: 18


### 3)  Word2Vec
Word2Vec is a way to turn words into numbers, but unlike CountVectorizer or TF-IDF,
it doesn‚Äôt just count words ‚Äî it understands their meanings (to some extent).

It represents each word as a vector of numbers (like a list of 100 or 300 floating-point values),so that similar words have similar vectors.

* Dense Vectors (not sparse like TF-IDF)
* Words with similar contexts get similar vectors

In [12]:
! pip install gensim[fast]

Collecting gensim[fast]
  Downloading gensim-4.4.0-cp313-cp313-win_amd64.whl.metadata (8.6 kB)
Downloading gensim-4.4.0-cp313-cp313-win_amd64.whl (24.4 MB)
   ---------------------------------------- 0.0/24.4 MB ? eta -:--:--
   ---------------------------------------- 0.3/24.4 MB ? eta -:--:--
   -- ------------------------------------- 1.3/24.4 MB 4.8 MB/s eta 0:00:05
   ---- ----------------------------------- 2.6/24.4 MB 5.3 MB/s eta 0:00:05
   ------ --------------------------------- 3.7/24.4 MB 5.3 MB/s eta 0:00:04
   ------- -------------------------------- 4.7/24.4 MB 5.2 MB/s eta 0:00:04
   --------- ------------------------------ 5.5/24.4 MB 5.0 MB/s eta 0:00:04
   ----------- ---------------------------- 6.8/24.4 MB 5.2 MB/s eta 0:00:04
   ------------- -------------------------- 8.1/24.4 MB 5.4 MB/s eta 0:00:04
   --------------- ------------------------ 9.4/24.4 MB 5.5 MB/s eta 0:00:03
   ----------------- ---------------------- 10.7/24.4 MB 5.6 MB/s eta 0:00:03
   -------



In [13]:
#Word2Vec learns word associations from large text corpora, producing meaningful word embeddings.
from gensim.models import Word2Vec

# Example corpus (list of tokenized sentences)
sentences = [

    ["machine", "learning", "is", "fun"],
    ["deep", "learning", "drives", "artificial", "intelligence"],
    ["artificial", "intelligence", "is", "the", "future"],
    ["vector", "embeddings", "are", "useful"]
]
# Create and train the model 
# Model learns relationships between words from your sentences
model = Word2Vec(sentences, vector_size=50, window=3, min_count=1, sg=1)

# sg=1 ‚Üí Skip-gram model  -- 1 ; 0 = CBOW
# window ‚Üí how many context words to consider


vector_embedding = model.wv["vector"]

# Print the dimension
print(f"Embedding dimension: {len(vector_embedding)}")

# Also check the vector size from model parameters
print(f"Vector size from model: {model.vector_size}")
print('----------------------------')

print(model.wv["vector"])


Embedding dimension: 50
Vector size from model: 50
----------------------------
[ 0.00855287  0.00015212 -0.01916856 -0.01933109 -0.01229639 -0.00025714
  0.00399483  0.01886394  0.0111687  -0.00858139  0.00055663  0.00992872
  0.01539662 -0.00228845  0.00864684 -0.01162876 -0.00160838  0.0162001
 -0.00472013 -0.01932691  0.01155852 -0.00785964 -0.00244575  0.01996103
 -0.0045127  -0.00951413 -0.01065877  0.01396178 -0.01141774  0.00422733
 -0.01051132  0.01224143  0.00871461  0.00521271 -0.00298217 -0.00549213
  0.01798587  0.01043155 -0.00432504 -0.01894062 -0.0148521  -0.00212748
 -0.00158989 -0.00512582  0.01936544 -0.00091704  0.01174752 -0.01489517
 -0.00501215 -0.01109973]


In [14]:
import gensim.downloader as api
# Load pretrained model
model = api.load("word2vec-google-news-300")

# Find similar words to "eyes"
similar_words = model.most_similar("eyes", topn=10)
for word, score in similar_words:
    print(f"{word} ‚Üí {score:.3f}")

eye ‚Üí 0.645
Wretched_mortals_open ‚Üí 0.599
ears ‚Üí 0.597
gaze ‚Üí 0.591
Eyes ‚Üí 0.589
beady_eyes ‚Üí 0.565
lips ‚Üí 0.559
cheeks ‚Üí 0.559
eyelids ‚Üí 0.555
pupils_dilated ‚Üí 0.548


### 4)  Transformer based Embedding

#### Word Embeddings using Open AI
State-of-the-Art Embeddings
| Model                     | ~ Pages per Dollar | Performance (on MTEB eval) | Max Input |
|---------------------------|--------------------|---------------------|-----------|
| text-embedding-3-small    | 62,500             | 62.3%               | 8192      |
| text-embedding-3-large    | 9,615              | 64.6%               | 8192      |
| text-embedding-ada-002    | 12,500             | 61.0%               | 8192      |


In [15]:
from langchain_openai import OpenAIEmbeddings
from dotenv import load_dotenv
# Load environment variables from .env
load_dotenv()

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

In [16]:
import pandas as pd
words = pd.DataFrame({'text':
  [
      'sad',
      'unhappy',
      'tomato'
  ]})

In [17]:
doc_embeddings = embeddings.embed_documents(texts=list(words['text']))

print(f"Number of texts: {len(doc_embeddings)}")
print(f"Embedding dimension: {len(doc_embeddings[0])}")
print(words)

Number of texts: 3
Embedding dimension: 1536
      text
0      sad
1  unhappy
2   tomato


## Cosine Simililarity 
measures the similarity between two non-zero vectors by calculating the cosine of the angle between them. It ranges from -1 (exactly opposite) to 1 (exactly the same), with 0 indicating they are orthogonal or unrelated

In [18]:
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

# Convert list of lists ‚Üí NumPy array
matrix = np.array(doc_embeddings)

# Compute cosine similarity matrix

similarity_matrix = cosine_similarity(matrix)

import pandas as pd

df_sim = pd.DataFrame(
    similarity_matrix,
    index=words["text"],
    columns=words["text"]
)

print(df_sim.round(3))

text       sad  unhappy  tomato
text                           
sad      1.000    0.504   0.257
unhappy  0.504    1.000   0.183
tomato   0.257    0.183   1.000


#### Sentence Embedding with Transformer package
#Hugging Face sentence-transformers model all-MiniLM-L6-v2 to create sentence embeddings 

all-MiniLM-L6-v2 is a distilled model:

    Trained to mimic larger models' behavior
    Removes redundant dimensions while preserving semantic meaning
    More efficient without significant quality loss

#### Sentence Embedding using Langchain - HuggingFace

In [22]:
! pip install langchain_huggingface

Collecting langchain_huggingface
  Downloading langchain_huggingface-1.1.0-py3-none-any.whl.metadata (2.8 kB)
Downloading langchain_huggingface-1.1.0-py3-none-any.whl (29 kB)
Installing collected packages: langchain_huggingface
Successfully installed langchain_huggingface-1.1.0


In [23]:

# Create embeddings for some sample texts
texts = [
    "I love programming in Python",
    "Machine learning is fascinating",
    "Vector databases are efficient"
]


In [24]:
from langchain_huggingface import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

doc_embeddings = embeddings.embed_documents(texts=texts)

print(f"Number of texts: {len(doc_embeddings)}")
print(f"Embedding dimension: {len(doc_embeddings[0])}")

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Number of texts: 3
Embedding dimension: 384


In [25]:
text = "This is a blog post on vector embeddings."
embeddings_result = embeddings.embed_query(text)

print(f"Length: {len(embeddings_result)}")

Length: 384


| Function                         | Purpose                                                        | Typical Usage                                            | Expected Input  | Output                                           |
| -------------------------------- | -------------------------------------------------------------- | -------------------------------------------------------- | --------------- | ------------------------------------------------ |
| `embed_query(text)`              | Generates embedding for a **search query**                     | When you have a user query like ‚ÄúWhat is vector search?‚Äù | One string      | One vector (list of floats)                      |
| `embed_documents(list_of_texts)` | Generates embeddings for **your corpus / dataset / documents** | When you index multiple texts for retrieval              | List of strings | List of vectors (each corresponding to one text) |


#### Langchain Embeddings using OpenAI Model

In [26]:
from langchain_openai import OpenAIEmbeddings

In [27]:
# Sample documents
documents_a = [
    "How to train neural networks with PyTorch",
    "Guide to deep learning architectures",
    "Introduction to machine learning basics",
    "Building AI models with TensorFlow"
]

In [28]:
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

# Encode documents
doc_embeddings_a = embeddings.embed_documents(documents_a)

# Query
query_a = "How to build neural networks?"

query_embedding_a = embeddings.embed_query(query_a)

print(f"Length: {len(query_embedding_a)}")
print(f"Type: {type(query_embedding_a)}")

Length: 1536
Type: <class 'list'>


#### Compare how close two vectors
Cosine Similarity ‚Üí Measures angle (good for text)

Euclidean Distance ‚Üí Measures absolute distance (less common in NLP)

In [29]:
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.metrics.pairwise import euclidean_distances


print(cosine_similarity([query_embedding_a], doc_embeddings_a))
print(euclidean_distances([query_embedding_a], doc_embeddings_a)) # Euclidean distance is the straight-line distance between two points

[[0.63235925 0.5516323  0.42187681 0.51152483]]
[[0.85748557 0.94696111 1.07528899 0.988408  ]]


In [30]:
# Find most similar document
similarities = cosine_similarity([query_embedding_a], doc_embeddings_a)
most_similar_idx = np.argmax(similarities)

print(f"Query: {query_a}")
print(f"Most similar: {documents_a[most_similar_idx]}")
print(f"Similarity score: {similarities[0][most_similar_idx]:.3f}")

Query: How to build neural networks?
Most similar: How to train neural networks with PyTorch
Similarity score: 0.632


In [31]:


documents_b = [
    "What is the capital of USA?",
    "Who is the president of USA?",
    "Who is the Prime Minister of India?",
]

query_b = "Narendra Modi is the Prime Minister of India."
query_embedding_b = embeddings.embed_query(query_b)

doc_embeddings_b = embeddings.embed_documents(documents_b)


In [32]:
# Find most similar document
similarities = cosine_similarity([query_embedding_b], doc_embeddings_b)
most_similar_idx = np.argmax(similarities)

print(f"Query: {query_b}")
print(f"Most similar: {documents_b[most_similar_idx]}")
print(f"Similarity score: {similarities[0][most_similar_idx]:.3f}")

Query: Narendra Modi is the Prime Minister of India.
Most similar: Who is the Prime Minister of India?
Similarity score: 0.732


In [33]:
similarities

array([[0.13320951, 0.30048807, 0.73174497]])

### Vector Databases
A Complete Guide to Creating and Storing Vector Embeddings!A vector database stores and manages high-dimensional vector embeddings, which are numerical representations of data like text, images, or audio

Semantic Search using LangChain + Chromadb

In [37]:
import pandas as pd
import os
current_dir = os.path.dirname(os.path.abspath("__file__"))
file_path = os.path.join(current_dir, './data/wiki_clean.pkl')

wiki_articles = pd.read_pickle(file_path)
#wiki_articles

In [38]:

# ! pip install langchain_chroma

In [39]:

from langchain_chroma import Chroma
from langchain_core.documents import Document

vector_store = Chroma(
    collection_name="wikipedia_articles1",
    embedding_function=OpenAIEmbeddings(),
    collection_metadata={"hnsw:space": "cosine"}   #  default is  "l2" (Euclidean) distance.
)
#collection = client.create_collection(name="wikipedia_articles")

# Add each article to ChromaDB

ids=[]
documents =[]
MAX_CHARS = 4000  # a safe margin
for idx, article_text in enumerate(wiki_articles):
    if isinstance(article_text, str) and len(article_text.strip()) > 100:
        # Extract title from first few words
        first_words = ' '.join(article_text.split()[:3])
        title = f"Article_{idx}_{first_words}"
        document = Document(page_content=article_text[:MAX_CHARS], metadata={
                "title": title,
                "article_id": idx,
                "content_length": len(article_text),
                "word_count": len(article_text.split())
            })
        documents.append(document)
        ids.append(f"article_{idx}")
        
        print(f"‚úÖ Added article {idx}: {title}")

vector_store.add_documents(documents=documents, ids=ids)
print(f"\nüìö Total articles in vector store: {vector_store._collection.count()}")


‚úÖ Added article 0: Article_0_April April (Apr.)
‚úÖ Added article 1: Article_1_People's Republic of
‚úÖ Added article 2: Article_2_Scotland Scotland (,
‚úÖ Added article 3: Article_3_Tire A tire
‚úÖ Added article 4: Article_4_Mary Rose The
‚úÖ Added article 5: Article_5_Esteban Huertas Esteban
‚úÖ Added article 6: Article_6_Monaco Monaco, officially
‚úÖ Added article 7: Article_7_Nudity Nudity (or
‚úÖ Added article 8: Article_8_James A. Garfield
‚úÖ Added article 9: Article_9_Vaccine A vaccine
‚úÖ Added article 10: Article_10_U.S. 1st Infantry
‚úÖ Added article 11: Article_11_Maxwell's equations Maxwell's
‚úÖ Added article 12: Article_12_New York Rangers
‚úÖ Added article 13: Article_13_Krakatoa Krakatoa is
‚úÖ Added article 14: Article_14_Globalization Globalization is
‚úÖ Added article 15: Article_15_Albert Coady Wedemeyer
‚úÖ Added article 16: Article_16_History of India
‚úÖ Added article 17: Article_17_Bielefeld Bielefeld (;
‚úÖ Added article 18: Article_18_Serial ATA Serial
‚úÖ 

### Basic Query Operations

In [40]:

results = vector_store.similarity_search(
    query="Physics of radiation, types of radiation, and practical applications in medicine and technology", k=2)
print("Search results:", len(results))

for doc in results:
    print(f"* {doc.page_content} [{doc.metadata}]")


print('--------------------------------------------------------')

Search results: 2
* Radiation In physics, radiation is the emission or transmission of energy in the form of waves or particles through space or through a material medium. This includes: Radiation may also refer to the energy, waves, or particles being radiated. Originally, radiation waves do not contain particles as they are transferred to Earth by the Sun for example. Many people are already familiar with electromagnetic radiation (EMR), including light. The electromagnetic spectrum shows the types of radiation according to their wavelength and frequency. Some kinds are: Ionizing radiation is radiation that carries enough energy to free electrons from atoms or molecules. Only certain types of radiation are harmful to humans. For example, ultraviolet radiation can give people sunburns. X-rays and gamma rays can make a person sick, or even die, depending on the dose they get. Some types of particle radiation can also make people sick and lead to burns. If radiation does not carry high 

### Query 2:  with similarity score

In [41]:
print("\n" + "=" * 60)
print("üî¨ SCIENCE & TECHNOLOGY: RADIATION PHYSICS")
print("=" * 60)


results2 = vector_store.similarity_search_with_score(
    query="Physics of radiation, types of radiation, and practical applications in medicine and technology", k=2)
print("Search results:", len(results))

for doc, score in results2:
    print(f"Score: {score:.4f}")
    print(f"Text: {doc.page_content}")
    print(f"Metadata: {doc.metadata}\n")




üî¨ SCIENCE & TECHNOLOGY: RADIATION PHYSICS
Search results: 2
Score: 0.1484
Text: Radiation In physics, radiation is the emission or transmission of energy in the form of waves or particles through space or through a material medium. This includes: Radiation may also refer to the energy, waves, or particles being radiated. Originally, radiation waves do not contain particles as they are transferred to Earth by the Sun for example. Many people are already familiar with electromagnetic radiation (EMR), including light. The electromagnetic spectrum shows the types of radiation according to their wavelength and frequency. Some kinds are: Ionizing radiation is radiation that carries enough energy to free electrons from atoms or molecules. Only certain types of radiation are harmful to humans. For example, ultraviolet radiation can give people sunburns. X-rays and gamma rays can make a person sick, or even die, depending on the dose they get. Some types of particle radiation can also make 

#### If your collection was created with: collection_metadata={"hnsw:space": "cosine"}  Then:

* The score = cosine distance (not cosine similarity).
* Again, lower = more similar, because:
* distance=1‚àícosine similarity

* A cosine similarity of 0.9 ‚Üí distance 0.1 ‚Üí top of list.
* A cosine similarity of 0.5 ‚Üí distance 0.5 ‚Üí lower in list

In [42]:
# Only retrieve documents that have a relevance score
# Above a certain threshold
retriever = vector_store.as_retriever(
    search_type="similarity_score_threshold",
    search_kwargs={"score_threshold": 0.2},
)

retriever.invoke("Physics of radiation, types of radiation, and practical applications in medicine and technology")

[Document(id='article_24', metadata={'content_length': 1039818, 'title': 'Article_24_Radiation In physics,', 'article_id': 24, 'word_count': 177251}, page_content='Radiation In physics, radiation is the emission or transmission of energy in the form of waves or particles through space or through a material medium. This includes: Radiation may also refer to the energy, waves, or particles being radiated. Originally, radiation waves do not contain particles as they are transferred to Earth by the Sun for example. Many people are already familiar with electromagnetic radiation (EMR), including light. The electromagnetic spectrum shows the types of radiation according to their wavelength and frequency. Some kinds are: Ionizing radiation is radiation that carries enough energy to free electrons from atoms or molecules. Only certain types of radiation are harmful to humans. For example, ultraviolet radiation can give people sunburns. X-rays and gamma rays can make a person sick, or even die,

#### ‚öôÔ∏è MMR (Maximal Marginal Relevance)

MMR = balance between relevance and diversity.

It‚Äôs designed to avoid retrieving redundant chunks that are all similar to each other.

Instead of just picking top-k most similar results (which might overlap),
it tries to pick results that are: 

Highly relevant to the query, and As different as possible from each other.

| Parameter               | Description                                                                                                                                   |
| ----------------------- | --------------------------------------------------------------------------------------------------------------------------------------------- |
| **`search_type="mmr"`** | Use *Maximal Marginal Relevance* search instead of plain similarity                                                                           |
| **`fetch_k=10`**        | Fetch 10 most relevant docs first (raw similarity search)                                                                                     |
| **`k=2`**               | Then return top 2 *diverse* ones (after MMR filtering)                                                                                        |
| **`lambda_mult=0.5`**   | Controls trade-off:<br> ‚Ä¢ `1.0` ‚Üí prioritize relevance (standard similarity search)<br> ‚Ä¢ `0.0` ‚Üí prioritize diversity<br> ‚Ä¢ `0.5` ‚Üí balanced |


In [43]:
retriever = vector_store.as_retriever(
    search_type="mmr",
    search_kwargs={"k": 2, "fetch_k": 10, "lambda_mult": 0.5},
)
retriever.invoke("Physics of radiation, types of radiation, and practical applications in medicine and technology")

[Document(id='article_24', metadata={'title': 'Article_24_Radiation In physics,', 'article_id': 24, 'content_length': 1039818, 'word_count': 177251}, page_content='Radiation In physics, radiation is the emission or transmission of energy in the form of waves or particles through space or through a material medium. This includes: Radiation may also refer to the energy, waves, or particles being radiated. Originally, radiation waves do not contain particles as they are transferred to Earth by the Sun for example. Many people are already familiar with electromagnetic radiation (EMR), including light. The electromagnetic spectrum shows the types of radiation according to their wavelength and frequency. Some kinds are: Ionizing radiation is radiation that carries enough energy to free electrons from atoms or molecules. Only certain types of radiation are harmful to humans. For example, ultraviolet radiation can give people sunburns. X-rays and gamma rays can make a person sick, or even die,

### filter condition

In [44]:
results = vector_store.similarity_search_with_score(
    query="radiation",
    k=3,
    filter={"title":"Article_24_Radiation In physics," }
)

In [45]:
for doc,score in results:
    print(f"Text: {doc.page_content}")
    print(f"Metadata: {doc.metadata}\n")

Text: Radiation In physics, radiation is the emission or transmission of energy in the form of waves or particles through space or through a material medium. This includes: Radiation may also refer to the energy, waves, or particles being radiated. Originally, radiation waves do not contain particles as they are transferred to Earth by the Sun for example. Many people are already familiar with electromagnetic radiation (EMR), including light. The electromagnetic spectrum shows the types of radiation according to their wavelength and frequency. Some kinds are: Ionizing radiation is radiation that carries enough energy to free electrons from atoms or molecules. Only certain types of radiation are harmful to humans. For example, ultraviolet radiation can give people sunburns. X-rays and gamma rays can make a person sick, or even die, depending on the dose they get. Some types of particle radiation can also make people sick and lead to burns. If radiation does not carry high enough levels 

https://reference.langchain.com/python/integrations/langchain_chroma/?_gl=1*18d844w*_gcl_au*MTkxODA4NzY1My4xNzYyODc3Mjc3*_ga*MTI3NjE2NjY3Ny4xNzYyODc3Mjc4*_ga_47WX3HKKY2*czE3NjI4NzcyNzckbzEkZzEkdDE3NjI4Nzg0MTgkajYwJGwwJGgw