<a href="https://colab.research.google.com/github/rabbitmetrics/langchain-13-min/blob/main/notebooks/langchain-13-min.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
from dotenv import load_dotenv, find_dotenv
# import os

if load_dotenv(find_dotenv()):
    print("Environment variables loaded successfully!") 
    # print(os.getenv('OPENAI_API_KEY'))
else:
    print("Could not load environment variables.") 

Environment variables loaded successfully!


In [2]:
# import schema for chat messages and ChatOpenAI in order to query chatmodels GPT-3.5-turbo or GPT-4
from langchain.schema import AIMessage, HumanMessage, SystemMessage
from langchain.chat_models import ChatOpenAI

chat = ChatOpenAI(model_name="gpt-3.5-turbo") # can also pass through temperatue
messages = [
    SystemMessage(content="You are an expert computer scientist and developer."),
    HumanMessage(content="Explain the process of embeddings and vector databases, and their relationship in at least 500 words.")
]
response=chat(messages)
# chat(messages)
print(response.content,end='\n')

Embeddings and vector databases are two important concepts in the field of natural language processing (NLP) and machine learning. They are closely related and play a crucial role in various applications such as information retrieval, recommendation systems, and semantic search. In this explanation, we will discuss the process of embeddings and vector databases and their relationship.

1. Embeddings:
Embeddings are vector representations of words, sentences, or documents in a high-dimensional space. The goal of embeddings is to capture the semantic and syntactic relationships between different entities in a way that is mathematically meaningful and computationally efficient. Embeddings are learned through a process called word embedding or sentence embedding.

a. Word Embedding:
Word embedding is the process of representing words as continuous vectors in a high-dimensional space. It is typically done using neural networks, specifically a technique called word2vec. Word2vec models learn

In [3]:
response_str=response.content
response_str

'Embeddings and vector databases are two important concepts in the field of natural language processing (NLP) and machine learning. They are closely related and play a crucial role in various applications such as information retrieval, recommendation systems, and semantic search. In this explanation, we will discuss the process of embeddings and vector databases and their relationship.\n\n1. Embeddings:\nEmbeddings are vector representations of words, sentences, or documents in a high-dimensional space. The goal of embeddings is to capture the semantic and syntactic relationships between different entities in a way that is mathematically meaningful and computationally efficient. Embeddings are learned through a process called word embedding or sentence embedding.\n\na. Word Embedding:\nWord embedding is the process of representing words as continuous vectors in a high-dimensional space. It is typically done using neural networks, specifically a technique called word2vec. Word2vec model

In [5]:
# Import utility for splitting up texts and split up the explanation given above into document chunks
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=100,
    chunk_overlap=0,
)

texts = text_splitter.create_documents([response_str])
texts


[Document(page_content='Embeddings and vector databases are two important concepts in the field of natural language', metadata={}),
 Document(page_content='processing (NLP) and machine learning. They are closely related and play a crucial role in various', metadata={}),
 Document(page_content='applications such as information retrieval, recommendation systems, and semantic search. In this', metadata={}),
 Document(page_content='explanation, we will discuss the process of embeddings and vector databases and their relationship.', metadata={}),
 Document(page_content='1. Embeddings:', metadata={}),
 Document(page_content='Embeddings are vector representations of words, sentences, or documents in a high-dimensional', metadata={}),
 Document(page_content='space. The goal of embeddings is to capture the semantic and syntactic relationships between', metadata={}),
 Document(page_content='different entities in a way that is mathematically meaningful and computationally efficient.', metadata={}

In [6]:
# Individual text chunks can be accessed with "page_content"
# texts[0].page_content
for i in range(0, len(texts)):
    print(texts[i].page_content)

Embeddings and vector databases are two important concepts in the field of natural language
processing (NLP) and machine learning. They are closely related and play a crucial role in various
applications such as information retrieval, recommendation systems, and semantic search. In this
explanation, we will discuss the process of embeddings and vector databases and their relationship.
1. Embeddings:
Embeddings are vector representations of words, sentences, or documents in a high-dimensional
space. The goal of embeddings is to capture the semantic and syntactic relationships between
different entities in a way that is mathematically meaningful and computationally efficient.
Embeddings are learned through a process called word embedding or sentence embedding.
a. Word Embedding:
Word embedding is the process of representing words as continuous vectors in a high-dimensional
space. It is typically done using neural networks, specifically a technique called word2vec.
Word2vec models learn w

In [9]:
# Import and instantiate OpenAI embeddings
from langchain.embeddings import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(model_name="text-embedding-ada-002") 
embeddings

ValidationError: 1 validation error for OpenAIEmbeddings
model_name
  extra fields not permitted (type=value_error.extra)

In [25]:
# Turn the first text chunk into a vector with the embedding
query_result = embeddings.embed_query(texts[0].page_content)
# print(query_result)

# todo: avail functions for OpenAIEmbeddings? should this be embed_query, or document/search, etc. 

In [26]:
# Import and initialize Pinecone client
import os
import pinecone
from langchain.vectorstores import Pinecone
pinecone.init(
    api_key=os.getenv('PINECONE_API_KEY'),  
    environment=os.getenv('PINECONE_ENV')  
)

In [27]:
# Upload vectors to Pinecone
index_name = "learn-langchain"
search = Pinecone.from_documents(texts, embeddings, index_name=index_name)

In [28]:
# Do a simple vector similarity search
query = "What is the purpose of a vector?"
result = search.similarity_search(query)
result

[Document(page_content='context of a word and its actual context.', metadata={}),
 Document(page_content='detection, and search engines.', metadata={}),
 Document(page_content='approximate similarity search.', metadata={}),
 Document(page_content='embedding space.', metadata={})]