In [10]:
import os

from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Pinecone
from langchain.document_loaders import TextLoader

import pinecone

## API Key Setup

Please ensure you have both an API key from OpenAI and from Pinecone. 
Please ensure that the API keys are never uploaded to GitHub. It is essential they be kept private. 

In [2]:
os.environ['OPENAI_API_KEY'] = 'KEY'
os.environ["PINECONE_API_KEY"] = 'KEY'
os.environ["PINECONE_ENV"] = 'KEY'

## Load the Sample Document

A text loader allows for a document to be loaded in as a Document, including all text and associated metadata.
Please refer to https://python.langchain.com/docs/modules/data_connection/document_loaders/ for more information on loading alternative document types such as PDF, JSON, Markdown, etc.

In [3]:
loader = TextLoader("../kb/sample.txt")
document = loader.load()
print(document)



## Chunk the Document
Chunk size is set to 1000 characters and there exists no chunk overlap. 
The output below, is composed of a list of chunks each tracking the associated metadata of each chunk.

In [4]:
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
doc = text_splitter.split_documents(document)
print(doc)



## Create OpenAI Embeddings

In [5]:
embeddings = OpenAIEmbeddings()

In [28]:
print(embeddings)

client=<class 'openai.api_resources.embedding.Embedding'> model='text-embedding-ada-002' deployment='text-embedding-ada-002' openai_api_version='' openai_api_base='' openai_api_type='' openai_proxy='' embedding_ctx_length=8191 openai_api_key='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' openai_organization='' allowed_special=set() disallowed_special='all' chunk_size=1000 max_retries=6 request_timeout=None headers=None tiktoken_model_name=None show_progress_bar=False model_kwargs={}


## Initialize Pinecone

In [9]:
pinecone.init(
    api_key=os.getenv("PINECONE_API_KEY"),  # find at app.pinecone.io
    environment=os.getenv("PINECONE_ENV"),  # next to api key in console
)

In [6]:
index_name = "demo"

First, check if our index already exists. If it doesn't, we create it. 
Note, this is the point at which the decision metric (Vector arithmetic operator) can be selected. 
In this case the cosine metric is used.

In [11]:
if index_name not in pinecone.list_indexes():
    pinecone.create_index(
        name=index_name,
        metric='cosine',
        dimension=1536  
)

The OpenAI embedding model `text-embedding-ada-002' uses 1536 dimensions

In [14]:
docsearch = Pinecone.from_documents(doc, embeddings, index_name=index_name)

Query the uploaded document using the Cosine similarity metric.

Alter the `[0]` to reflect the other document similarities. 

In [20]:
query = "What are the effects of Global Warming"
docs = docsearch.similarity_search(query)
print(docs[0].page_content)

The impacts of climate change are widespread and multifaceted. Rising global temperatures have led to melting ice caps and glaciers, causing sea levels to rise. This puts coastal communities at risk of flooding, threatens ecosystems, and poses challenges to food security. Heatwaves have become more frequent and intense, posing threats to human health, particularly among vulnerable populations.

Changes in precipitation patterns have led to altered water availability, affecting agriculture, water supply, and ecosystems. Extreme weather events, such as hurricanes, droughts, and heavy rainfall, have become more frequent and intense, causing devastating consequences for communities, infrastructure, and economies.


## Retreive Index Information

In [30]:
index = pinecone.Index("demo")

Display the vector statistics

In [31]:
index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 6e-05,
 'namespaces': {'': {'vector_count': 6}},
 'total_vector_count': 6}

Perform a simple query to showcase results in vector form

In [40]:
index.query(
    vector=[0]*1536,
    top_k=1,
    include_values=True,
    include_metadata=True)

{'matches': [{'id': 'eb3d51f0-1c1f-4c94-8f90-353597e06f0d',
              'metadata': {'source': '../kb/sample.txt',
                           'text': 'Adaptation strategies focus on preparing '
                                   'for and minimizing the impacts of climate '
                                   'change that are already underway. This '
                                   'involves building resilient '
                                   'systems for extreme weather events, and '
                                   'developing policies to safeguard '
                                   'vulnerable communities. Coastal regions, '
                                   'for instance, can employ strategies like '
                                   'building sea walls and elevating '
                                   'infrastructure to mitigate the impacts of '
                                   'rising sea levels.\n'
                                   '\n'
                         