# Using Pinecone

The goal of this colab is to use the vector database pinecone to create embeddings of new documents, and store them, create a doc searcher, and use that to talk to documents.

### install libraries
This version uses llama_hubs vectore store and default query engine modules.
the underlying model that is called is openai's gpt3.5 model but can be changed to anything else

In [None]:
%pip install llama-index=="0.8.41"
%pip install llama_hub=="0.0.36"
%pip install youtube_transcript_api=="0.6.1"
%pip install pinecone-client=="2.2.4"


# Create a Pinecone Account and Index

### To proceed we need some initial steps for pinecone:

- Register an account at pinecone.io
- Configure and create an index according to your [needs](https://docs.pinecone.io/docs/choosing-index-type-and-size#overview) 

For this demo I have used my account with the starter plan which should be sufficient for the amount of documents we need to index

![](images/pinecone-create-index.png)

- finally wait for the index to be initialized and be ready

![](images/pinecone-index-view.png)

### configuration of modules

- logging and required api keys

In [None]:
import os
import logging
import sys
import getpass

logging.basicConfig(stream=sys.stdout, level=logging.WARN)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")
os.environ["PINECONE_API_KEY"] = getpass.getpass("Pinecone API Key:")

### download youtube video's transcripts

- using llama_hub's [youtube transcript loader] (https://llamahub.ai/l/youtube_transcript)

In [None]:
from llama_hub.youtube_transcript import YoutubeTranscriptReader

loader = YoutubeTranscriptReader()
documents = loader.load_data(ytlinks=['https://www.youtube.com/watch?v=VH8lEOmesas','https://www.youtube.com/watch?v=ZSEnQ00iUgI'])


### index in pinecone with embeddings

- using llama_hubs vectorstore wrapper

In [None]:
import pinecone
from llama_index import VectorStoreIndex, StorageContext
from llama_index.vector_stores import PineconeVectorStore

pinecone_api_key = os.environ["PINECONE_API_KEY"]
pinecone.init(api_key=pinecone_api_key, environment="gcp-starter")
pinecone_index = pinecone.Index("playground")
vector_store = PineconeVectorStore(pinecone_index=pinecone_index)

index = VectorStoreIndex.from_documents(
    documents=documents,
    storage_context=StorageContext.from_defaults(vector_store=vector_store)
)


# use query engine to talk to docs

this uses the vector store wrapper from [llama_index](https://gpt-index.readthedocs.io/en/v0.6.5/how_to/integrations/vector_stores.html)

note: indexing might take a while (a couple of seconds) on pinecone.io (since its cloud based and its starter plan) and you should wait a bit before firing the question

In [12]:
import textwrap

query_engine = index.as_query_engine()
question = "I still run old Softare and Services in my company. Who you gonna call?"         
response = query_engine.query(question)

wrapped_lines = textwrap.wrap(response.response, 79, break_long_words=False)

for line in wrapped_lines:
  print(line)

You can contact the Deep Shore Cloud Archive with Microservices for assistance
with running old software and services in your company. They offer a cloud-
based archiving solution that is highly available, always-on, and freely
scalable. Their solution is designed to provide unlimited access to company
data, transactions, and documents, while ensuring full regulatory compliance
and security. Additionally, they offer blockchain-based cloud solutions that
can revolutionize your digital archiving by providing incorruptible bookkeeping
of digital transactions.
