<a href="https://colab.research.google.com/github/adas754/generative-AI_class/blob/main/Pinecone_Demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Pinecone: https://www.pinecone.io/

In [137]:
!pip install langchain
!pip install pinecone-client==2.2.4
!pip install pypdf



In [138]:
!pip install sentence-transformers==2.2.2



In [139]:
!mkdir pdfs

mkdir: cannot create directory ‘pdfs’: File exists


## Extract the Text from the PDF's

In [140]:
from langchain.document_loaders import PyPDFDirectoryLoader

In [141]:
loader = PyPDFDirectoryLoader("pdfs")
data = loader.load()

In [142]:
data

[Document(page_content='See discussions, st ats, and author pr ofiles f or this public ation at : https://www .researchgate.ne t/public ation/351411017\nReal-T ime Object Detection Using YOLO: A Review\nPreprint  · May 2021\nDOI: 10.13140/RG.2.2.24367.66723\nCITATIONS\n17READS\n16,315\n2 author s:\nUpulie Handalag e\nUniv ersität des Saarlandes\n5 PUBLICA TIONS \xa0\xa0\xa020 CITATIONS \xa0\xa0\xa0\nSEE PROFILE\nLakshini K uganandamurthy\nSri Lank a Instit ute of Inf ormation T echnolog y\n3 PUBLICA TIONS \xa0\xa0\xa016 CITATIONS \xa0\xa0\xa0\nSEE PROFILE\nAll c ontent f ollo wing this p age was uplo aded b y Upulie Handalag e on 08 May 2021.\nThe user has r equest ed enhanc ement of the do wnlo aded file.', metadata={'source': 'pdfs/Real-TimeObjectDetectionusingYOLOAreview.pdf', 'page': 0}),
 Document(page_content="Real-Time Object Detection using YOLO: A review  \nUpulie H.D.I  \nIT18107074  \nSri Lanka Institute of Information Technology  \nMalabe, Sri Lanka  \nireshaupulie@gmail.co

## Chunkins

In [143]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [144]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=20)
text_chunks = text_splitter.split_documents(data)

In [145]:
len(text_chunks)

74

In [146]:
text_chunks[2]

Document(page_content='Real-Time Object Detection using YOLO: A review  \nUpulie H.D.I  \nIT18107074  \nSri Lanka Institute of Information Technology  \nMalabe, Sri Lanka  \nireshaupulie@gmail.com   \nLakshini Kuganandamurthy  \nIT17073592   \nSri Lanka Institute of Information Technology  \nMalabe, Sri Lanka  \nlakkuga@gmail.com  \n \nAbstract—With the availability of eno rmous amounts of data \nand the need to computerize visual -based systems, research on \nobject detection has been the focus for the past decade. This need', metadata={'source': 'pdfs/Real-TimeObjectDetectionusingYOLOAreview.pdf', 'page': 1})

## Embeddings

In [147]:
from langchain.embeddings import HuggingFaceEmbeddings

In [148]:
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

In [149]:
query_result = embeddings.embed_query("Hello World")

In [150]:
query_result

[-0.034477271139621735,
 0.0310231801122427,
 0.006734997965395451,
 0.026108959689736366,
 -0.03936203569173813,
 -0.16030244529247284,
 0.06692398339509964,
 -0.006441446021199226,
 -0.0474504791200161,
 0.014758843928575516,
 0.07087528705596924,
 0.055527616292238235,
 0.01919332519173622,
 -0.026251347735524178,
 -0.01010959129780531,
 -0.026940450072288513,
 0.02230745181441307,
 -0.022226683795452118,
 -0.1496926099061966,
 -0.01749301515519619,
 0.007676273118704557,
 0.0543522834777832,
 0.0032544205896556377,
 0.0317259207367897,
 -0.08462149649858475,
 -0.029405983164906502,
 0.051595594733953476,
 0.04812406376004219,
 -0.0033148264046758413,
 -0.058279186487197876,
 0.04196928068995476,
 0.02221069484949112,
 0.128188818693161,
 -0.02233893796801567,
 -0.011656254529953003,
 0.06292837858200073,
 -0.03287634626030922,
 -0.09122609347105026,
 -0.03117532841861248,
 0.05269954726099968,
 0.04703483358025551,
 -0.0842030718922615,
 -0.030056199058890343,
 -0.02074482850730419

In [151]:
print("Length", len(query_result))

Length 384


## Initializing the Pinecone

In [152]:
import os

In [153]:
PINECONE_API_KEY = os.environ.get('PINECONE_API_KEY', '')
PINECONE_API_ENV = os.environ.get('PINECONE_API_ENV', 'gcp-starter')

In [154]:
import pinecone
# initialize pinecone
pinecone.init(
    api_key=PINECONE_API_KEY,  # find at app.pinecone.io
    environment=PINECONE_API_ENV  # next to api key in console
)
index_name = "test" # put in the name of your pinecone index here


In [155]:
from langchain.vectorstores import Pinecone

In [156]:
docsearch = Pinecone.from_texts([t.page_content for t in text_chunks], embeddings, index_name=index_name)

## If you already have an index, you can load it like this

In [123]:
docsearch = Pinecone.from_existing_index(index_name, embeddings)
docsearch

<langchain_community.vectorstores.pinecone.Pinecone at 0x7c56743f3a00>

## Similarity Search

In [157]:
query = "What is yolo?"

In [158]:
docs = docsearch.similarity_search(query, k=3)

In [159]:
docs

[Document(page_content='algorithm only once to get the output, thus the name. Although \ncomparatively similar to R -CNN, YOLO practically runs a lot \nfaster than Faster R -CNN because of its simpler architecture. \nUnlike Faster R -CNN, YOLO can classify and perform bounding box regression at the same time. With YOLO, the \nclass label containing objects, their location can be predicted \nin one glance. Entirely devia ting from the typical CNN \npipeline, YOLO treats object detection as a regression'),
 Document(page_content="object detection. In comparison with R -CNN architectures, \nunlike running a classifier on a potential bounding box, then \nreevaluating probability scores, YOLO predicts bounding \nboxes and class probability for those boundi ng boxes \nsimultaneously. This optimizes the YOLO algorithm and is \none of the significant reasons why YOLO is so fast and less \nlikely to have errors to be utilizable for real -time object \npredictions.  \nYOLO's architecture is simi

In [160]:
len(docs)

3

In [161]:
!pip install openai



In [164]:
from langchain.llms import OpenAI

In [165]:
llm = OpenAI()

In [166]:
from langchain.chains import RetrievalQA

In [167]:
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=docsearch.as_retriever())

In [168]:
query = "What is yolo?"

In [169]:
print(qa.run(query))

 YOLO is a fast and efficient object detection algorithm that can classify and predict the location of objects in an image in one glance. It has a simpler architecture compared to other algorithms and is known for its speed and accuracy. However, it also has weaknesses, such as its limited ability to handle complex images.
