# Project 1: Question-Answering

#### This OLP based code demonstrates the working of Question-Answering project by reading either a pdf file or a wikipedia link and subsequently answering questions on the same

In [2]:
import os
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv(), override=True)

True

In [2]:
pip install pypdf -q

Note: you may need to restart the kernel to use updated packages.


In [3]:
pip install wikipedia -q

Note: you may need to restart the kernel to use updated packages.


#### Functions to Read Documents

In [11]:
# Read PDF document
def read_PDF_document(file):
    from langchain.document_loaders import PyPDFLoader
    print(f'Reading {file} ...')
    loader = PyPDFLoader(file)
    prind('Done')                         
    data = loader.load()
    return data
                             
# Read wikipedia page and only 3 pages
def read_from_wikipedia(query, lang='en', load_max_docs=3):
    from langchain.document_loaders import WikipediaLoader
    loader = WikipediaLoader(query=query, lang=lang, load_max_docs=load_max_docs)
    data = loader.load()
    return data

#### Function to Chunk Data

In [15]:
def chunk_data(data, chunk_size=256):
    from langchain.text_splitter import RecursiveCharacterTextSplitter
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=0)
    chunks = text_splitter.split_documents(data)
    return chunks

#### Functions to Embed the data, Upload to a Vector Database and Delete the index

In [20]:
import pinecone
pinecone.init(api_key=os.environ.get('PINECONE_API_KEY'), environment=os.environ.get('PINECONE_ENV'))

def insert_or_fetch_embeddings(index_name, chunks):    
    from langchain.vectorstores import Pinecone
    from langchain.embeddings.openai import OpenAIEmbeddings
    
    embeddings = OpenAIEmbeddings()
    
    if index_name in pinecone.list_indexes():
        print(f'Index {index_name} already exists. Loading embeddings')
        vector_store = Pinecone.from_existing_index(index_name, embeddings)
        print('Done')
    else:
        print(f'Creating index {index_name} and embeddings')
        pinecone.create_index(index_name, dimension=1536, metric='cosine')
        vector_store = Pinecone.from_documents(chunks, embeddings, index_name=index_name)
        print('Done')
    return vector_store

def delete_pinecone_index(index_name='all'):    
    if index_name == 'all':
        indexes = pinecone.list_indexes()
        print('Deleting all indexes')
        for index in indexes:
            pinecone.delete_index(index)
        print('Done')
    else:
        print(f'Deleting index {index_name}')
        pinecone.delete_index(index_name)
        print('Done')

  from tqdm.autonotebook import tqdm


#### Function for asking questions and getting answers

In [26]:
def question_answer(vector_store, query):
    from langchain.chains import RetrievalQA
    from langchain.chat_models import ChatOpenAI

    llm = ChatOpenAI(model='gpt-3.5-turbo', temperature=1)

    retriever = vector_store.as_retriever(search_type='similarity', search_kwargs={'k': 3})

    chain = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever)
    
    answer = chain.run(query)
    return answer  

### Running Code

In [14]:
data = read_from_wikipedia('climate change')
print(f'You have {len(data)} pages in your data')

You have 3 pages in your data
There are 4000 characters in the page


In [21]:
print(data[0].page_content)

In common usage, climate change describes global warming—the ongoing increase in global average temperature—and its effects on Earth's climate system. Climate change in a broader sense also includes previous long-term changes to Earth's climate. The current rise in global average temperature is more rapid than previous changes, and is primarily caused by humans burning fossil fuels. Fossil fuel use, deforestation, and some agricultural and industrial practices increase greenhouse gases, notably carbon dioxide and methane. Greenhouse gases absorb some of the heat that the Earth radiates after it warms from sunlight. Larger amounts of these gases trap more heat in Earth's lower atmosphere, causing global warming.
Climate change is causing a range of increasing impacts on the environment. Deserts are expanding, while heat waves and wildfires are becoming more common. Amplified warming in the Arctic has contributed to melting permafrost, glacial retreat and sea ice loss. Higher temperature

In [22]:
chunks = chunk_data(data)
print(len(chunks))

57


In [24]:
index_name = 'langchain-pinecone'
vector_store = insert_or_fetch_embeddings(index_name, chunks)

Creating index langchain-pinecone and embeddings
Done


In [27]:
import time
i = 1
print('Write Quit or Exit to quit.')
while True:
    question = input(f'Question #{i}: ')
    i = i + 1
    if question.lower() in ['quit', 'exit']:
        print('Exiting!')
        time.sleep(2)
        break
    
    answer = question_answer(vector_store, question)
    print(f'\nAnswer: {answer}')
    print(f'\n {"-" * 100} \n')

    

Write Quit or Exit to quit.
Question #1: What is the document about?

Answer: I'm sorry, but without any specific document mentioned or provided, I cannot determine what the document is about. Could you please provide more information or context?

 ---------------------------------------------------------------------------------------------------- 

Question #2: What is climate change

Answer: In common usage, climate change refers to the long-term changes in Earth's climate system, particularly the ongoing increase in global average temperature. It includes the effects of greenhouse gas emissions on the Earth's atmosphere, such as global warming, but can also encompass other changes in climate patterns and phenomena. These can include shifts in precipitation patterns, more frequent and intense extreme weather events, rising sea levels, and the loss of biodiversity.

 ---------------------------------------------------------------------------------------------------- 

Question #3: Wha

In [28]:
delete_pinecone_index()

Deleting all indexes
Done
