# Getting started with langchain - question answering

This notebook walks through how to use LangChain for question answering over a given document. 

Large language models (LLMs) are emerging as a transformative technology, enabling developers to build applications that they previously could not. However, using these LLMs in isolation is often insufficient for creating a truly powerful app - the real power comes when you can combine them with other sources of computation or knowledge.

And that is what we will do here!

### Sources
- GitHub: https://github.com/hwchase17/langchain
- Documents: https://python.langchain.com/en/latest/use_cases/question_answering.html
- PyPi: https://pypi.org/project/langchain/

### Contents
0. Install packages
1. Prepare data
2. Search the document
2. Call the LLM to do generative AI

## 0. Install packages

In [1]:
!pip install langchain
!pip install openai



In [4]:
pip install langchain

Note: you may need to restart the kernel to use updated packages.


In [2]:
pip show langchain

Name: langchain
Version: 0.0.339
Summary: Building applications with LLMs through composability
Home-page: https://github.com/langchain-ai/langchain
Author: 
Author-email: 
License: MIT
Location: /Users/michielbontenbal/anaconda3/lib/python3.10/site-packages
Requires: aiohttp, anyio, async-timeout, dataclasses-json, jsonpatch, langsmith, numpy, pydantic, PyYAML, requests, SQLAlchemy, tenacity
Required-by: 
Note: you may need to restart the kernel to use updated packages.


In [4]:
pip install langchain --upgrade

Collecting langchain
  Downloading langchain-0.1.1-py3-none-any.whl.metadata (13 kB)
Collecting langchain-community<0.1,>=0.0.13 (from langchain)
  Downloading langchain_community-0.0.13-py3-none-any.whl.metadata (7.5 kB)
Collecting langchain-core<0.2,>=0.1.9 (from langchain)
  Downloading langchain_core-0.1.13-py3-none-any.whl.metadata (5.9 kB)
Collecting langsmith<0.1.0,>=0.0.77 (from langchain)
  Downloading langsmith-0.0.83-py3-none-any.whl.metadata (10 kB)
Collecting packaging<24.0,>=23.2 (from langchain-core<0.2,>=0.1.9->langchain)
  Using cached packaging-23.2-py3-none-any.whl.metadata (3.2 kB)
Downloading langchain-0.1.1-py3-none-any.whl (802 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m802.4/802.4 kB[0m [31m4.9 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hDownloading langchain_community-0.0.13-py3-none-any.whl (1.6 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m8.6 MB/s[0m eta [36m0:00:00[0m:00:01[0m00:

In [None]:
from langchain_community.llms import Ollama
llm= Ollama(model='llama2')
llm('Tell me about the history of AI')

  warn_deprecated(


## 1. Settings

In [2]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.docstore.document import Document
from langchain.prompts import PromptTemplate
from langchain.indexes.vectorstore import VectorstoreIndexCreator
import config

ImportError: cannot import name 'URL' from 'sqlalchemy' (/Users/michielbontenbal/anaconda3/lib/python3.10/site-packages/sqlalchemy/__init__.py)

In [None]:
#store your openai api key as an environment variable - as required by OpenAI (see previous tutorial)
import os
import config # I've created a config.py file that stores my password
os.environ['OPENAI_API_KEY'] = config.openai_key #but don't keep it in your source code :-)

## 2. Get the data and create embeddings

In [None]:
#we will use the wget library to download the file 
import wget
filename = wget.download('https://github.com/hwchase17/chat-your-data/blob/master/state_of_the_union.txt')
filename

In [None]:
#split the document into chunks
with open("state_of_the_union.txt") as f:
    state_of_the_union = f.read()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_text(state_of_the_union)

#create the embeddings
embeddings = OpenAIEmbeddings()

## 2. Search the document
A first step can be to search your document. This way we will find the location in the document where words of the query are mentioned. The result of this search is the relevant part of the document
(In the next step we will use this result and feed it to the LLM so it can generate a nice response.)

In [None]:
#first we will create docsearch variable. 
docsearch = Chroma.from_texts(texts, embeddings, metadatas=[{"source": str(i)} for i in range(len(texts))]).as_retriever()

In [None]:
#first tr
query = "What did the president say about Justice Breyer"
docs = docsearch.get_relevant_documents(query)

In [None]:
#print the best result
docs[0]

## 3. Call the LLM to generate an answer
If you just want to get started as quickly as possible, this is the fasted way to do it.

In [None]:
#import two more liberies
from langchain.chains.question_answering import load_qa_chain
from langchain.llms import OpenAI

In [None]:
#create the chain, give the pro
chain = load_qa_chain(OpenAI(temperature=0), chain_type="stuff")
#create your prompt
query = "What did the president say about Justice Breyer"
chain.run(input_documents=docs, question=query)

## Conclusion
That's it! We've taken a text document from the web and did document search with it. Also we used a Large Language Model to generate a good answer for it!