## Install All the Requirements


In [1]:
# !pip install langchain unstructured faiss-cpu pdf2image sentence_transformers openai tiktoken

## Define the OpenAI API Key

In [2]:
import os
os.environ['OPENAI_API_KEY'] = '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'

## Data Collection and Preparation
LangChain provides a feature where you can provide a list of urls and the content will be scraped to use as the source of our data. The collected data is then converted into embeddings and stores in a vector database.

We use ***OpenAI Embeddings*** and ***FAISS*** as vector database here.

#### URLs as the scource of our data

In [3]:
urls = ['https://www.opfanpage.com/top-5-zoro-roronoa-future-fights/',
        'https://www.opfanpage.com/all-12-confirmed-devil-fruit-users-of-the-blackbeard-pirates-crew/',
        'https://www.opfanpage.com/all-11-devil-fruit-users-who-have-died-in-one-piece-series-2/',
        'https://www.opfanpage.com/top-50-highest-bounties-ever-in-one-piece/']

#### Scraping the content of websites

In [4]:
from langchain.document_loaders import UnstructuredURLLoader

loaders = UnstructuredURLLoader(urls=urls)
data = loaders.load()

In [5]:
data


[Document(page_content='You are here:\n\nHome\n\nTOP TEN\n\nTop 5 Zoro’s Future Opponents\n\nTop 5 Zoro’s Future Opponents\n\nAdvertisements\n\nBefore he becomes the world’s greatest swordsman, Zoro needs to have a real sword fight. By that, I mean a fight determined by the quickness and subtlety of the blade. I understand that Haki attacks that cut ships and cliffs are cool. But what I would love to see is a fight where every stroke of the blade can dismember limbs or necks, where Zoro barely dodges a thrust and responds with a slice that his opponent paries at just the right time.\n\nAdvertisements\n\nThroughout the series, Zoro’s fights have demonstrated his strength, resolve, and power. His swordsmanship, however has only been demonstrated through these moments where he cuts through powerful bodies or attacks. Ironically, if his opponents will be as strong as he is, then the way to make the fights interesting is classical swordsmanship. Dual weild presents some good opportunities a

#### Splitting the data into smaller chunks

In [6]:
from langchain.text_splitter import CharacterTextSplitter
text_splitter = CharacterTextSplitter(separator=".",
                                      chunk_size=1000,
                                      chunk_overlap=50)
docs = text_splitter.split_documents(data)

In [7]:
len(docs)

29

#### Generating embeddings and storing it in a vector database

In [8]:
import pickle
import faiss
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings

In [9]:
embeddings = OpenAIEmbeddings()

In [10]:
db = FAISS.from_documents(docs, embeddings)

# Saving the embedding for future use
with open('one_piece_embeddings.pkl', 'wb') as f:
  pickle.dump(db, f)

## Model Building
The main concept in LangChain is to build a chain convinient for our task. Here, we have used the *RetrievalQAWithSourcesChain* as we want to retrieve answers from among our data and also the source of answer.
Again, OpenAI default LLM is used for generation of answer.

In [11]:
from langchain.chains import RetrievalQAWithSourcesChain
from langchain.chains.question_answering import load_qa_chain
from langchain import OpenAI

llm = OpenAI()

In [12]:
with open('one_piece_embeddings.pkl', 'rb') as f:
  db = pickle.load(f)

#### Building Chain

In [13]:
chain = RetrievalQAWithSourcesChain.from_llm(llm=llm, retriever=db.as_retriever())

#### Testing the model

In [14]:
chain({'question': 'Who will zoro fight in the future?'}, return_only_outputs=True)

{'answer': ' It is not known who Zoro will fight in the future. Possible opponents include Diamond Jozu, Fujitora, and a Gorosei member.\n',
 'sources': 'https://www.opfanpage.com/top-5-zoro-roronoa-future-fights/'}

In [15]:
chain({'question': 'Who have the highest bounty?'}, return_only_outputs=True)

{'answer': ' Gol D. Roger has the highest bounty with ฿5,564,800,000.\n',
 'sources': 'https://www.opfanpage.com/top-50-highest-bounties-ever-in-one-piece/'}