In [7]:
!pip -q install langchain langchain-core langchain-community faiss-cpu tiktoken langchain-together sentence_transformers gradio -qqq

In [3]:
from google.colab import userdata

import textwrap
import os

os.environ["TOGETHER_API_KEY"] = userdata.get('TOGETHER_API_KEY')

## Q&A-Retreiver
 This tutorial provides an overview of creating a question and answer (Q&A) retrieval system using various libraries in a notebook setting. Here are the main components and their functions:

1. Installing required libraries: The code installs several Python packages, including `langchain`, `langchain-core`, `langchain-community`, `faiss-cpu`, `tiktoken`, `sentence_transformers`, and `Gradio`. These libraries are used to build the Q&A system.

2. Loading user API key: The script retrieves an API key from the notebook's user data, which is likely needed for some external services like Together AI.

3. Setting up the Q&A retriever: The tutorial demonstrates creating a question-answering system using different embedding models and vector databases. Here, `HuggingFaceEmbeddings` from the `langchain-community` library is used with the `intfloat/multilingual-e5-small` model for text embeddings. The embedded documents are stored in a FAISS (FAST AI Similarity Search) vector store.

4. Loading and splitting data: The code loads text content from either a web page or a local file using `WebBaseLoader` or `TextLoader`. The loaded text is then split into smaller chunks using the `RecursiveCharacterTextSplitter` for efficient indexing.

5. Creating an example data set with metadata: The split documents are used to create a vector store, which serves as the searchable database for the Q&A system.

6. Building a retriever: A retriever is created from the FAISS vector store, allowing it to search and retrieve relevant documents based on user queries. In this case, the `search_kwargs` parameter specifies that three most similar documents should be returned for each query.

7. Integrating a language model: The tutorial uses Together AI's Open Hermes model (`teknium/OpenHermes-2p5-Mistral-7B`) to generate answers based on the retrieved context. The `Together` class from the `langchain-together` library is used for this purpose.

8. Creating a chain: A pipeline, or "chain," is created by combining the retriever, prompt template, language model, and output parser. This chain allows the system to take a question as input and return an answer based on the retrieved context.

9. Invoking the chain: Finally, the chain is invoked with a sample question ("How did they start Instagram?") to demonstrate how the Q&A system works. The answer is generated by combining the relevant documents retrieved from the vector store and using the language model to generate an appropriate response.


In [4]:
#from langchain.embeddings.openai import OpenAIEmbeddings
#from langchain_together.embeddings import TogetherEmbeddings
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS

#embeddings = OpenAIEmbeddings()
#embeddings = TogetherEmbeddings(model="togethercomputer/m2-bert-80M-2k-retrieval")
embeddings = HuggingFaceEmbeddings(model_name="intfloat/multilingual-e5-small")

from operator import itemgetter

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnableLambda, RunnablePassthrough

In [5]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [6]:
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.document_loaders import TextLoader

In [7]:
import bs4

In [28]:
# Load, chunk and index the contents of the blog.
#loader = WebBaseLoader(
#    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
#    bs_kwargs=dict(
#        parse_only=bs4.SoupStrainer(
#            class_=("post-content", "post-title", "post-header")
#        )
#    ),
#)

loader = TextLoader("/content/Instagram_ Kevin Systrom & Mike Krieger-transcript.txt")

docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)

## Example data with metadata attached

In [27]:
# get the wines in the store
vectorstore = FAISS.from_documents(splits, embeddings)

## Creating our retriever

In [29]:
from langchain_together import Together

In [30]:
llm = Together(
    model="teknium/OpenHermes-2p5-Mistral-7B",
    temperature=0.3,
    max_tokens=256,
    top_k=50,
    # together_api_key="..."
)

retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

In [31]:
retriever.get_relevant_documents('how did they start?')

[Document(page_content="Huge lesson, number one, people read the docs that, you know, you put out and you should care about them as a founder. So your terms of service, basically what happened? We had these terms of service that we had, I think effectively copied from some other site way back in the day and just find and replace their name with Instagram because we were a startup and we didn't know what we were doing, but we eventually got to a place where when we joined Facebook, Facebook was like, Hey, you actually", metadata={'source': '/content/Instagram_ Kevin Systrom & Mike Krieger-transcript.txt'}),
 Document(page_content='0 (12m 3s):\nHow did people even find out about it Instagram? Like how did they know that they should download this like this app?', metadata={'source': '/content/Instagram_ Kevin Systrom & Mike Krieger-transcript.txt'}),
 Document(page_content="That? No, I mean, a couple of things that showed how naive we were at the time, number one midnight in San Francisco

In [32]:
template = """Answer the question based only on the following context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

In [33]:
chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [34]:
chain.invoke("How did they start Instagram?")

'Answer: Kevin Systrom and Mike Krieger started Instagram by sketching out the idea and launching it in the Apple store. It took them eight weeks from the time they decided to move away from bourbon to the launch of Instagram on October 10th, 2010.'