The source for this lab was from the LangChain teams updated documentation found here - https://python.langchain.com/docs/use_cases/question_answering/quickstart

This has been modified to support using google colab secret keys, and a section added showing connection into google drive.

Big thanks to the LangChain team for the doc updates. 0.1 ROCKS!!!!

# Quickstart

[![](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/colinmcnamara/austin_langchain/blob/main/labs/LangChain_103/ALC_Turbocharge_your_RAG_quickstart.ipynb)

## Setup

### Dependencies

We'll use an OpenAI chat model and embeddings and a Chroma vector store in this walkthrough, but everything shown here works with any [ChatModel](/docs/modules/model_io/chat/) or [LLM](/docs/modules/model_io/llms/), [Embeddings](/docs/modules/data_connection/text_embedding/), and [VectorStore](/docs/modules/data_connection/vectorstores/) or [Retriever](/docs/modules/data_connection/retrievers/).

We'll use the following packages:

In [None]:
%pip install --upgrade --quiet  langchain langchain-community langchainhub langchain-openai chromadb bs4 colab_env

  Preparing metadata (setup.py) ... [?25l[?25hdone
  Building wheel for colab_env (setup.py) ... [?25l[?25hdone


We need to get our keys from Google Colab
If you are using a local enviornment, you can uncomment the the dotenv statements

In [None]:
import getpass
import os
from google.colab import userdata
os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')

# Uncomment below if you want to enter the API key manually
#os.environ["OPENAI_API_KEY"] = getpass.getpass()

# Uncomment below if you want to use .env file
# import dotenv
# dotenv.load_dotenv()

### LangSmith

Many of the applications you build with LangChain will contain multiple steps with multiple invocations of LLM calls. As these applications get more and more complex, it becomes crucial to be able to inspect what exactly is going on inside your chain or agent. The best way to do this is with [LangSmith](https://smith.langchain.com).

Note that LangSmith is not needed, but it is helpful. If you do want to use LangSmith, after you sign up at the link above, make sure to set your environment variables to start logging traces:

In [None]:
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = userdata.get('LANGCHAIN_API_KEY')


## overview

In this guide we'll build a QA app that uses a RAG chain to answer questions. We'll pull the transcript from our LangChain 101 virtual edition and use it to answer questions.

We can create a simple indexing pipeline and RAG chain to do this in ~20 lines of code:

In [None]:
import bs4
from langchain import hub
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import Chroma
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain.document_loaders import TextLoader


In [None]:
from google.colab import drive
drive.mount('/content/gdrive')


Mounted at /content/gdrive


In [None]:
import os
import requests

# Path to your Google Drive directory
directory_path = '/content/gdrive/My Drive/austin_langchain_labs'

# Check if the directory exists within Google Drive
if not os.path.exists(directory_path):
    # If the directory does not exist, create it using the os.mkdir() as os.makedirs() might not work as expected
    os.mkdir(directory_path)
    print(f"Directory {directory_path} created")
else:
    print(f"Directory {directory_path} already exists")

# URL of the file to be downloaded
file_url = "https://github.com/colinmcnamara/austin_langchain/blob/main/resources/transcripts/langchain_101-Transcript.txt?raw=true"

# Filename to save the file as
filename = "langchain_101-Transcript.txt"

# Full path to save the file
file_path = os.path.join(directory_path, filename)

# Download the file
response = requests.get(file_url)

# Check if the request was successful
if response.status_code == 200:
    with open(file_path, 'wb') as file:
        file.write(response.content)
    print(f"File successfully saved to {file_path}")
else:
    print("Failed to download the file.")


In [None]:
# We load the text document from our Google Drive
loader = TextLoader("/content/gdrive/My Drive/austin_langchain_labs/langchain_101-Transcript.txt")

docs = loader.load()

# We split the document into smaller chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)

# We embed the text snippets using OpenAi's embedding model and stuff it into a vectorstore
vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())

# Retrieve and generate using the relevant snippets of the blog.
retriever = vectorstore.as_retriever()

# We pull our rag prompt from the hub
prompt = hub.pull("rlm/rag-prompt")

# We specify gpt-3.5-turbo as our model and set the temperature to 0 to get the most coherent response
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

# we define a function to format the documents
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)




In [None]:
# Here we define our langchain pipeline using LCEL - note how simple it is like the Unix command line 
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [None]:
rag_chain.invoke("What did Ricky Say about caching in streamlit?")

"Task decomposition is a technique used to break down complex tasks into smaller and simpler steps. It can be done through various methods such as using prompting techniques, task-specific instructions, or human inputs. The goal is to make the task more manageable and facilitate the interpretation of the model's thinking process."

In [None]:
# cleanup
vectorstore.delete_collection()

:::tip

Check out the [LangSmith trace](https://smith.langchain.com/public/1c6ca97e-445b-4d00-84b4-c7befcbc59fe/r)

:::