# Quickstart: QA with Astra and LangChain

### A question-answering demo using Astra DB and LangChain, powered by Vector Search

#### Pre-requisites:

You need a _Serverless C* with Vector Search_ database on [Astra DB](https://astra.datastax.com) to run this demo.

1. Find the corresponding [Database ID](https://awesome-astra.github.io/docs/pages/astra/faq/#where-should-i-find-a-database-identifier).
2. Get a [DB Token](https://awesome-astra.github.io/docs/pages/astra/create-token/#c-procedure) with role _Database Administrator_: in the following you'll provide the `"AstraCS:..."` credential string found in the token.
3. You need an [OpenAI API Key](https://cassio.org/start_here/#llm-access).

#### What you will do:

- Setup: import dependencies, provide secrets, create the LangChain vector store;
- Load data: you will populate the vector store with a number of "The Onion" headlines from a HuggingFace dataset;
- Run a Question-Answering loop retrieving the relevant headlines and having an LLM construct the answer.

Install the required dependencies:

In [1]:
!pip install -q cassio datasets langchain openai tiktoken

Import the packages you'll need:

In [2]:
# LangChain components to use
from langchain.vectorstores.cassandra import Cassandra
from langchain.indexes.vectorstore import VectorStoreIndexWrapper
from langchain.llms import OpenAI
from langchain.embeddings import OpenAIEmbeddings

# Support for dataset retrieval with Hugging Face
from datasets import load_dataset

# With CassIO, the engine powering the Astra DB integration in LangChain,
# you will also initialize the DB connection:
import cassio

### Setup

#### Provide your secrets:

Replace the following with your Astra DB connection details and your OpenAI API key:

In [3]:
ASTRA_DB_APPLICATION_TOKEN = "AstraCS:..." # enter the "AstraCS:..." string found in in your Token JSON file
ASTRA_DB_ID = "01234567-..." # enter your Database ID

OPENAI_API_KEY = "sk-..." # enter your OpenAI key

Initialize the connection to your database:

_(do not worry if you see a few warnings, it's just that the drivers are chatty about negotiating protocol versions with the DB.)_

In [4]:
cassio.init(token=ASTRA_DB_APPLICATION_TOKEN, database_id=ASTRA_DB_ID)

Create the LangChain embedding and LLM objects for later usage:

In [5]:
llm = OpenAI(openai_api_key=OPENAI_API_KEY)
embedding = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)

Create your LangChain vector store ... backed by Astra DB!

In [6]:
astra_vector_store = Cassandra(
    embedding=embedding,
    table_name="qa_mini_demo",
    session=None,
    keyspace=None,
)

### Load the dataset into the vector store

> Feel free to tweak the `NUM_HEADLINES` to a higher value for added fun (at a moderate additional expense in OpenAI usage and with a little more time to load)

In [7]:
NUM_HEADLINES = 100

print("Loading data from huggingface ... ", end="")
onion_dataset = load_dataset("Biddls/Onion_News", split="train")
headlines = onion_dataset["text"][:NUM_HEADLINES]
print("Done.")

print("\nGenerating embeddings and storing headlines in AstraDB ... ", end="")
astra_vector_store.add_texts(headlines)

print("Inserted %i headlines." % len(headlines))

astra_vector_index = VectorStoreIndexWrapper(vectorstore=astra_vector_store)

Loading data from huggingface ... Done.

Generating embeddings and storing headlines in AstraDB ... Inserted 100 headlines.


### Run the QA cycle

Simply run the cells and ask a question -- or `quit` to stop. (you can also stop execution with the "▪" button on the top toolbar)

Here are some suggested questions:
- _What are scientists doing with amoebas?_
- _Did ChatGPT take the bar exam?_
- _Are gas stoves a controversial item in a household?_

In [8]:
first_question = True
while True:
    if first_question:
        query_text = input("\nEnter your question (or type 'quit' to exit): ").strip()
    else:
        query_text = input("\nWhat's your next question (or type 'quit' to exit): ").strip()

    if query_text.lower() == "quit":
        break

    if query_text == "":
        continue
    
    first_question = False

    print("\nQUESTION: \"%s\"" % query_text)
    answer = astra_vector_index.query(query_text, llm=llm).strip()
    print("ANSWER: \"%s\"\n" % answer)

    print("FIRST DOCUMENTS BY RELEVANCE:")
    for doc, score in astra_vector_store.similarity_search_with_score(query_text, k=4):
        print("    [%0.4f] \"%s ...\"" % (score, doc.page_content[:84]))


Enter your question (or type 'quit' to exit):  What are scientists doing with amoebas?



QUESTION: "What are scientists doing with amoebas?"
ANSWER: "They are torturing them in an attempt to extract information on where life came from."

FIRST DOCUMENTS BY RELEVANCE:
    [0.9397] "Biologists Torture Amoeba For Information On Where Life Came From #~# CAMBRIDGE, MA— ..."
    [0.8773] "Expectant Couple Hoping For Human Baby #~# CONWAY, AR—Praying to be blessed with a c ..."
    [0.8772] "Dolphin Trained To Kill By U.S. Military In ’60s Now Lying Destitute In Street #~# S ..."
    [0.8740] "USDA Approves First Vaccine For Honeybees #~# The United States Department of Agricu ..."



What's your next question (or type 'quit' to exit):  Did ChatGPT take the bar exam?



QUESTION: "Did ChatGPT take the bar exam?"
ANSWER: "Yes, ChatGPT was reportedly forced to take the bar exam."

FIRST DOCUMENTS BY RELEVANCE:
    [0.9327] "ChatGPT Forced To Take Bar Exam Even Though Dream Was To Be AI Art Bot #~# MINNEAPOL ..."
    [0.9033] "What To Know About ChatGPT #~# The artificially intelligent chatbot ChatGPT has rece ..."
    [0.8993] "CEOs Explain How They Will Use ChatGPT #~# ChatGPT, an AI-based program that creates ..."
    [0.8708] "Alito, Thomas Share Laugh After Discovering They Both Leaked Dobbs Decision #~# WASH ..."



What's your next question (or type 'quit' to exit):  Are gas stoves a controversial item in a household?



QUESTION: "Are gas stoves a controversial item in a household?"
ANSWER: "Yes, gas stoves have become a controversial item in households due to recent suggestions by the Consumer Product Safety Commission that they could be banned."

FIRST DOCUMENTS BY RELEVANCE:
    [0.9431] "Conservatives Defend Their Right To Have Gas Stoves #~# Recently, a member of the Co ..."
    [0.9314] "Experts Warn Gas Stoves May Slowly Ingratiate Selves In Family To Kill And Take Plac ..."
    [0.8843] "Concerning Study Finds 1 In 10 Americans Lack Access To Adequate Food Eating Challen ..."
    [0.8722] "Parents Feel Safer Letting Kids Drink And Drive Under Their Roof #~# ASTORIA, OREGON ..."



What's your next question (or type 'quit' to exit):  quit
