# Using llama-parse with AstraDB

In this notebook, we show a basic RAG-style example that uses `llama-parse` to parse a PDF document, store the corresponding document into a vector store (`AstraDB`) and finally, perform some basic queries against that store. The notebook is modeled after the quick start notebooks and hence is meant as a way of getting started with `llama-parse`, backed by a vector database.

### Requirements

In [None]:
# First, install the required dependencies
!pip install --quiet ragstack-ai

### Configuration

In [None]:
import os
from getpass import getpass

os.environ["LLAMA_CLOUD_API_KEY"] = getpass("Enter your Llama Index Cloud API Key:")
os.environ["ASTRA_DB_ENDPOINT"] = input("Enter you Astra DB API Endpoint: ")
os.environ["ASTRA_DB_TOKEN"] = getpass("Enter you Astra DB Token: ")
os.environ["OPEN_AI_KEY"] = getpass("Enter your OpenAI API Key: ")

In [None]:
# llama-parse is async-first, running the sync code in a notebook requires the use of nest_asyncio
import nest_asyncio

nest_asyncio.apply()

### Using llama-parse to parse a PDF

In [None]:
# Grab a PDF from Arxiv for indexing
import requests 

# The URL of the file you want to download
url = "https://arxiv.org/pdf/1706.03762.pdf"
# The local path where you want to save the file
file_path = "./attention.pdf"

# Perform the HTTP request
response = requests.get(url)

# Check if the request was successful
if response.status_code == 200:
    # Open the file in binary write mode and save the content
    with open(file_path, "wb") as file:
        file.write(response.content)
    print("Download complete.")
else:
    print("Error downloading the file.")

In [None]:
from llama_parse import LlamaParse

documents = LlamaParse(result_type="text", verbose=True).load_data("./attention.pdf")

In [None]:
# Take a quick look at some of the parsed text from the document:
documents[0].get_content()[10000:11000]

### Storing into Astra DB

In [None]:
import os
from llama_index.vector_stores.astra import AstraDBVectorStore

astra_db_store = AstraDBVectorStore(
    token=os.environ["ASTRA_DB_TOKEN"],
    api_endpoint=os.environ["ASTRA_DB_ENDPOINT"],
    collection_name="astra_v_table_llamaparse",
    embedding_dimension=1536
)

In [None]:
from llama_index.core.node_parser import SimpleNodeParser

node_parser = SimpleNodeParser()

nodes = node_parser.get_nodes_from_documents(documents)

In [None]:
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import VectorStoreIndex, StorageContext

storage_context = StorageContext.from_defaults(vector_store=astra_db_store)

index = VectorStoreIndex(
    nodes=nodes,
    storage_context=storage_context,
    embed_model=OpenAIEmbedding(api_key=os.environ["OPEN_AI_KEY"]),
)

### Simple RAG Example

In [None]:
query_engine = index.as_query_engine(similarity_top_k=15)

In [None]:
query = "What is Multi-Head Attention also known as?"

response_1 = query_engine.query(query)
print("\n***********New LlamaParse+ Basic Query Engine***********")
print(response_1)

In [None]:
# Take a look at one of the source nodes from the response
response_1.source_nodes[0].get_content()