In [25]:
from dotenv import load_dotenv
import os
from llama_index.core import (
    VectorStoreIndex,
    SimpleDirectoryReader,
    StorageContext,
)
from llama_index.embeddings.ollama import OllamaEmbedding
from llama_index.vector_stores.astra_db import AstraDBVectorStore

from llama_index.llms.groq import Groq

In [26]:

load_dotenv()

astra_api_endpoint = os.getenv('ASTRA_DB_ENDPOINT')
astra_db_token = os.getenv('ASTRA_DB_TOKEN')
groq_api_token = os.getenv('GROQ_API_TOKEN')

# Validate that all required environment variables are present
if not all([astra_api_endpoint, astra_db_token, groq_api_token]):
    raise ValueError("Missing required environment variables. Please check your .env file")

In [27]:
# load documents
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
print(f"Total documents: {len(documents)}")
print(f"First document, id: {documents[0].doc_id}")
print(f"First document, hash: {documents[0].hash}")
print(
    "First document, text"
    f" ({len(documents[0].text)} characters):\n{'='*20}\n{documents[0].text[:360]} ..."
)

Total documents: 1
First document, id: 9e2f719c-ce9e-4b32-836f-68c9ed89d4d3
First document, hash: 0c3c3f46cac874b495d944dfc4b920f6b68817dbbb1699ecc955d1fafb2bf87b
First document, text (75011 characters):
What I Worked On

February 2021

Before college the two main things I worked on, outside of school, were writing and programming. I didn't write essays. I wrote what beginning writers were supposed to write then, and probably still are: short stories. My stories were awful. They had hardly any plot, just characters with strong feelings, which I imagined made ...


In [28]:
astra_db_store = AstraDBVectorStore(
    token=astra_db_token,
    api_endpoint=astra_api_endpoint,
    collection_name="test_table",
    embedding_dimension=768,
)

In [29]:
embed_model = OllamaEmbedding(model_name="nomic-embed-text")

storage_context = StorageContext.from_defaults(vector_store=astra_db_store)

index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context, embed_model=embed_model
)

In [30]:

llm = Groq(model="deepseek-r1-distill-llama-70b", api_key=groq_api_token)

query_engine = index.as_query_engine(llm=llm)
response = query_engine.query("Why did the author choose to work on AI?")

print(response.response)

<think>
Okay, so I need to figure out why the author chose to work on AI based on the given context. Let me read through the context again to pick up the relevant details.

The author mentions that when they were 18, they kept taking philosophy courses but found them boring. So they decided to switch to AI. That's one reason—philosophy wasn't holding their interest, and AI seemed more engaging.

Then, the author talks about two specific influences: a novel by Heinlein called "The Moon is a Harsh Mistress" and a PBS documentary featuring Terry Winograd using SHRDLU. The novel had an intelligent computer named Mike, which really captivated the author. The documentary showed SHRDLU in action, making the author think that AI was just a few years away from achieving something like Mike. This made the author excited and motivated to work on AI.

Additionally, the author was drawn to the potential of AI because it seemed like the future. They started teaching themselves AI by learning Lisp, w