In [1]:
import sys
import os 
import nest_asyncio

# Sanity check
print(sys.executable)
nest_asyncio.apply()

from dotenv import load_dotenv
load_dotenv() 

/Users/amorvan/Documents/code_dw/llm_collection/.venv/bin/python


True

In [2]:
import os
from pydantic import BaseModel, Field
from llama_index.core.workflow import (
    Workflow,
    step,
    Event,
    Context,
    StartEvent,
    StopEvent
)
from llama_index.llms.openai import OpenAI
from llama_index.core import SimpleDirectoryReader
from llama_index.core import (
    SimpleDirectoryReader,
    load_index_from_storage,
    VectorStoreIndex,
    StorageContext,
)
from llama_index.core.node_parser import SentenceSplitter
from llama_index.retrievers.bm25 import BM25Retriever
import Stemmer
from llama_index.core import VectorStoreIndex, get_response_synthesizer
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.query_engine import RetrieverQueryEngine


In [3]:
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O './paul_graham_essay.txt'

zsh:1: command not found: wget


In [4]:
llm = OpenAI(temperature=0.2, model="gpt-4o-mini")

## 1 - RAG 

Using the BM25 retriever system 

In [5]:

documents = SimpleDirectoryReader(
    input_files=["./paul_graham_essay.txt"],
).load_data()
splitter = SentenceSplitter(chunk_size=256)
nodes = splitter.get_nodes_from_documents(documents)
retriever_top_5 = BM25Retriever.from_defaults(
    nodes=nodes,
    similarity_top_k=5,
    stemmer=Stemmer.Stemmer("english"),
    language="english",
)

In [6]:
rez = retriever_top_5.retrieve("computer")

print(rez[0])

Node ID: e03a1011-17cb-49a7-997c-ca34b209cf8a
Text: So I'm not surprised I can't remember any programs I wrote,
because they can't have done much. My clearest memory is of the moment
I learned it was possible for programs not to terminate, when one of
mine didn't. On a machine without time-sharing, this was a social as
well as a technical error, as the data center manager's expression
made clear....
Score:  1.289



In [7]:
print(rez[1].text)

It was meant to be a formal model of computation, an alternative to the Turing machine. If you want to write an interpreter for a language in itself, what's the minimum set of predefined operators you need? The Lisp that John McCarthy invented, or more accurately discovered, is an answer to that question. [19]

McCarthy didn't realize this Lisp could even be used to program computers till his grad student Steve Russell suggested it. Russell translated McCarthy's interpreter into IBM 704 machine language, and from that point Lisp started also to be a programming language in the ordinary sense. But its origins as a model of computation gave it a power and elegance that other languages couldn't match. It was this that attracted me in college, though I didn't understand why at the time.

McCarthy's 1960 Lisp did nothing more than interpret Lisp expressions. It was missing a lot of things you'd want in a programming language. So these had to be added, and when they were, they weren't define

In [8]:
# configure response synthesizer
response_synthesizer = get_response_synthesizer(
    response_mode="tree_summarize",
)

# assemble query engine
query_engine = RetrieverQueryEngine(
    retriever=retriever_top_5,
    response_synthesizer=response_synthesizer,
)
response = query_engine.query("Who is Paul Graham.")

In [9]:
response.response

'Paul Graham is a programmer and entrepreneur who worked intensively on a project called Bel, which involved writing code for an interpreter. He had to ban himself from writing essays during this time to focus on completing the project. Additionally, he is known for creating the Summer Founders Program, which attracted a significant number of applications from undergraduates and recent graduates.'

## 2 - Exercise : 

Combine it with Workflows

Create a workflow that : 
- Search for the best quote about the user query
- Make a rap about it




In [10]:

class ContextualGrahamRapWorkflow(Workflow):
    
    @step
    def do(self, ev: StartEvent) -> StopEvent:
        return StopEvent()



In [11]:
w = ContextualGrahamRapWorkflow()

r = await w.run()

## 3 - Exercise (if time permits) 

Combine it with a reranker

Create a workflow that : 
- Search for the best quote about the user query
- Rerank it
- Make a rap about it
