In [13]:
%pip install -Uq llama-index openai langchain

Note: you may need to restart the kernel to use updated packages.


## imports

In [1]:
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

from llama_index import GPTSimpleVectorIndex, download_loader
from IPython.display import Markdown, display

from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

from dotenv import load_dotenv

# load OPENAI API KEY
load_dotenv()

True

## data loader

In [16]:
PDFReader = download_loader("PDFReader")

loader = PDFReader()
documents = loader.load_data(file=Path('pdfs/lecture01-intro-2up.pdf'))
#print(documents) 

## manual construction

source: https://github.com/emptycrown/llama-hub/blob/main/loader_hub/file/pdf/base.py

In [12]:
from pypdf import PdfReader
import re
from io import BytesIO
from llama_index import Document


def parse_pdf(file: BytesIO):

    pdf = PdfReader(file)
    text_list = []
    
    # Get the number of pages in the PDF document
    num_pages = len(pdf.pages)

    # Iterate over every page
    for page in range(num_pages):
        # Extract the text from the page
        page_text = pdf.pages[page].extract_text()
        text_list.append(page_text)

    text = "\n".join(text_list)

    return [Document(text)]


with open('pdfs/lecture01-intro-2up.pdf', 'rb') as file:
    manual_load = parse_pdf(file)

## creating index

In [3]:
from llama_index.langchain_helpers.chatgpt import ChatGPTLLMPredictor

llm_predictor = ChatGPTLLMPredictor()

In [4]:
index = GPTSimpleVectorIndex(documents, llm_predictor=llm_predictor)

INFO:root:> [build_index_from_documents] Total LLM token usage: 0 tokens
> [build_index_from_documents] Total LLM token usage: 0 tokens
INFO:root:> [build_index_from_documents] Total embedding token usage: 1672 tokens
> [build_index_from_documents] Total embedding token usage: 1672 tokens


In [5]:
index.save_to_disk('index.json')
# load from disk
index = GPTSimpleVectorIndex.load_from_disk('index.json')

##  query chatgpt

In [6]:
# set Logging to DEBUG for more detailed outputs
response = index.query("Summarize this lecture in bullet points", llm_predictor=llm_predictor)

INFO:root:> [query] Total LLM token usage: 1822 tokens
> [query] Total LLM token usage: 1822 tokens
INFO:root:> [query] Total embedding token usage: 8 tokens
> [query] Total embedding token usage: 8 tokens


In [7]:
display(Markdown(f"<b>{response}</b>"))

<b>- The lecture is about Artificial Intelligence (AI).
- It covers the definition of AI, general characteristics of intelligence, and different approaches to creating AI.
- The history of AI is discussed, including early successes, disappointments, and recent advances.
- Examples of AI applications are given, including games, mathematics, space exploration, autonomous driving, and scientific discovery.
- AI's potential impact on society is mentioned, including concerns about AI surpassing human intelligence (the "Singularity").
- The lecture provides a brief introduction to the study of AI.</b>

In [8]:
response = index.query("Test my knowledge on this material with 3 questions and give me answers", llm_predictor=llm_predictor)

INFO:root:> [query] Total LLM token usage: 1844 tokens
> [query] Total LLM token usage: 1844 tokens
INFO:root:> [query] Total embedding token usage: 13 tokens
> [query] Total embedding token usage: 13 tokens


In [9]:
display(Markdown(f"<b>{response}</b>"))

<b>1. What is AI?
AI is the art of creating machines that perform functions that require intelligence when performed by humans. It is the study of the computations that make it possible to perceive, reason, and act. 

2. What are the four general characteristics of intelligence?
The four general characteristics of intelligence are perception, action, reasoning, and learning. 

3. What is a rational agent according to the course?
A rational agent is one that acts so as to achieve the best outcome, given the available information. It is an entity that perceives and acts, and the course is about designing such agents.</b>

In [10]:
response = index.query("What does slide 14 say about what AI can do?", llm_predictor=llm_predictor)
display(Markdown(f"<b>{response}</b>"))

INFO:root:> [query] Total LLM token usage: 1770 tokens
> [query] Total LLM token usage: 1770 tokens
INFO:root:> [query] Total embedding token usage: 11 tokens
> [query] Total embedding token usage: 11 tokens


<b>Slide 14 says that AI has defeated human champions in various games such as chess, checkers, poker, and Go. It also mentions how AI has proven a mathematical conjecture, assisted with logistics planning during the Gulf War, and controlled the scheduling of operations for spacecraft and Mars Exploration Rovers.</b>