<a href="https://colab.research.google.com/github/dankaparaficz/tensorflow/blob/master/AMLD_2025_Chat_with__data_LlamaIndex.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Chat with Data: Retrieval Augmented Generation (RAG) with LlamaIndex and OpenAI API**

## Overview

LlamaIndex https://www.llamaindex.ai/ is a powerful data framework that helps you:
- Create vector embeddings from documents
- Build efficient retrieval systems
- Connect your data with Large Language Models
- Generate contextualized responses

## **Part 1. Structural data**
We will build a simple RAG system to retrieve information about players at EURO 2024 using LlamaIndex library

**Data**: CSV file: euro2024_players.csv

This table presents information about football players participating in Euro 2024. It includes the following details for each player:

**Name**: Full name of the player

**Position**: The position they play on the field (e.g., goalkeeper, centre-back)

**Age**: Age of the player

**Club**: The club they currently play for

**Height**: Height of the player in centimeters

**Foot**: Their dominant foot (right or left)

**Caps**: The number of international matches played for their country

**Goals**: The number of goals scored in international matches

**Market Value**: Estimated market value of the player in Euros

**Country**: Country the player represents

#LLamaIndex
https://docs.llamaindex.ai/en/stable/

LlamaIndex, formerly known as GPT Index, is a data framework designed to streamline the development of applications that leverage Large Language Models (LLMs). It provides developers with tools and functionalities for integrating various data sources (e.g., documents, APIs, databases) with LLMs, enabling them to build more intelligent and context-aware applications.


##Chat Engine
###Concept:

Chat engine is a high-level interface for having a conversation with your data (multiple back-and-forth instead of a single question & answer). Think ChatGPT, but augmented with your knowledge base.

By keeping track of the conversation history, it can answer questions with past context in mind.

Configuring a Chat Engine
https://docs.llamaindex.ai/en/stable/module_guides/deploying/chat_engines/usage_pattern/




#Setup

In [None]:
import sys
import os
import pandas as pd

Install LlamaIndex library, adopted to OpenAI
The optional flag -q installs "quietly" without printing out details of the installation.

In [None]:
%pip install llama-index-llms-openai -q

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.6 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━[0m [32m1.5/1.6 MB[0m [31m45.0 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m31.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m64.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.9/50.9 kB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
!pip install llama-index -q

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/253.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m253.0/253.0 kB[0m [31m11.3 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/300.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m300.7/300.7 kB[0m [31m19.6 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI

##Set your OpenAI API key

By default, we use the OpenAI gpt-3.5-turbo model for text generation and text-embedding-ada-002 for retrieval and embeddings. In order to use this, you must have an OPENAI_API_KEY set up as an environment variable. You can obtain an API key by logging into your OpenAI account and creating a new API key. https://platform.openai.com/api-keys

###Obtain an OpenAI API Key
Log in to Your Account:
Go to https://platform.openai.com/login and log in with your credentials.

Navigate to the API Section:
Once logged in, click on your profile icon in the top-right corner and select "API Keys" from the dropdown menu.

Generate a New API Key:
Click the "Create new secret key" button.
Copy the generated API key. You will not be able to view it again, so store it securely in a password manager or a safe document.

In [None]:
#put open ai key
os.environ['OPENAI_API_KEY'] = input("Enter Open API key ")


Enter Open API key sk-80Ob2O4mx0sOQZW4fpzdRNo6FxQ8ykKzoHd9rhJz1UT3BlbkFJpoonmyzE8Z4vOHPecc8hGeZWW5jMGh9k42VaGV7hcA


## Upload Data from Kaggle
Dataset of all the players that are in the squad of the teams participating in the UEFA EURO 2024.

Source and short description: https://www.kaggle.com/datasets/damirdizdarevic/uefa-euro-2024-players


In [None]:
# Create data directory
!mkdir -p 'data_euro/'

In [None]:
#!/bin/bash
!curl -L -o data_euro/uefa-euro-2024-players.zip\
  https://www.kaggle.com/api/v1/datasets/download/damirdizdarevic/uefa-euro-2024-players

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 15446  100 15446    0     0  40535      0 --:--:-- --:--:-- --:--:-- 40535


In [None]:
#unzip data into data_euro folder
!unzip -o data_euro/uefa-euro-2024-players.zip -d data_euro


Archive:  data_euro/uefa-euro-2024-players.zip
  inflating: data_euro/euro2024_players.csv  


In [None]:
#remove zip file from data_euro folder
!rm data_euro/uefa-euro-2024-players.zip

In [None]:
#load data to dataframe to explore
data = pd.read_csv("./data_euro/euro2024_players.csv")

In [None]:
# Print a concise summary of the DataFrame 'data',
# including information about columns, data types
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 623 entries, 0 to 622
Data columns (total 10 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   Name         623 non-null    object
 1   Position     623 non-null    object
 2   Age          623 non-null    int64 
 3   Club         623 non-null    object
 4   Height       623 non-null    int64 
 5   Foot         620 non-null    object
 6   Caps         623 non-null    int64 
 7   Goals        623 non-null    int64 
 8   MarketValue  623 non-null    int64 
 9   Country      623 non-null    object
dtypes: int64(5), object(5)
memory usage: 48.8+ KB


In [None]:
#first 5 players
data.head()

Unnamed: 0,Name,Position,Age,Club,Height,Foot,Caps,Goals,MarketValue,Country
0,Marc-André ter Stegen,Goalkeeper,32,FC Barcelona,187,right,40,0,28000000,Germany
1,Manuel Neuer,Goalkeeper,38,Bayern Munich,193,right,119,0,4000000,Germany
2,Oliver Baumann,Goalkeeper,34,TSG 1899 Hoffenheim,187,right,0,0,3000000,Germany
3,Nico Schlotterbeck,Centre-Back,24,Borussia Dortmund,191,left,12,0,40000000,Germany
4,Jonathan Tah,Centre-Back,28,Bayer 04 Leverkusen,195,right,25,0,30000000,Germany


In [None]:
#filter by country
data[data['Country'] == 'Spain']

Unnamed: 0,Name,Position,Age,Club,Height,Foot,Caps,Goals,MarketValue,Country
104,David Raya,Goalkeeper,28,Arsenal FC,183,right,5,0,35000000,Spain
105,Unai Simón,Goalkeeper,26,Athletic Bilbao,190,right,39,0,30000000,Spain
106,Álex Remiro,Goalkeeper,29,Real Sociedad,191,right,1,0,25000000,Spain
107,Robin Le Normand,Centre-Back,27,Real Sociedad,187,right,10,1,40000000,Spain
108,Dani Vivian,Centre-Back,24,Athletic Bilbao,184,right,2,0,25000000,Spain
109,Aymeric Laporte,Centre-Back,30,Al-Nassr FC,191,left,28,1,20000000,Spain
110,Nacho Fernández,Centre-Back,34,Real Madrid,180,right,24,1,3000000,Spain
111,Alejandro Grimaldo,Left-Back,28,Bayer 04 Leverkusen,171,left,3,0,45000000,Spain
112,Marc Cucurella,Left-Back,25,Chelsea FC,173,left,3,0,25000000,Spain
113,Daniel Carvajal,Right-Back,32,Real Madrid,173,right,43,0,12000000,Spain


In [None]:
#filter by club
data[data['Club'] == 'Real Madrid']

Unnamed: 0,Name,Position,Age,Club,Height,Foot,Caps,Goals,MarketValue,Country
5,Antonio Rüdiger,Centre-Back,31,Real Madrid,190,right,69,3,25000000,Germany
15,Toni Kroos,Central Midfield,34,Real Madrid,183,right,109,17,10000000,Germany
110,Nacho Fernández,Centre-Back,34,Real Madrid,180,right,24,1,3000000,Spain
113,Daniel Carvajal,Right-Back,32,Real Madrid,173,right,43,0,12000000,Spain
129,Joselu,Centre-Forward,34,Real Madrid,191,right,10,5,5000000,Spain
145,Luka Modric,Central Midfield,38,Real Madrid,172,right,174,24,6000000,Croatia
250,Jude Bellingham,Attacking Midfield,20,Real Madrid,186,right,29,3,180000000,England
400,Ferland Mendy,Left-Back,29,Real Madrid,180,left,9,0,22000000,France
402,Aurélien Tchouaméni,Defensive Midfield,24,Real Madrid,188,right,31,3,100000000,France
404,Eduardo Camavinga,Central Midfield,21,Real Madrid,182,left,16,1,100000000,France


#Load Documents to Prepare for Indexing Using LlamaIndex

Load documents (our csv file) to build the VectorStoreIndex. The folder may consist of several files. SimpleDirectoryReader will load all files in the folder. The format of the data may vary.

In [None]:
# load data
documents = SimpleDirectoryReader("./data_euro").load_data()

In [None]:
print(documents)

[Document(id_='78f44214-c2b3-4647-8a21-e37f6e8a72c3', embedding=None, metadata={'file_path': '/content/data_euro/euro2024_players.csv', 'file_name': 'euro2024_players.csv', 'file_type': 'text/csv', 'file_size': 49009, 'creation_date': '2025-02-14', 'last_modified_date': '2024-06-08'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={}, metadata_template='{key}: {value}', metadata_separator='\n', text_resource=MediaResource(embeddings=None, data=None, text="Marc-André ter Stegen, Goalkeeper, 32, FC Barcelona, 187, right, 40, 0, 28000000, Germany\nManuel Neuer, Goalkeeper, 38, Bayern Munich, 193, right, 119, 0, 4000000, Germany\nOliver Baumann, Goalkeeper, 34, TSG 1899 Hoffenheim, 187, right, 0, 0, 3000000, Germany\nNico Schlotterbeck, Centre-Back, 24, Borussia Dort

### Build index
Creates a searchable vector index from the loaded documents
 - Converts text into numerical vectors using embeddings
 - Enables efficient semantic search and retrieval
 - Stores document vectors in memory for quick access
 - Used by chat engine to find relevant context when answering questions

In [None]:
index = VectorStoreIndex.from_documents(documents)

In [None]:
#setup LLM model
llm = OpenAI(model="gpt-3.5-turbo", temperature=0)

#Configuring a Chat Engine

Configuring a chat engine is very similar to configuring a query engine.

##High-Level API

You can directly build and configure a chat engine from an index in 1 line of code:


*chat_engine = index.as_chat_engine(chat_mode="condense_question", verbose=True)*

**Note**: you can access different chat engines by specifying the chat_mode as a kwarg. condense_question corresponds to CondenseQuestionChatEngine, react corresponds to ReActChatEngine, context corresponds to a ContextChatEngine.

**Note**: While the high-level API optimizes for ease-of-use, it does NOT expose full range of configurability.


##Available Chat Modes:

**context** - Retrieve nodes from the index using every user message. The retrieved text is inserted into the system prompt, so that the chat engine can either respond naturally or use the context from the query engine.

**condense_question** - Look at the chat history and re-write the user message to be a query for the index. Return the response after reading the response from the query engine.

**condense_plus_context** - A combination of condense_question and context. Look at the chat history and re-write the user message to be a retrieval query for the index. The retrieved text is inserted into the system prompt, so that the chat engine can either respond naturally or use the context from the query engine.

**simple** - A simple chat with the LLM directly, no query engine involved.

**best** - Turn the query engine into a tool, for use with a ReAct data agent or an OpenAI data agent, depending on what your LLM supports. OpenAI data agents require gpt-3.5-turbo or gpt-4 as they use the function calling API from OpenAI.

**react** - Same as best, but forces a ReAct data agent.

**openai** - Same as best, but forces an OpenAI data agent.



### Default Mode
Default mode is the basic question-answering mode in LlamaIndex. It provides straightforward answers by directly querying the indexed documents without any additional processing like question reformulation or context maintenance.


In [None]:
chat_engine = index.as_chat_engine()

In [None]:
# Get response
response = chat_engine.chat("Hello! What club played Toni Kroos during euro 2024?")

In [None]:
print(response)

Toni Kroos played for Real Madrid during Euro 2024.


### Context mode
Context mode in LlamaIndex is a chat engine configuration that provides more natural and complete responses by maintaining the conversational context. Unlike the default mode, it formats answers in complete sentences and includes relevant context from the source documents.

#### Key Features
- Generates complete, grammatically correct sentences
- Maintains conversational history
- Provides responses with better context
- Doesn't reformulate the question (unlike condense_question mode)



In [None]:
# Create chat engine with optimal settings
chat_engine = index.as_chat_engine(
    chat_mode="context",  # For complete, natural responses
    llm=llm,              # Directly set LLM in chat engine
    verbose=True,        # To see the retrieval process
    similarity_top_k=2   # Get top 2 most relevant chunks
)


### Understanding the Settings:
- `chat_mode="context"`: Provides complete, natural responses while maintaining context
- `verbose=True`: Shows the retrieval process for debugging
- `similarity_top_k=2`: Retrieves the 2 most relevant text chunks
- `llm=llm` : Specifies which Language Model to use

In [None]:
# Get response
response = chat_engine.chat("Hello! What club played Toni Kroos during euro 2024?")

In [None]:
print(response)

Toni Kroos played for Real Madrid during Euro 2024.


In [None]:
chat_engine = index.as_chat_engine(chat_mode="context", verbose=True)

The `verbose=True` parameter in LlamaIndex enables detailed logging and debugging output during the query process. When enabled, it shows step-by-step information about how LlamaIndex processes your query.

#### What It Shows
- Query processing steps
- Retrieved chunks of text
- Similarity scores
- Response generation process
- Internal prompts being used

### Condense_question
Condense question is a simple chat mode built on top of a query engine over your data.

For each chat interaction:

- first generate a standalone question from conversation context and last message, then
- query the query engine with the condensed question for a response.

This approach is simple, and works for questions directly related to the knowledge base. Since it always queries the knowledge base, it can have difficulty answering meta questions like "what did I ask you before?"

In [None]:
chat_engine = index.as_chat_engine(chat_mode="condense_question", verbose=True)


###Chat with your data

In [None]:
response = chat_engine.chat("Hello! What club played Toni Kroos during euro 2024?")

Querying with: Hello! What club played Toni Kroos during euro 2024?


In [None]:
print(response)

Real Madrid


In [None]:
response = chat_engine.chat("What is his age?")

Querying with: How old is Toni Kroos, the player who played for Real Madrid during Euro 2024?


In [None]:
print(response)

34


In [None]:
response = chat_engine.chat("List all players from Real Madrid Club")

Querying with: Can you please list all players from the Real Madrid Club?


In [None]:
print(response)

Joselu, Centre-Forward, 34, Real Madrid, 191, right, 10, 5, 5000000, Spain


In [None]:
response = chat_engine.chat("From which club is David Raum?")

Querying with: Which club did David Raum play for?


In [None]:
print(response)

David Raum played for TSG Hoffenheim.



##Reset conversation state

In [None]:
chat_engine.reset()

##Chat Engine - Condense Plus Context Mode

This is a multi-step chat mode built on top of a retriever over your data.

For each chat interaction:

- First condense a conversation and latest user message to a standalone question
- Then build a context for the standalone question from a retriever,
- Then pass the context along with prompt and user message to LLM to generate a response.

This approach is simple, and works for questions directly related to the knowledge base and general interactions.



Since the context retrieved can take up a large amount of the available LLM context, let's ensure we configure a smaller limit to the chat history!

In [None]:
from llama_index.core.memory import ChatMemoryBuffer

memory = ChatMemoryBuffer.from_defaults(token_limit=3900)

chat_engine = index.as_chat_engine(
    chat_mode="condense_plus_context",
    memory=memory,
    llm=llm,
    context_prompt=(
        "You are a chatbot, able to have normal interactions, as well as talk"
        " about football players in Euro 2024 based on the provided data"
        "Here are the relevant documents for the context:\n"
        "{context_str}"
        "\nInstruction: Use the previous chat history, or the context above, to interact and help the user."
    ),
    verbose=False,
)


###Chat with your data

In [None]:
response = chat_engine.chat("Hello! What club played Toni Kroos during euro 2024?")

In [None]:
print(response)

Toni Kroos played for Real Madrid during Euro 2024.


In [None]:
response = chat_engine.chat("What is his age?")

In [None]:
print(response)

Toni Kroos was 34 years old during Euro 2024.


In [None]:
response = chat_engine.chat("List all players from Real Madrid Club")

In [None]:
print(response)

Here are the players from Real Madrid Club in Euro 2024:

1. Joselu, Centre-Forward, 34 years old
2. Toni Kroos, Central Midfield, 34 years old


In [None]:
response = chat_engine.chat("From which club is David Raum?")

In [None]:
print(response)

David Raum is not listed in the provided data for Euro 2024 players. If you have any other players in mind or need information about the existing players, feel free to ask!



###Reset conversation state

In [None]:
chat_engine.reset()

In [None]:
response = chat_engine.chat("Hello! What do you know?")

In [None]:
print(response)

Hello! I know a lot of things. Is there anything specific you would like to talk about or learn more about?


###Context chat engine

In [None]:
chat_engine = index.as_chat_engine(chat_mode="context", llm=llm, verbose=True)

## Task 1
Try different modes of chat engine: simple,
best, react, openai. Chat with data. Copmare the output. Which engine mode is the best for this tasks? Don't forget to reset the chatbot after each conversation

## Task 2
##Test you chats. Chat with you data asking these questions:

**Factual Questions (Direct Retrieval):**

Which club does Manuel Neuer play for?

What is the age of the youngest player in the dataset?

How many international caps does Marc-André ter Stegen have?

Which player has scored the most goals in international matches?

What is the total market value of all players in the dataset?

**Comparative Questions (Simple Inference):**

Which goalkeeper has the higher market value, ter Stegen or Neuer?

Which player is taller, Schlotterbeck or Tah?

Who is older, Baumann or Neuer?

Which centre-back has more international caps?

**Complex Questions (Requires Reasoning):**

Based on the data, who would you consider the most valuable goalkeeper for Germany? Explain your reasoning.

Which centre-back might be a better fit for a team prioritizing defensive experience? Why?

If you were a club manager looking for a young, promising centre-back, who would you be interested in signing? Justify your choice.

**Open-ended Questions (Promotes Generation):**

What potential strengths and weaknesses could this group of players bring to the German national team at Euro 2024?

How might the combination of experience and youth in this dataset impact Germany's performance in the tournament?

**Bonus Question (Tests Out-of-Domain Knowledge):**

Which of these players were part of Germany's World Cup-winning squad in 2014? (This requires external knowledge, as it's not in the provided dataset)

#**Part 2. Non-structural data**
This part demonstrates how to use **LlamaIndex**  to build a document-based retrieval system.

We will do step-by-step:

1. Install necessary dependencies.
2. Load a PDF document (GPT4 technical report paper).
3. Build a vector-based index.
4. Perform document retrieval and query with OpenAI’s GPT model.
5. Stream query responses with page citations.

This section demonstrates how to build a powerful document-based retrieval system using LlamaIndex. We'll analyze research papers using RAG (Retrieval-Augmented Generation) technology.



## What We'll Build

A complete document analysis system that can:
- Load and process PDF documents
- Create searchable vector indices
- Answer questions with relevant citations
- Stream responses in real-time
- Provide source verification

## Step-by-Step Process

### 1. Install Dependencies
We'll set up all necessary packages:
- llama-index: Core framework for document processing
- llama-index-llms-openai: OpenAI integration
- Other supporting libraries

### 2. Load PDF Document
We'll work with the GPT-4 technical report to:
- Create proper directory structure
- Download paper from arXiv
- Process PDF into analyzable text
- Verify document loading

### 3. Build Vector Index
Transform document into searchable format:
- Convert text to vector embeddings
- Create efficient index structure
- Optimize for quick retrieval

### 4. Document Retrieval & Querying
Implement intelligent querying:
- Connect with OpenAI's GPT model
- Set up query parameters
- Configure response formats
- Handle different types of questions

### 5. Stream Responses
Create interactive output system:
- Real-time response streaming
- Page number citations
- Source verification
- Relevance scoring

###**Step 1. Import the necessary components from LlamaIndex**

In [None]:
from llama_index.core import (
    SimpleDirectoryReader,    # For reading documents
    VectorStoreIndex,         # For creating vector embeddings
    download_loader,          # For loading different document types
    RAKEKeywordTableIndex,    # For keyword extraction
)

Configure LLM: Initialize the OpenAI LLM with a temperature of 0 for deterministic responses.

In [None]:
llm = OpenAI(temperature=0, model="gpt-3.5-turbo")

### **Step 2: Document Preparation**


In [None]:
# Create data directory
!mkdir -p 'data/'

In [None]:
# Download research paper from arXiv  and save it in the data folder as chatGPT4_paper.pdf
!wget "https://arxiv.org/pdf/2303.08774.pdf" -O "data/chatGPT4_paper.pdf"

--2025-02-14 10:10:09--  https://arxiv.org/pdf/2303.08774.pdf
Resolving arxiv.org (arxiv.org)... 151.101.195.42, 151.101.131.42, 151.101.67.42, ...
Connecting to arxiv.org (arxiv.org)|151.101.195.42|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://arxiv.org/pdf/2303.08774 [following]
--2025-02-14 10:10:09--  http://arxiv.org/pdf/2303.08774
Connecting to arxiv.org (arxiv.org)|151.101.195.42|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 5245564 (5.0M) [application/pdf]
Saving to: ‘data/chatGPT4_paper.pdf’


2025-02-14 10:10:09 (86.4 MB/s) - ‘data/chatGPT4_paper.pdf’ saved [5245564/5245564]



### **Step 3. Load and Analyze Document**

We use SimpleDirectoryReader to load the document and VectorStoreIndex to build an index.

In [None]:
reader = SimpleDirectoryReader(input_files=["./data/chatGPT4_paper.pdf"])
data = reader.load_data()

Let's examine our loaded document:
print basic document information (numbers of loaded documents and the abstract)

In [None]:
# Print basic document information
print(f"Number of documents loaded: {len(data)}")
print(f"Abstract text preview: {data[0].text[:1000]}...")


Number of documents loaded: 100
Abstract text preview: GPT-4 Technical Report
OpenAI∗
Abstract
We report the development of GPT-4, a large-scale, multimodal model which can
accept image and text inputs and produce text outputs. While less capable than
humans in many real-world scenarios, GPT-4 exhibits human-level performance
on various professional and academic benchmarks, including passing a simulated
bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-
based model pre-trained to predict the next token in a document. The post-training
alignment process results in improved performance on measures of factuality and
adherence to desired behavior. A core component of this project was developing
infrastructure and optimization methods that behave predictably across a wide
range of scales. This allowed us to accurately predict some aspects of GPT-4’s
performance based on models trained with no more than 1/1,000th the compute of
GPT-4.
1 Introduction
This technic

##**Step 4: Create Vector Index**
We create a vector-based index:

In [None]:
#build a vector-based index from our documents:
index = VectorStoreIndex.from_documents(data)

## **Step 5: Setup Query Engine**
We create a query engine that retrieves the most relevant information.

In [None]:
#create a query engine with streaming and top-k retrieval:
query_engine = index.as_query_engine(streaming=True, similarity_top_k=3)

## **Step 6: Query the Document**
Let's ask some questions about the paper:


In [None]:
#streaming query response with page citation
response = query_engine.query(
    "How does GPT-4 perform on professional exams compared to GPT-3.5? Show statements in bullet form and show"
    " page reference after each statement."
)
response.print_response_stream()

- GPT-4 outperforms GPT-3.5 on most exams tested. (Page 6)
- GPT-4 exhibits human-level performance on the majority of professional and academic exams. (Page 6)
- GPT-4 passes a simulated version of the Uniform Bar Examination with a score in the top 10% of test takers. (Page 6)

###Analyze the Sources
Let's examine where this information came from:

In [None]:

for node in response.source_nodes:
    print("-----")
    text_fmt = node.node.get_content().strip().replace("\n", " ")[:1000]
    print(f"Text:\t {text_fmt} ...")
    print(f"Metadata:\t {node.node.metadata}")
    print(f"Score:\t {node.score:.3f}")

-----
Text:	 AP Calculus BCAMC 12Codeforces RatingAP English LiteratureAMC 10Uniform Bar ExamAP English LanguageAP ChemistryGRE QuantitativeAP Physics 2USABO Semifinal 2020AP MacroeconomicsAP StatisticsLSATGRE WritingAP MicroeconomicsAP BiologyGRE VerbalAP World HistorySAT MathAP US HistoryAP US GovernmentAP PsychologyAP Art HistorySAT EBRWAP Environmental Science Exam 0% 20% 40% 60% 80% 100% Estimated percentile lower bound (among test takers) Exam results (ordered by GPT-3.5 performance) gpt-4 gpt-4 (no vision) gpt3.5 Figure 4. GPT performance on academic and professional exams. In each case, we simulate the conditions and scoring of the real exam. Exams are ordered from low to high based on GPT-3.5 performance. GPT-4 outperforms GPT-3.5 on most exams tested. To be conservative we report the lower end of the range of percentiles, but this creates some artifacts on the AP exams which have very wide scoring bins. For example although GPT-4 attains the highest possible score on AP Biolo

In [None]:
response = query_engine.query(
    "Can GPT-4 process images as inputs, and if so, how? Show statements in bullet form and show"
    " page referencehttps://www.linkedin.com/posts/wids-geneva_datascience-machinelearning-digitalinnovation-activity-7295335890191613952-TEoG?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAtg5T4BN_RTfJBY_yVvdg6n0wTVToTamc8after each statement."
)
response.print_response_stream()

- GPT-4 can process images as inputs by describing the content of the image in detail, panel by panel. (Page reference: 9)
- It can analyze and identify elements in the image, such as objects and their interactions, to provide a description of what is depicted. (Page reference: 9)
- GPT-4 can recognize and comment on the unusual or humorous aspects of an image based on the visual content presented. (Page reference: 36)
- The model can interpret visual input to generate responses that capture the essence of the image and provide insights or explanations about the depicted scene. (Page reference: 36)

### Test you chats. Chat with you data asking these questions:

**Factuality and Benchmark Performance**

- What score did GPT-4 achieve on the Uniform Bar Exam?
- How does GPT-4’s performance on MMLU compare to previous models?
- How does GPT-4’s performance on TruthfulQA change after RLHF fine-tuning?

**Hallucination and Accuracy**

- How does GPT-4's hallucination rate compare to GPT-3.5?
- What impact does reinforcement learning from human feedback (RLHF) have on GPT-4’s factual accuracy?

**Multimodal Capabilities**

- What types of images can GPT-4 interpret in its multimodal setting?
- Give an example of GPT-4 analyzing an image and producing a response.


**Safety and Bias Mitigation**

- What safety interventions were applied to GPT-4 during its development?
- How does GPT-4 handle requests for sensitive or restricted information?
- What methods did OpenAI use to test GPT-4’s safety before deployment?
- What are rule-based reward models (RBRMs), and how do they improve GPT-4’s safety?

