# Project 2: Building a Custom NBA Playoffs Chatbot <a class="jp-toc-ignore"></a>

## Project overview <a class="jp-toc-ignore"></a>
In this project, I've developed a custom chatbot focused on NBA playoffs information. By leveraging the Wikipedia API to gather comprehensive data about NBA playoffs history, statistics, and memorable moments, this chatbot serves as a specialized knowledge base for basketball enthusiasts, sports journalists, and casual fans looking for accurate playoff information.

The chatbot is built using OpenAI API and also information gathered from Wikipedia. So we are going to be able to test the relevance of using OpenAI to answered questions about NBA playoffs, and also compared the results when we combined Wikipedia information in our customized prompt to OpenAI.
Project structure

The current project is broken into the following parts:
Project Config

- Configuring OpenAO API
- Extracting data from Wikipedia
- Preparing the dataset
- Summarizing NBA Playoffs dataset
- Naive OpenAI NBA Playoffs Chatbot
- RAG Based OpenAI NBA Playoffs Chatbot

This NBA Playoffs Chatbot demonstrates how domain-specific knowledge can be integrated with large language models to create specialized AI assistants. By focusing exclusively on NBA playoffs data, the chatbot provides more detailed and accurate information than general-purpose AI systems when discussing this specific domain of basketball history.

The implementation showcases the power of customized knowledge bases prompts with language models to create a custom knowledge retrieval system without relying on complex frameworks, offering insights into the fundamental mechanics of modern AI assistants.

# Config OpenAI API

In [16]:
import os
from dotenv import load_dotenv

load_dotenv()

api_key = os.getenv("PERSONAL_OPENAI_KEY")

In [17]:
from openai import OpenAI

client = OpenAI(
    api_key=api_key,
)

# Extracting data from Wikipedia
In this session we are going to use Wikipedia API to fetch data from the pages related to NBA playoffs.
## Getting a list of all pages from a Wikipedia category

In [18]:
import wikipediaapi
import openai
import pandas as pd
import json
import os

In [19]:
def fetch_wikipedia_page(page_title):
    page = wiki_wiki.page(page_title)
    if page.exists():
        return page.text
    else:
        return None

In [20]:
wiki_wiki = wikipediaapi.Wikipedia('CustomChatbot (gabrielgoncalvesbr@gmail.com)', 'en')

In [21]:
if os.path.exists("nba_playoffs_wikipedia.json"):
    # Read the JSON file and load it into the dictionary
    with open("nba_playoffs_wikipedia.json", "r", encoding="utf-8") as json_file:
        dict_nba_playoffs_pages = json.load(json_file)
else:
    dict_nba_playoffs_pages = {}
    
    cat = wiki_wiki.page("Category:NBA playoffs")
    for title in cat.categorymembers:
        print(title)
        page_result = fetch_wikipedia_page(title)
        if page_result:
            dict_nba_playoffs_pages[title] = page_result

    with open("nba_playoffs_wikipedia.json", "w", encoding="utf-8") as json_file:
        json.dump(dict_nba_playoffs_pages, json_file, ensure_ascii=False, indent=4)


# Preparing the dataset

In [22]:
df_playoffs = pd.DataFrame.from_dict(
    dict_nba_playoffs_pages, 
    orient='index', 
    columns=['text']#, 'text']
).reset_index().rename(columns={'index': 'title'})
df_playoffs.head()

Unnamed: 0,title,text
0,NBA playoffs,The NBA playoffs is the annual elimination tou...
1,1950 NBA playoffs,The 1950 NBA playoffs was the postseason tourn...
2,1951 NBA playoffs,The 1951 NBA playoffs was the postseason tourn...
3,1952 NBA playoffs,The 1952 NBA playoffs was the postseason tourn...
4,1953 NBA playoffs,The 1953 NBA playoffs was the postseason tourn...


# Summarizing the NBA Playoffs datasets
In this session, we are going to use the OpenAI ChatGPT-4 to summarize the Wikipedia pages for each page, as the size can be larger that the context windows for our following RAG approach.

**WARNING**: The process of using OpenAI API to summarize the NBA playoffs datasets will consume a large portion of the credits for your OpenAI API. Make sure to use it carefully, and also save it as json file locally, to avoid reprocessing and extra expenses.

In [23]:
import tiktoken
from tqdm.notebook import tqdm

def split_into_chunks(text, max_tokens=3000, model="gpt-4"):
    tokenizer = tiktoken.encoding_for_model(model)
    tokens = tokenizer.encode(text)
    chunks = [tokens[i:i+max_tokens] for i in range(0, len(tokens), max_tokens)]
    return [tokenizer.decode(chunk) for chunk in chunks]

def chat_with_openai(messages):
    try:
        response = client.chat.completions.create(
            model="gpt-4",
            messages=messages,
        )
        return response.choices[0].message.content
    except Exception as e:
        print(f"An error occurred: {e}")
        return None

def summarize_large_text(text, max_tokens=4000, model="gpt-4"):
    chunks = split_into_chunks(text, max_tokens=max_tokens, model=model)
    summaries = []
    for i, chunk in tqdm(enumerate(chunks)):
        messages = [
            {"role": "system", "content": "You are a helpful assistant that summarizes text."},
            {"role": "user", "content": f"Summarize the following text keeping key statistics about players and games:\n{chunk}"}
        ]
        summary = chat_with_openai(messages)
        summaries.append(summary)
    return " ".join(summaries)

def final_summarization(summaries, model="gpt-4"):
    messages = [
        {"role": "system", "content": "You are a helpful assistant creating concise summaries."},
        {"role": "user", "content": f"Summarize the following text keeping key statistics about players and games:\n{summaries}"}
    ]
    return chat_with_openai(messages)

In [24]:
if os.path.exists("nba_playoffs_wikipedia_summarized.json"):
    with open("nba_playoffs_wikipedia_summarized.json", "r", encoding="utf-8") as json_file:
        dict_nba_playoffs_summarized = json.load(json_file)
else:
    dict_nba_playoffs_summarized = {}
    for index, row in tqdm(df_playoffs.iterrows()):
        # Example Usage
        #large_text = "Your very large text here..."
        summaries = summarize_large_text(row.text)
        final_summary = final_summarization(summaries)
        dict_nba_playoffs_summarized[row.title] = final_summary
    with open("nba_playoffs_wikipedia_summarized.json", "w", encoding="utf-8") as json_file:
            json.dump(dict_nba_playoffs_summarized, json_file, ensure_ascii=False, indent=4) 

In [25]:
df_playoffs_summarized = pd.DataFrame.from_dict(
    dict_nba_playoffs_summarized, 
    orient='index', 
    columns=['text']#, 'text']
).reset_index().rename(columns={'index': 'title'})
df_playoffs_summarized.head()

Unnamed: 0,title,text
0,NBA playoffs,The NBA playoffs is a yearly tournament to det...
1,1950 NBA playoffs,The 1950 NBA playoffs marked the end of the Na...
2,1951 NBA playoffs,The 1951 NBA playoffs concluded with the Roche...
3,1952 NBA playoffs,The 1952 NBA playoffs concluded with the Minne...
4,1953 NBA playoffs,The 1953 NBA playoffs ended with the Minneapol...


# NBA Playoffs Chatbots
## Naive OpenAI NBA Playoffs Chatbot
In this session, we are going to built a naive OpenAI chatbot, that captures the information about the year and a question about NBA playoffs and makes a request to OpenAI. 

In [26]:
from openai import OpenAI
import os
from dotenv import load_dotenv

load_dotenv()
'''
client = OpenAI(
    api_key=os.getenv("OPENAI_API_KEY"),
    base_url="https://openai.vocareum.com/v1"
)
'''
def chat_with_openai(messages):
    """
    Sends a conversation to OpenAI and gets a response using the updated API.
    """
    try:
        response = client.chat.completions.create(
            model="gpt-4",
            messages=messages,
            
        )
        # Access the content of the response using dot notation
        return response.choices[0].message.content
    except Exception as e:
        print(f"An error occurred: {e}")
        return None

def naive_OpenAI_chatbot():
    """
    Starts the chatbot interaction.
    """
    print("👋 Welcome to the NBA Playoffs Chatbot!")
    
    # Initialize conversation history
    conversation = [
        {"role": "system", "content": "You are a helpful assistant knowledgeable about NBA playoffs."}
    ]
    
    # Ask the first question
    year = input("Bot: Which NBA playoff year do you want to know about?\nYou: ")
    conversation.append({"role": "user", "content": f"I want to know about the NBA playoffs in {year}."})
    
    # Ask the second question
    info_type = input("Bot: What's the information you want to know about?\nYou: ")
    conversation.append({"role": "user", "content": f"I want to know about {info_type} in the {year} NBA playoffs."})
    
    # Generate a response
    print("Bot: Let me gather some information for you...\n")
    
    assistant_prompt = f"Generate a detailed paragraph about {info_type} in the {year} NBA playoffs."
    conversation.append({"role": "assistant", "content": assistant_prompt})
    
    response = chat_with_openai(conversation)
    
    if response:
        # Display the generated paragraph
        print(f"Bot: Here's what I found:\n{response}\n")
        
        # Confirm with the user
        is_correct = input("Bot: Is this what you were searching for? (yes/no)\nYou: ").strip().lower()
        
        if is_correct == 'yes':
            print("Bot: I'm glad I could help! Let me know if there's anything else you'd like to know.")
        else:
            print("Bot: I'm sorry it wasn't what you were looking for. Could you clarify or ask something else?")
            naive_OpenAI_chatbot()
    else:
        print("Bot: I couldn't retrieve any information. Please try again later.")


## Asking questions to the Naive Chatbot
To test the Naive Chatbot, we are going to ask the following questions:

1. Question 1<br>
    - Year: **2020**
    - Question: **What was the starting five for the two teams in the NBA finals, each player basic statistics?**

2. Question 2
    - Year: **2023**
    - Question: **Which team was the champion?**

### Question 1 (Naive Chatbot)

In [27]:
naive_OpenAI_chatbot()

👋 Welcome to the NBA Playoffs Chatbot!


Bot: Which NBA playoff year do you want to know about?
You:  2020
Bot: What's the information you want to know about?
You:  What was the starting five for the two teams in the NBA finals, each player basic statistics?


Bot: Let me gather some information for you...

Bot: Here's what I found:
The 2020 NBA Finals was played between the Los Angeles Lakers and the Miami Heat. 

For the Los Angeles Lakers, the starting five were LeBron James, Anthony Davis, Dwight Howard, Danny Green, and Kentavious Caldwell-Pope. Let's look at their basic statistics:

1. LeBron James: He played extraordinarily well, with averages of 29.8 points, 11.8 rebounds, and 8.5 assists per game in six games. 

2. Anthony Davis: He had a great playoff run with averages of 27.7 points, 9.7 rebounds, and 3.5 assists per game.

3. Dwight Howard: Howard was mainly a defender and rebounder, with averages of 1.8 points and 2.8 rebounds per game. 

4. Danny Green: Green averaged 8.0 points, 3.0 rebounds, and 1.0 assists per game. 

5. Kentavious Caldwell-Pope: He played a significant role on the team with averages of 12.8 points, 2.3 rebounds, and 2.8 assists per game. 

For the Miami Heat, the starting five were Jimmy Butler, Bam Adebayo

Bot: Is this what you were searching for? (yes/no)
You:  yes


Bot: I'm glad I could help! Let me know if there's anything else you'd like to know.


### Question 2 (Naive Chatbot)

In [81]:
naive_OpenAI_chatbot()

👋 Welcome to the NBA Playoffs Chatbot!


Bot: Which NBA playoff year do you want to know about?
You:  2023
Bot: What's the information you want to know about?
You:  Which team was the champion?


Bot: Let me gather some information for you...

Bot: Here's what I found:
I'm sorry, but I am currently unable to provide information about future events such as the 2023 NBA playoffs, including the reigning champion, as they have not yet occurred. My programming only includes factual information up to the present day.



Bot: Is this what you were searching for? (yes/no)
You:  yes


Bot: I'm glad I could help! Let me know if there's anything else you'd like to know.


## RAG Based OpenAI NBA Playoffs Chatbot
In this session we are going to create a Chatbot that uses the information from the NBA playoffs dataset we have created before in order to augment the prompt, similar to a RAG approach.

Next, we'll ask the same questions asked to the Naive Chatbot and compare the results.

In [63]:
def get_data_from_nba_playoffs(
    query: str, dataset: pd.DataFrame = df_playoffs_summarized
) -> str:
    cat_text = dataset[
        dataset["title"].str.contains(str(query))]["text"].str.cat(sep="\n")
    return cat_text

def augmented_openai_chatbot():
    print("👋 Welcome to the Augmented NBA Playoffs Chatbot!")
    
    # Initialize conversation history
    conversation = [
        {"role": "system", "content": "You are a helpful assistant knowledgeable about NBA playoffs."}
    ]
    
    # Ask the first question
    year = input("Bot: Which NBA playoff year do you want to know about?\nYou: ")
    conversation.append({"role": "user", "content": f"I want to know about the NBA playoffs in {year}."})
    
    # Ask the second question
    info_type = input("Bot: What's the information you want to know about?\nYou: ")
    conversation.append({"role": "user", "content": f"I want to know about {info_type} in the {year} NBA playoffs."})
    
    # Generate a response
    print("Bot: Let me gather some information for you...\n")

    # Get Data from Wikipedia dataset
    str_filtered_dataset = get_data_from_nba_playoffs(year)
    assistant_prompt = f"Generate a detailed paragraph to answer the question for the {year} NBA playoffs: {info_type}\n"
    if str_filtered_dataset == '':
        print(f'Found no information about year "{year}" in NBA playoffs.')
        augmented_prompt = assistant_prompt + ''
    else:
        augmented_prompt = f'Use the following wikipedia paragraph to answer the question for the {year} NBA playoffs: {info_type}\n{str_filtered_dataset}'
    
    conversation.append({"role": "assistant", "content": augmented_prompt})
    print(augmented_prompt)
    response = chat_with_openai(conversation)
    
    if response:
        # Display the generated paragraph
        print(f"Bot: Here's what I found:\n{response}\n")
        
        # Confirm with the user
        is_correct = input("Bot: Is this what you were searching for? (yes/no)\nYou: ").strip().lower()
        
        if is_correct == 'yes':
            print("Bot: I'm glad I could help! Let me know if there's anything else you'd like to know.")
        else:
            print("Bot: I'm sorry it wasn't what you were looking for. Could you clarify or ask something else?")
            augmented_openai_chatbot()
    else:
        print("Bot: I couldn't retrieve any information. Please try again later.")


### Question 1 (RAG based Chatbot)

In [78]:
augmented_openai_chatbot()

👋 Welcome to the Augmented NBA Playoffs Chatbot!


Bot: Which NBA playoff year do you want to know about?
You:  2020
Bot: What's the information you want to know about?
You:  What was the starting five for the two teams in the NBA finals, each player basic statistics?


Bot: Let me gather some information for you...

Use the following wikipedia paragraph to answer the question for the 2020 NBA playoffs: What was the starting five for the two teams in the NBA finals, each player basic statistics?
The 2020 NBA playoffs concluded the 2019-20 NBA season, postposed due to COVID-19 and reconvened in the NBA Bubble with the top 22 teams. Notable events included the Toronto Raptors losing in the semifinals, the absence of the San Antonio Spurs for the first time since 1997, and the return of the Los Angeles Lakers, Dallas Mavericks, and Toronto Raptors to the playoffs. Player records set included LeBron James' unprecedented 20+ points, 15+ rebounds, and 15+ assists in a game, and Donovan Mitchell and Jamal Murray's multiple 50–point games in a single series. The playoffs were temporarily halted due to a wildcat strike following the Jacob Blake incident but resumed with the Los Angeles Lakers winning their 17th NBA championship, tying with the Boston Celtics. 

Bot: Is this what you were searching for? (yes/no)
You:  yes


Bot: I'm glad I could help! Let me know if there's anything else you'd like to know.


In [79]:
augmented_openai_chatbot()

👋 Welcome to the Augmented NBA Playoffs Chatbot!


Bot: Which NBA playoff year do you want to know about?
You:  2023
Bot: What's the information you want to know about?
You:  Which team was the champion?


Bot: Let me gather some information for you...

Use the following wikipedia paragraph to answer the question for the 2023 NBA playoffs: Which team was the champion?
The 2023 NBA playoffs were a historic season with Denver Nuggets winning their first NBA title and the Miami Heat reaching the finals for the first time since 2020. The Sacramento Kings also ended the longest postseason drought by making it to the postseason for the first time since 2006. For the first time since 2000-01 season, no team won at least 60 games in an 82-game regular season, and top scorers Luka Dončić and Damian Lillard did not reach playoffs, first time since 2004-05 season. Key players included Jimmy Butler, Devin Booker, Kevin Durant, and Ja Morant, with Nikola Jokić leading in points, rebounds, and assists. For the first round playoffs, Miami Heat, Boston Celtics, Philadelphia 76ers, and New York Knicks progressed from the Eastern Conference, while Denver Nuggets secured berths from the Western Conference.

Bot: Is this what you were searching for? (yes/no)
You:  yes


Bot: I'm glad I could help! Let me know if there's anything else you'd like to know.


# Final thoughts
Using the RAG approach to augment the prompt has improved the performance for the Chatbot, as OpenAI ChatGPT model training datasets were trained with data until 2021.
So when asked "Which team was the champion in 2023?" the Naive Chatbot was not able to answer, while the RAG based Chatbot found the information in the Wikipedia page, and answered it correctly.