# NBA Question Answering System

This notebook demonstrates how to use the NBA QA system to answer questions about NBA players, teams, and games.

## Overview

The NBA QA System uses a hybrid RAG-like approach:
1. **Question Analysis**: Extracts entities (players, teams, stats, dates)
2. **Data Retrieval**: Queries NBA API for relevant data
3. **Context Generation**: Converts structured data to natural language
4. **QA Model**: Uses Hugging Face QA model to extract answers
5. **Answer Formatting**: Returns structured answers with sources

## Features

- Answer questions about player statistics
- Query game results and box scores
- Compare players
- Analyze play-by-play data
- Query all-time league leaders in various statistical categories
- Get formatted scores for the most recent team games
- Works with both pre-trained and fine-tuned models


## Step 0: Install Required Packages

Run this cell first to install the required packages if they're not already installed.


In [1]:
# Install required packages
# Pin urllib3 to v1.x to avoid OpenSSL compatibility warnings on macOS
%pip install transformers nba-api pandas numpy "urllib3<2.0" --no-warn-script-location

print("✅ Packages installed successfully!")


Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.
✅ Packages installed successfully!


## Step 1: Import the QA System

**Note**: If you get a `ModuleNotFoundError`, make sure you ran the installation cell above.

Import the NBAQASystem class from the nba module.


In [2]:
import warnings
# Suppress urllib3 OpenSSL warnings (optional)
warnings.filterwarnings('ignore', category=UserWarning, module='urllib3')

from nba_qa_system import NBAQASystem

print("✅ NBA QA System imported successfully!")


✅ NBA QA System imported successfully!


## Step 2: Initialize the QA System

You can use either:
- **Pre-trained model**: `deepset/roberta-base-squad2` (general purpose)
- **Fine-tuned model**: `nba/model_nba_qa_finetuned` (NBA-optimized)

The fine-tuned model is recommended for better NBA-specific performance.


In [3]:
qa_system = NBAQASystem(model_name="utils/model_nba_qa_finetuned")

print("✅ QA System initialized!")
print(f"Model: {qa_system.qa_pipeline.model.config.name_or_path}")


Device set to use mps:0


✅ QA System initialized!
Model: utils/model_nba_qa_finetuned


## Step 3: Ask Questions About Player Statistics

The system can answer questions about player career statistics, recent games, and more.


In [4]:
# Example 1: Career statistics
question1 = "What is Kevin Durant' career PPG?"
result1 = qa_system.answer(question1)

print(f"Question: {question1}")
print(f"Answer: {result1.answer}")
print(f"Confidence: {result1.confidence:.2f}")
print(f"Sources: {len(result1.sources)} source(s)")
print("-" * 80)


Question: What is Kevin Durant' career PPG?
Answer: 27.3
Confidence: 1.00
Sources: 2 source(s)
--------------------------------------------------------------------------------


## Step 4: View Detailed Analysis

Use `answer_with_details()` to see the full pipeline process including entity extraction and context generation.


In [5]:
# Get detailed analysis
question = "What is Nikola Jokic's career PPG?"
details = qa_system.answer_with_details(question)

print("=" * 80)
print("DETAILED ANALYSIS")
print("=" * 80)
print(f"\nQuestion: {details['question']}")
print(f"\nQuestion Type: {details['analysis']['question_type']}")
print(f"Entities Extracted:")
for key, value in details['analysis']['entities'].items():
    if value:  # Only show non-empty entities
        print(f"  - {key}: {value}")
print(f"\nNBA Data Type: {details['nba_data_type']}")
print(f"\nContext Generated:")
print(details['context'][:300] + "..." if len(details['context']) > 300 else details['context'])
print(f"\nFinal Answer: {details['answer']}")
print(f"Confidence: {details['confidence']:.2f}")
print(f"\nSources: {len(details['sources'])} source(s)")
for source in details['sources']:
    print(f"  - {source}")


DETAILED ANALYSIS

Question: What is Nikola Jokic's career PPG?

Question Type: general
Entities Extracted:
  - stats: ['average']
  - temporal: {'career': True}

NBA Data Type: general

Context Generated:
No specific NBA data found to answer this question.

Final Answer: NBA data
Confidence: 0.06

Sources: 0 source(s)


## Step 5: Test Multiple Question Types

The system handles various question types: player stats, game results, comparisons, league leaders, and more.


In [12]:
# Test various question types
test_questions = [
    "What is LeBron James' career scoring average?",
    "What is Kevin Durant career RPG?",
    "What was the score of the last Lakers game?",
    "Who scored more points, Kobe Bryant or Stephen Curry?",
    "Who are the top 5 assists leaders of the league history?",
    "What is Stephen Curry career APG?",
    "Kobe Bryant scored 23 points in the 2010 NBA Finals Game 7"
]

print("Testing Multiple Questions")
print("=" * 80)

for i, question in enumerate(test_questions, 1):
    print(f"\n{i}. Question: {question}")
    result = qa_system.answer(question)
    # For league leader questions, the answer is formatted as a table
    if "top" in question.lower() and "leader" in question.lower():
        print(f"   Answer:\n{result.answer}")
    else:
        print(f"   Answer: {result.answer}")
    print(f"   Confidence: {result.confidence:.2f}")
    print("-" * 80)


Testing Multiple Questions

1. Question: What is LeBron James' career scoring average?
   Answer: 27.0
   Confidence: 0.92
--------------------------------------------------------------------------------

2. Question: What is Kevin Durant career RPG?
   Answer: 6.9
   Confidence: 1.00
--------------------------------------------------------------------------------

3. Question: What was the score of the last Lakers game?
   Answer: Los Angeles Lakers 112, Philadelphia 76ers 108
   Confidence: 0.95
--------------------------------------------------------------------------------

4. Question: Who scored more points, Kobe Bryant or Stephen Curry?
   Answer: Stephen Curry
   Confidence: 0.52
--------------------------------------------------------------------------------

5. Question: Who are the top 5 assists leaders of the league history?
   Answer:
Rank   Player Name          AST
--------------------------------------------------
1      John Stockton        15,806
2      Chris Paul     

## Step 6: Test League Leaders Questions

The system can answer questions about all-time league leaders in various statistical categories.


In [7]:
# Example 1: Top assists leaders
question1 = "Who are the top 5 assists leaders of the league history?"
result1 = qa_system.answer(question1)

print(f"Question: {question1}")
print("\nAnswer:")
print(result1.answer)
print(f"\nConfidence: {result1.confidence:.2f}")
print(f"Sources: {len(result1.sources)} source(s)")
print("-" * 80)


Question: Who are the top 5 assists leaders of the league history?

Answer:
Rank   Player Name          AST
--------------------------------------------------
1      John Stockton        15,806
2      Chris Paul           12,552
3      Jason Kidd           12,091
4      LeBron James         11,637
5      Steve Nash           10,335

Confidence: 0.95
Sources: 1 source(s)
--------------------------------------------------------------------------------


## Step 7: Test Last Team Game Questions

The system can retrieve and format the score of the most recent game for any team.


In [8]:
# Example: Last Lakers game
question = "What was the score of the last Lakers game?"
result = qa_system.answer(question)

print(f"Question: {question}")
print(f"\nAnswer: {result.answer}")
print(f"Confidence: {result.confidence:.2f}")
print(f"Sources: {len(result.sources)} source(s)")
for source in result.sources:
    print(f"  - {source}")
print("-" * 80)


Question: What was the score of the last Lakers game?

Answer: Los Angeles Lakers 112, Philadelphia 76ers 108
Confidence: 0.95
Sources: 1 source(s)
  - {'type': 'team_game', 'game_id': '0022500362', 'game_date': '2025-12-07', 'team': 'Los Angeles Lakers'}
--------------------------------------------------------------------------------


## Step 8: Compare Pre-trained vs Fine-tuned Models

If you have both models available, compare their performance on the same questions.


In [9]:
# Compare models (if fine-tuned model exists)
import os

test_question = "What is Nikola Jokic' career PPG?"

# Pre-trained model
qa_pretrained = NBAQASystem(model_name="deepset/roberta-base-squad2")
result_pretrained = qa_pretrained.answer(test_question)

print("MODEL COMPARISON")
print("=" * 80)
print(f"\nQuestion: {test_question}\n")
print(f"Pre-trained Model:")
print(f"  Answer: {result_pretrained.answer}")
print(f"  Confidence: {result_pretrained.confidence:.2f}")

# Fine-tuned model (if available)
if os.path.exists("../utils/model_nba_qa_finetuned"):
    qa_finetuned = NBAQASystem(model_name="utils/model_nba_qa_finetuned")
    result_finetuned = qa_finetuned.answer(test_question)
    
    print(f"\nFine-tuned Model:")
    print(f"  Answer: {result_finetuned.answer}")
    print(f"  Confidence: {result_finetuned.confidence:.2f}")
else:
    print("\nFine-tuned model not found. Run 2_finetune_nba_qa.ipynb to create it.")


Device set to use mps:0
Device set to use mps:0


MODEL COMPARISON

Question: What is Nikola Jokic' career PPG?

Pre-trained Model:
  Answer: I couldn't find relevant NBA data to answer this question. Please try asking about specific players, teams, or games.
  Confidence: 0.00

Fine-tuned Model:
  Answer: I couldn't find relevant NBA data to answer this question. Please try asking about specific players, teams, or games.
  Confidence: 0.00


## Step 9: Interactive Question Loop

Create an interactive loop to ask multiple questions.


In [10]:
# Interactive question loop (uncomment to use)
#print("NBA QA System - Interactive Mode")
#print("Type 'quit' to exit\n")
# 
#while True:
#    question = input("Ask a question about NBA: ")
#    if question.lower() in ['quit', 'exit', 'q']:
#        break
#     
#    result = qa_system.answer(question)
#    print(f"\nAnswer: {result.answer}")
#    print(f"Confidence: {result.confidence:.2f}\n")
#    print("-" * 80)

#print("Interactive mode is commented out. Uncomment the code above to enable it.")


## Summary

✅ **NBA QA System is ready to use!**

### Key Features

- **Entity Extraction**: Automatically identifies players, teams, stats, and dates
- **NBA API Integration**: Retrieves real-time data from official NBA API
- **Context Generation**: Converts structured data to natural language
- **QA Model**: Uses state-of-the-art question-answering models
- **Source Citations**: Provides traceability to original data sources
- **League Leaders**: Query all-time leaders in points, assists, rebounds, blocks, turnovers, steals, and 3-pointers
- **Last Team Games**: Get formatted scores for the most recent games
- **Career Averages**: Accurate per-game averages including turnovers and blocks
- **Smart Player Matching**: Prioritizes full names to avoid ambiguity

### Usage Tips

1. **Be Specific**: Include player names, teams, or dates for better results
2. **Use Abbreviations**: The system understands PPG, RPG, APG, etc.
3. **Check Confidence**: Lower confidence scores may indicate ambiguous questions
4. **View Details**: Use `answer_with_details()` to debug or understand the process
5. **League Leaders**: Ask "Who are the top N [stat] leaders?" for formatted lists
6. **Last Games**: Ask "What was the score of the last [Team] game?" for recent scores
7. **Full Names**: Use full player names (e.g., "Stephen Curry") for better accuracy

### Supported Question Types

- **Player Statistics**: Career averages, season stats, recent games
- **Game Results**: Scores, box scores, specific games
- **Player Comparisons**: Compare stats between players
- **League Leaders**: Top N players in any statistical category (all-time)
- **Last Team Games**: Most recent game scores for any team
- **Historical Data**: Questions about past seasons and games

### Next Steps

- Try different question types
- Compare pre-trained vs fine-tuned models
- Integrate into your own applications
- Extend with additional NBA API endpoints
