# Custom Chatbot Project

I chose to use the 2024 WSL World Surf League Wikipedia page to supplement ChatGPT 3.5. Experimenting with ChatGPT 4.0, I noticed that even though its last training date was in 2023 it provided accurate answers because it could automatically search the web for up to date information. (I verified this by asking it to explain how it came up with that answer!) Although ChatGPT 4.0 is better than 3.5, there is still a legitimate use case for choosing the older model. Because it is cheaper per token than 4.0, a business case could be made for implementing a custom RAG solution depending on the scale of the application.

## Data Wrangling

TODO: In the cells below, load your chosen dataset into a `pandas` dataframe with a column named `"text"`. This column should contain all of your text data, separated into at least 20 rows.

Get the 2022 — 2024 WSL rankings and event schedules from the WSL Website.

Event request format: `https://www.worldsurfleague.com/events?all=1&year=<year>`

Rankings request format: `https://www.worldsurfleague.com/athletes/tour/<wct|mct>?year=<year>`

In [55]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
from itertools import chain

In [72]:
years = ['2022','2023','2024']

In [73]:
def get_rankings_by_year(year):
    tours = ['mct','wct']
    data = []
    for tour in tours:
        res = requests.get(f"https://www.worldsurfleague.com/athletes/tour/{tour}?year={year}")
        soup = BeautifulSoup(res.text)
        rankings = soup.find_all('a', class_='athlete-name')
        data.append([{"year": year, "tour": tour, "rank": rank + 1, "name": el.text} for rank, el in enumerate(rankings)])
    return data

In [74]:
results = []
for year in years:
    data = get_rankings_by_year(year)
    results.append(data)

In [75]:
df = pd.DataFrame(chain(*chain(*results)))

In [76]:
df

Unnamed: 0,year,tour,rank,name
0,2022,mct,1,Filipe Toledo
1,2022,mct,2,Italo Ferreira
2,2022,mct,3,Jack Robinson
3,2022,mct,4,Ethan Ewing
4,2022,mct,5,Kanoa Igarashi
...,...,...,...,...
174,2024,wct,14,Isabella Nichols
175,2024,wct,15,India Robinson
176,2024,wct,16,Alyssa Spencer
177,2024,wct,17,Sophie McCulloch


Next steps: Use ChatGPT to create natural language descriptions for the rankings (e.g. Filipe Toledo was ranked number 1 in 2022). Use these as embeddings for the prompt engineering. Also include tour events, locations and results.

## Custom Query Completion

TODO: In the cells below, compose a custom query using your chosen dataset and retrieve results from an OpenAI `Completion` model. You may copy and paste any useful code from the course materials.

In [27]:
import os
import openai
openai.api_key = None

## Custom Performance Demonstration

TODO: In the cells below, demonstrate the performance of your custom query using at least 2 questions. For each question, show the answer from a basic `Completion` model query as well as the answer from your custom query.

### Question 1

In [77]:
from openai import OpenAI

In [78]:
client = OpenAI()

In [100]:
surf_prompt = """
Question: "Who was the number 1 ranked female surfer in 2024 in the WSL?"
Answer:
"""
surf_prompt = client.completions.create(
    model="gpt-3.5-turbo-instruct",
    prompt=surf_prompt,
    stream=False,
    max_tokens=150
)
print(surf_prompt.choices[0].text)


As a language model AI, I don't have access to real-time data or predictions for the future. It is impossible for me to accurately answer this question.


### Question 2