# TLoL-LLM - Initial LLM Experiments

## Overview

This notebook contains the initial experiments for generating text descriptions of observations / scenes, embedding them using `text-embedding-ada-002` (OpenAI GPT3.5? embedding), and then seeing how well we can query these embeddings. GPT3.5/4 is used for the embeddings now as they most likely have the best zero-shot performance, this may be changed to a fine-tuned model in the future, however for the experimental stage it's easier to use a model which has very good out-of-the-box performance. This can have dual usage:

1. For analysis, similar situations can be compared or queried in the future which allows us to create a large database of League of Legends situations. This can be used to coach new players on what high elo or pro players would of done in a similar situation, for coaches to analyse similar situations in the future and get a more nuanced summary or description of events that occured.
2. For creating a game playing bot. If we cover a large enough number of situations, the game playing bot can either copy what was done before, or attempt to generalise from examples of similar situations.

## Dataset

The dataset for this notebook is Game 5 of the League of Legends Worlds 2022 Finals. The processed version of the *.rofl file is available on [Google Drive](https://drive.google.com/file/d/1kZchHUksTCOvpN_hJZ5iVvESF6Be5FPt/view?usp=sharing).

### Dataset Reliability

There is a possibility that some of the fields may be inaccurate, it's a good idea to eyeball check the data first.

## Load OpenAI API Key

In [29]:
import os
import openai
from dotenv import load_dotenv
load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")

## Load the Replay File

In [24]:
import pandas as pd
import os
from pathlib import Path
from sqlite3 import connect

HOME   = Path(os.getcwd())
REPLAY = "ESPORTSTMNT02-3080905(old).db"
conn   = connect(HOME / REPLAY)

champs_df   = pd.read_sql('SELECT * FROM champs;',   conn)
missiles_df = pd.read_sql('SELECT * FROM missiles;', conn)
objects_df  = pd.read_sql('SELECT * FROM objects;',  conn)
game_df     = pd.read_sql('SELECT * FROM games;',  conn)
conn.close()

### Preprocess Replay File

#### Remove Duplicate Frames

In [26]:
champs_df = champs_df.drop_duplicates(subset=['time', 'name'], keep='first')
missiles_df = missiles_df.drop_duplicates(subset=['time', 'name'], keep='first')
objects_df = objects_df.drop_duplicates(subset=['time', 'name'], keep='first')

#### Unique Counts per Table

In [64]:
unique_champs   = champs_df['name'].nunique()
unique_missiles = missiles_df['name'].nunique()
unique_objects  = objects_df['name'].nunique()

In [65]:
unique_champs, unique_missiles, unique_objects

(10, 83, 355)

### Add Player to `champs_df`

In [66]:
player_to_champ = {
    "gwen":     "T1 Zeus",
    "viego":    "T1 Oner",
    "viktor":   "T1 Faker",
    "varus":    "T1 Gumayusi",
    "karma":    "T1 Keria",
    "aatrox":   "DRX Kingen",
    "hecarim":  "DRX Pyosik",
    "azir":     "DRX Zeka",
    "caitlyn":  "DRX Deft",
    "bard":     "DRX BeryL"
}

player_to_role = {
    "gwen":     "top",
    "viego":    "jungle",
    "viktor":   "mid",
    "varus":    "adc",
    "karma":    "support",
    "aatrox":   "top",
    "hecarim":  "jungle",
    "azir":     "mid",
    "caitlyn":  "adc",
    "bard":     "support"
}

In [67]:
champs_df["player"] = champs_df["name"].apply(lambda name: player_to_champ[name])

In [68]:
champs_df["role"] = champs_df["name"].apply(lambda name: player_to_role[name])

In [70]:
champs_df[["player", "name", "role"]]

Unnamed: 0,player,name,role
0,T1 Gumayusi,varus,adc
1,DRX Zeka,azir,mid
2,DRX Kingen,aatrox,top
3,DRX BeryL,bard,support
4,T1 Oner,viego,jungle
...,...,...,...
170533,DRX Kingen,aatrox,top
170534,DRX BeryL,bard,support
170535,T1 Keria,karma,support
170536,DRX Pyosik,hecarim,jungle


In [74]:
champs_df.columns

Index(['game_id', 'time', 'obj_type', 'net_id', 'obj_id', 'name', 'health',
       'max_health', 'team', 'armour', 'mr', 'movement_speed', 'is_alive',
       'position_x', 'position_y', 'position_z', 'is_moving', 'targetable',
       'invulnerable', 'recallState', 'q_name', 'q_level', 'q_cd', 'w_name',
       'w_level', 'w_cd', 'e_name', 'e_level', 'e_cd', 'r_name', 'r_level',
       'r_cd', 'd_name', 'd_level', 'd_cd', 'd_summoner_spell_type', 'f_name',
       'f_level', 'f_cd', 'f_summoner_spell_type', 'crit', 'critMulti',
       'level', 'mana', 'max_mana', 'ability_haste', 'ap', 'lethality',
       'experience', 'mana_regen', 'health_regen', 'attack_range',
       'current_gold', 'total_gold', 'player', 'role'],
      dtype='object')

## Test GPT Queries

### Test Starting ADC Positions Prompt

#### Get Positions

In [135]:
min_spawn_adc_positions_prompt = """i have a pd df variable (called champs_df) with these columns

Index(['game_id', 'time', 'obj_type', 'net_id', 'obj_id', 'name', 'health',
       'max_health', 'team', 'armour', 'mr', 'movement_speed', 'is_alive',
       'position_x', 'position_y', 'position_z', 'is_moving', 'targetable',
       'invulnerable', 'recallState', 'q_name', 'q_level', 'q_cd', 'w_name',
       'w_level', 'w_cd', 'e_name', 'e_level', 'e_cd', 'r_name', 'r_level',
       'r_cd', 'd_name', 'd_level', 'd_cd', 'd_summoner_spell_type', 'f_name',
       'f_level', 'f_cd', 'f_summoner_spell_type', 'crit', 'critMulti',
       'level', 'mana', 'max_mana', 'ability_haste', 'ap', 'lethality',
       'experience', 'mana_regen', 'health_regen', 'attack_range',
       'current_gold', 'total_gold', 'player', 'role'],
      dtype='object')

get me the position of the players ( one of the players has team=100, the other has team=200) with role "adc" at around minute 1:40 (give or take a second as the recorded times are extremely accurate). do not enumerate the players, just return them based on team index, return only the first instance"""

In [136]:
def ask(prompt):
    return openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=[
                {"role": "system", "content": "You are a helpful assistant."},
                {"role": "user", "content": prompt}
            ]
    )

In [137]:
result = ask(min_spawn_adc_positions_prompt)

In [141]:
chatgpt_code = result["choices"][0]["message"]["content"]
print(chatgpt_code)
chatgpt_code = [c for c in chatgpt_code]
#print(chatgpt_code)
if chatgpt_code[0:6] == "python":
    code = "".join(chatgpt_code.split("")[6:])
elif "```" in "".join(chatgpt_code):
    chatgpt_code = "".join(chatgpt_code)
    chatgpt_code = chatgpt_code.replace("python", "")
    code = chatgpt_code.split("```")[1]
code

Here's one way to do it using pandas:

```python
# Filter the DataFrame to contain only rows at around minute 1:40
time_mask = (champs_df['time'] >= 100) & (champs_df['time'] <= 101)

# Filter the DataFrame to contain only rows with role "adc"
role_mask = champs_df['role'] == 'adc'

# Filter the DataFrame to contain only rows for the two different teams
team_100_mask = champs_df['team'] == 100
team_200_mask = champs_df['team'] == 200

# Combine all the masks using the "&" and "|" operators
mask = time_mask & role_mask & (team_100_mask | team_200_mask)

# Select the position_x and position_y columns using the loc accessor
positions = champs_df.loc[mask, ['team', 'position_x', 'position_y']]

# Split the DataFrame into two DataFrames based on team index
team_100_positions = positions[positions['team'] == 100]
team_200_positions = positions[positions['team'] == 200]

# Return the first row from each DataFrame (if they exist)
adc_100_position = team_100_positions.iloc[0] if not team_100_po

'\n# Filter the DataFrame to contain only rows at around minute 1:40\ntime_mask = (champs_df[\'time\'] >= 100) & (champs_df[\'time\'] <= 101)\n\n# Filter the DataFrame to contain only rows with role "adc"\nrole_mask = champs_df[\'role\'] == \'adc\'\n\n# Filter the DataFrame to contain only rows for the two different teams\nteam_100_mask = champs_df[\'team\'] == 100\nteam_200_mask = champs_df[\'team\'] == 200\n\n# Combine all the masks using the "&" and "|" operators\nmask = time_mask & role_mask & (team_100_mask | team_200_mask)\n\n# Select the position_x and position_y columns using the loc accessor\npositions = champs_df.loc[mask, [\'team\', \'position_x\', \'position_y\']]\n\n# Split the DataFrame into two DataFrames based on team index\nteam_100_positions = positions[positions[\'team\'] == 100]\nteam_200_positions = positions[positions[\'team\'] == 200]\n\n# Return the first row from each DataFrame (if they exist)\nadc_100_position = team_100_positions.iloc[0] if not team_100_posit

In [142]:
exec(code)

The position of the ADC on team 100 is (11234.8837890625, -8.445413589477539)
The position of the ADC on team 200 is (12160.3828125, 52.02464294433594)


## First Minute of Gameplay

### Player to Champ Mapping

In [71]:
start_time = 0.0
end_time   = 60.0

first_min_champs_df = champs_df[(champs_df['time'] >= start_time) & (champs_df['time'] <= end_time)]
first_min_missiles_df = missiles_df[(missiles_df['time'] >= start_time) & (missiles_df['time'] <= end_time)]
first_min_objects_df = objects_df[(objects_df['time'] >= start_time) & (objects_df['time'] <= end_time)]

In [72]:
first_min_champs_df

Unnamed: 0,game_id,time,obj_type,net_id,obj_id,name,health,max_health,team,armour,...,ap,lethality,experience,mana_regen,health_regen,attack_range,current_gold,total_gold,player,role
0,3080905,2.006888,champs,1073741856,2589,varus,600.000000,600.0,100,0.0,...,0.0,0.0,1.0,1.60,0.7,575.0,500.0,500.0,T1 Gumayusi,adc
1,3080905,2.006888,champs,1073741860,2608,azir,622.000000,622.0,200,0.0,...,9.0,0.0,1.0,1.60,1.4,525.0,500.0,500.0,DRX Zeka,mid
2,3080905,2.006888,champs,1073741858,2594,aatrox,730.000000,730.0,200,0.0,...,0.0,0.0,1.0,0.00,1.8,225.0,50.0,500.0,DRX Kingen,top
3,3080905,2.006888,champs,1073741862,2615,bard,630.000000,630.0,200,0.0,...,0.0,0.0,1.0,1.20,1.1,500.0,500.0,500.0,DRX BeryL,support
4,3080905,2.006888,champs,1073741854,2583,viego,630.000000,630.0,100,0.0,...,0.0,0.0,1.0,0.00,1.4,200.0,150.0,500.0,T1 Oner,jungle
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6135,3080905,59.985725,champs,1073741857,2844,karma,614.000000,614.0,100,0.0,...,29.0,0.0,1.0,3.90,1.1,525.0,0.0,500.0,T1 Keria,support
6136,3080905,59.985725,champs,1073741859,2803,hecarim,601.378601,625.0,200,0.0,...,0.0,0.0,1.0,1.30,1.4,175.0,25.0,500.0,DRX Pyosik,jungle
6137,3080905,59.985725,champs,1073741861,2777,caitlyn,580.000000,580.0,200,0.0,...,0.0,0.0,1.0,1.48,0.7,650.0,0.0,500.0,DRX Deft,adc
6138,3080905,59.985725,champs,1073741853,2833,gwen,700.000000,700.0,100,0.0,...,9.0,0.0,1.0,1.50,2.9,150.0,0.0,500.0,T1 Zeus,top


In [73]:
first_min_champs_df.columns

Index(['game_id', 'time', 'obj_type', 'net_id', 'obj_id', 'name', 'health',
       'max_health', 'team', 'armour', 'mr', 'movement_speed', 'is_alive',
       'position_x', 'position_y', 'position_z', 'is_moving', 'targetable',
       'invulnerable', 'recallState', 'q_name', 'q_level', 'q_cd', 'w_name',
       'w_level', 'w_cd', 'e_name', 'e_level', 'e_cd', 'r_name', 'r_level',
       'r_cd', 'd_name', 'd_level', 'd_cd', 'd_summoner_spell_type', 'f_name',
       'f_level', 'f_cd', 'f_summoner_spell_type', 'crit', 'critMulti',
       'level', 'mana', 'max_mana', 'ability_haste', 'ap', 'lethality',
       'experience', 'mana_regen', 'health_regen', 'attack_range',
       'current_gold', 'total_gold', 'player', 'role'],
      dtype='object')