# Using OpenAI to interact with the datasets
The code below follows [the approach described in the langchaing How-to Guides](https://python.langchain.com/en/latest/modules/chains/examples/sqlite.html#using-local-language-models) in order to enable interaction with the dataset via text.

Results look promising, however using OpenAI's APIs is quite expensive. In order to productionalise something like this in the streamlit app, we need to find a cheaper model to generate the SQL.

In [2]:
import os

home = os.path.expanduser("~/transfermarkt-datasets") 
os.chdir(home)

In [4]:
from langchain.chat_models import ChatOpenAI
from langchain.chains import SQLDatabaseChain
from langchain.sql_database import SQLDatabase

In [7]:
llm = ChatOpenAI(max_retries=1, temperature=0, verbose=True, model="gpt-3.5-turbo")
db = SQLDatabase.from_uri(
    "duckdb:///dbt/duck.db",
    include_tables=[
        "players",
        "games",
        "clubs",
        "appearances",
        "player_valuations",
        "game_events",
        "club_games"
    ]
)
db_chain = SQLDatabaseChain.from_llm(llm, db, verbose=True, use_query_checker=True)



In [6]:
db_chain.run("Who are the top 10 players by market value? Exclude players with missing values")



[1m> Entering new SQLDatabaseChain chain...[0m
Who are the top 10 players by market value? Exclude players with missing values
SQLQuery:[32;1m[1;3mSELECT "player_id", "name", "market_value_in_eur"
FROM "players"
WHERE "market_value_in_eur" IS NOT NULL
ORDER BY "market_value_in_eur" DESC
LIMIT 10[0m
SQLResult: [33;1m[1;3m[(342229, 'Kylian Mbappé', 180000000), (418560, 'Erling Haaland', 170000000), (371998, 'Vinicius Junior', 120000000), (581678, 'Jude Bellingham', 120000000), (433177, 'Bukayo Saka', 110000000), (406635, 'Phil Foden', 110000000), (580195, 'Jamal Musiala', 110000000), (401923, 'Victor Osimhen', 100000000), (369081, 'Federico Valverde', 100000000), (683840, 'Pedri', 100000000)][0m
Answer:[32;1m[1;3mThe top 10 players by market value are Kylian Mbappé, Erling Haaland, Vinicius Junior, Jude Bellingham, Bukayo Saka, Phil Foden, Jamal Musiala, Victor Osimhen, Federico Valverde, and Pedri.[0m
[1m> Finished chain.[0m


'The top 10 players by market value are Kylian Mbappé, Erling Haaland, Vinicius Junior, Jude Bellingham, Bukayo Saka, Phil Foden, Jamal Musiala, Victor Osimhen, Federico Valverde, and Pedri.'

In [7]:
db_chain.run("Give me the list of players from FC Barcelona, together with the total number of games they played this season and the average minutes per game. Sort by total number of games played.")



[1m> Entering new SQLDatabaseChain chain...[0m
Give me the list of players from FC Barcelona, together with the total number of games they played this season and the average minutes per game. Sort by total number of games played.
SQLQuery:[32;1m[1;3mSELECT players.name, COUNT(appearances.game_id) AS total_games, AVG(appearances.minutes_played) AS avg_minutes_per_game
FROM players
JOIN appearances ON players.player_id = appearances.player_id
JOIN clubs ON appearances.player_current_club_id = clubs.club_id
WHERE clubs.name = 'FC Barcelona' AND appearances.date >= '2021-08-01'
GROUP BY players.name
ORDER BY total_games DESC
LIMIT 5[0m
SQLResult: [33;1m[1;3m[('Marc-André ter Stegen', 90, 91.0), ('Gavi', 88, 68.67045454545455), ('Sergio Busquets', 87, 82.64367816091954), ('Robert Lewandowski', 83, 86.144578313253), ('Frenkie de Jong', 81, 74.77777777777777)][0m
Answer:[32;1m[1;3mThe list of players from FC Barcelona, together with the total number of games they played this seaso

'The list of players from FC Barcelona, together with the total number of games they played this season and the average minutes per game, sorted by total number of games played, is:\n- Marc-André ter Stegen played 90 games with an average of 91 minutes per game.\n- Gavi played 88 games with an average of 68.67 minutes per game.\n- Sergio Busquets played 87 games with an average of 82.64 minutes per game.\n- Robert Lewandowski played 83 games with an average of 86.14 minutes per game.\n- Frenkie de Jong played 81 games with an average of 74.78 minutes per game.'