## Minutes estimation data

Here, we will obtain a dataframe of players and their minutes played. We will also have information about whether or not they were absent from the game (according to TransferMarkt).

First we need to set up a fresh database and update it:

1. Set up database with:

```
airsenal_setup_initial_db --clean --n_previous 7
```

2. Update database with:

```
airsenal_update_db
```

In [1]:
import pandas as pd
from airsenal.framework.prediction_utils import get_player_history_df, get_player
from airsenal.framework.utils import was_historic_absence
from airsenal.framework.schema import session, PlayerAttributes
from airsenal.framework.utils import (
    CURRENT_SEASON,
    NEXT_GAMEWEEK
)

  from .autonotebook import tqdm as notebook_tqdm


We use `get_player_history_df` which will get the player histories for the players that we have data for (by setting `all_players=True`).

In [14]:
history_df = get_player_history_df(position="all",
                                   all_players=True,
                                   fill_blank=False,
                                   dbsession=session)

Filling history dataframe for Granit Xhaka: 0/2104 done
Filling history dataframe for Mohamed Elneny: 1/2104 done
Filling history dataframe for Rob Holding: 2/2104 done
Filling history dataframe for Thomas Partey: 3/2104 done
Filling history dataframe for Martin Ødegaard: 4/2104 done
Filling history dataframe for Kieran Tierney: 5/2104 done
Filling history dataframe for Nicolas Pépé: 6/2104 done
Filling history dataframe for Benjamin White: 7/2104 done
Filling history dataframe for Eddie Nketiah: 8/2104 done
Filling history dataframe for Emile Smith Rowe: 9/2104 done
Filling history dataframe for Bukayo Saka: 10/2104 done
Filling history dataframe for Takehiro Tomiyasu: 11/2104 done
Filling history dataframe for Aaron Ramsdale: 12/2104 done
Filling history dataframe for Gabriel dos Santos Magalhães: 13/2104 done
Filling history dataframe for Nuno Varela Tavares: 14/2104 done
Filling history dataframe for Gabriel Martinelli Silva: 15/2104 done
Filling history dataframe for Pablo Marí Vi

Filling history dataframe for Mads Bidstrup: 141/2104 done
Filling history dataframe for Ellery Balcombe: 142/2104 done
Filling history dataframe for Aaron Hickey: 143/2104 done
Filling history dataframe for Keane Lewis-Potter: 144/2104 done
Filling history dataframe for Thomas Strakosha: 145/2104 done
Filling history dataframe for Ben Mee: 146/2104 done
Filling history dataframe for Fin Stevens: 147/2104 done
Filling history dataframe for Halil Dervişoğlu: 148/2104 done
Filling history dataframe for Mikkel Damsgaard: 149/2104 done
Filling history dataframe for Mathias Jorgensen: 150/2104 done
Filling history dataframe for Ryan Trevitt: 151/2104 done
Filling history dataframe for Matthew Cox: 152/2104 done
Filling history dataframe for Tristan Crama: 153/2104 done
Filling history dataframe for Yegor Yarmolyuk: 154/2104 done
Filling history dataframe for Kevin Schade: 155/2104 done
Filling history dataframe for Michael Olakigbe: 156/2104 done
Filling history dataframe for Adam Lallana: 

Filling history dataframe for Asmir Begović: 279/2104 done
Filling history dataframe for Salomón Rondón: 280/2104 done
Filling history dataframe for Seamus Coleman: 281/2104 done
Filling history dataframe for Andros Townsend: 282/2104 done
Filling history dataframe for Michael Keane: 283/2104 done
Filling history dataframe for Dele Alli: 284/2104 done
Filling history dataframe for Jordan Pickford: 285/2104 done
Filling history dataframe for Allan Marques Loureiro: 286/2104 done
Filling history dataframe for André Tavares Gomes: 287/2104 done
Filling history dataframe for Abdoulaye Doucouré: 288/2104 done
Filling history dataframe for Alex Iwobi: 289/2104 done
Filling history dataframe for Jean-Philippe Gbamin: 290/2104 done
Filling history dataframe for Yerry Mina: 291/2104 done
Filling history dataframe for Demarai Gray: 292/2104 done
Filling history dataframe for Tom Davies: 293/2104 done
Filling history dataframe for Dominic Calvert-Lewin: 294/2104 done
Filling history dataframe for

Filling history dataframe for Joe Gelhardt: 417/2104 done
Filling history dataframe for Leo Fuhr Hjelde: 418/2104 done
Filling history dataframe for Rasmus Kristensen: 419/2104 done
Filling history dataframe for Marc Roca Junqué: 420/2104 done
Filling history dataframe for Brenden Aaronson: 421/2104 done
Filling history dataframe for Darko Gyabi: 422/2104 done
Filling history dataframe for Tyler Adams: 423/2104 done
Filling history dataframe for Luis Sinisterra Lucumí: 424/2104 done
Filling history dataframe for Cody Drameh: 425/2104 done
Filling history dataframe for Archie Gray: 426/2104 done
Filling history dataframe for Joel Robles: 427/2104 done
Filling history dataframe for Wilfried Gnonto: 428/2104 done
Filling history dataframe for Mateo Joseph Fernández: 429/2104 done
Filling history dataframe for Sonny Perkins: 430/2104 done
Filling history dataframe for Maximilian Wöber: 431/2104 done
Filling history dataframe for Georginio Rutter: 432/2104 done
Filling history dataframe for

Filling history dataframe for Martin Dubravka: 550/2104 done
Filling history dataframe for Callum Wilson: 551/2104 done
Filling history dataframe for Kieran Trippier: 552/2104 done
Filling history dataframe for Dan Burn: 553/2104 done
Filling history dataframe for Ryan Fraser: 554/2104 done
Filling history dataframe for Jamaal Lascelles: 555/2104 done
Filling history dataframe for Dwight Gayle: 556/2104 done
Filling history dataframe for Paul Dummett: 557/2104 done
Filling history dataframe for Javier Manquillo Gaitán: 558/2104 done
Filling history dataframe for Emil Krafth: 559/2104 done
Filling history dataframe for Jacob Murphy: 560/2104 done
Filling history dataframe for Fabian Schär: 561/2104 done
Filling history dataframe for Matt Targett: 562/2104 done
Filling history dataframe for Allan Saint-Maximin: 563/2104 done
Filling history dataframe for Miguel Almirón Rejala: 564/2104 done
Filling history dataframe for Sean Longstaff: 565/2104 done
Filling history dataframe for Joelinto

Filling history dataframe for Joe Rodon: 688/2104 done
Filling history dataframe for Cristian Romero: 689/2104 done
Filling history dataframe for Yves Bissouma: 690/2104 done
Filling history dataframe for Emerson Leite de Souza Junior: 691/2104 done
Filling history dataframe for Dejan Kulusevski: 692/2104 done
Filling history dataframe for Dane Scarlett: 693/2104 done
Filling history dataframe for Ivan Perišić: 694/2104 done
Filling history dataframe for Harvey White: 695/2104 done
Filling history dataframe for Pape Matar Sarr: 696/2104 done
Filling history dataframe for Brandon Austin: 697/2104 done
Filling history dataframe for Alfie Devine: 698/2104 done
Filling history dataframe for Troy Parrott: 699/2104 done
Filling history dataframe for Richarlison de Andrade: 700/2104 done
Filling history dataframe for Clément Lenglet: 701/2104 done
Filling history dataframe for Djed Spence: 702/2104 done
Filling history dataframe for Bryan Gil Salvatierra: 703/2104 done
Filling history datafra

Filling history dataframe for Przemyslaw Placheta: 823/2104 done
Filling history dataframe for Ben Wilmot: 824/2104 done
Filling history dataframe for Domingos Quina: 825/2104 done
Filling history dataframe for Josh Benson: 826/2104 done
Filling history dataframe for Edinson Cavani: 827/2104 done
Filling history dataframe for Alireza Jahanbakhsh: 828/2104 done
Filling history dataframe for Sead Kolasinac: 829/2104 done
Filling history dataframe for Pierre Lees-Melou: 830/2104 done
Filling history dataframe for Andreas Söndergaard: 831/2104 done
Filling history dataframe for Folarin Balogun: 832/2104 done
Filling history dataframe for Moise Kean: 833/2104 done
Filling history dataframe for Charlie Whitaker: 834/2104 done
Filling history dataframe for Kwadwo Baah: 835/2104 done
Filling history dataframe for Jeremy Ngakia: 836/2104 done
Filling history dataframe for Ryan Giles: 837/2104 done
Filling history dataframe for Julian Jeanvier: 838/2104 done
Filling history dataframe for Joe Whi

Filling history dataframe for Lukas Rupp: 963/2104 done
Filling history dataframe for Anthony Mancini: 964/2104 done
Filling history dataframe for Nicolas Nkoulou: 965/2104 done
Filling history dataframe for Lewis Dobbin: 966/2104 done
Filling history dataframe for Samir Caetano de Souza Santos: 967/2104 done
Filling history dataframe for Jarosław Jach: 968/2104 done
Filling history dataframe for Reece Hannam: 969/2104 done
Filling history dataframe for Thomas McGill: 970/2104 done
Filling history dataframe for Sam Waller: 971/2104 done
Filling history dataframe for Conor Bradley: 972/2104 done
Filling history dataframe for Matthew Pollock: 973/2104 done
Filling history dataframe for Michal Karbownik: 974/2104 done
Filling history dataframe for Nohan Kenneh: 975/2104 done
Filling history dataframe for James Rodríguez: 976/2104 done
Filling history dataframe for Freddie Woodman: 977/2104 done
Filling history dataframe for Ashley Barnes: 978/2104 done
Filling history dataframe for Michae

Filling history dataframe for Dara O'Shea: 1101/2104 done
Filling history dataframe for Theo Corbeanu: 1102/2104 done
Filling history dataframe for Wes Morgan: 1103/2104 done
Filling history dataframe for Liam Hughes: 1104/2104 done
Filling history dataframe for Roberto Jimenez Gago: 1105/2104 done
Filling history dataframe for William Osula: 1106/2104 done
Filling history dataframe for Faustino Anjorin: 1107/2104 done
Filling history dataframe for Jordan Stevens: 1108/2104 done
Filling history dataframe for Teden Mengi: 1109/2104 done
Filling history dataframe for Scott Dann: 1110/2104 done
Filling history dataframe for Daniel Jebbison: 1111/2104 done
Filling history dataframe for Sam Field: 1112/2104 done
Filling history dataframe for Denis Odoi: 1113/2104 done
Filling history dataframe for Teddy Jenks: 1114/2104 done
Filling history dataframe for Oliver Norwood: 1115/2104 done
Filling history dataframe for William Fish: 1116/2104 done
Filling history dataframe for Barry Douglas: 111

Filling history dataframe for Robert Snodgrass: 1236/2104 done
Filling history dataframe for Thakgalo Leshabela: 1237/2104 done
Filling history dataframe for Sergio Romero: 1238/2104 done
Filling history dataframe for Nile John: 1239/2104 done
Filling history dataframe for Filip Benkovic: 1240/2104 done
Filling history dataframe for Sébastien Haller: 1241/2104 done
Filling history dataframe for Conor Townsend: 1242/2104 done
Filling history dataframe for Maxime Le Marchand: 1243/2104 done
Filling history dataframe for Georginio Wijnaldum: 1244/2104 done
Filling history dataframe for Ahmed El Mohamady: 1245/2104 done
Filling history dataframe for Robbie Brady: 1246/2104 done
Filling history dataframe for Oliver Burke: 1247/2104 done
Filling history dataframe for Branislav Ivanovic: 1248/2104 done
Filling history dataframe for DeAndre Yedlin: 1249/2104 done
Filling history dataframe for Grady Diangana: 1250/2104 done
Filling history dataframe for Reda Khadra: 1251/2104 done
Filling histo

Filling history dataframe for Tom Trybull: 1372/2104 done
Filling history dataframe for Maya Yoshida: 1373/2104 done
Filling history dataframe for Pedro Obiang: 1374/2104 done
Filling history dataframe for Birkir Bjarnason: 1375/2104 done
Filling history dataframe for Michel Vorm: 1376/2104 done
Filling history dataframe for Kieron Freeman: 1377/2104 done
Filling history dataframe for Zech Medley: 1378/2104 done
Filling history dataframe for George Hirst: 1379/2104 done
Filling history dataframe for Borja González Tomás: 1380/2104 done
Filling history dataframe for Morgan Schneiderlin: 1381/2104 done
Filling history dataframe for Daniel Drinkwater: 1382/2104 done
Filling history dataframe for Panagiotis Retsos: 1383/2104 done
Filling history dataframe for Steven Defour: 1384/2104 done
Filling history dataframe for Jetro Willems: 1385/2104 done
Filling history dataframe for José Heriberto Izquierdo Mena: 1386/2104 done
Filling history dataframe for Jacob Maddox: 1387/2104 done
Filling h

Filling history dataframe for Jaroslaw Jach: 1506/2104 done
Filling history dataframe for Mousa Dembélé: 1507/2104 done
Filling history dataframe for Jerome Sinclair: 1508/2104 done
Filling history dataframe for Omar Bogle: 1509/2104 done
Filling history dataframe for Fernando Llorente: 1510/2104 done
Filling history dataframe for Nathaniel Mendez-Laing: 1511/2104 done
Filling history dataframe for Davy Klaassen: 1512/2104 done
Filling history dataframe for Cuco Martina: 1513/2104 done
Filling history dataframe for Guido Carrillo: 1514/2104 done
Filling history dataframe for Stephan Lichtsteiner: 1515/2104 done
Filling history dataframe for Alex Pritchard: 1516/2104 done
Filling history dataframe for Manolo Gabbiadini: 1517/2104 done
Filling history dataframe for Demeaco Duhaney: 1518/2104 done
Filling history dataframe for Ragnar Klavan: 1519/2104 done
Filling history dataframe for Steve Mounie: 1520/2104 done
Filling history dataframe for Tomer Hemed: 1521/2104 done
Filling history d

Filling history dataframe for Sean Scannell: 1644/2104 done
Filling history dataframe for Wilfried Bony: 1645/2104 done
Filling history dataframe for Jesús Gámez Duarte: 1646/2104 done
Filling history dataframe for Aiden O'Neill: 1647/2104 done
Filling history dataframe for Dion Henry: 1648/2104 done
Filling history dataframe for James McClean: 1649/2104 done
Filling history dataframe for Ryan Shawcross: 1650/2104 done
Filling history dataframe for Conor Masterson: 1651/2104 done
Filling history dataframe for Saido Berahino: 1652/2104 done
Filling history dataframe for Michael Carrick: 1653/2104 done
Filling history dataframe for Renato Sanches: 1654/2104 done
Filling history dataframe for Nacer Chadli: 1655/2104 done
Filling history dataframe for Reece Burke: 1656/2104 done
Filling history dataframe for Ben Watson: 1657/2104 done
Filling history dataframe for Marc Muniesa: 1658/2104 done
Filling history dataframe for Paul Robinson: 1659/2104 done
Filling history dataframe for Thomas E

Filling history dataframe for Brice Dja Djédjé: 1782/2104 done
Filling history dataframe for Marten de Roon: 1783/2104 done
Filling history dataframe for Mateusz Hewelt: 1784/2104 done
Filling history dataframe for Jan Kirchhoff: 1785/2104 done
Filling history dataframe for Ron-Robert Zieler: 1786/2104 done
Filling history dataframe for Rickie Lambert: 1787/2104 done
Filling history dataframe for Will Keane: 1788/2104 done
Filling history dataframe for Dimitri Payet: 1789/2104 done
Filling history dataframe for Álvaro Negredo: 1790/2104 done
Filling history dataframe for Jack Rose: 1791/2104 done
Filling history dataframe for John Terry: 1792/2104 done
Filling history dataframe for Rene Gilmartin: 1793/2104 done
Filling history dataframe for Arouna Koné: 1794/2104 done
Filling history dataframe for Darron Gibson: 1795/2104 done
Filling history dataframe for Eunan O'Kane: 1796/2104 done
Filling history dataframe for Steve Mandanda: 1797/2104 done
Filling history dataframe for George Fri

Filling history dataframe for Juan Cuadrado: 1917/2104 done
Filling history dataframe for Daniel Ayala: 1918/2104 done
Filling history dataframe for Greg Luer: 1919/2104 done
Filling history dataframe for Julien de Sart: 1920/2104 done
Filling history dataframe for Mouez Hassen: 1921/2104 done
Filling history dataframe for Shaun MacDonald: 1922/2104 done
Filling history dataframe for Lukas Jutkiewicz: 1923/2104 done
Filling history dataframe for Carlos De Pena: 1924/2104 done
Filling history dataframe for Allan McGregor: 1925/2104 done
Filling history dataframe for Memphis Depay: 1926/2104 done
Filling history dataframe for Sébastien Pocognoli: 1927/2104 done
Filling history dataframe for Thibaud Verlinden: 1928/2104 done
Filling history dataframe for Raphael Spiegel: 1929/2104 done
Filling history dataframe for Michael Phillips: 1930/2104 done
Filling history dataframe for Bojan Krkic: 1931/2104 done
Filling history dataframe for Markus Henriksen: 1932/2104 done
Filling history datafr

Filling history dataframe for Song: 2085/2104 done
Filling history dataframe for Sterry: 2086/2104 done
Filling history dataframe for Steven Naismith: 2087/2104 done
Filling history dataframe for Steven Taylor: 2088/2104 done
Filling history dataframe for Sylvain Distin: 2089/2104 done
Filling history dataframe for Tettey: 2090/2104 done
Filling history dataframe for Thauvin: 2091/2104 done
Filling history dataframe for Tim Howard: 2092/2104 done
Filling history dataframe for Tioté: 2093/2104 done
Filling history dataframe for Toivonen: 2094/2104 done
Filling history dataframe for Tomlin: 2095/2104 done
Filling history dataframe for Tomori: 2096/2104 done
Filling history dataframe for Toner: 2097/2104 done
Filling history dataframe for Toney: 2098/2104 done
Filling history dataframe for Touré: 2099/2104 done
Filling history dataframe for Veretout: 2100/2104 done
Filling history dataframe for Ward: 2101/2104 done
Filling history dataframe for Whittaker: 2102/2104 done
Filling history da

We can see that we have information for each player in the games we have data for them. If there is an absence according to TransferMarkt, the reason will be found in `absence_reason` and more detail is given in `absence_detail`.

In [15]:
history_df

Unnamed: 0,player_id,player_name,match_id,date,season,gameweek,goals,assists,minutes,team_goals,absence_reason,absence_detail
0,1,Granit Xhaka,1,2022-08-05 19:00:00+00:00,2223,1,0,0,90,2,,
1,1,Granit Xhaka,12,2022-08-13 14:00:00+00:00,2223,2,1,1,90,4,,
2,1,Granit Xhaka,26,2022-08-20 16:30:00+00:00,2223,3,0,1,87,3,,
3,1,Granit Xhaka,37,2022-08-27 16:30:00+00:00,2223,4,0,0,90,2,,
4,1,Granit Xhaka,45,2022-08-31 18:30:00+00:00,2223,5,0,0,90,2,,
...,...,...,...,...,...,...,...,...,...,...,...,...
166620,1849,George Boyd,2332,2017-04-23 00:00:00,1617,34,0,0,61,0,,
166621,1849,George Boyd,2326,2017-04-29 00:00:00,1617,35,0,1,90,2,,
166622,1849,George Boyd,2312,2017-05-06 00:00:00,1617,36,0,0,82,2,,
166623,1849,George Boyd,2300,2017-05-13 00:00:00,1617,37,0,0,54,1,,


In [16]:
history_df[history_df["absence_reason"].notnull()]

Unnamed: 0,player_id,player_name,match_id,date,season,gameweek,goals,assists,minutes,team_goals,absence_reason,absence_detail
42,1,Granit Xhaka,423,2021-09-18 14:00:00+00:00,2122,5,0,0,0,1,suspension,Red card suspension
43,1,Granit Xhaka,439,2021-09-26 15:30:00+00:00,2122,6,0,0,81,3,suspension,Red card suspension
45,1,Granit Xhaka,460,2021-10-18 19:00:00+00:00,2122,8,0,0,0,2,injury,Medial Collateral Ligament Injury
46,1,Granit Xhaka,461,2021-10-22 19:00:00+00:00,2122,9,0,0,0,3,injury,Medial Collateral Ligament Injury
47,1,Granit Xhaka,471,2021-10-30 11:30:00+00:00,2122,10,0,0,0,2,injury,Medial Collateral Ligament Injury
...,...,...,...,...,...,...,...,...,...,...,...,...
166566,1889,Josh Clackstone,2335,2017-04-22 00:00:00,1617,34,0,0,0,2,Transfer,Transferred to Notts County
166567,1889,Josh Clackstone,2322,2017-04-29 00:00:00,1617,35,0,0,0,0,Transfer,Transferred to Notts County
166568,1889,Josh Clackstone,2313,2017-05-06 00:00:00,1617,36,0,0,0,0,Transfer,Transferred to Notts County
166569,1889,Josh Clackstone,2296,2017-05-14 00:00:00,1617,37,0,0,0,0,Transfer,Transferred to Notts County


Note that there are some that although are listed to have an absence reason that play more than 0 minutes. This may be because the injury isn't as severe as once thought, or maybe an error in data collection. The few below are due to name matching errors for "Ben Davies", since this player ID refers to the Ben Davies that played for Tottenham (https://www.transfermarkt.co.uk/ben-davies/profil/spieler/192765). But he's never played for Preston, but there is a different Ben Davies which has (https://www.transfermarkt.co.uk/ben-davies/profil/spieler/257097).

Also, there is an error for Lo Celso, as something went wrong between the 19-20 to 20-21 season.

We will just do a quick fix and remove these...

In [17]:
history_df[[reason=="Transfer" for reason in history_df["absence_reason"]] & history_df["minutes"]>0]

Unnamed: 0,player_id,player_name,match_id,date,season,gameweek,goals,assists,minutes,team_goals,absence_reason,absence_detail
50797,463,Ben Davies,738,2022-05-12 18:45:00+00:00,2122,36,0,0,81,3,Transfer,Transferred to Sheff Utd
50803,463,Ben Davies,797,2020-10-04 15:30:00+00:00,2021,4,0,1,17,6,Transfer,Transferred to Preston
50809,463,Ben Davies,855,2020-11-29 16:30:00+00:00,2021,10,0,0,1,0,Transfer,Transferred to Preston
102350,972,Giovani Lo Celso,1478,2020-07-09 17:00:00+00:00,1920,43,0,0,45,0,Transfer,Transferred to Real Betis
102351,972,Giovani Lo Celso,1488,2020-07-12 15:30:00+00:00,1920,44,0,0,83,2,Transfer,Transferred to Real Betis
102354,972,Giovani Lo Celso,1514,2020-07-26 15:00:00+00:00,1920,47,0,1,59,1,Transfer,Transferred to Real Betis


In [18]:
get_player(463).team("2122", 19)

'TOT'

In [19]:
index_to_remove = history_df[[reason=="Transfer" for reason in history_df["absence_reason"]] & history_df["minutes"]>0].index
history_df = history_df.drop(index_to_remove)

There are some players with more than 1 reason for being absent, but only one which has played despite that...

Again this is Ben Davies due to naming clash...

In [20]:
history_df[[isinstance(item, list) for item in history_df["absence_reason"]] & history_df["minutes"]>0]

Unnamed: 0,player_id,player_name,match_id,date,season,gameweek,goals,assists,minutes,team_goals,absence_reason,absence_detail
50806,463,Ben Davies,826,2020-11-01 19:15:00+00:00,2021,7,0,0,5,2,"[injury, Transfer]","[Hamstring Injury, Transferred to Preston]"


We drop this from the dataset as we know this is wrong. There might be other errors though...

In [21]:
history_df = history_df.drop(50806)
history_df = history_df.reset_index(drop=True)

Save this csv to data directory...

In [22]:
history_df.to_csv("../airsenal/data/minutes_estimation_challenge.csv")