# Teams
This dataset describes all the soccer teams in seven prominent soccer competitions (Italian, Spanish, German, French and English first divisions, World Cup 2018, European Cup 2016). It consists of the following fields:

- city: the city where the team is located. For national teams it is the capital of the country;
- name: the common name of the team;
- area: information about the geographic area associated with the team;
- wyId: the identifier of the team, assigned by Wyscout;
- officialName: the official name of the team (e.g., Juventus FC);
- type: the type of the team. It is "club" for teams in the competitions for clubs and "national" for the teams in international competitions;

# Events
This dataset describes all the events that occur during each match. Each event refers to a ball touch and contains the following information:

- eventId: the identifier of the event's type. Each eventId is associated with an event name (see next point);
- eventName: tteamIdhe name of the event's type. There are seven types of events: pass, foul, shot, duel, free kick, offside and touch;
- subEventId: the identifier of the subevent's type. Each subEventId is associated with a subevent name (see next point);
- subEventName: the name of the subevent's type. Each event type is associated with a different set of subevent types;
- tags: a list of event tags, each one describes additional information about the event (e.g., accurate). Each event type is associated with a different set of tags;
- eventSec: the time when the event occurs (in seconds since the beginning of the current half of the match);
- id: a unique identifier of the event;
- matchId: the identifier of the match the event refers to. The identifier refers to the field "wyId" in the match dataset;
- matchPeriod: the period of the match. It can be "1H" (first half of the match), "2H" (second half of the match), "E1" (first extra time), "E2" (second extra time) or "P" (penalties time);
- playerId: the identifier of the player who generated the event. The identifier refers to the field "wyId" in a player dataset;
- positions: the origin and destination positions associated with the event. Each position is a pair of coordinates (x, y). The x and y coordinates are always in the range [0, 100] and indicate the percentage of the field from the perspective of the attacking team. In particular, the value of the x coordinate indicates the event's nearness (in percentage) to the opponent's goal, while the value of the y coordinates indicates the event's nearness (in percentage) to the right side of the field;
- teamId: the identifier of the player's team. The identifier refers to the field "wyId" in the team dataset.

# Matches
This dataset describes all the matches made available. Each match is a document consisting of the following fields:

- competitionId: the identifier of the competition to which the match belongs to. It is a integer and refers to the field "wyId" of the competition document;
- date and dateutc: the former specifies date and time when the match starts in explicit format (e.g., May 20, 2018 at 8:45:00 PM GMT+2), the latter contains the same information but in the compact format YYYY-MM-DD hh:mm:ss;
- duration: the duration of the match. It can be "Regular" (matches of regular duration of 90 minutes + stoppage time), "ExtraTime" (matches with supplementary times, as it may happen for matches in continental or international competitions), or "Penalities" (matches which end at penalty kicks, as it may happen for continental or international competitions);
- gameweek: the week of the league, starting from the beginning of the league;
- label: contains the name of the two clubs and the result of the match (e.g., "Lazio - Internazionale, 2 - 3");
- roundID: indicates the match-day of the competition to which the match belongs to. During a competition for soccer clubs, each of the participating clubs plays against each of the other clubs twice, once at home and once away. The matches are organized in match-days: all the matches in match-day i are played before the matches in match-day i + 1, even tough some matches can be anticipated or postponed to facilitate players and clubs participating in Continental or Intercontinental competitions. During a competition for national teams, the "roundID" indicates the stage of the competition (eliminatory round, round of 16, quarter finals, semifinals, final);
- seasonId: indicates the season of the match;
- status: it can be "Played" (the match has officially finished), "Cancelled" (the match has been canceled for some reason), "Postponed" (the match has been postponed and no new date and time is available yet) or "Suspended" (the match has been suspended and no new date and time is available yet);
- venue: the stadium where the match was held (e.g., "Stadio Olimpico");
- winner: the identifier of the team which won the game, or 0 if the match ended with a draw;
- wyId: the identifier of the match, assigned by Wyscout;
- teamsData: it contains several subfields describing information about each team that is playing that match: such as lineup, bench composition, list of substitutions, coach and scores:
- hasFormation: it has value 0 if no formation (lineups and benches) is present, and 1 otherwise;
- score: the number of goals scored by the team during the match (not counting penalties);
- scoreET: the number of goals scored by the team during the match, including the extra time (not counting penalties);
- scoreHT: the number of goals scored by the team during the first half of the match;
- scoreP: the total number of goals scored by the team after the penalties;
- side: the team side in the match (it can be "home" or "away");
- teamId: the identifier of the team;
- coachId: the identifier of the team's coach;
- bench: the list of the team's players that started the match in the bench and some basic statistics about their performance during the match (goals, own goals, cards);
- lineup: the list of the team's players in the starting lineup and some basic statistics about their performance during the match (goals, own goals, cards);
- substitutions: the list of team's substitutions during the match, describing the players involved and the minute of the substitution.

# Competitions
This dataset describes seven major soccer competitions (Italian, Spanish, German, French, English first divisions, World cup 2018, European cup 2016). Each competition is a document consisting of the following fields:

area: it denotes the geographic area associated with the league as a sub-document, using the ISO 3166-1 specification (https://www.iso.org/iso-3166-country-codes.html);
format: the format of the competition. All competitions for clubs have value "Domestic league". The competitions for national teams have value "International cup";
- name: the official name of the competition (e.g., Italian first division, Spanish first division, World Cup, etc.);
- type: the typology of the competition. It is "club" for the competitions for clubs and "international" for the competitions for national teams (World Cup 2018, European Cup 2016);
- wyId: the unique identifier of the competition, assigned by Wyscout.

# 积分规则
赢一局加三分，输一局加零分，平一局加一分

In [62]:
import pandas as pd
import numpy as np
df_matches_Eng = pd.read_json('matches/matches_England.json')
print(df_matches_Eng.columns)
for i in range(len(df_matches_Eng)):
    if 'Everton' in df_matches_Eng['label'][i]:
        print(df_matches_Eng['label'][i], df_matches_Eng['teamsData'][i].keys(), df_matches_Eng['winner'][i])

Index(['competitionId', 'date', 'dateutc', 'duration', 'gameweek', 'label',
       'referees', 'roundId', 'seasonId', 'status', 'teamsData', 'venue',
       'winner', 'wyId'],
      dtype='object')
West Ham United - Everton, 3 - 1 dict_keys(['1623', '1633']) 1633
Everton - Southampton, 1 - 1 dict_keys(['1623', '1619']) 0
Huddersfield Town - Everton, 0 - 2 dict_keys(['1623', '1673']) 1623
Everton - Newcastle United, 1 - 0 dict_keys(['1623', '1613']) 1623
Swansea City - Everton, 1 - 1 dict_keys(['1623', '10531']) 0
Everton - Liverpool, 0 - 0 dict_keys(['1623', '1612']) 0
Everton - Manchester City, 1 - 3 dict_keys(['1623', '1625']) 1625
Stoke City - Everton, 1 - 2 dict_keys(['1623', '1639']) 1623
Everton - Brighton & Hove Albion, 2 - 0 dict_keys(['1623', '1651']) 0
Burnley - Everton, 2 - 1 dict_keys(['1623', '1646']) 1646
Watford - Everton, 1 - 0 dict_keys(['1623', '1644']) 0
Everton - Crystal Palace, 3 - 1 dict_keys(['1623', '1628']) 1623
Arsenal - Everton, 5 - 1 dict_keys(['1609', '1623

In [77]:
# 可以根据这个表的名字查出每个球队唯一的wyID，如Everton是1623
df_teams = pd.read_json('teams.json')
df_teams

Unnamed: 0,area,city,name,officialName,type,wyId
0,"{'name': 'England', 'id': '0', 'alpha3code': '...",Newcastle upon Tyne,Newcastle United,Newcastle United FC,club,1613
1,"{'name': 'Spain', 'id': '724', 'alpha3code': '...",Vigo,Celta de Vigo,Real Club Celta de Vigo,club,692
2,"{'name': 'Spain', 'id': '724', 'alpha3code': '...",Barcelona,Espanyol,Reial Club Deportiu Espanyol,club,691
3,"{'name': 'Spain', 'id': '724', 'alpha3code': '...",Vitoria-Gasteiz,Deportivo Alav\u00e9s,Deportivo Alav\u00e9s,club,696
4,"{'name': 'Spain', 'id': '724', 'alpha3code': '...",Valencia,Levante,Levante UD,club,695
5,"{'name': 'France', 'id': '250', 'alpha3code': ...",Troyes,Troyes,Esp\u00e9rance Sportive Troyes Aube Champagne,club,3795
6,"{'name': 'Spain', 'id': '724', 'alpha3code': '...",Getafe (Madrid),Getafe,Getafe Club de F\u00fatbol,club,698
7,"{'name': 'Germany', 'id': '276', 'alpha3code':...",M\u00f6nchengladbach,Borussia M'gladbach,Borussia VfL M\u00f6nchengladbach,club,2454
8,"{'name': 'England', 'id': '0', 'alpha3code': '...","Huddersfield, West Yorkshire",Huddersfield Town,Huddersfield Town FC,club,1673
9,"{'name': 'Spain', 'id': '724', 'alpha3code': '...",Bilbao,Athletic Club,Athletic Club Bilbao,club,678


In [101]:
teamId_dic = {}
for i in range(len(df_matches_Eng)):
    for teamId in df_matches_Eng['teamsData'][i].keys():
        if teamId not in teamId_dic:
            teamId_dic[teamId] = 0
            
for i in range(len(df_matches_Eng)):
    if df_matches_Eng['winner'][i] == 0:
        team0 = list(df_matches_Eng['teamsData'][i].keys())[0]
        team1 = list(df_matches_Eng['teamsData'][i].keys())[1]
        teamId_dic[team0] += 1
        teamId_dic[team1] += 1
    else:
        teamId_dic[str(df_matches_Eng['winner'][i])] += 3
team_ranking = sorted(teamId_dic.items(), key=lambda x:x[1], reverse=True)
for index, value in enumerate(team_ranking):
    print(index+1, df_teams[df_teams['wyId']==int(team_ranking[index][0])]['name'].values[0],team_ranking[index][1])

1 Manchester City 100
2 Manchester United 81
3 Tottenham Hotspur 77
4 Liverpool 75
5 Chelsea 70
6 Arsenal 63
7 Burnley 54
8 Everton 48
9 Leicester City 47
10 AFC Bournemouth 44
11 Crystal Palace 44
12 Newcastle United 44
13 West Ham United 42
14 Brighton & Hove Albion 41
15 Watford 39
16 Huddersfield Town 37
17 Southampton 36
18 Swansea City 33
19 Stoke City 33
20 West Bromwich Albion 31


In [98]:
df_teams[df_teams['wyId']==1623]['name'].values[0]

'Everton'

In [58]:
df_events_Eng = pd.read_json('events/events_England.json')
# Everton所有的动作
df_events_Eng[df_events_Eng['teamId']==1623]

Unnamed: 0,eventId,eventName,eventSec,id,matchId,matchPeriod,playerId,positions,subEventId,subEventName,tags,teamId
6453,3,Free Kick,31.174681,178214455,2499723,1H,10131,"[{'y': 0, 'x': 0}, {'y': 89, 'x': 66}]",34,Goal kick,[],1623
6455,1,Duel,33.812965,178214456,2499723,1H,293687,"[{'y': 89, 'x': 66}, {'y': 100, 'x': 78}]",10,Air duel,"[{'id': 702}, {'id': 1801}]",1623
6458,8,Pass,46.323501,178214457,2499723,1H,7919,"[{'y': 97, 'x': 34}, {'y': 95, 'x': 59}]",82,Head pass,[{'id': 1801}],1623
6459,1,Duel,47.004714,178214458,2499723,1H,293687,"[{'y': 95, 'x': 59}, {'y': 94, 'x': 64}]",13,Ground loose ball duel,"[{'id': 701}, {'id': 1802}]",1623
6462,8,Pass,49.880983,178214459,2499723,1H,293687,"[{'y': 95, 'x': 55}, {'y': 93, 'x': 57}]",82,Head pass,[{'id': 1801}],1623
6464,1,Duel,50.109404,178216502,2499723,1H,0,"[{'y': 93, 'x': 57}, {'y': 89, 'x': 53}]",13,Ground loose ball duel,"[{'id': 703}, {'id': 1801}]",1623
6465,8,Pass,51.022546,178214460,2499723,1H,25706,"[{'y': 89, 'x': 53}, {'y': 91, 'x': 69}]",85,Simple pass,[{'id': 1801}],1623
6466,8,Pass,51.624604,178214461,2499723,1H,7944,"[{'y': 91, 'x': 69}, {'y': 100, 'x': 75}]",85,Simple pass,[{'id': 1802}],1623
6470,3,Free Kick,85.962263,178214473,2499723,1H,293687,"[{'y': 100, 'x': 65}, {'y': 84, 'x': 81}]",36,Throw in,[{'id': 1802}],1623
6474,1,Duel,96.945738,178214477,2499723,1H,77546,"[{'y': 56, 'x': 33}, {'y': 55, 'x': 42}]",10,Air duel,"[{'id': 703}, {'id': 1801}]",1623
