# Course Project - Soccer data analysis (Due: .. Dec 2022)

In this project, your task is to answer some **questions** raised by a trouble boss (Ryan?) based on the following soccer datasets.

(1) [Squad List](https://terrikon.com/en/football/teams/878/players)\
(2) [FIFA 22 Datasets](https://uofmacau-my.sharepoint.com/:x:/g/personal/ryanlhu_umac_mo/EX6GQ7B4ysdBpIvteM7cuEkBVCOmDkGmPJt0MZDt3DwcWA?e=em4eaB)\
(3) [StatBomb](https://statsbomb.com/what-we-do/hub/free-data/)\
(4) [MPLSoccer](https://mplsoccer.readthedocs.io/en/latest/index.html)

## (1) Team Player List

You need to use web crawling technique to download the squad list of each team.

![image-2.png](attachment:image-2.png)

## (2) FIFA 22 Datasets

The FIFA 22 dataset provide you the statistics of each player in FIFA 22. 
- This is just a CSV file so it is easy to open.

In [1]:
import pandas as pd
df = pd.read_csv('players_22.csv')

print(df.iloc[0]) 

sofifa_id                                                      158023
player_url          https://sofifa.com/player/158023/lionel-messi/...
short_name                                                   L. Messi
long_name                              Lionel Andrés Messi Cuccittini
player_positions                                           RW, ST, CF
                                          ...                        
player_face_url     https://cdn.sofifa.net/players/158/023/22_120.png
club_logo_url                  https://cdn.sofifa.net/teams/73/60.png
club_flag_url                     https://cdn.sofifa.net/flags/fr.png
nation_logo_url              https://cdn.sofifa.net/teams/1369/60.png
nation_flag_url                   https://cdn.sofifa.net/flags/ar.png
Name: 0, Length: 110, dtype: object


  df = pd.read_csv('players_22.csv')


# [Your tasks] Dealing With a Difficult Boss

### Tasks

1. [10%] Download the correct Squad list of 32 teams from terrikon.com (be politely please).

2. [30%] Join the Squad result with the FIFA table. Rank the teams based the total overall score (in terms of their `overall` score in the FIFA csv).

3. [30%] Find the dream starting XI (of specific formations) of World Cup 2022 based on the `overall` score.
- '4-3-3': ['GK', 'RB', 'CB', 'CB', 'LB', 'CDM', 'CM', 'CAM', 'RW', 'ST', 'LW']
- '4-4-2': ['GK', 'RB', 'CB', 'CB', 'LB', 'RM', 'CM', 'CM', 'LM', 'ST', 'ST']
- '4-2-3-1': ['GK', 'RB', 'CB', 'CB', 'LB', 'CDM', 'CDM', 'CAM', 'CAM', 'CAM', 'ST']

4. [30%] Try to figure out the best formation (you could find more from Internet, https://www.redbull.com/us-en/soccer-formations) of each team. 

5. [Bonus, 10%] Write down your thought and refer to (3) and (4), how can you use these data and programming libraries for advanced soccer data analysis? If you can provide some implementation based on (3) and (4), you will get more marks. [Gallery](https://mplsoccer.readthedocs.io/en/latest/gallery/index.html)

### Submissions

1. Every team is required to submit a jupyter notebook.
2. In the notebook, please answer the questions using the following format.
    - What's your findings? (by text)
        - If there is any novel idea to get this finding (e.g., Q4 and Q5), please write them into the jupyter notebook. 
    - How do you conduct your findings? (by Python program)
    - Your submitted codes are used to **support** your findings. You could use some advanced libraries to complete this project, e.g., Scikit-learn, Keras, TensorFlow, Seaborn, etc.

# Q1.
- We ... 
- Deprecated, terrikon has imcomplete data

In [2]:
!pip install selenium
!pip install beautifulsoup4

Collecting selenium
  Downloading selenium-4.7.2-py3-none-any.whl (6.3 MB)
     ---------------------------------------- 6.3/6.3 MB 1.1 MB/s eta 0:00:00
Collecting trio-websocket~=0.9
  Downloading trio_websocket-0.9.2-py3-none-any.whl (16 kB)
Collecting trio~=0.17
  Downloading trio-0.22.0-py3-none-any.whl (384 kB)
     -------------------------------------- 384.9/384.9 kB 1.0 MB/s eta 0:00:00
Collecting outcome
  Downloading outcome-1.2.0-py2.py3-none-any.whl (9.7 kB)
Collecting exceptiongroup>=1.0.0rc9
  Downloading exceptiongroup-1.0.4-py3-none-any.whl (14 kB)
Collecting async-generator>=1.9
  Downloading async_generator-1.10-py3-none-any.whl (18 kB)
Collecting sortedcontainers
  Downloading sortedcontainers-2.4.0-py2.py3-none-any.whl (29 kB)
Collecting wsproto>=0.14
  Downloading wsproto-1.2.0-py3-none-any.whl (24 kB)
Installing collected packages: sortedcontainers, wsproto, outcome, exceptiongroup, async-generator, trio, trio-websocket, selenium
Successfully installed async-gener

In [3]:
# code here for Q1
import os
import sys
os.path.dirname(sys.executable)
from bs4 import BeautifulSoup
import json
import requests
# from selenium.webdriver.chrome.service import Service # if you use chrome

In [4]:
BASE_URL = "https://terrikon.com"

In [5]:
# get team players site id
def get_team_players_link():
    # Read from stored json. getting again is annoying
    if(os.path.exists("32teams.json")):
        with open("32teams.json", "r", encoding="utf-8") as f:
            team_dct = json.load(f)
        return team_dct
    r = requests.get(f'{BASE_URL}/en/worldcup-2022')

    data = r.text

    soup = BeautifulSoup(data)
    teams = soup.select("td.team>a")
    team_dct = dict()
    for t in teams:
        team_dct[t.text.strip()] = f"{t.get('href')}/players"

    with open("32teams.json", "w", encoding="utf-8") as f:
        f.write(json.dumps(team_dct, indent=4))
    return team_dct

team_dct = get_team_players_link()

In [6]:
def get_players(team_link):
    r = requests.get(f"{BASE_URL}{team_link}")

    data = r.text
    soup = BeautifulSoup(data)
    players_row = soup.select("div.main > table.colored > tr")
    players = list()
    for row in players_row :
        payload = dict()
        row_td = row.select('td')
        payload["Number"] = row_td[0].text
        payload["Name"] = row_td[2].text
        payload["Role"] = row_td[3].text
        payload["Birth Date"] = row_td[4].text
        payload["Height"] = row_td[5].text
        payload["Weight"] = row_td[6].text
        players.append(payload)
    return players

In [7]:
team_players = dict()
for team, link in team_dct.items():
    team_players[team] = get_players(link)
print(team_players)

{'Netherlands': [{'Number': '1', 'Name': 'Remko Pasveer', 'Role': 'Goalkeeper', 'Birth Date': '08.11.1983', 'Height': '188', 'Weight': '88'}, {'Number': '13', 'Name': 'Justin Bijlow', 'Role': 'Goalkeeper', 'Birth Date': '22.01.1998', 'Height': '188', 'Weight': '83'}, {'Number': '23', 'Name': 'Andries Noppert', 'Role': 'Goalkeeper', 'Birth Date': '07.04.1994', 'Height': '203', 'Weight': '94'}, {'Number': '2', 'Name': 'Jurrien Timber', 'Role': 'Full-back', 'Birth Date': '17.06.2001', 'Height': '179', 'Weight': '75'}, {'Number': '3', 'Name': 'Matthijs de Ligt', 'Role': 'Full-back', 'Birth Date': '12.08.1999', 'Height': '189', 'Weight': '89'}, {'Number': '4', 'Name': 'van Dijk', 'Role': 'Full-back', 'Birth Date': '08.07.1991', 'Height': '193', 'Weight': '80'}, {'Number': '5', 'Name': 'Nathan Ake', 'Role': 'Full-back', 'Birth Date': '18.02.1995', 'Height': '180', 'Weight': '73'}, {'Number': '6', 'Name': 'Stefan de Vrij', 'Role': 'Full-back', 'Birth Date': '05.02.1992', 'Height': '189', 'Wei

In [8]:
# listing number of players per country
# USA and Canada has no data on terrikon
for t, p in team_players.items():
    print(t, len(p))

Netherlands 26
Senegal 26
Ecuador 26
Qatar 26
England 26
USA 0
Iran 25
Wales 26
Poland 26
Argentina 26
Saudi Arabia 26
Mexico 26
France 26
Australia 27
Denmark 26
Tunisia 26
Spain 26
Japan 26
Costa Rica 26
Germany 26
Croatia 26
Morocco 26
Belgium 26
Canada 0
Brazil 26
Switzerland 26
Cameroon 26
Serbia 26
Portugal 26
Ghana 26
South Korea 26
Uruguay 26


In [9]:
# save players data as json
with open("wc2022_teams_players.json", "w", encoding="utf-8") as f:
    f.write(json.dumps(team_players, indent=4))

# Q1 (Alternative) Fifa squad list

In [10]:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.edge.service import Service # if you use chrome
import time

In [11]:
s = Service(executable_path="C:\\tools\msedgedriver.exe")

browser = webdriver.Edge(service=s)
# Load WR 2022 main page for teams
browser.get("https://www.fifa.com/fifaplus/en/tournaments/mens/worldcup/qatar2022/teams")
time.sleep(1) # please finish loading in 1 second React

"""Cookie prompt, Reject them all. No COOKIES for you"""
browser.find_element(By.ID, "onetrust-reject-all-handler").click()
teams = browser.find_elements(By.XPATH, '//*[@id="root"]/main/div/section[2]/div/div/a')

In [12]:
"""
Construct contry team and corresponding link
"""
import re
from pprint import pprint
team_links = dict()
for t in teams:
    team_links[re.sub(r'\([A-Za-z]+\)$', '', t.text.replace("\n", ""))] = t.get_attribute("href").replace("news", "squad")
    
with open('32teams_fifa.json', 'w', encoding="utf-8") as f:
    f.write(json.dumps(team_links, indent=4))

In [13]:
""" load from cached link json"""

with open('32teams_fifa.json', 'r', encoding="utf-8") as f:
    team_links = json.load(f)
    
print(f"No of teams : {len(team_links)}")
for t, l in team_links.items() :
      print(f" {t} : {l}")

No of teams : 32
 Argentina : https://www.fifa.com/fifaplus/en/tournaments/mens/worldcup/qatar2022/teams/argentina/squad
 Australia : https://www.fifa.com/fifaplus/en/tournaments/mens/worldcup/qatar2022/teams/australia/squad
 Belgium : https://www.fifa.com/fifaplus/en/tournaments/mens/worldcup/qatar2022/teams/belgium/squad
 Brazil : https://www.fifa.com/fifaplus/en/tournaments/mens/worldcup/qatar2022/teams/brazil/squad
 Cameroon : https://www.fifa.com/fifaplus/en/tournaments/mens/worldcup/qatar2022/teams/cameroon/squad
 Canada : https://www.fifa.com/fifaplus/en/tournaments/mens/worldcup/qatar2022/teams/canada/squad
 Costa Rica : https://www.fifa.com/fifaplus/en/tournaments/mens/worldcup/qatar2022/teams/costa-rica/squad
 Croatia : https://www.fifa.com/fifaplus/en/tournaments/mens/worldcup/qatar2022/teams/croatia/squad
 Denmark : https://www.fifa.com/fifaplus/en/tournaments/mens/worldcup/qatar2022/teams/denmark/squad
 Ecuador : https://www.fifa.com/fifaplus/en/tournaments/mens/worldcup/q

In [15]:
# browser = webdriver.Edge() # if you somehow closed the edge instance, uncomment this to reopen it.

browser.get("https://www.fifa.com/fifaplus/en/tournaments/mens/worldcup/qatar2022/teams/argentina/squad")
time.sleep(1) # please finish loading in 1 second React (change number if it can't load in 1 second)
browser.find_element(By.ID, "onetrust-reject-all-handler").click()

NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":"[id="onetrust-reject-all-handler"]"}
  (Session info: MicrosoftEdge=108.0.1462.46)
Stacktrace:
Backtrace:
	Microsoft::Applications::Events::EventProperties::SetProperty [0x00007FF6D34191F2+14306]
	Microsoft::Applications::Events::EventProperty::EventProperty [0x00007FF6D33A09B2+842898]
	(No symbol) [0x00007FF6D3051D90]
	(No symbol) [0x00007FF6D3095928]
	(No symbol) [0x00007FF6D3095C10]
	(No symbol) [0x00007FF6D30D2507]
	(No symbol) [0x00007FF6D30B5D6F]
	(No symbol) [0x00007FF6D308836E]
	(No symbol) [0x00007FF6D30CF70C]
	(No symbol) [0x00007FF6D30B5B03]
	(No symbol) [0x00007FF6D30873B8]
	(No symbol) [0x00007FF6D308638E]
	(No symbol) [0x00007FF6D3087AE4]
	Microsoft::Applications::Events::EventProperty::EventProperty [0x00007FF6D32EFDF8+119000]
	Microsoft::Applications::Events::EventProperty::EventProperty [0x00007FF6D32DD4D6+42934]
	Microsoft::Applications::Events::EventProperty::EventProperty [0x00007FF6D32E06EC+55756]
	(No symbol) [0x00007FF6D3157923]
	Microsoft::Applications::Events::EventProperty::EventProperty [0x00007FF6D33A846A+874314]
	Microsoft::Applications::Events::EventProperty::EventProperty [0x00007FF6D33AD564+895044]
	Microsoft::Applications::Events::EventProperty::EventProperty [0x00007FF6D33AD6BC+895388]
	Microsoft::Applications::Events::EventProperty::EventProperty [0x00007FF6D33B67DE+932542]
	BaseThreadInitThunk [0x00007FFE1A217614+20]
	RtlUserThreadStart [0x00007FFE1C0026A1+33]


In [4]:
team_players = dict()
for t, link in team_links.items():
    browser.get(link)
    time.sleep(5)
    players = browser.find_elements(By.XPATH, '//*[@id="root"]/main/div/section[3]/div/div[2]/div')
    payload = list()
    for p in players :
        sub_payload = dict()
        player_info = p.text.split('\n')
        sub_payload["first_name"] = player_info[0]
        sub_payload["last_name"] = player_info[1] if(len(player_info) == 3) else ""
        sub_payload["role"] = player_info[-1]
        payload.append(sub_payload)
    team_players[t] = payload

NameError: name 'team_links' is not defined

In [5]:
""" Cache team players"""
with open("32teams_fifa_players.json", "w", encoding="utf-8") as f:
    f.write(json.dumps(team_players, indent=4))

In [16]:
with open("32teams_fifa_players.json", "r", encoding="utf-8") as f:
    team_players = json.load(f)

print(f"No of teams : {len(team_players)}")
for t, p in team_players.items() :
      print(f" No of players of {t} : {len(p)}")

No of teams : 32
 No of players of Argentina : 27
 No of players of Australia : 27
 No of players of Belgium : 27
 No of players of Brazil : 27
 No of players of Cameroon : 27
 No of players of Canada : 27
 No of players of Costa Rica : 27
 No of players of Croatia : 27
 No of players of Denmark : 27
 No of players of Ecuador : 27
 No of players of England : 27
 No of players of France : 27
 No of players of Germany : 27
 No of players of Ghana : 27
 No of players of Iran : 26
 No of players of Japan : 27
 No of players of Korea Republic : 27
 No of players of Mexico : 27
 No of players of Morocco : 27
 No of players of Netherlands : 27
 No of players of Poland : 27
 No of players of Portugal : 27
 No of players of Qatar : 27
 No of players of Saudi Arabia : 27
 No of players of Senegal : 27
 No of players of Serbia : 27
 No of players of Spain : 27
 No of players of Switzerland : 27
 No of players of Tunisia : 27
 No of players of United States : 27
 No of players of Uruguay : 27
 No 

# Task 2 (partially done)(concat issue)

In [1]:
import json
import re
from IPython.display import display, HTML
from typing import List
import pandas as pd

In [2]:
with open("32teams_fifa_players.json", "r", encoding="utf-8") as f:
    team_players = json.load(f)
team_players.pop("Qatar",None)
df = pd.read_csv('players_22.csv')

  df = pd.read_csv('players_22.csv')


In [53]:
def clean(nation,df=df,team_players=team_players):
    # count = 0
    out = pd.DataFrame()
    nationDf = df[df["nationality_name"]== nation].copy()
#     display(nationDf)
    if nation == "Korea Republic" or nation =='Saudi Arabia':
        nationDf['player_url'] = nationDf['player_url'].str.replace('-','')
    for i in team_players[nation]:
        regex = "(?i)(?=.*"+ i["first_name"]+")(?=.*"+i["last_name"].replace(" ",".*")+")"
        playerDf = nationDf[nationDf.short_name.str.contains(regex)]
        if playerDf.shape[0]==0:
                playerDf = nationDf[nationDf.player_url.str.contains(regex)]
                if playerDf.shape[0]==0:
                    playerDf = nationDf[nationDf.long_name.str.contains(regex)]

#         playerDf['first_name'] = i['first_name']
#         playerDf['last_name'] = i['last_name']
#         print(i["first_name"]+" "+i["last_name"])
#         display(playerDf[['long_name','short_name','player_positions','overall', 'gk', 'rb', 'cb','lm' , 'lb', 'cdm', 'cm', 'cam', 'rw', 'st', 'lw', 'rm']])
#         print("=============================================")
        if playerDf.shape[0]:
            out=pd.concat([out,playerDf])
    # print(f'number of match players of {nation}:{count}')
    return out

In [50]:
pd.options.display.max_rows = 4000

clean("Cameroon")

Brady NGAPANDOUETNBU
Devis EPASSY
Andre ONANA
Jerome NGOM MBEKELI
Nicolas NKOULOU
Christopher WOOH
Olivier MBAIZO
Collins FAI
Jean-Charles CASTELLETTO
Enzo EBOSSE
Nouhou TOLO
Gael ONDOUA
Georges-Kevin NKOUDOU
Andre-Frank ZAMBO ANGUISSA
Samuel GOUET
Pierre KUNDE
Martin HONGLA
Olivier NTCHAM
Souaibou MAROU
Nicolas NGAMALEU
Jean-Pierre Nsame
Vincent ABOUBAKAR
Christian BASSOGOG
Karl TOKO EKAMBI
Eric Maxim CHOUPO-MOTING
Bryan MBEUMO
Rigobert Song Bahanag


Unnamed: 0,sofifa_id,player_url,short_name,long_name,player_positions,overall,potential,value_eur,wage_eur,age,...,lcb,cb,rcb,rb,gk,player_face_url,club_logo_url,club_flag_url,nation_logo_url,nation_flag_url
8540,255151,https://sofifa.com/player/255151/simon-ngapand...,S. Ngapandouetnbu,Simon Brady Ngapandouetnbu,GK,67,77,1800000.0,1000.0,18,...,18+2,18+2,18+2,16+2,66+2,https://cdn.sofifa.net/players/255/151/22_120.png,https://cdn.sofifa.net/teams/219/60.png,https://cdn.sofifa.net/flags/fr.png,,https://cdn.sofifa.net/flags/cm.png
7214,243235,https://sofifa.com/player/243235/olivier-mbaiz...,O. Mbaizo,Olivier Mbaissidara Mbaizo,RB,68,73,1700000.0,3000.0,23,...,63+2,63+2,63+2,66+2,16+2,https://cdn.sofifa.net/players/243/235/22_120.png,https://cdn.sofifa.net/teams/112134/60.png,https://cdn.sofifa.net/flags/us.png,,https://cdn.sofifa.net/flags/cm.png
5111,232183,https://sofifa.com/player/232183/collins-fai/2...,C. Fai,Collins Ngoran Suiru Fai,"RB, RWB, LB",70,70,1400000.0,9000.0,28,...,67+2,67+2,67+2,68+2,16+2,https://cdn.sofifa.net/players/232/183/22_120.png,https://cdn.sofifa.net/teams/232/60.png,https://cdn.sofifa.net/flags/be.png,,https://cdn.sofifa.net/flags/cm.png
2701,213868,https://sofifa.com/player/213868/jean-charles-...,J. Castelletto,Jean-Charles Castelletto,CB,73,75,3300000.0,19000.0,26,...,73+2,73+2,73+2,70+2,18+2,https://cdn.sofifa.net/players/213/868/22_120.png,https://cdn.sofifa.net/teams/71/60.png,https://cdn.sofifa.net/flags/fr.png,,https://cdn.sofifa.net/flags/cm.png
7104,237469,https://sofifa.com/player/237469/nouhou-tolo/2...,Nouhou,Nouhou Tolo,"CB, LB",68,74,1700000.0,3000.0,24,...,68+2,68+2,68+2,68+2,16+2,https://cdn.sofifa.net/players/237/469/22_120.png,https://cdn.sofifa.net/teams/111144/60.png,https://cdn.sofifa.net/flags/us.png,,https://cdn.sofifa.net/flags/cm.png
5314,245017,https://sofifa.com/player/245017/gael-ondoua/2...,G. Ondoua,Gaël Bella Ondoua,CDM,70,76,2200000.0,13000.0,25,...,67+2,67+2,67+2,67+2,15+2,https://cdn.sofifa.net/players/245/017/22_120.png,https://cdn.sofifa.net/teams/485/60.png,https://cdn.sofifa.net/flags/de.png,,https://cdn.sofifa.net/flags/cm.png
8273,241093,https://sofifa.com/player/241093/samuel-oum-go...,S. Oum Gouet,Samuel Yves Oum Gouet,"CDM, CM",67,74,1900000.0,4000.0,23,...,65+2,65+2,65+2,65+2,17+2,https://cdn.sofifa.net/players/241/093/22_120.png,https://cdn.sofifa.net/teams/110724/60.png,https://cdn.sofifa.net/flags/be.png,,https://cdn.sofifa.net/flags/cm.png
2997,240190,https://sofifa.com/player/240190/kunde-malong/...,K. Malong,Pierre Kunde Malong,"CM, CDM",73,76,4000000.0,1000.0,25,...,72+2,72+2,72+2,72+2,17+2,https://cdn.sofifa.net/players/240/190/22_120.png,https://cdn.sofifa.net/teams/280/60.png,https://cdn.sofifa.net/flags/gr.png,,https://cdn.sofifa.net/flags/cm.png
2961,236556,https://sofifa.com/player/236556/martin-hongla...,M. Hongla,Martin Hongla Yma,"CM, CDM, CB",73,81,6500000.0,15000.0,23,...,69+2,69+2,69+2,69+2,16+2,https://cdn.sofifa.net/players/236/556/22_120.png,https://cdn.sofifa.net/teams/206/60.png,https://cdn.sofifa.net/flags/it.png,,https://cdn.sofifa.net/flags/cm.png
2327,235996,https://sofifa.com/player/235996/nicolas-moumi...,N. Moumi Ngamaleu,Nicolas Brice Moumi Ngamaleu,"LM, RM",74,74,4600000.0,16000.0,26,...,49+2,49+2,49+2,55+2,17+2,https://cdn.sofifa.net/players/235/996/22_120.png,https://cdn.sofifa.net/teams/900/60.png,https://cdn.sofifa.net/flags/ch.png,,https://cdn.sofifa.net/flags/cm.png


In [4]:
temp=[]
for nation in team_players:
    cleanedDf = clean(nation)
#     print(nation)
#     display(cleanedDf)
    average = cleanedDf['overall'].mean()
#     print(average)
    temp.append([nation,average])
    temp.sort(key = lambda x: x[1],reverse = True)
    rank = list(map(lambda i: i[0], temp))


In [5]:
print(rank)

['England', 'France', 'Germany', 'Argentina', 'Belgium', 'Spain', 'Netherlands', 'Uruguay', 'Portugal', 'Brazil', 'Serbia', 'Denmark', 'Croatia', 'Morocco', 'Mexico', 'Switzerland', 'Costa Rica', 'Poland', 'Senegal', 'Cameroon', 'United States', 'Japan', 'Ghana', 'Korea Republic', 'Wales', 'Iran', 'Canada', 'Ecuador', 'Tunisia', 'Australia', 'Saudi Arabia']


# Task 3

In [60]:
all_players = pd.DataFrame()
for nation in team_players:
    cleanedDf = clean(nation)
    all_players = pd.concat([all_players,cleanedDf])
sorted_all_players = all_players.sort_values(by=['overall'],ascending=False)
# display(all_player)

In [61]:
'''
'4-3-3': ['GK', 'RB', 'CB', 'CB', 'LB', 'CDM', 'CM', 'CAM', 'RW', 'ST', 'LW']
'4-4-2': ['GK', 'RB', 'CB', 'CB', 'LB', 'RM', 'CM', 'CM', 'LM', 'ST', 'ST']
'4-2-3-1': ['GK', 'RB', 'CB', 'CB', 'LB', 'CDM', 'CDM', 'CAM', 'CAM', 'CAM', 'ST']
'''
formations = [
                ['GK', 'RB', 'CB', 'CB', 'LB', 'CDM', 'CM', 'CAM', 'RW', 'ST', 'LW'],
                ['GK', 'RB', 'CB', 'CB', 'LB', 'RM', 'CM', 'CM', 'LM', 'ST', 'ST'],
                ['GK', 'RB', 'CB', 'CB', 'LB', 'CDM', 'CDM', 'CAM', 'CAM', 'CAM', 'ST']
                ]

In [62]:
smaller_lst = []
for index, player in sorted_all_players.iterrows():
    a = [player["short_name"], player["player_positions"].split(", "), player["overall"]]
    smaller_lst.append(a)

In [63]:
best_score = 0
best_team = []

In [64]:
def find_best_team(i: int, remain: List[str], team,score, players = lst):
    global best_score
    global best_team
    if (len(remain) == 0):
        if score > best_score:
            best_score = score
            best_team = team.copy()
#             print(best_team)
        return

    for role in players[i][1]:
        flag = 0
        if role in remain:
            flag = 1
            team.append(players[i][0])
            score +=players[i][2] 
            remain.remove(role)
        find_best_team(i+1, remain, team, score)
        if flag:
            team.pop()
            remain.append(role)
            score -=players[i][2]


In [65]:
temp3 = []
for formation in formations:
    temp1 = formation.copy()
    lst = []
    for i in smaller_lst:
        temp = [i for i in i[1] if i in formation]
        if temp !=[]:
            lst.append([i[0],temp,i[2]])
    best_score = 0
    best_team = []
    
    
    def find_best_team(i: int, remain: List[str], team,score, players = lst):
        global best_score
        global best_team
        if (len(remain) == 0):
            if score > best_score:
                best_score = score
                best_team = team.copy()
    #             print(best_team)
            return

        for role in players[i][1]:
            flag = 0
            if role in remain:
                flag = 1
                team.append(players[i][0])
                score +=players[i][2] 
                remain.remove(role)
            find_best_team(i+1, remain, team, score)
            if flag:
                team.pop()
                remain.append(role)
                score -=players[i][2]
    
    find_best_team(0, temp1, [],0,lst)
    temp3.append([best_score,best_team])
temp3.sort(reverse = True)
print("The dream starting XI ")
print(f"Formation: {temp3[0][1]}")
print(f"Score: {temp3[0][0]}")

The dream starting XI 
Formation: ['L. Messi', 'R. Lewandowski', 'Neymar Jr', 'Cristiano Ronaldo', 'K. De Bruyne', 'M. Neuer', 'J. Kimmich', 'V. van Dijk', 'Casemiro', 'Rúben Dias', 'Jordi Alba']
Score: 988


# Task 4

In [66]:
formations = [
                ['GK', 'RB', 'CB', 'CB', 'LB', 'CDM', 'CM', 'CAM', 'RW', 'ST', 'LW'],
                ['GK', 'RB', 'CB', 'CB', 'LB', 'RM', 'CM', 'CM', 'LM', 'ST', 'ST'],
                ['GK', 'RB', 'CB', 'CB', 'LB', 'CDM', 'CDM', 'CAM', 'CAM', 'CAM', 'ST']
                ]

In [67]:
all_players = pd.DataFrame()
for nation in team_players:
    cleanedDf = clean(nation)
    all_players = pd.concat([all_players,cleanedDf])
sorted_all_players = all_players.sort_values(by=['overall'],ascending=False)
# display(all_player)

In [68]:
for nation in team_players:
    sorted_nation_players=sorted_all_players[sorted_all_players["nationality_name"]== nation]
#     display(sorted_nation_player)
    smaller_lst = []
    for index, player in sorted_nation_players.iterrows():
        a = [player["short_name"], player["player_positions"].split(", "), player["overall"]]
        smaller_lst.append(a)

    if any(sorted_nation_players.player_positions.str.contains("GK")):
        print(nation)
        temp2 = []
        
        for formation in formations:
            temp1 = formation.copy()
            lst = []
            for i in smaller_lst:
                lst.append(i)
            best_score = 0
            best_team = []


            def find_best_team(i: int, remain: List[str], team,score, players = lst):
                global best_score
                global best_team
                if (len(remain) == 0 or i>= len(players)):
                    if score > best_score:
                        best_score = score
                        best_team = team.copy()
            #             print(best_team)
                    return

                for role in players[i][1]:
                    flag = 0
                    if role in remain:
                        flag = 1
                        team.append(players[i][0])
                        score +=players[i][2] 
                        remain.remove(role)
                    find_best_team(i+1, remain, team, score)
                    if flag:
                        team.pop()
                        remain.append(role)
                        score -=players[i][2]

            find_best_team(0, temp1, [],0,lst)
            temp2.append([best_score,best_team])
        temp2.sort(reverse = True)
        if len(temp2[0][1]) <11:
            print("*This formation is less then 11 players")
        print(f"Formation: {temp2[0][1]}")
        print(f"Score:{temp2[0][0]}")
        print("")
    else:
        print(f"{nation} has no goal keeper.\n")
    


Argentina
Formation: ['L. Messi', 'P. Dybala', 'Á. Di María', 'A. Gómez', 'L. Martínez', 'M. Acuña', 'C. Romero', 'N. Otamendi', 'L. Paredes', 'G. Rulli', 'G. Montiel']
Score:923

Australia
Formation: ['A. Mooy', 'M. Ryan', 'J. Maclaren', 'M. Degenek', 'H. Souttar', 'M. Leckie', 'C. Devlin', 'A. Hrustić', 'A. Mabil', 'A. Behich', 'N. Atkinson']
Score:791

Belgium
Formation: ['K. De Bruyne', 'T. Courtois', 'R. Lukaku', 'E. Hazard', 'D. Mertens', 'Y. Tielemans', 'T. Alderweireld', 'J. Vertonghen', 'T. Castagne', 'J. Doku', 'A. Theate']
Score:910

Brazil
Formation: ['Neymar Jr', 'Ederson', 'Casemiro', 'Marquinhos', 'Fabinho', 'Gabriel Jesus', 'Alex Sandro', 'Richarlison', 'Raphinha', 'Danilo', 'Fred']
Score:934

Cameroon
Formation: ['K. Toko Ekambi', 'J. Nsame', 'N. Moumi Ngamaleu', 'M. Hongla', 'K. Malong', 'J. Castelletto', 'C. Bassogog', 'C. Fai', 'Nouhou', 'O. Mbaizo', 'S. Ngapandouetnbu']
Score:794

Canada
Formation: ['A. Davies', 'J. David', 'A. Hutchinson', 'M. Borjan', 'Stephen Eu