<div class="alert alert-info" role="alert">
  
**PREAMBLE** <br>
We require the following packages for our subsequent analysis

</div>

In [1]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# charts
import seaborn as sns 
import matplotlib.pyplot as plt
import squarify #TreeMap

# import graph objects as "go"
import plotly.graph_objs as go

# For exercise 4 and 5
import plotly.express as px

#for offline plotting using plotly
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot

#for in-notebooks plot using plotly
init_notebook_mode(connected=True)

%matplotlib inline
from IPython.display import display
# ignore warning, uncomment the two lines below only if warnings are spurious
# import warnings
# warnings.filterwarnings("ignore")

from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets

<div class="alert alert-info" role="alert">
    
**DATASET** <br>
We will use the data from the Transfermarkt website for all transfers between 2005 and 2019. This data was obtained by Webscraping using the Beautiful Soup package, the details of which are avaiable in our Github Page.
</div>

In [2]:
Test_1 = pd.read_csv("transfer_data_Final_Master_2010_2011_Jul.csv")
Test_2 = pd.read_csv("transfer_data_Final_Master_2011_Aug_2012_Jun.csv")
Test_3 = pd.read_csv("transfer_data_Final_Master_2012_Jul_2015.csv")
Test_4 = pd.read_csv("transfer_data_Final_Master_2016_Aug_Dec.csv")
Test_5 = pd.read_csv("transfer_data_Final_Master_2016_Jan_Jul.csv")
Test_6 = pd.read_csv("transfer_data_Final_Master_2017.csv")
Test_7 = pd.read_csv("transfer_data_Final_Master_2018.csv")
Test_8 = pd.read_csv("transfer_data_Final_Master_2019.csv")
Test_9 = pd.read_csv("transfer_data_Final_Master_2005.csv")
Test_10 = pd.read_csv("transfer_data_Final_Master_2006.csv")
Test_11 = pd.read_csv("transfer_data_Final_Master_2008.csv")
Test_12 = pd.read_csv("transfer_data_Final_Master_2009.csv")
Test_13 = pd.read_csv("transfer_data_Final_Master_2007.csv")
Test_14 = pd.read_csv("transfer_data_Final_Master_2007_Apr.csv")
Test_EDA = pd.concat([Test_1, Test_2, Test_3,
                      Test_4, Test_5, Test_6, 
                      Test_7, Test_8, Test_9, 
                      Test_10, Test_11, Test_12, 
                      Test_13, Test_14], ignore_index=True)

<div class="alert alert-info" role="alert">

**DATA CLEANING** <br>
We need to cleaning the dataset, including club names, and standardize the float variables such as Market Valuation and Transfer Fee ('Transfer'). We also create a new column, known as Surplus/Deficit. We define and understand each of the variables here: <br>
1. Date: The data on which the Player Transfer details were completed.
2. Name: Name of the Player for which the transfer took place.
3. Age: This represents the age of the player at the time of the transfer.
4. Position: This is the primary position at which the player plays.
5. Nationality: The players' nationality at the time of the transfer.
6. Club Left: The existing club of the player which he leaves on the Transfer Date. This club receives the Transfer Fee.
7. Club Joined: This is the new club for the player which he joins on the Transfer Date. This club pays the Transfer Fee.
8. Market Valuation: The value of the player at the time of the Transfer as determined by Transfermarkt.
9. Transfer: This represents the actual fee paid by the "Club Joined" to complete the transfer of the player.
10. Surplus/Deficit: This is essentially the difference between Transfer Fee and Market Valuation. When the difference between Transfer Fee and Market Valuation is positive, it is termed as Surplus (from the perspective of "CLub left"). When the difference between Transfer Fee and Market Valuation is negative, it is termed as Deficit (from the perspective of "CLub left").
11. Year: The year in which the Transfer took place. We create this variable to ease our Time-Series Analysis.
    
</div>

In [3]:
# Cleaning the name for the column Club left
New = Test_EDA['Club Left'].str.replace('\n', '')
New = New.to_frame(name = 'Club_Left')

# Cleaning the name for the column Club Joined
New1 = Test_EDA['Club Joined'].str.replace('\n', '')
New1 = New1.to_frame(name = 'Club_Joined')

# Cleanning Transfer Data
New2 = Test_EDA['Transfer'].str.replace('?', '0')
New2 = New2.to_frame(name = 'Transfer_Fee')

New2["Transfer_Fee"]=New2["Transfer_Fee"].str.replace("Free transfer","0")
New2["Transfer_Fee"]=New2["Transfer_Fee"].str.replace("-","0")
New2["Transfer_Fee"]=New2["Transfer_Fee"].str.replace(",","")
New2["Transfer_Fee"]=New2["Transfer_Fee"].str.replace("draft","0")
New2["Transfer_Fee"]=New2["Transfer_Fee"].str.replace("Draft","0")
New2["Transfer_Fee"]=New2["Transfer_Fee"].str.replace(" mil. €","0000")
New2["Transfer_Fee"]=New2["Transfer_Fee"].str.replace(" K €","000")
New2["Transfer_Fee"]=New2["Transfer_Fee"].str.replace(" €","")

# Cleaning Market Valuation data
New3 = Test_EDA['Market Valuation'].str.replace(',', '')
New3 = New3.to_frame(name = 'Market_Valuation')
New3["Market_Valuation"]=New3["Market_Valuation"].str.replace(" mil. €","0000")
New3["Market_Valuation"]=New3["Market_Valuation"].str.replace(" K €","000")

#Replacing with original data
Test_EDA["Club Left"]=New["Club_Left"]
Test_EDA["Club Joined"]=New1["Club_Joined"]
Test_EDA["Transfer"]=New2["Transfer_Fee"]
Test_EDA["Market Valuation"]=New3["Market_Valuation"]

# Converting the Date from object ot a datetime format
Test_EDA['Date'] = pd.to_datetime(Test_EDA['Date'], errors='coerce')

# Removing retired players
Test_EDA.rename(columns={"Club Joined": "Club_Joined"}, inplace=True)
Test_EDA=Test_EDA[~Test_EDA.Club_Joined.str.contains("Retired")]
Test_EDA.rename(columns={"Club_Joined": "Club Joined"}, inplace=True)

Test_EDA.reset_index(inplace=True)

#Converting to float
Test_EDA["Transfer"]=Test_EDA['Transfer'].apply(lambda x:float(x))
Test_EDA["Market Valuation"]=Test_EDA["Market Valuation"].apply(lambda x:float(x))
Test_EDA["Age"]=Test_EDA['Age'].apply(lambda x:float(x))

#Adding surplus/deficit column
Test_EDA.rename(columns={"Market Valuation": "Market_Valuation"}, inplace=True)
Test_EDA["Surplus/Deficit"]=Test_EDA.Transfer-Test_EDA.Market_Valuation
Test_EDA.rename(columns={"Market_Valuation": "Market Valuation"}, inplace=True)

#Adding year column
Test_EDA["Year"]=pd.DatetimeIndex(Test_EDA["Date"]).year

Test_EDA

Unnamed: 0,index,Date,Name,Age,Position,Nationality,Club Left,League Left,Club Joined,League Joined,Market Valuation,Transfer,Surplus/Deficit,Year
0,0,2010-01-01,Douglas Costa,19.0,Right Winger,Brazil,Grêmio,Série A,Shakhtar D.,,4000000.0,8000000.0,4000000.0,2010
1,1,2010-01-01,Florent Sinama-Pongolle,25.0,Second Striker,France,Atlético Madrid,LaLiga,Sporting CP,,7000000.0,6500000.0,-500000.0,2010
2,2,2010-01-01,Alex Teixeira,19.0,Left Winger,Brazil,Vasco da Gama,Série A,Shakhtar D.,,3500000.0,6000000.0,2500000.0,2010
3,3,2010-01-01,Keisuke Honda,23.0,Attacking Midfield,Japan,VVV-Venlo,Eredivisie,CSKA Moscow,,3000000.0,6000000.0,3000000.0,2010
4,4,2010-01-01,Younès Kaboul,23.0,Centre-Back,France,Portsmouth,Premier League,Spurs,,5700000.0,5900000.0,200000.0,2010
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
61835,63255,2007-09-25,Thomas Sowunmi,29.0,Centre-Forward,Hungary,Without Club,,Vasas FC,,300000.0,0.0,-300000.0,2007
61836,63256,2007-09-28,Onur Karali,25.0,Midfielder,Turkey,Osmaniyespor,Turkey\t\t,Altinova Bld,,200000.0,0.0,-200000.0,2007
61837,63257,2007-09-28,Enver Isik,22.0,Right Winger,Turkey,Kayserispor,Süper Lig,Göztepe,,250000.0,0.0,-250000.0,2007
61838,63258,2007-09-28,Mustafa Özzengi,20.0,Goalkeeper,Turkey,Galatasaray U21,Turkey\t\t,Bakirköyspor,,50000.0,0.0,-50000.0,2007


<div class="alert alert-danger" role="alert">

**Part 1: Analyzing the Biggest Spenders** <br><br>
Firstly, we aggregate the Transfer Fee spent by each club between 2005-2019 and take the top 15 clubs from that list. These clubs are the Top 15 Transfer Fee Spenders in the world of football for the period 2005-2019.
Secondly, we create a DataFrame to analyze the yearly Transfer Fee spending pattern for the Top 15 Transfer Fee Spenders.
    
</div>

In [4]:
# Top 15 clubs by spending
Top_Spenders = Test_EDA.groupby("Club Joined").agg({"Transfer":"sum", 'Surplus/Deficit':'count'})
Top_Spenders = Top_Spenders.sort_values(by='Transfer', ascending = False)
Top_Spenders.rename(columns={"Surplus/Deficit": "No. of Players Bought"}, inplace=True)
Top_Spenders = Top_Spenders[:15]

Top_Spenders_Surplus = pd.merge(Top_Spenders.reset_index(),Test_EDA.groupby("Club Joined").agg({"Surplus/Deficit":"sum"}),
                    how='inner',
                    on = 'Club Joined')

Top_Spenders_EDA = pd.merge(Top_Spenders.reset_index(),Test_EDA.reset_index(),
                    how='inner',
                    on = 'Club Joined')
Trend_Top_Spenders = Top_Spenders_EDA.groupby(['Club Joined', 'Year']).agg({'Transfer_y':'sum'})
Trend_Top_Spenders.rename(columns={"Transfer_y": "Total Fee Spent"}, inplace=True)
Top_Spenders_Surplus

Unnamed: 0,Club Joined,Transfer,No. of Players Bought,Surplus/Deficit
0,Man City,1905340000.0,100,479365000.0
1,Real Madrid,1697900000.0,68,97240000.0
2,FC Barcelona,1630070000.0,55,249770000.0
3,Chelsea,1573880000.0,82,295255000.0
4,Man Utd,1407130000.0,53,408780000.0
5,Paris SG,1297100000.0,59,244100000.0
6,Juventus,1281790000.0,86,118165000.0
7,Liverpool,1222448000.0,75,332073000.0
8,Atlético Madrid,1114510000.0,68,139160000.0
9,Spurs,964220000.0,78,160620000.0


<div class="alert alert-warning" role="alert">

**Method** <br>
In following graph, we analyze the aggregate transfer Fee spending by the Top 15 Transfer Fee Spenders for the period 2005-2019. At the same time, we check the surplus or defict these clubs have incurred in making this Transfer for the period 2005-2019.

</div>

<div class="alert alert-success" role="alert">
    
**Inference**<br>
As expected, the biggest clubs in the world are the ones who spend the most on Transfer Fees to attract the best players in the world. The surprising fact is that it is these very clubs who are having to overpay for the players despite the power and the command that they hold in the football world. This brings to light a few important points to note in the football transfer market: <br>
1. It underscores the importance and bargaining power of world-class players and their agents in the Transfer Market.
2. It highlights how the clubs nurturing and selling the world-class players are able to negotiate favourable terms and conditions for themselves.
3. The player talent is the source of bargaining power in the football Transfer Market.
    
</div>

In [7]:
fig1a = px.bar(Top_Spenders_Surplus.reset_index(),
               x = "Club Joined",
               y = "Transfer",
               color = "Surplus/Deficit",
               color_continuous_scale = px.colors.sequential.Reds,
               hover_name = "Club Joined",
               title = 'Top 15 Clubs by Transfer Fee Expenditure (2005-2019)',
               template = 'plotly_dark')

fig1a.add_trace(
    go.Line(# interactive bar object
                x = Top_Spenders_Surplus['Club Joined'],
                y = Top_Spenders_Surplus['No. of Players Bought'],
                name = "Number of Transfers",
                marker = dict(color = 'rgb(237,239,93)'),
                line = dict(width = 3),
                text = Top_Spenders_Surplus['No. of Players Bought'],
                hovertemplate =
                "Club: %{x}<br>" +
                "Number of Transfers: %{y:,.0f}",
                yaxis='y2')
)

fig1a.update_layout(
    hoverlabel=dict(
        bgcolor="black",
        font=dict(
                family="Arial, monospace",
                size=14,
                color="white"
            ),
        bordercolor = "black"
    ),
    xaxis=go.layout.XAxis(
        title=go.layout.xaxis.Title(
            text="Year",
            font=dict(
                family="Arial, monospace",
                size=14,
                color="white"
            )
        ),
        showline = True,
        linecolor = 'white',
        showticklabels = True,
        tickfont = dict(
            family = 'Arial',
            size = 10,
            color = 'white'
      )
    ),
    yaxis=go.layout.YAxis(
        title=go.layout.yaxis.Title(
            text="Total Transfer Fee Spent (2005-2019), in Euros",
            font=dict(
                family="Arial, monospace",
                size=14,
                color="white"
            )
        ),
        showline = False,
        linecolor = 'white',
        showticklabels = True,
        tickfont = dict(
            family = 'Arial',
            size = 10,
            color = 'white'
        )
    ),
      yaxis2=dict(title='',
           overlaying='y',
           side='right'),
           font=dict(
                family="Arial, monospace",
                size=10,
                color="white"
            )
                 )

fig1a.update_layout(
    legend=dict(
        x=0.80,
        y=1
    )
)

fig1a.update_layout(
    xaxis=dict(showgrid=False, zeroline=False),
    yaxis=dict(showgrid=False, zeroline=False),
    yaxis2=dict(showgrid=False, zeroline=False)
)

# plot(fig1a ,filename = 'Top15Spenders.html')
fig1a.show()

<div class="alert alert-warning" role="alert">

**Method** <br>
In following graph, we analyze the yearly transfer Fee spending by the Top 15 Transfer Fee Spenders for the period 2005-2019.
</div>    

<div class="alert alert-success" role="alert">

**Inference**<br>
The spikes or dip in the general trend of for every clubs coincides with important event related to the specific club. The events are of the following nature: <br>
1. Changes in ownership or increased backing from investors.
2. Buying or selling of critical players.<br>

A few example of such events are as follows:<br>
1. 2009: Real Madrid Transfer expenditure rose due to the transfer of Cristiano Ronaldo from Manchester United. At Manchester City, the Royal Family of Abu Dhabi became owners of the club mid-season in 2008. This led to a splurge in transfer fee spending in the effort to make Manchester City the best club in the world.
2. 2014: Manchester United spent a huge amount on rebuilding their squad as the requirements of a new manager, David Moyes. This was the first manager change at the club for 27 years as the legendary sir Alex Ferguson retired.
3. 2017: Paris Saint-Germain, with the owners being one of Qatar's richest people, made two of the most expensive transfers in football history when they purchased Neymar Jr. and Kylian Mbappe.
4. 2019: Real Madrid again spent a huge sum of money on squad rebuilding as their star striker, Cristiano Ronaldo, was transferred to Juventus.

</div>    

In [8]:
fig1b = px.line(Trend_Top_Spenders.reset_index(), 
                x = "Year", 
                y = "Total Fee Spent",
                color = "Club Joined",
                color_discrete_sequence = px.colors.cyclical.Edge,
                hover_name = "Club Joined", 
                title = 'Trend Analysis for Top 15 Clubs by Transfer Fee Expenditure',
                template = 'plotly_dark')

fig1b.update_layout(
    hoverlabel=dict(
        bgcolor="black",
        font=dict(
                family="Arial, monospace",
                size=14,
                color="white"
            ),
        bordercolor = "black"
    ),
    xaxis=go.layout.XAxis(
        title=go.layout.xaxis.Title(
            text="Year",
            font=dict(
                family="Arial, monospace",
                size=14,
                color="white"
            )
        ),
        showline = True,
        linecolor = 'white',
        showticklabels = True,
        tickfont = dict(
            family = 'Arial',
            size = 10,
            color = 'white'
      )
    ),
    yaxis=go.layout.YAxis(
        title=go.layout.yaxis.Title(
            text="Total Transfer Fee Spent, in Euros",
            font=dict(
                family="Arial, monospace",
                size=14,
                color="white"
            )
        ),
        showline = False,
        linecolor = 'white',
        showticklabels = True,
        tickfont = dict(
            family = 'Arial',
            size = 10,
            color = 'white'
        )
    )
)

fig1b.update_layout(
    xaxis=dict(showgrid=False, zeroline=False),
    yaxis=dict(showgrid=False, zeroline=False),
)

# plot(fig1b ,filename = 'TrendsTop15Spenders.html')
fig1b.show()

<div class="alert alert-danger" role="alert">

**Part 2: Analyzing the Biggest Over-Spenders** <br><br>
Over-Spenders are those who pay a Transfer Fee in excess of the Market Valuation of the player. <br>
Firstly, we aggregate the Surplus/Deficit incurred by each club between 2005-2019 and take the top 15 clubs from that list. These clubs are the Top 15 Over-Spenders in the world of football for the period 2005-2019. <br>
Secondly, we create a DataFrame to analyze the yearly over-spending patterns for the Top 15 Over-Spenders.
    
</div>    

In [10]:
# Top 15 clubs by surplus(over spending)

Top_OverSpenders=Test_EDA.groupby("Club Joined").agg({"Surplus/Deficit":"sum", 'Transfer':'count'})
Top_OverSpenders = Top_OverSpenders.sort_values(by='Surplus/Deficit', ascending = False)
Top_OverSpenders.rename(columns={"Transfer": "No. of Players Bought"}, inplace=True)
Top_OverSpenders = Top_OverSpenders[:15]

Top_OverSpenders_Spending = pd.merge(Top_OverSpenders.reset_index(),Test_EDA.groupby("Club Joined").agg({"Transfer":"sum"}),
                    how='inner',
                    on = 'Club Joined')

Top_OverSpenders_EDA = pd.merge(Top_OverSpenders.reset_index(),Test_EDA.reset_index(),
                    how='inner',
                    on = 'Club Joined')

Trends_Top_OverSpenders= Top_OverSpenders_EDA.groupby(['Club Joined', 'Year']).agg({'Surplus/Deficit_y':'sum'})


Trends_Top_OverSpenders
Top_OverSpenders_Spending

Unnamed: 0,Club Joined,Surplus/Deficit,No. of Players Bought,Transfer
0,Man City,479365000.0,100,1905340000.0
1,Man Utd,408780000.0,53,1407130000.0
2,Liverpool,332073000.0,75,1222448000.0
3,Chelsea,295255000.0,82,1573880000.0
4,FC Barcelona,249770000.0,55,1630070000.0
5,Paris SG,244100000.0,59,1297100000.0
6,Arsenal,201060000.0,57,948810000.0
7,Spurs,160620000.0,78,964220000.0
8,Aston Villa,155780000.0,78,541180000.0
9,Atlético Madrid,139160000.0,68,1114510000.0


<div class="alert alert-warning" role="alert">

**Method** <br>
In following graph, we analyze the aggregate transfer Fee over-spending by the Top 15 Transfer Fee Over-Spenders for the period 2005-2019. At the same time, we check the Total Transfer Fee these clubs incurred in making this Over-Spending Loss for the period 2005-2019.
</div>

<div class="alert alert-success" role="alert">

**Inference**<br>
We do not find any surprising result here as the big clubs have over spent the most. It is interesting to note that Manchester United has spent almost as much extra money as Manchester City while buying less than half the players that Machester City have bought. This underlines the weakened bargaining power of the club since the departure of Sir Alex Ferguson. Furthermore, there has been an increasing number of manager changes in the club over the last 5-6 years. It shows that the lack of a stable manager or a well-defined player scouting process weakens the bargaining power of the club.

</div>    

In [11]:
fig2a = px.bar(Top_OverSpenders_Spending.reset_index(),
               x = "Club Joined",
               y = "Surplus/Deficit",
               color = "Transfer",
               color_continuous_scale = px.colors.sequential.Reds,
               hover_name = "Club Joined",
               title = 'Top 15 Clubs by Transfer Fee Over-Spending (2005-2019)',
               template = 'plotly_dark')

fig2a.add_trace(
    go.Line(# interactive bar object
                x = Top_OverSpenders_Spending['Club Joined'],
                y = Top_OverSpenders_Spending['No. of Players Bought'],
                name = "Number of Transfers",
                marker = dict(color = 'rgb(237,239,93)'),
                line = dict(width = 3),
                text = Top_OverSpenders_Spending['No. of Players Bought'],
                hovertemplate =
                "Club: %{x}<br>" +
                "Number of Transfers: %{y:,.0f}",
                yaxis='y2')
)    

fig2a.update_layout(
    hoverlabel=dict(
        bgcolor="black",
        font=dict(
                family="Arial, monospace",
                size=14,
                color="white"
            ),
        bordercolor = "black"
    ),
    xaxis=go.layout.XAxis(
        title=go.layout.xaxis.Title(
            text="Club Name",
            font=dict(
                family="Arial, monospace",
                size=14,
                color="white"
            )
        ),
        showline = True,
        linecolor = 'white',
        showticklabels = True,
        tickfont = dict(
            family = 'Arial',
            size = 10,
            color = 'white'
      )
    ),
    yaxis=go.layout.YAxis(
        title=go.layout.yaxis.Title(
            text="Total Overspending Amount, in Euros",
            font=dict(
                family="Arial, monospace",
                size=14,
                color="white"
            )
        ),
        
        showline = False,
        linecolor = 'white',
        showticklabels = True,
        tickfont = dict(
            family = 'Arial',
            size = 10,
            color = 'white'
        )
    ),
yaxis2=dict(title='',
           overlaying='y',
           side='right'),
           font=dict(
                family="Arial, monospace",
                size=10,
                color="white"
            )
                 )

fig2a.update_layout(
    legend=dict(
        x=0.8,
        y=1
    )
)

fig2a.update_layout(
    xaxis=dict(showgrid=False, zeroline=False),
    yaxis=dict(showgrid=False, zeroline=False),
    yaxis2=dict(showgrid=False, zeroline=False)
)

# plot(fig2a, filename = 'Top15OverSpenders.html')
fig2a.show()

<div class="alert alert-warning" role="alert">

**Method** <br>
In following graph, we analyze the yearly transfer Fee over-spending by the Top 15 Transfer Fee Over-Spenders for the period 2005-2019.
</div>

<div class="alert alert-success" role="alert">

**Inference**<br>
In general, we can observe that the transfer fee over-spending has risen in this past decade. The reason may be attributable to increased resources of the club due to increasing ownership by middle eastern investors and/or significant changes at the club. For example:<br>
1. Manchester City and Paris Saint-Germain (PSG): Ownship changes led to increased spending power of the clubs which was duly exploited by the players, agents and selling clubs.
2. Manchester United, Chelsea, Barcelona and Real Madrid: Changes in managers or important players leaving the club.
</div>    

In [12]:
fig2b = px.line(Trends_Top_OverSpenders.reset_index(), 
                x="Year", 
                y="Surplus/Deficit_y",
                color = "Club Joined",
                color_discrete_sequence = px.colors.cyclical.Edge,
                hover_name = "Club Joined", 
                title = 'Trend Analysis for Top 15 Clubs by Transfer Fee Over-Spending',
                template = 'plotly_dark')

fig2b.update_layout(
    hoverlabel=dict(
        bgcolor="black",
        font=dict(
                family="Arial, monospace",
                size=14,
                color="white"
            ),
        bordercolor = "black"
    ),
    xaxis=go.layout.XAxis(
        title=go.layout.xaxis.Title(
            text="Year",
            font=dict(
                family="Arial, monospace",
                size=14,
                color="white"
            )
        ),
        showline = True,
        linecolor = 'white',
        showticklabels = True,
        tickfont = dict(
            family = 'Arial',
            size = 10,
            color = 'white'
      )
    ),
    yaxis=go.layout.YAxis(
        title=go.layout.yaxis.Title(
            text="Total Amount Over-Spent, in Euros",
            font=dict(
                family="Arial, monospace",
                size=14,
                color="white"
            )
        ),
        showline = False,
        linecolor = 'white',
        showticklabels = True,
        tickfont = dict(
            family = 'Arial',
            size = 10,
            color = 'white'
        )
    )
)

fig2b.update_layout(
    xaxis=dict(showgrid=False, zeroline=False),
    yaxis=dict(showgrid=False, zeroline=True),
)

# plot(fig2b, filename = 'TrendsTop15OverSpenders.html')
fig2b.show()

<div class="alert alert-danger" role="alert">

**Part 3: Analyzing the Biggest Under Spenders** <br><br>
Under-Spenders are those who pay a Transfer Fee below that of the Market Valuation of the player. <br>
Firstly, we aggregate the Surplus/Deficit incurred by each club between 2005-2019 and take the bottom 15 clubs from that list. These clubs are the Top 15 Under-Spenders in the world of football for the period 2005-2019. <br>
Secondly, we create a DataFrame to analyze the yearly under-spending patterns for the Top 15 under-Spenders.
</div>

In [13]:
# Top 20 clubs by deficit (under spending)
Top_UnderSpenders=Test_EDA.groupby("Club Joined").agg({"Surplus/Deficit":"sum", 'Transfer':'count'})
Top_UnderSpenders = Top_UnderSpenders.sort_values(by='Surplus/Deficit', ascending = True)
Top_UnderSpenders.rename(columns={"Transfer": "No. of Players Bought"}, inplace=True)
Top_UnderSpenders = Top_UnderSpenders[:15]


Top_UnderSpenders_Spending = pd.merge(Top_UnderSpenders.reset_index(),Test_EDA.groupby("Club Joined").agg({"Transfer":"sum"}),
                    how='inner',
                    on = 'Club Joined')

Top_UnderSpenders_EDA = pd.merge(Top_UnderSpenders.reset_index(),Test_EDA.reset_index(),
                    how='inner',
                    on = 'Club Joined')
Trends_Top_UnderSpenders = Top_UnderSpenders_EDA.groupby(['Club Joined', 'Year']).agg({'Surplus/Deficit_x':'sum'})


Trends_Top_UnderSpenders
Top_UnderSpenders_Spending

Unnamed: 0,Club Joined,Surplus/Deficit,No. of Players Bought,Transfer
0,Besiktas,-129775000.0,82,145750000.0
1,Galatasaray,-124379000.0,91,244371000.0
2,Trabzonspor,-116623000.0,103,110077000.0
3,Fenerbahce,-99833000.0,59,237567000.0
4,Flamengo,-77610000.0,41,98640000.0
5,Antalyaspor,-76733000.0,70,8917000.0
6,Konyaspor,-73990000.0,79,13285000.0
7,Ankaragücü,-71600000.0,98,3900000.0
8,QPR,-68070000.0,71,153480000.0
9,Hamburger SV,-67590000.0,55,220510000.0


<div class="alert alert-warning" role="alert">

**Method** <br>
In following graph, we analyze the aggregate transfer Fee under-spending by the Top 15 Transfer Fee Under-Spenders for the period 2005-2019. At the same time, we check the Total Transfer Fee these clubs incurred in making this Under-Spending Gain for the period 2005-2019.
</div>

<div class="alert alert-success" role="alert">

**Inference**<br>
We see evidence of some unanticipated results in the following graphs. Many turkish and middle eastern clubs are able to get hugely favourable deals as they often buy players, who are without agents and from weaker clubs, showing immense potential and footballers. This highlights the importance of strong connections with grassroot clubs in Transfer Market dealings.

</div>    

In [16]:
fig3a = px.bar(Top_UnderSpenders_Spending.reset_index(),
               x = "Club Joined",
               y = "Surplus/Deficit",
               color = "Transfer",
               color_continuous_scale = px.colors.sequential.Blues,
               hover_name = "Club Joined",
               title = 'Top 15 Clubs by Transfer Fee Under-Spending (2005-2019)',
               template = 'plotly_dark')
fig3a.add_trace(
    go.Line(# interactive bar object
                x = Top_UnderSpenders_Spending['Club Joined'],
                y = Top_UnderSpenders_Spending['No. of Players Bought'],
                name = "Number of Transfers",
                marker = dict(color = 'rgb(237,239,93)'),
                line = dict(width = 3),
                text = Top_UnderSpenders_Spending['No. of Players Bought'],
                hovertemplate =
                "Club: %{x}<br>" +
                "Number of Transfers: %{y:,.0f}",
                yaxis='y2')
)    

fig3a.update_layout(
    hoverlabel=dict(
        bgcolor="black",
        font=dict(
                family="Arial, monospace",
                size=14,
                color="white"
            ),
        bordercolor = "black"
    ),
    xaxis=go.layout.XAxis(
        title=go.layout.xaxis.Title(
            text="Year",
            font=dict(
                family="Arial, monospace",
                size=14,
                color="white"
            )
        ),
        showline = True,
        linecolor = 'white',
        showticklabels = True,
        tickfont = dict(
            family = 'Arial',
            size = 10,
            color = 'white'
      )
    ),
    yaxis=go.layout.YAxis(
        title=go.layout.yaxis.Title(
            text="Total Underspending, in Euros",
            font=dict(
                family="Arial, monospace",
                size=14,
                color="white"
            )
        ),
        showline = False,
        linecolor = 'white',
        showticklabels = True,
        tickfont = dict(
            family = 'Arial',
            size = 10,
            color = 'white'
        )
    ),
yaxis2=dict(title='',
           overlaying='y',
           side='right'),
           font=dict(
                family="Arial, monospace",
                size=10,
                color="white"
            )
                 )

fig3a.update_layout(
    legend=dict(
        x=0.80,
        y=1.1
    )
)

fig3a.update_layout(
    xaxis=dict(showgrid=False, zeroline=False),
    yaxis=dict(showgrid=False, zeroline=False),
    yaxis2=dict(showgrid=False, zeroline=False)
)


# plot(fig3a ,filename = 'Top15UnderSpenders.html')
fig3a.show()

<div class="alert alert-warning" role="alert">

**Method** <br>
In following graph, we analyze the yearly transfer Fee over-spending by the Top 15 Transfer Fee Under-Spenders for the period 2005-2019.
</div>

<div class="alert alert-success" role="alert">

**Inference**<br>
There are no noticable results in the following graph. The important information has been conveyed in the previous graph.

</div>    

In [17]:
fig3b = px.line(Trends_Top_UnderSpenders.reset_index(), 
                x="Year", 
                y="Surplus/Deficit_x",
                color = "Club Joined",
                color_discrete_sequence = px.colors.cyclical.Edge,
                hover_name = "Club Joined", 
                title = 'Trend Analysis for Top 15 Clubs by Transfer Fee Under-Spending',
                template = 'plotly_dark')

fig3b.update_layout(
    hoverlabel=dict(
        bgcolor="black",
        font=dict(
                family="Arial, monospace",
                size=14,
                color="white"
            ),
        bordercolor = "black"
    ),
    xaxis=go.layout.XAxis(
        title=go.layout.xaxis.Title(
            text="Year",
            font=dict(
                family="Arial, monospace",
                size=14,
                color="white"
            )
        ),
        showline = True,
        linecolor = 'white',
        showticklabels = True,
        tickfont = dict(
            family = 'Arial',
            size = 10,
            color = 'white'
      )
    ),
    yaxis=go.layout.YAxis(
        title=go.layout.yaxis.Title(
            text="Total Amount Under-Spent, in Euros",
            font=dict(
                family="Arial, monospace",
                size=14,
                color="white"
            )
        ),
        showline = False,
        linecolor = 'white',
        showticklabels = True,
        tickfont = dict(
            family = 'Arial',
            size = 10,
            color = 'white'
        )
    )
)

fig3b.update_layout(
    xaxis=dict(showgrid=False, zeroline=False),
    yaxis=dict(showgrid=False, zeroline=True),
)

# plot(fig3b ,filename = 'TrendsTop15UnderSpenders.html')
fig3b.show()

<div class="alert alert-danger" role="alert">

**Part 4: Analyses for Player Position** <br><br>
We will analyze whether Player Position is a determinant of the nature of the Transfer that take place. Here, we are interested in the quantum of transfers that took place and their cumulative value. We will also analyze the surplus and deficit that transfers for different positions warrant.
</div>

In [18]:
# Top positions in terms of surplus
Top_Positions = Test_EDA.groupby("Position").agg({"Transfer":"sum", 'Surplus/Deficit':'count'})
Top_Positions= Top_Positions.sort_values(by='Transfer', ascending = False)
Top_Positions.rename(columns={'Surplus/Deficit':'No. of Transfers'}, inplace=True)

Top_Positions_New= pd.merge(Top_Positions.reset_index(),Test_EDA.groupby("Position").agg({"Surplus/Deficit":"sum"}),
                    how='inner',
                    on = 'Position')

Top_Positions_New.reset_index(inplace=True)
Top_Positions_New

Unnamed: 0,index,Position,Transfer,No. of Transfers,Surplus/Deficit
0,0,Centre-Forward,11921610000.0,12338,-1834332000.0
1,1,Centre-Back,7118170000.0,10061,-1093076000.0
2,2,Central Midfield,5674836000.0,5721,-839642200.0
3,3,Left Winger,4662268000.0,2765,45193000.0
4,4,Defensive Midfield,4155074000.0,5174,-1117551000.0
5,5,Attacking Midfield,4018408000.0,4641,-1192042000.0
6,6,Right Winger,3966138000.0,2887,50747500.0
7,7,Left-Back,2237758000.0,3606,-405672000.0
8,8,Right-Back,1994593000.0,3476,-614362000.0
9,9,Goalkeeper,1506380000.0,4984,-801610000.0


<div class="alert alert-warning" role="alert">

**Method** <br>
In the following graph, we analyze the transfer market data for Player Position.
</div>

<div class="alert alert-success" role="alert">

**Inference**<br>
As expected, the position of the player is instrumental in determining the Transfer Fee and the number of transfers that take place. It is clear that positions such as Centre-Forward, Centre-Back and Central Midfield are far more important to managers in forming their team. Moreover, we observe that the Transfer Fee paid for a position like Centre-Forward is lower than the Market Valuation. We may infer that this is due to more number of players playing in that position.

</div>    

In [19]:
fig5 = px.bar(Top_Positions_New.reset_index(),
               x = "Position",
               y = "Transfer",
               color = "Surplus/Deficit",
               color_continuous_scale = px.colors.sequential.RdBu,
               hover_name = "Position",
               title = 'Transfer Market Analysis based on Player Position',
               template = 'plotly_dark')

fig5.add_trace(
    go.Line(# interactive bar object
                x = Top_Positions_New.Position,
                y = Top_Positions_New['No. of Transfers'],
                name = "Number of Transfers",
                marker = dict(color = 'rgb(237,239,93)'),
                line = dict(width = 3),
                text = Top_Positions_New['No. of Transfers'],
                hovertemplate =
                "Age: %{x}<br>" +
                "Number of Transfers: %{y:,.0f}",
                yaxis='y2')
)

fig5.update_layout(
    hoverlabel=dict(
        bgcolor="black",
        font=dict(
                family="Arial, monospace",
                size=14,
                color="white"
            ),
        bordercolor = "black"
    ),
    xaxis=go.layout.XAxis(
        title=go.layout.xaxis.Title(
            text="Position",
            font=dict(
                family="Arial, monospace",
                size=14,
                color="white"
            )
        ),
        showline = True,
        linecolor = 'white',
        showticklabels = True,
        tickfont = dict(
            family = 'Arial',
            size = 10,
            color = 'white'
      )
    ),
    yaxis=go.layout.YAxis(
        title=go.layout.yaxis.Title(
            text="Total Transfer Fee Spent (2005-2019), in Euros",
            font=dict(
                family="Arial, monospace",
                size=14,
                color="white"
            )
        )
    ),
      yaxis2=dict(title='',
           overlaying='y',
           side='right'),
           font=dict(
                family="Arial, monospace",
                size=10,
                color="white"
            )
                 )

fig5.update_layout(
    legend=dict(
        x=0.80,
        y=1
    )
)

fig5.update_layout(
    xaxis=dict(showgrid=False, zeroline=False),
    yaxis=dict(showgrid=False, zeroline=False),
    yaxis2=dict(showgrid=False, zeroline=False)
)

# plot(fig5 ,filename = 'TopPosition.html')
fig5.show()

<div class="alert alert-danger" role="alert">

**Part 5: Time-Series Analyses for Transfermarkt Data** <br><br>
This is a summary of the Transfer data obtained from the Transfermarkt website.
</div>

In [20]:
# Year-wise average Transfer data trend
Transfer_Fee_Trend=Test_EDA.groupby("Year").agg({"Transfer":np.mean, 'Surplus/Deficit':'count'})
Transfer_Fee_Trend.rename(columns={"Surplus/Deficit": "No. of Transfers"}, inplace=True)
Transfer_Fee_Trend
# Year-wise trend for player valuation
Market_Valuation_Trend=Test_EDA.groupby("Year").agg({"Market Valuation":np.mean})
Market_Valuation_Trend
# Year-wise trend for surplus/deficit
Surplus_Trend=Test_EDA.groupby("Year").agg({"Surplus/Deficit":np.mean})
Surplus_Trend
Test_EDA.groupby("Year").agg({"Transfer": ['mean', 'min', 'max', 'var']})

Trend_Combined = pd.concat([Transfer_Fee_Trend, Market_Valuation_Trend, Surplus_Trend], axis=1, join='inner')
Trend_Combined.reset_index(inplace=True)
Trend_Combined

Unnamed: 0,Year,Transfer,No. of Transfers,Market Valuation,Surplus/Deficit
0,2005,994269.3,973,1544702.0,-550432.682425
1,2006,980151.4,1212,1504357.0,-525405.528053
2,2007,1253471.0,1430,1454310.0,-203123.076923
3,2008,1211738.0,1705,1433811.0,-223936.187683
4,2009,1073434.0,2128,1279467.0,-206032.424812
5,2010,682135.5,2580,994224.8,-312089.302326
6,2011,793435.7,3103,1092635.0,-299198.823719
7,2012,594395.3,4336,904752.8,-310560.401292
8,2013,659183.4,5209,860049.0,-200865.521213
9,2014,570995.4,5709,742609.5,-171533.893852


<div class="alert alert-warning" role="alert">

**Method** <br>
In the following graph, we analyze the evolution of the Transfer Market over the last 15 years,
</div>

<div class="alert alert-success" role="alert">

**Inference**<br>
We can immediately see that discrepancy between Market Valuation and Transfer Fee has been reducing gradually over the last 15 years. This may be due to wider availability of data and the use of data analytics techniques is determining transfer fee, thereby leading to consistency between market valuation and transfer fee paid. Moreover, the number of transfers have also increased significantly as China, USA and middle-eastern nations have been spending a lot of money to develop the football market in their nations.

</div>    

In [21]:
fig6 = px.bar(Trend_Combined.reset_index(),
               x = "Year",
               y = "Surplus/Deficit",
               color = "No. of Transfers",
               color_continuous_scale = px.colors.sequential.Blues,
               hover_name = "Year",
               title = 'Time-Series Analysis based of the Transfermarkt Data',
               template = 'plotly_dark')

fig6.add_trace(
    go.Line(# interactive bar object
                x = Trend_Combined['Year'],
                y = Trend_Combined.Transfer,
                name = "Transfer Fee",
                marker = dict(color = 'rgb(156,219,165)'),
                line = dict(width = 3),
                text = Trend_Combined['Transfer'],
                hovertemplate =
                "Year: %{x}<br>" +
                "Transfer Fee: %{y:,.0f} Euros")
)
fig6.add_trace(
    go.Line(# interactive bar object
                x = Trend_Combined['Year'],
                y = Trend_Combined['Market Valuation'],
                name = "Market Valuation",
                marker = dict(color = 'rgb(231,109,84)'),
                line = dict(width = 3),
                text = Trend_Combined['Market Valuation'],
                hovertemplate =
                "Year: %{x}<br>" +
                "Market Valuation: %{y:,.0f} Euros")
)
fig6.update_layout(
    hoverlabel=dict(
        bgcolor="black",
        font=dict(
                family="Arial, monospace",
                size=14,
                color="white"
            ),
        bordercolor = "black"
    ),
    annotations=[
            go.layout.Annotation(
                text='The Bars represent<br>Surplus/Deficit',
                align='left',
                showarrow=False,
                xref='paper',
                yref='paper',
                x=0.98,
                y=0.92,
                bordercolor='black',
                borderwidth=1)
    ],
    xaxis=go.layout.XAxis(
        title=go.layout.xaxis.Title(
            text="Year",
            font=dict(
                family="Arial, monospace",
                size=14,
                color="white"
            )
        ),
        showline = True,
        linecolor = 'white',
        showticklabels = True,
        tickfont = dict(
            family = 'Arial',
            size = 10,
            color = 'white'
      )
    ),
    yaxis=go.layout.YAxis(
        title=go.layout.yaxis.Title(
            text="Total Amount (in EUR)",
            font=dict(
                family="Arial, monospace",
                size=14,
                color="white"
            )
        ),
        showline = False,
        linecolor = 'white',
        showticklabels = True,
        tickfont = dict(
            family = 'Arial',
            size = 10,
            color = 'white'
        )
    )
)

fig6.update_layout(
    legend=dict(
        x=0.8,
        y=1
    )
)

fig6.update_layout(
    xaxis=dict(showgrid=False, zeroline=False),
    yaxis=dict(showgrid=False, zeroline=False),
)

# plot(fig6 ,filename = 'TimeSeries.html')
fig6.show()

<div class="alert alert-danger" role="alert">

**Part 6: Analyses for Player Age** <br><br>
We will analyze whether Player Age is a determinant of the nature of the Transfer that take place. Here, we are interested in the quantum of transfers that took place and their cumulative value. We will also analyze the surplus and deficit that transfers for different age groups warrant.

</div>

In [22]:
Age1 = Test_EDA.groupby("Age").agg({"Transfer":np.mean})
Age1 = Age1.sort_values(by='Age', ascending = True)
# Age-wise no. of transfers and average surplus/deficit

Age2=Test_EDA.groupby("Age").agg({"Transfer":"count", "Surplus/Deficit":"mean"})
Age_Analysis = pd.merge(Age2, Age1, 
                     how = 'outer',
                     on = 'Age',
                    suffixes = ('_Count','_mean'))
Age_Analysis.reset_index(inplace=True)
Age_Analysis


Unnamed: 0,Age,Transfer_Count,Surplus/Deficit,Transfer_mean
0,14.0,1,-25000.0,0.0
1,15.0,5,-230000.0,0.0
2,16.0,48,813854.166667,1336771.0
3,17.0,193,500290.15544,785497.4
4,18.0,689,669534.107402,1362284.0
5,19.0,1572,491468.70229,1271838.0
6,20.0,2445,390209.734151,1139729.0
7,21.0,3159,285056.853435,1280965.0
8,22.0,3673,217211.543697,1292614.0
9,23.0,4230,165837.115839,1343344.0


<div class="alert alert-warning" role="alert">

**Method** <br>
In the following graph, we analyze the transfer market data for Player Age.
</div>

<div class="alert alert-success" role="alert">

**Inference**<br>
As expected, the age of the player is instrumental in determining the Transfer Fee and the number of transfers that take place. Evidently, the number of transfers are high in number for two age groups. Ages 16-18 see the most number of transfers as clubs look to buy promising players and developing them within the club from a young age. Another peak age for transfers is between the 21-25 where players have proved themselves on the big stage but not yet matured fully in their careers. After 25, the number of transfers drop as clubs look to retain the mature and tested players and players themselves are more settled where they are.

</div>    

In [23]:
fig7 = px.bar(Age_Analysis.reset_index(),
               x = "Age",
               y = "Transfer_mean",
               color = "Surplus/Deficit",
               color_continuous_scale = px.colors.sequential.Blues,
               hover_name = "Age",
               title = 'Transfer Market Analysis based on Player Age',
               template = 'plotly_dark')

fig7.add_trace(
    go.Line(# interactive bar object
                x = Age_Analysis['Age'],
                y = Age_Analysis['Transfer_Count'],
                name = "Number of Transfers",
                marker = dict(color = 'rgb(237,239,93)'),
                line = dict(width = 3),
                text = Age_Analysis['Transfer_Count'],
                hovertemplate =
                "Age: %{x}<br>" +
                "Number of Transfers: %{y:,.0f}",
                yaxis='y2')
)    

fig7.update_layout(
    hoverlabel=dict(
        bgcolor="black",
        font=dict(
                family="Arial, monospace",
                size=14,
                color="white"
            ),
        bordercolor = "black"
    ),
    xaxis=go.layout.XAxis(
        title=go.layout.xaxis.Title(
            text="Age",
            font=dict(
                family="Arial, monospace",
                size=14,
                color="white"
            )
        ),
        showline = True,
        linecolor = 'white',
        showticklabels = True,
        tickfont = dict(
            family = 'Arial',
            size = 10,
            color = 'white'
      )
    ),
    yaxis=go.layout.YAxis(
        title=go.layout.yaxis.Title(
            text="Average Transfer Fee, in Euros",
            font=dict(
                family="Arial, monospace",
                size=14,
                color="white"
            )
        ),
        showline = False,
        linecolor = 'white',
        showticklabels = True,
        tickfont = dict(
            family = 'Arial',
            size = 10,
            color = 'white'
        )
    ),
yaxis2=dict(title='',
           overlaying='y',
           side='right'),
           font=dict(
                family="Arial, monospace",
                size=10,
                color="white"
            )
                 )

fig7.update_layout(
    legend=dict(
        x=0.80,
        y=1
    )
)

fig7.update_layout(
    xaxis=dict(showgrid=False, zeroline=False),
    yaxis=dict(showgrid=False, zeroline=False),
    yaxis2=dict(showgrid=False, zeroline=False)
)


# plot(fig7 ,filename = 'AgeAnalysis.html')
fig7.show()

<div class="alert alert-danger" role="alert">

**Part 7: Analyses for Player Nationality** <br><br>
We will analyze whether Player Nationality is a determinant of the nature of the Transfer that take place. Here, we are interested in the quantum of transfers that took place and their cumulative value. We will also analyze the surplus and deficit that transfers for different nationalities warrant.
</div>

In [24]:
# top 10 countries with most no. of transfers 
Nation_Transfers = Test_EDA.groupby("Nationality").agg({"Transfer":'count'})
Nation_Transfers = Nation_Transfers.sort_values(by='Transfer', ascending = False)

# top 10 countries based on average transfer amount
Nation_AvgFee = Test_EDA.groupby("Nationality").agg({"Transfer":'mean'})
Nation_AvgFee = Nation_AvgFee.sort_values(by='Transfer', ascending = False)

# top 10 countries based on average surplus
Nation_AvgSurplus=Test_EDA.groupby("Nationality").agg({"Surplus/Deficit":'mean'})
Nation_AvgSurplus = Nation_AvgSurplus.sort_values(by='Surplus/Deficit', ascending = False)

Nation_Analysis_Intermediate = pd.merge(Nation_Transfers, Nation_AvgFee, 
                     how = 'inner',
                     on = 'Nationality',
                    suffixes = ('_Count','_Mean'))

Nation_Analysis = pd.merge(Nation_Analysis_Intermediate, Nation_AvgSurplus, 
                     how = 'inner',
                     on = 'Nationality',
                    suffixes = ('_Count','_Mean'))

Nation_Analysis_Top25 = Nation_Analysis[:25]
Nation_Analysis_Top25.reset_index(inplace=True)
Nation_Analysis_Top25 = Nation_Analysis_Top25.sort_values(by='Transfer_Mean', ascending = False)
Nation_Analysis_Top25

Unnamed: 0,Nationality,Transfer_Count,Transfer_Mean,Surplus/Deficit
24,Portugal,600,3163438.0,67521.666667
5,France,1569,2764709.0,-81655.194391
22,Netherlands,697,2523326.0,-206731.707317
7,Spain,1420,2457454.0,-154728.802817
4,Argentina,1819,1712362.0,-160037.658054
6,Germany,1469,1142332.0,-209370.319946
1,Brazil,5192,1140306.0,-143586.479199
3,England,3301,1016174.0,95732.505301
13,Uruguay,1079,757304.0,-206764.596849
23,Sweden,674,645762.6,-234704.747774


<div class="alert alert-warning" role="alert">

**Method** <br>
In the following graph, we analyze the transfer market data for Player Nationality.
</div>

<div class="alert alert-success" role="alert">

**Inference**<br>
As expected, the nation of the player is instrumental in determining the Transfer Fee and the number of transfers that take place. There are a few nations that have been able to produce the best talent in the football world: Portugal, France, Netherlands, Spain, Argentina, Germany, Brazil and England. Players from these nations command the highest transfer fees on average. This may be because football has been developing in these nations for the last 100 years and these nations have good institutions in place to continue to churn out world-class players. <br>
Brazil, Italy and (surprisingly) Turkey take the top spot in terms of the number of players transferred underscoring the immense talent pool prevalant in these countries.

</div>    

In [25]:
fig8 = px.bar(Nation_Analysis_Top25.reset_index(),
               x = "Nationality",
               y = "Transfer_Mean",
               color = "Surplus/Deficit",
               color_continuous_scale = px.colors.sequential.Blues,
               hover_name = "Nationality",
               title = 'Transfer Market Analysis based on Player Nationality',
               template = 'plotly_dark')
fig8.add_trace(
    go.Line(# interactive bar object
                x = Nation_Analysis_Top25['Nationality'],
                y = Nation_Analysis_Top25['Transfer_Count'],
                name = "Number of Transfers",
                marker = dict(color = 'rgb(237,239,93)'),
                line = dict(width = 3),
                text = Nation_Analysis_Top25['Transfer_Count'],
                hovertemplate =
                "Nationality: %{x}<br>" +
                "Number of Transfers: %{y:,.0f}",
                yaxis='y2')
)    

fig8.update_layout(
    hoverlabel=dict(
        bgcolor="black",
        font=dict(
                family="Arial, monospace",
                size=14,
                color="white"
            ),
        bordercolor = "black"
    ),
    xaxis=go.layout.XAxis(
        title=go.layout.xaxis.Title(
            text="Nationality",
            font=dict(
                family="Arial, monospace",
                size=14,
                color="white"
            )
        ),
        showline = True,
        linecolor = 'white',
        showticklabels = True,
        tickfont = dict(
            family = 'Arial',
            size = 10,
            color = 'white'
      )
    ),
    yaxis=go.layout.YAxis(
        title=go.layout.yaxis.Title(
            text="Average Transfer Fee, in Euros",
            font=dict(
                family="Arial, monospace",
                size=14,
                color="white"
            )
        ),
        showline = False,
        linecolor = 'white',
        showticklabels = True,
        tickfont = dict(
            family = 'Arial',
            size = 10,
            color = 'white'
        )
    ),
yaxis2=dict(title='',
           overlaying='y',
           side='right'),
           font=dict(
                family="Arial, monospace",
                size=10,
                color="white"
            )
                 )

fig8.update_layout(
    legend=dict(
        x=0.8,
        y=1
    )
)

fig8.update_layout(
    xaxis=dict(showgrid=False, zeroline=False),
    yaxis=dict(showgrid=False, zeroline=False),
    yaxis2=dict(showgrid=False, zeroline=False)
)


# plot(fig8 ,filename = 'NationalityAnalysis.html')
fig8.show()

<div class="alert alert-danger" role="alert">

**Part 8: Analyses for Chinese Transfer Data** <br><br>
Looking at the activity in the Chinese market in recent years, we wanted to look at how the country is developing its players and the dynamics of the transfer market at its early stage of development.
</div>

In [26]:
China=Test_EDA[Test_EDA.Nationality=='China']

China_Age1=China.groupby("Age").agg({"Transfer":"count", "Surplus/Deficit":"mean"})
China_Age2=China.groupby("Age").agg({"Transfer":np.mean})
China_Age = pd.merge(China_Age1, China_Age2, 
                     how = 'outer',
                     on = 'Age',
                    suffixes = ('_Count','_mean'))
China_Age.reset_index(inplace=True)
China_Age

Unnamed: 0,Age,Transfer_Count,Surplus/Deficit,Transfer_mean
0,17.0,1,1415000.0,1440000.0
1,18.0,4,-25000.0,0.0
2,19.0,9,38888.89,66666.67
3,20.0,32,510125.0,585125.0
4,21.0,33,661545.5,755484.8
5,22.0,40,454525.0,546400.0
6,23.0,49,1202184.0,1372592.0
7,24.0,58,805931.0,948172.4
8,25.0,45,1036378.0,1210267.0
9,26.0,49,1200776.0,1447918.0


<div class="alert alert-warning" role="alert">

**Method** <br>
In the following graph, we analyze the transfer market data for Chinese nationals specifically.
</div>

<div class="alert alert-success" role="alert">

**Inference**<br>
One aspect to take note of, that is not shown in the graph, is that Chinese players are transferred mainly among Chinese clubs. International transfers are rare. With this knowsledge, we see that transfer occur when players are at the mature stages of their footbal careers (Age 23-28). This is because clubs are looking for players to have an immediate impact with the team and do not want to spend too much money on their development. Interestingly, the transfer value is high at the age of 16 as the richer clubs look to but potential star players at an early stage.

</div>    

In [27]:
fig9 = px.bar(China_Age.reset_index(),
               x = "Age",
               y = "Transfer_mean",
               color = "Surplus/Deficit",
               color_continuous_scale = px.colors.sequential.Blues,
               hover_name = "Age",
               title = 'Analysis for Chinese Players',
               template = 'plotly_dark')
fig9.add_trace(
    go.Line(# interactive bar object
                x = China_Age['Age'],
                y = China_Age['Transfer_Count'],
                name = "Number of Transfers",
                marker = dict(color = 'rgb(237,239,93)'),
                line = dict(width = 3),
                text = China_Age['Transfer_Count'],
                hovertemplate =
                "Nationality: %{x}<br>" +
                "Number of Transfers: %{y:,.0f}",
                yaxis='y2')
)    

fig9.update_layout(
    hoverlabel=dict(
        bgcolor="black",
        font=dict(
                family="Arial, monospace",
                size=14,
                color="white"
            ),
        bordercolor = "black"
    ),
    xaxis=go.layout.XAxis(
        title=go.layout.xaxis.Title(
            text="Age",
            font=dict(
                family="Arial, monospace",
                size=14,
                color="white"
            )
        ),
        showline = True,
        linecolor = 'white',
        showticklabels = True,
        tickfont = dict(
            family = 'Arial',
            size = 10,
            color = 'white'
      )
    ),
    yaxis=go.layout.YAxis(
        title=go.layout.yaxis.Title(
            text="Average Transfer Fee, in Euros",
            font=dict(
                family="Arial, monospace",
                size=14,
                color="white"
            )
        ),
        showline = False,
        linecolor = 'white',
        showticklabels = True,
        tickfont = dict(
            family = 'Arial',
            size = 10,
            color = 'white'
        )
    ),
yaxis2=dict(title='',
           overlaying='y',
           side='right'),
           font=dict(
                family="Arial, monospace",
                size=10,
                color="white"
            )
                 )

fig9.update_layout(
    legend=dict(
        x=0.8,
        y=1
    )
)

fig9.update_layout(
    xaxis=dict(showgrid=False, zeroline=False),
    yaxis=dict(showgrid=False, zeroline=False),
    yaxis2=dict(showgrid=False, zeroline=False)
)


# plot(fig9 ,filename = 'ChinaAgeAnalysis.html')
fig9.show()

<div class="alert alert-info" role="alert">
  
**CONCLUSION** <br>
While this is just a stylized analysis, we are able to understand the varied underlying features of the Transfer including, but not limited to, bargaining power dynamics, importance of certain attributes and the benefits and curse of having wealthy owners. The aim is to develop further analysis based on this dataset to understand the dynamics of the Transfer Market in football.

</div>