# Abstract

For this project, we want to analyze football transfers. The data to build our network is web scraped from [`transfermarkt.com`](https://www.transfermarkt.com/), a football-specialized website. This website records all transfers between clubs all around the world, from major leagues to less-popular ones. The data does not concern only the first-level leagues, but also second and inferior divisions. Due to the great granularity of the data stored in this website, our analysis will only take into account all transfers from the 1st January 2015 to the 31 December 2016.

Our network is composed of football clubs. Each node represents a club who participate in at least one transfer between the two years of interest. A transfer between two clubs is encapsulated as an edge.

A first step in this project will be to analyze the differences between the major three types of transfers: Free transfers, loans, and monetary transfers. Each type of transfers has its own specificities, regarding the type of clubs or the characteristics of players. In a second phase, we will look more deeply in the monetary transfers network and the way money flows in this market.

> **Tip**: For a better experience reading this notebook, we advice you, dear reader, to open it with [nbviewer](https://nbviewer.jupyter.org/github/MGT-416/Team1FinalProject/blob/master/0.%20Project%20Report.ipynb)

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc" style="margin-top: 1em;"><ul class="toc-item"><li><span><a href="#Abstract" data-toc-modified-id="Abstract-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Abstract</a></span></li><li><span><a href="#Introduction" data-toc-modified-id="Introduction-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Introduction</a></span></li><li><span><a href="#Data-Acquisition-and-Preparation" data-toc-modified-id="Data-Acquisition-and-Preparation-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Data Acquisition and Preparation</a></span></li><li><span><a href="#Analysis" data-toc-modified-id="Analysis-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Analysis</a></span><ul class="toc-item"><li><span><a href="#Centralities-analysis" data-toc-modified-id="Centralities-analysis-4.1"><span class="toc-item-num">4.1&nbsp;&nbsp;</span>Centralities analysis</a></span><ul class="toc-item"><li><span><a href="#Centralities-analysis---Club-Level" data-toc-modified-id="Centralities-analysis---Club-Level-4.1.1"><span class="toc-item-num">4.1.1&nbsp;&nbsp;</span>Centralities analysis - Club Level</a></span></li><li><span><a href="#Centralities-analysis---League-Level" data-toc-modified-id="Centralities-analysis---League-Level-4.1.2"><span class="toc-item-num">4.1.2&nbsp;&nbsp;</span>Centralities analysis - League Level</a></span></li></ul></li><li><span><a href="#Communities" data-toc-modified-id="Communities-4.2"><span class="toc-item-num">4.2&nbsp;&nbsp;</span>Communities</a></span></li><li><span><a href="#Transfers-distribution-by-age" data-toc-modified-id="Transfers-distribution-by-age-4.3"><span class="toc-item-num">4.3&nbsp;&nbsp;</span>Transfers distribution by age</a></span></li><li><span><a href="#Dividing-loans-and-transfers-into-4-age-groups" data-toc-modified-id="Dividing-loans-and-transfers-into-4-age-groups-4.4"><span class="toc-item-num">4.4&nbsp;&nbsp;</span>Dividing loans and transfers into 4 age groups</a></span><ul class="toc-item"><li><span><a href="#Agre-groups:" data-toc-modified-id="Agre-groups:-4.4.1"><span class="toc-item-num">4.4.1&nbsp;&nbsp;</span>Agre groups:</a></span></li></ul></li><li><span><a href="#Relation-between-club's-pagerank-and-existence-of-other-clubs-in-network¶" data-toc-modified-id="Relation-between-club's-pagerank-and-existence-of-other-clubs-in-network¶-4.5"><span class="toc-item-num">4.5&nbsp;&nbsp;</span>Relation between club's pagerank and existence of other clubs in network¶</a></span></li><li><span><a href="#Money-infection-analysis" data-toc-modified-id="Money-infection-analysis-4.6"><span class="toc-item-num">4.6&nbsp;&nbsp;</span>Money infection analysis</a></span></li></ul></li><li><span><a href="#Conclusion" data-toc-modified-id="Conclusion-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Conclusion</a></span></li><li><span><a href="#Appendix:-Project-Structure" data-toc-modified-id="Appendix:-Project-Structure-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Appendix: Project Structure</a></span><ul class="toc-item"><li><ul class="toc-item"><li><ul class="toc-item"><li><span><a href="#Report" data-toc-modified-id="Report-6.0.0.1"><span class="toc-item-num">6.0.0.1&nbsp;&nbsp;</span>Report</a></span></li><li><span><a href="#Data-scraping-and-network-creation" data-toc-modified-id="Data-scraping-and-network-creation-6.0.0.2"><span class="toc-item-num">6.0.0.2&nbsp;&nbsp;</span>Data scraping and network creation</a></span></li><li><span><a href="#Analysis" data-toc-modified-id="Analysis-6.0.0.3"><span class="toc-item-num">6.0.0.3&nbsp;&nbsp;</span>Analysis</a></span></li></ul></li></ul></li></ul></li></ul></div>

# Introduction

Football is the probably the most popular sport in the world. It also one of the sports with the most money flowing around it. With all the financial stakes involved in football, the transfer market is a key moment in a club's sporting and marketing success. In the football transfer market, each club can hire players. If a player already has a contract with another club, both clubs can find a financial agreement for the player leaving a club, the selling club, to join a new team, the buying club. This is what we call **monetary transfers** in this project. Another possibility for a club, or team, to sign a new player is to sign it for free, when the player has no current contract. This is referred to as **free transfer**. Another type of transfer possible is the **loan**, when a player, with a contract with a club, joins a new team for a pre-defined amount of time, negotiated between the two clubs.

In this project, we model the football transfer market as a network. Each node represents a club. All nodes have an attribute: the league the club is part. A transfer between two clubs is encapsulated as an edge. As for the nodes, each edge stores information about its transfer, like player or clubs characteristics. When a player leaves club A and sign a new contract with club B, this is represented as a directed edge, from club A to club B. Note that two players might leave club A and join club B: this is represented with two directed edges, from club A node to club B node. Thus, this project will deal with a **Multi Directed Network**.

The question we want to answer with this data is: Does the monetary transfers have the same characteristics as the loans on the free transfers? What is the kind of clubs participating in these type of transfers? What are the differences between clubs doing a lot of monetary transfers and the clubs doing loans? What are the interactions between clubs of the same country? Of the same league division? We will also look at the players and try to create a player's profile for each type of transfers. We will also look exclusively at the monetary transfers and try to understand how the money flows between clubs, which are the *key* clubs in this network, which clubs have the more effect on the amount of money involving transfers?

To answer these questions, we will mainly rely on centralities measures, like the degree centrality or the PageRank centrality. Diffusion models will also be a key part of our analysis of monetary transfers.

# Data Acquisition and Preparation

> The notebook with the code detailed in this section is **`1. Buid Data`** into our github repository.

Our analysis will be based on the data available on [Transfer Mart](https://www.transfermarkt.com/), one of the most complete open platform about Football data. This website is often used by journalists to estimate the monetary value of a player in the transfer market. Regarding, our project and the analysis we wish to do, the website most important features is its transfers history. Almost all transfers happening in the football world, from the one involving millions of millions to the ones between two small clubs playing in a small division of eastern Europe, are recorded and categorized. 

For example, if someone wants to know all transfers that were made on the 12th of August 2015, the website has this information accessible at [https://www.transfermarkt.com/transfers/transfertagedetail/statistik/top/land_id_zu/0/land_id_ab/0/leihe//datum/2015-08-12/page/1](https://www.transfermarkt.com/transfers/transfertagedetail/statistik/top/land_id_zu/0/land_id_ab/0/leihe//datum/2015-08-12/page/1). Within this URL, three things need to be stated:

- Once the page is open, we can see that all transfers are organized into an array, with information about the player, the selling club, the buying club and the transfer's type, among all other information. We can also notice that the player and clubs are represented with a *hyperlink* attached to it, enabling us to have the URL will all other information for a player or a club.
- The date is present just after **datum** in the URL, in format *year-month-day*. To get the data for transfers between 2015 and 2016, we will use this URL with the appropriate date.
- For a given day, it might have been a lot of transfers. The website only shows 25 transfers per HTML page, but we can iterate through all pages with the last component of the URL.

For each transfer, the website stores a lot of information, from the player's name to the selling club director. Only a subset of those records are of interest for our project:

- Player attributes:
    - **Player Name**: Name of the player
    - **Player Link**: *Transfermarkt** URL for the player's profile
    - **Player position**: Position of the player
    - **Age**: Age of the player at the time of the transfer


- Transfer money:
    - **Fee**: Monetary value, if any, of the transfer
    - **Market value**: Theoretical value of the player, computed by Transfermarkt.com

- Clubs
    - **From club**: Club/Team that the player leaves
    - **To club**: Club/Team that the player joins.
    - **From manager**: Manager of the club that the player leaves.
    - **To manager**: Manager of the club that the player joins.
    - **From manager link**: Transfermarkt** URL for the manager of the club that the player leaves.
    - **To manager link**: Transfermarkt** URL for the manager of the club that the player joins.
    
    
- Competitions
    - **From competition**: Competition/League where the `from club` participates in
    - **To competition**: Competition/League where the `to club`  participates in

**Web scraping strategy**:
- `Transfermarkt.com` has an URL for each transfers occurring at a specific date.
- The transfers happening on a specific day can be spread across multiple pages.
- For each transfer, a detailed version - containing the information we are interested in - is available through a link.
- Create one csv file per day. At the end, merge all csv files into one (so if an error occurs, no need to start everything)
- All transfers happening in **2015** and **2016** will be retrieved.

Once retrieved, all *.csv* files will be merged into a single one and feed to **Pandas** to build a Dataframe. The data looks like:

In [2]:
import pandas as pd

df = pd.read_csv("data/data.csv", index_col=0)
df.sample(5)

Unnamed: 0,Player Name,Player Link,Player position,From club,To club,From competition,To competition,From manager,From manager link,To manager,To manager link,Market value,Fee,Age,From manager agent,To manager agent,Player Agent
37094,marco-berardi,/marco-berardi/profil/spieler/240191,Central Midfield,Fiorentina,Tuttocuoio,Serie A,Prima Divisione - A,Paulo Sousa,/paulo-sousa/profil/trainer/8369,Luca Fiasconi,/luca-fiasconi/profil/trainer/48840,75 Th. €,Loan,20 years 03 months 24 days,EUROPE SPORTS GROUP,,
42945,adrian-colunga,/adrian-colunga/profil/spieler/57866,Centre-Forward,RCD Mallorca,Anorthosis,Segunda División,Division A,Fernando Vázquez,/fernando-vazquez/profil/trainer/16535,Antonio Puche,/antonio-puche/profil/trainer/9334,700 Th. €,Free transfer,31 years 09 months 14 days,Eugenio Botas ...,,Marco Kirdemir
3923,harry-beautyman,/harry-beautyman/profil/spieler/130752,Central Midfield,Welling,Peterborough,Conference National,League One,Jody Brown,/jody-brown/profil/trainer/37951,Darren Ferguson,/darren-ferguson/profil/trainer/4298,50 Th. €,?,22 years 09 months 30 days,,,
10796,gianmario-comi,/gianmario-comi/profil/spieler/130324,Centre-Forward,AC Milan,AS Livorno,Serie A,Serie B,Siniša Mihajlović,/sini-scaron-a-mihajlovi%C4%87/profil/trainer/...,Christian Panucci,/christian-panucci/profil/trainer/23364,750 Th. €,Loan,23 years 03 months 26 days,,,
615,juuso-aalto,/juuso-aalto/profil/spieler/224607,Left Wing,JIPPO,Ekenäs IF,Kakkonen,Ykkönen,Mark Dziadulewicz,/mark-dziadulewicz/profil/trainer/8042,Jens Mattfolk,/jens-mattfolk/profil/trainer/38295,-,Free transfer,24 years 04 months 15 days,,,


This data needs to be modified a bit in order to use it perfectly for our analyses:
- Create the **transfer type**: As stated above, there is not a unique type of transfers in the football market. The website stores the type of a transfer, but all under the same column and with several and different formats. Based on all possible entries for the **transfer fee** column in *Transfermarkt.com*, we create **four types** of transfers: **Free**, **loan**, **swap** and monetary **transfer**. Note that the *swap* transfers will be discarded due to the small amount of transfers.
- For the monetary type and the loans, some money might have been involved in the transfer. This information is also stored under the **transfer fee** column in *Transfermatk.com*. The fee specified will be extracted and stored in a new column in our data under the *integer* type.
- During the web scraping phase, whitespaces have been appended with **player position** data. Those need to be removed.
- The each of the player is stored as a string, with multiple formats: sometimes with the year, month and day information (*21 years 09 months 04 days*) or sometimes only with the years and month (*31 years and 01 months*). The age will be converted into a single float variable, stored into a new column.

Finally, the data is ready to be used for creating networks. **Five networks** are created:
- All transfers
- Only monetary transfers
- Only loans
- Only free transfers
- Only swap transfers

All networks are **Multi Directed Graph**. Below is the output of NetworkX's function *info*:

> Name: loan<br/>
> Type: MultiDiGraph<br/>
> Number of nodes: 2664<br/>
> Number of edges: 10773<br/>
> Average in degree:   4.0439<br/>
> Average out degree:   4.0439<br/>

> Name: swap<br/>
> Type: MultiDiGraph<br/>
> Number of nodes: 76<br/>
> Number of edges: 87<br/>
> Average in degree:   1.1447<br/>
> Average out degree:   1.1447<br/>

> Name: transfer<br/>
> Type: MultiDiGraph<br/>
> Number of nodes: 1124<br/>
> Number of edges: 2913<br/>
> Average in degree:   2.5916<br/>
> Average out degree:   2.5916<br/>

> Name: free<br/>
> Type: MultiDiGraph<br/>
> Number of nodes: 3851<br/>
> Number of edges: 30994<br/>
> Average in degree:   8.0483<br/>
> Average out degree:   8.0483<br/>

# Analysis

In [2]:
import networkx as nx

In [12]:
G_monetary  = nx.read_gml("networks/transfers_transfer_network.gml")
G_loans     = nx.read_gml("networks/transfers_loan_network.gml")
G_free      = nx.read_gml("networks/transfers_free_network.gml")

## Centralities analysis

In a first touch with the data, we want to compute some centralities measures and understand the differences betwenn the three types of transfers for each measures. 

Please refer to the notebook **`Transfers vs Loan vs Free`** for the complete code. Below, only the findings are presented.

### Centralities analysis - Club Level

**In-degree**

The **in-degree** centrality deals with the number of new players clubs acquire. In the **monetary transfers**, we can see clubs from primary leagues, with centralities values close one to the others. The **free** and **loan** network versions have top-ranked club less popular, from inferior divisions leagues. This is a first step in confirming the above assumption: popular clubs with money, which are fewer, participate in less transfers and in the vast majority of the monetary ones.

**Out-degree**

- we can see again that popular clubs are the owns *selling* the most players. 
- The **free** ranking is a bit different, compare to the *in-degree* version. There is almost only clubs from primary divisions. One assumption could be that players in the final phase of their careers leave a mid-table club for free and return to their home country league. It will be interesting to compare those findings with the ones taking player's age into account.
- The **loan** version contains clubs exclusively from primary divisions. This makes sens, as top clubs have the habit to buy young and promising players and directly "send" them to less stressful teams in order to win experiences. This ranking is composed a lot of italian clubs and the club leading this ranking is **Juventus**, the previous winner of the primary italian division. We can note that in the **in-degree** version, the *loan* ranking was also composed of a lot of italian clubs, but this time from inferior divisions. One possible explanation is that clubs from primary division "send" their young players to teams playing in the secondary and tertiary italian divisions.

> **Interesting fact**: In the **free** ranking, the club **Parma** appears first, with a centrality value 60% higher than the second one. Why ? Because this football club had has financial troubles and had to declare bankruptcy in 2015. Thus a lot of player left the club for free.
    
> **Interesting fact #2**: As pointed out previously, there is a lot of italian clubs in the **loan** rankings, both in and out degree versions. This might be a cultural thing: in Italy, some clubs have a reputation of being **farm clubs** club. taking young players on loan from bigger teams. Why in Italy mainly ? Because the italian rules allow italian clubs [to *co-own* players](http://www.bbc.com/sport/football/34125476).

**Closeness**

As previously, one top-clubs compose the monetary transfers ranking. Those clubs are all european and mainly from England and italy. It's interesting to know that Watford, the ranking leader, was managed in 2016/2016 by an italian manager and it's now managed by a portuguese one. The closeness centralities values are pretty close one to the others.

The loan ranking is interesting, as there is some portuguese clubs. Portuguese clubs are known for making transfers with south-american players, mostly Brazilian ones, wishing to have a career in Europe. Why Portugal? Because of the cultural and language proximity. Those players are then sell to higher-value european clubs, or go back at home in case of failure. Thus, founding portuguese clubs in this ranking isn't surprising.

We can notice that in this centrality rankings, almost all clubs are from first division leagues.


**Betwenness**
The betweenness rankings are a bit more difficutly to make sens of. As before, we can notice that each of those has italian clubs among the top ones.

**PageRank**

In the **transfer** version, weighted with the number of transfers, almost all clubs are from England, primary or secondary division mixed. But in the **free** ranking, there is no club from a first division, only *small* clubs. Interestingly, first both clubs are from Switzerland. We note the same behavior on the **loan** ranking, mainly composed of clubs from inferior divisions. This conforts our previous findings: clubs from the primary division are mainly important in the monetary transfers, but not so in the loans and free version.

In the transfer version, but weighted with transfer's fees, most *famous* clubs are present. There is mainly clubs from England, with a lot of financial power, or clubs recently bought by new investors, like Paris SG and Valencia.

### Centralities analysis - League Level

Perform the same centralities analysis, but this time at the league level. All clubs evolving within the same league will be merged together.

**In-degree**

We first see that there is a much bigger variation in the centralities values compare to the club version.

In the **transfer** ranking, the first two leagues are the primary leagues from **China** and **Turkey**. It's interesting since no clubs from those leagues were present in the club ranking version. This means that chinese and turkish clubs are pretty homogeneous in terms of transfers: there is no club that does more transfer than the others, they all follow the same transfer strategy. Clubs from the **Jupiler Pro League**, from Belgium, also have the same type of attitude. 

Even if the leagues in this ranking are mainly issued from primary division, two secondary division leagues are present: **Championship** is the England second division and one of the "richest" league in Europe, **Serie B** is from Italy and confirms a fact already observed with clubs - Italian clubs where highly active in monetary transfers last years.

> Note that the network doesn't take transfer value in consideration.

Among the top 3 leagues in the **loan** ranking, the primary and secondary divisions from **Spain** are present. And once again, there was no spanish club in the same ranking at club-level. Following the same analysis as for Chinese and Turkish clubs in the monetary transfers, this means that spanish clubs have a *football loan* culture, as for Italy. But the difference is that in Italy, only a subset of clubs participate in this type of transfers, where in Spain it appears that much more clubs follow the same "loan strategy". Similar behavior can be observer with **Portuguese** clubs (*Primeira Liga* and *Segunda Liga*).

**Out-degree**
Comparing the in-degree and out-degree monetary transfers ranking, we can notice that Chinese and Turkish leagues don't appear in the second one: In 2015-2016, football players go more often to those countries that they leave. This is expected, as both countries have increase football interests recently. Another expected fact: find the Brazilian primary league in this out-degree ranking. The brazilian league is present in the top-10 ranking for monetary transfers, and is at second position in the loan rankings. This must be linked directly with Spanish and Portuguese clubs in the in-degree ranking version.

The german league (1. Bundesliga) in present in the top-5 of both version of the monetary transfers ranking: german top clubs seem to have less stable teams in those last years.

The free versions of the in and out degree centralities are the only ones to contains leagues from eastern Europe, where there is less money flowing. But money flowing in leagues isn't the key differentiator, because the english league is present everywhere. Compare to the two other rankings, the free ones have a lot of overlap between the in and out degree version.

**Closeness**

First, we can notice that the closeness centralities are all high, almost at 0.5.

The monetary ranking is coherent with the in-degree ranking, with the Turkish and Chinese leagues in the top of rankings. One more time, the Championship (England) is the only secondary division league present in the monetary ranking. This clearly demonstrates the importance of this league in the transfer market. But this centrality measure also gives results more complicated to analyse: the belgium and swiss leagues appear in this ranking, quite surprisingly. Otherw§ise, the big 5 is always present: Germany, Italy, England, France and Spain.

In the free transfers ranking, we can directly notice that all centralities values are pretty close one to the others. There is no league with a lot of money flowing in and out, the ranking is mainly composed of inferiors division. The fact of having such close centralities values also means that there is no outstanding or central league, they are close in a network point of view.

The loan ranking also has close centralities values. We can notice, as before, the presence of the primary and secondary divisions from Spain and Portugal.

**Betweenness**

All centralities values are small, but in can see as before that the **top five** plus Swiss and Belgium leagues are present in the **monetary transfers** ranking. Western Europe is really the central place for monetary transfers.

As for the *closeness centrality*, the rankings for **free** and **loan** transfers are very different from the ones of in/out-degree, with a mix of inferior division and some primary ones. We can note that the **Premier League** is present everywhere.

**PageRank**

The pagerank centrality for the transfer version is interesting, with the **Chinese** league being first, with a centrality value bigger than the second one. This is due to the massive offensive of chinese clubs in the transger market these last three years. Most surprisingly, the Turkish and Belgium league appear in this top 5. Below, the centralities are very close ones to the others, but we can se that a small spanish division seems important. The free and loan rankings are mainly composed of inferior division, once more asserting our assumption of popular and powerful clubs making business almost entirely with monetary transfers.

## Communities

In [16]:
# import the Louvain algorithm
import community as community
import operator

Take only a subset of the graph: All edges where one of the two nodes is part of the main european leagues.

In [17]:
leagues = ['Premier League', 'Championship', 'Serie A', 'Ligue 1', '1.Bundesliga', 'Primera División', 'Primeira Liga', ]

In [18]:
for G in [G_monetary, G_free, G_loans]:
    G_new = nx.MultiDiGraph()

    print("================================================", G.name,"================================================")
    print()
    
    for n1,n2,e in G.edges(data=True):
        if G.node[n1]['competition'] in leagues or G.node[n2]['competition'] in leagues:
            G_new.add_edge(n1,n2)

    # compute the best partition
    partition = community.best_partition(G_new.to_undirected())

    size = len(set(partition.values()))
    print('The number of communities: ', size)

    # For each communitiy, group the students per major
    nbrToPrint = 5

    for i in range(size):
        # Retrieve nodes inside the community
        community_i = [nodes for nodes in partition.keys() if partition[nodes] == i]

        # Majors frequency dict
        majors = {}
        for n in community_i:
            major = G.nodes[n]['competition']
            majors[major] = majors.get(major, 0) + 1

        tot = sum(majors.values())
        majors = sorted(majors.items(), key=operator.itemgetter(1), reverse=True)

        print("Community",i)
        for en,c in enumerate(majors[:nbrToPrint]):
            print("\t{}.   {:<35}{:<5}({:<7.2f}%)".format(en+1, c[0], c[1], 100*c[1]/tot))
        print()


The number of communities:  13
Community 0
	1.   1.Bundesliga                       17   (22.37  %)
	2.   2.Bundesliga                       13   (17.11  %)
	3.   Raiffeisen Super League            6    (7.89   %)
	4.   3.Liga                             6    (7.89   %)
	5.   Bundesliga                         3    (3.95   %)

Community 1
	1.   Ligue 1                            5    (13.16  %)
	2.   Campeonato Brasileiro Série A      4    (10.53  %)
	3.   Premier League                     3    (7.89   %)
	4.   Primera División                   3    (7.89   %)
	5.   Chinese Super League               3    (7.89   %)

Community 2
	1.   Championship                       23   (22.12  %)
	2.   Premier League                     12   (11.54  %)
	3.   League One                         12   (11.54  %)
	4.   League Two                         10   (9.62   %)
	5.   Eredivisie                         6    (5.77   %)

Community 3
	1.   Primera División                   13   (22.81  %)
	2.  

In the **loan transfers** and **free transfers** network, the communities are clearly organized after the countries. Primary, secondary and tertiary divisions all always cluster together. There is also some more hybrid communities, like the ones composed of leagues from Portugal and Brazil or Spain and South America countries. The language plays a crucial role in these types of transfers. As for the Portugal-Brazil or Spanish-speaking countries, France and the Belgium primary divisions are clustered together.

> **Interesting fact**: In the free transfer, the community containing french clubs also contains clubs from the **Qatar Stars League**. This league has never showed up in the analysis before,

The **monetary transfers** communities are also tied to this notion of countries, but in a less strong way. Within this network, there are some communities composed mainly by clubs from one country (Germany, England, France), but also communities composed of clubs from more diverse countries, like *Community 3* with clubs from Portugal, Spain, France, Brazil and China. One conclusion that can be made is that when it comes to loans and free transfers, clubs have a preference to deal with close clubs, regarding the country and language. This "restriction* is less obvious when money is involved. This observation was expected: the more good a player is, the more money its transfer will cost. Good players have the tendency to join big european clubs, thus leaving their home countries in most of the cases.

## Transfers distribution by age

<img src="figures/Number of transfers by age.png">

<img src="figures/Average value of transfers by age.png">

<img src="figures/Number of loans by age.png">

Distribution of transfers by age is a normal distribution with mean 23-24 years. Distribution of loans by age is is biased relative to the transfer's distribution in the direction of younger age. The reason is that young players usually loaned by their home club and sold only in more mature age because price for experienced players is higher. On the graph with average transfer value we can find that after 17 transfer value of players more than twice higher that before 17. Also, it is gradually growing and this is one of the most important reason for loaning young players and sell them only in mature age. 
Total value of transfers by age is normally distributed, because average transfer value of players between 20 and 30 is almost equal but number of transfers is normally distributed.

In [10]:
st = """LOANS AT THE CLUB LEVEL
----- Top-10 clubs with high value of in loans under 21 years -----
Paganese -- Prima Divisione - Girone C --- 17
Siena -- Prima Divisione - B --- 15
Pontedera -- Prima Divisione - B --- 15
SPAL -- Prima Divisione - B --- 12
Olbia -- Serie D - Girone G --- 12
Latina Calcio -- Serie B --- 11
Como -- Serie B --- 11
Ascoli -- Prima Divisione - B --- 11
Piacenza -- Serie D - Girone B --- 10
Mantova FC -- Prima Divisione - A --- 10

----- Top-10 clubs with high value of out loans under 21 years -----
Lazio -- Serie A --- 22
Udinese Calcio -- Serie A --- 18
Fiorentina -- Serie A --- 17
Entella U19 -- Campionato Primavera Girone A --- 16
Chelsea U23 -- U21 Premier League --- 13
Ascoli U19 -- Campionato Primavera Girone C --- 12
Tottenham U21 -- U21 Premier League --- 11
Novara -- Prima Divisione - A --- 11
Hellas Verona -- Serie A --- 11
Spezia Calcio -- Serie B --- 10


----- Top-10 clubs with high value of in loans between 22 and 24 years -----
Carpi -- Serie B --- 13
Teramo -- Prima Divisione - B --- 11
Pro Vercelli -- Serie B --- 11
Joinville -- Campeonato Brasileiro Série A --- 11
Hellas Verona -- Serie A --- 11
Vicenza -- Serie B --- 10
Olhanense -- Segunda Liga --- 10
Portsmouth FC -- League Two --- 9
Paraná -- Campeonato Brasileiro Série B --- 9
Hartlepool Utd. -- League Two --- 9


----- Top-10 clubs with high value of out loans between 22 and 24 years -----
Novara -- Prima Divisione - A --- 12
Independiente -- Serie A Segunda Etapa --- 12
Genoa -- Serie A --- 12
Benfica B -- Segunda Liga --- 11
Aston Villa -- Premier League --- 11
Swansea -- Premier League --- 10
Crystal Palace -- Premier League --- 10
Burnley -- Championship --- 10
HNK Rijeka -- 1.HNL --- 9
Genclerbirligi -- Süper Lig --- 9


----- Top-10 clubs with high value of in loans between 25 and 29 years -----
Dorados de Sin. -- Ascenso MX Clausura --- 13
EC Bahia -- Campeonato Brasileiro Série B --- 12
EC Vitória -- Campeonato Brasileiro Série B --- 10
Santa Cruz -- Campeonato Brasileiro Série B --- 9
Real Valladolid -- Segunda División --- 8
Ponte Preta -- Campeonato Brasileiro Série A --- 8
Leeds -- Championship --- 8
Carpi -- Serie B --- 8
Birmingham -- Championship --- 8
Atlas -- Liga MX Clausura --- 8


----- Top-10 clubs with high value of out loans between 25 and 29 years -----
Norwich -- Championship --- 11
Sunderland -- Premier League --- 10
SSC Napoli -- Serie A --- 10
Galatasaray -- Süper Lig --- 10
Burnley -- Championship --- 9
Udinese Calcio -- Serie A --- 8
Sagan Tosu -- J. League Division 1 – Second Stage --- 8
East Bengal -- I-League --- 8
Bengaluru FC -- I-League --- 8
Belgrano -- Primera División --- 8


----- Top-10 clubs with high value of in loans after 30 -----
Atlas -- Liga MX Clausura --- 5
Veracruz -- Liga MX Clausura --- 4
Wigan -- Championship --- 3
Necaxa -- Ascenso MX Clausura --- 3
Kerala Blasters -- Indian Super League --- 3
Fiorentina -- Serie A --- 3
Chesterfield FC -- League One --- 3
Woking FC -- Conference National --- 2
Watford -- Championship --- 2
TB Tvöroyri -- Effodeildin --- 2


----- Top-10 clubs with high value of out loans after 30 -----
Tigres -- Liga MX Clausura --- 6
GZ Evergrande -- Chinese Super League --- 4
Santos -- Liga MX Clausura --- 3
Rotherham -- Championship --- 3
Racing Club -- Primera División --- 3
Dorados de Sin. -- Ascenso MX Clausura --- 3
Timbers -- Major League Soccer --- 2
Sunderland -- Premier League --- 2
SSC Napoli -- Serie A --- 2
Querétaro FC -- Liga MX Clausura --- 2
"""
st1 = """LOANS AT THE LEAGUE LEVEL
----- Top-10 leagues with high value of in loans under 21 years -----
Primera División --- 31
Conference National --- 25
League One --- 24
Prima Divisione - B --- 23
Serie B --- 22
Serie D - Girone C --- 21
Prima Divisione - Girone C --- 21
League Two --- 21
USL Pro --- 19
Serie D - Girone E --- 19


----- Top-10 leagues with high value of out loans under 21 years -----
U21 Premier League --- 30
Premier League --- 24
Serie B --- 23
Championship --- 23
Serie A --- 20
Primera División --- 20
League One --- 19
Prima Divisione - A --- 18
Ligue 1 --- 18
Prima Divisione - B --- 17


----- Top-10 leagues with high value of in loans between 22 and 24 years -----
Primera División --- 41
USL Pro --- 27
League Two --- 24
League One --- 24
Serie B --- 23
Prima Divisione - Girone C --- 23
Premier Liga --- 23
Championship --- 23
Segunda Liga --- 22
Campeonato Brasileiro Série B --- 22


----- Top-10 leagues with high value of out loans between 22 and 24 years -----
Primera División --- 42
Premier League --- 25
Championship --- 25
Premier Liga --- 24
Serie B --- 21
Serie A --- 20
Major League Soccer --- 20
League One --- 20
Campeonato Brasileiro Série A --- 20
Segunda División --- 18


----- Top-10 leagues with high value of in loans between 25 and 29 years -----
Primera División --- 46
Premier Liga --- 25
Serie B --- 23
Premier League --- 23
League One --- 23
Championship --- 22
USL Pro --- 21
League Two --- 21
Serie A --- 20
Segunda División --- 20


----- Top-10 leagues with high value of out loans between 25 and 29 years -----
Primera División --- 41
Championship --- 25
Premier League --- 24
Premier Liga --- 23
League One --- 21
Serie A --- 20
Campeonato Brasileiro Série A --- 19
Serie B --- 18
Major League Soccer --- 18
Liga MX Clausura --- 17


----- Top-10 leagues with high value of in loans after 30 -----
Primera División --- 19
League One --- 18
Championship --- 16
Serie A --- 14
Serie B --- 13
League Two --- 13
Liga MX Clausura --- 12
Campeonato Brasileiro Série A --- 11
Ascenso MX Clausura --- 11
Conference National --- 10


----- Top-10 leagues with high value of out loans after 30 -----
Primera División --- 18
Liga MX Clausura --- 16
Championship --- 15
Serie A --- 14
League Two --- 14
Premier League --- 13
League One --- 13
Campeonato Brasileiro Série A --- 12
Serie B --- 10
Premier Liga --- 9

"""
st2 = """TRANSFERS AT THE CLUB LEVEL
----- Top-10 clubs with high value of in transfers under 21 years -----
VfB Stuttgart -- 1.Bundesliga --- 3
G. Bordeaux -- Ligue 1 --- 3
Everton -- Premier League --- 3
Trabzonspor -- Süper Lig --- 2
Sevilla Atl. -- Segunda División B - Grupo IV --- 2
Sampdoria U19 -- Campionato Primavera Girone A --- 2
RM Castilla -- Segunda División B - Grupo II --- 2
RB Salzburg -- Bundesliga --- 2
Olympique Lyon -- Ligue 1 --- 2
OGC Nice -- Ligue 1 --- 2


----- Top-10 clubs with high value of out transfers under 21 years -----
OFK Beograd -- SuperLiga --- 3
LOSC Lille -- Ligue 1 --- 3
FC Schalke 04 -- 1.Bundesliga --- 3
Empoli U19 -- Campionato Primavera Girone C --- 3
AS Trencin -- Corgon liga --- 3
Valenciennes FC -- Ligue 2 --- 2
Troyes -- Ligue 2 --- 2
São Paulo -- Campeonato Brasileiro Série A --- 2
Spezia Calcio -- Serie B --- 2
Sheffield Utd. -- League One --- 2


----- Top-10 clubs with high value of in transfers between 22 and 24 years -----
SC Freiburg -- 1.Bundesliga --- 5
Real Madrid -- Primera División --- 5
SM Caen -- Ligue 1 --- 4
FC Schalke 04 -- 1.Bundesliga --- 4
AZ Alkmaar -- Eredivisie --- 4
AFC Ajax -- Eredivisie --- 4
Red Star -- SuperLiga --- 3
Panathinaikos -- Super League --- 3
HNK Rijeka -- 1.HNL --- 3
FC Nantes -- Ligue 1 --- 3


----- Top-10 clubs with high value of out transfers between 22 and 24 years -----
Marseille -- Ligue 1 --- 6
RB Salzburg -- Bundesliga --- 5
Pescara -- Serie B --- 4
Monaco -- Ligue 1 --- 4
Cukaricki -- SuperLiga --- 4
Benfica -- Primeira Liga --- 4
Wigan -- Championship --- 3
US Palermo -- Serie A --- 3
Swindon Town -- League One --- 3
Sparta Praha -- Gambrinus Liga --- 3


----- Top-10 clubs with high value of in transfers between 25 and 29 years -----
Newcastle -- A-League --- 10
Burnley -- Championship --- 8
Real Betis -- Segunda División --- 6
Crystal Palace -- Premier League --- 6
Wolves -- Championship --- 5
Torino -- Serie A --- 5
TJ Teda -- Chinese Super League --- 5
SV Darmstadt 98 -- 2.Bundesliga --- 5
Middlesbrough -- Championship --- 5
Juventus -- Serie A --- 5


----- Top-10 clubs with high value of out transfers between 25 and 29 years -----
Newcastle -- A-League --- 7
LOSC Lille -- Ligue 1 --- 7
Blackburn -- Championship --- 7
Liverpool -- Premier League --- 6
1.FSV Mainz 05 -- 1.Bundesliga --- 6
RB Salzburg -- Bundesliga --- 5
Montpellier -- Ligue 1 --- 5
Independiente -- Serie A Segunda Etapa --- 5
Dinamo Moscow -- Premier Liga --- 5
Cardiff -- Championship --- 5


----- Top-10 clubs with high value of in transfers after 30 -----
SSC Napoli -- Serie A --- 2
Monterrey -- Liga MX Clausura --- 2
Lechia Gdansk -- Ekstraklasa --- 2
Everton -- Premier League --- 2
Estudiantes -- Primera División --- 2
Aston Villa -- Premier League --- 2
Zenit S-Pb -- Premier Liga --- 1
Zamalek -- Egyptian Premier League --- 1
Wolves -- Championship --- 1
Willem II -- Eredivisie --- 1


----- Top-10 clubs with high value of out transfers after 30 -----
Sunderland -- Premier League --- 2
Slask Wroclaw -- Ekstraklasa --- 2
Sassuolo -- Serie A --- 2
Sampdoria -- Serie A --- 2
Newcastle -- A-League --- 2
Inter -- Serie A --- 2
Galatasaray -- Süper Lig --- 2
GZ Evergrande -- Chinese Super League --- 2
Crystal Palace -- Premier League --- 2
West Ham -- Premier League --- 1
"""
st3 = """TRANSFERS AT THE LEAGUE LEVEL
----- Top-10 leagues with high value of in transfers under 21 years -----
1.Bundesliga --- 18
Serie A --- 16
Premier League --- 15
Primera División --- 11
Ligue 1 --- 10
Championship --- 10
Süper Lig --- 8
2.Bundesliga --- 8
Eredivisie --- 7
League One --- 6


----- Top-10 leagues with high value of out transfers under 21 years -----
Primera División --- 12
Ligue 1 --- 12
1.Bundesliga --- 11
Serie A --- 10
Serie B --- 8
Premier League --- 8
Ligue 2 --- 8
Championship --- 8
Campeonato Brasileiro Série A --- 8
SuperLiga --- 7


----- Top-10 leagues with high value of in transfers between 22 and 24 years -----
Primera División --- 24
Championship --- 20
Serie A --- 19
Premier League --- 19
1.Bundesliga --- 18
Ligue 1 --- 15
2.Bundesliga --- 14
Serie B --- 13
Chinese Super League --- 13
Liga MX Clausura --- 12


----- Top-10 leagues with high value of out transfers between 22 and 24 years -----
Primera División --- 29
1.Bundesliga --- 18
Premier League --- 17
Serie A --- 16
Ligue 1 --- 16
Eredivisie --- 16
Championship --- 15
Campeonato Brasileiro Série A --- 13
2.Bundesliga --- 12
Jupiler Pro League --- 11


----- Top-10 leagues with high value of in transfers between 25 and 29 years -----
Primera División --- 31
Championship --- 23
Serie A --- 19
Premier League --- 19
Chinese Super League --- 19
1.Bundesliga --- 19
Ligue 1 --- 17
Süper Lig --- 15
UAE Arabian Gulf League --- 14
Liga MX Clausura --- 14


----- Top-10 leagues with high value of out transfers between 25 and 29 years -----
Primera División --- 39
Serie A --- 20
Ligue 1 --- 20
Championship --- 20
1.Bundesliga --- 19
Premier League --- 18
Serie B --- 17
Campeonato Brasileiro Série A --- 17
Premier Liga --- 16
Eredivisie --- 15


----- Top-10 leagues with high value of in transfers after 30 -----
Serie A --- 14
Primera División --- 10
Premier League --- 10
Süper Lig --- 9
Chinese Super League --- 9
Championship --- 8
Campeonato Brasileiro Série A --- 8
Ligue 1 --- 7
Liga MX Clausura --- 7
UAE Arabian Gulf League --- 5


----- Top-10 leagues with high value of out transfers after 30 -----
Serie A --- 15
Premier League --- 14
Primera División --- 11
1.Bundesliga --- 10
Campeonato Brasileiro Série A --- 8
Championship --- 7
Süper Lig --- 6
Ligue 1 --- 6
Liga MX Clausura --- 6
Super League --- 5
"""

## Dividing loans and transfers into 4 age groups

Initilaly, we tried to find best football academies. For determination of best academies we decided to calculate number of transfers and loans of a players of a different age groups: under 21, between 22 and 24, between 25 and 29, and after 30. Then, find clubs with high value of transfers of young palyers we can determine these types of clubs.

Also, by this information we can find clubs which make business on players reselling. They should have high value of in-transfers of young player, out-loans, and out-transfer but second age group (22-24 years).

### Agre groups:
 - under 21
 - 22 - 24 years
 - 25 - 29 years
 - after 30

In [11]:
print(st)

LOANS AT THE CLUB LEVEL
----- Top-10 clubs with high value of in loans under 21 years -----
Paganese -- Prima Divisione - Girone C --- 17
Siena -- Prima Divisione - B --- 15
Pontedera -- Prima Divisione - B --- 15
SPAL -- Prima Divisione - B --- 12
Olbia -- Serie D - Girone G --- 12
Latina Calcio -- Serie B --- 11
Como -- Serie B --- 11
Ascoli -- Prima Divisione - B --- 11
Piacenza -- Serie D - Girone B --- 10
Mantova FC -- Prima Divisione - A --- 10

----- Top-10 clubs with high value of out loans under 21 years -----
Lazio -- Serie A --- 22
Udinese Calcio -- Serie A --- 18
Fiorentina -- Serie A --- 17
Entella U19 -- Campionato Primavera Girone A --- 16
Chelsea U23 -- U21 Premier League --- 13
Ascoli U19 -- Campionato Primavera Girone C --- 12
Tottenham U21 -- U21 Premier League --- 11
Novara -- Prima Divisione - A --- 11
Hellas Verona -- Serie A --- 11
Spezia Calcio -- Serie B --- 10


----- Top-10 clubs with high value of in loans between 22 and 24 years -----
Carpi -- Serie B --- 1

In [12]:
print(st1)

LOANS AT THE LEAGUE LEVEL
----- Top-10 leagues with high value of in loans under 21 years -----
Primera División --- 31
Conference National --- 25
League One --- 24
Prima Divisione - B --- 23
Serie B --- 22
Serie D - Girone C --- 21
Prima Divisione - Girone C --- 21
League Two --- 21
USL Pro --- 19
Serie D - Girone E --- 19


----- Top-10 leagues with high value of out loans under 21 years -----
U21 Premier League --- 30
Premier League --- 24
Serie B --- 23
Championship --- 23
Serie A --- 20
Primera División --- 20
League One --- 19
Prima Divisione - A --- 18
Ligue 1 --- 18
Prima Divisione - B --- 17


----- Top-10 leagues with high value of in loans between 22 and 24 years -----
Primera División --- 41
USL Pro --- 27
League Two --- 24
League One --- 24
Serie B --- 23
Prima Divisione - Girone C --- 23
Premier Liga --- 23
Championship --- 23
Segunda Liga --- 22
Campeonato Brasileiro Série B --- 22


----- Top-10 leagues with high value of out loans between 22 and 24 years -----
Primera 

In [13]:
print(st2)

TRANSFERS AT THE CLUB LEVEL
----- Top-10 clubs with high value of in transfers under 21 years -----
VfB Stuttgart -- 1.Bundesliga --- 3
G. Bordeaux -- Ligue 1 --- 3
Everton -- Premier League --- 3
Trabzonspor -- Süper Lig --- 2
Sevilla Atl. -- Segunda División B - Grupo IV --- 2
Sampdoria U19 -- Campionato Primavera Girone A --- 2
RM Castilla -- Segunda División B - Grupo II --- 2
RB Salzburg -- Bundesliga --- 2
Olympique Lyon -- Ligue 1 --- 2
OGC Nice -- Ligue 1 --- 2


----- Top-10 clubs with high value of out transfers under 21 years -----
OFK Beograd -- SuperLiga --- 3
LOSC Lille -- Ligue 1 --- 3
FC Schalke 04 -- 1.Bundesliga --- 3
Empoli U19 -- Campionato Primavera Girone C --- 3
AS Trencin -- Corgon liga --- 3
Valenciennes FC -- Ligue 2 --- 2
Troyes -- Ligue 2 --- 2
São Paulo -- Campeonato Brasileiro Série A --- 2
Spezia Calcio -- Serie B --- 2
Sheffield Utd. -- League One --- 2


----- Top-10 clubs with high value of in transfers between 22 and 24 years -----
SC Freiburg -- 1.Bu

In [14]:
print(st3)

TRANSFERS AT THE LEAGUE LEVEL
----- Top-10 leagues with high value of in transfers under 21 years -----
1.Bundesliga --- 18
Serie A --- 16
Premier League --- 15
Primera División --- 11
Ligue 1 --- 10
Championship --- 10
Süper Lig --- 8
2.Bundesliga --- 8
Eredivisie --- 7
League One --- 6


----- Top-10 leagues with high value of out transfers under 21 years -----
Primera División --- 12
Ligue 1 --- 12
1.Bundesliga --- 11
Serie A --- 10
Serie B --- 8
Premier League --- 8
Ligue 2 --- 8
Championship --- 8
Campeonato Brasileiro Série A --- 8
SuperLiga --- 7


----- Top-10 leagues with high value of in transfers between 22 and 24 years -----
Primera División --- 24
Championship --- 20
Serie A --- 19
Premier League --- 19
1.Bundesliga --- 18
Ligue 1 --- 15
2.Bundesliga --- 14
Serie B --- 13
Chinese Super League --- 13
Liga MX Clausura --- 12


----- Top-10 leagues with high value of out transfers between 22 and 24 years -----
Primera División --- 29
1.Bundesliga --- 18
Premier League --- 17


Using data above we can't say anything about football academies or business of growing players and reselling them because we have data only for 2 years (4 transfer windows). It is not enough for analysing these ideas because process of growing players takes at least 2-3 years.

But using this data we can find clubs and leagues which usually buy/loan players of specific age group. 

__Loans and transfers at the club level__

In the list of transfers of young players we can't find top clubs because they usually have academies and give chances for their players. Most of clubs in these lists represents lower leagues.

In the list of clubs selling players of 22-24 years I can highlite clubs which took high places in the championships but not very rich (Marseille, RB Salzburg, Monaco, Benfica). After achieving good results, the demand for their players has increased dramatically. 

In the lists of transfers of 25-29 years players are clubs from China, lower leagues (Championship, Segunda, 2nd Bundesliga) and non-top clubs (Newcastle, Torino). Probably Juventus is unexpected in this list, but this club know for buying age-old but upscale football players (Dani Alves, Khedira, Higuain).

In the lists of transfers of older than 30 years players we can't find clubs that bought or sold more than 2 players. But most of these clubs represent not top clubs. 

Most of clubs which take players for a loan at the age under 24 represent lower leagues, clubs that give palyers for a loan are junior teams or also clubs from lower leagues. Same trends we can find for players after 24 years, except junior team because they have no age-old players.


__Loans and transfers at the league level__

Most of leagues which give players for a loan are first leagues of countries and mostly they represent football countries. Places for their players they usually choose in the same country but in lower level. This pattern is observed in all age groups.


Main buyers and seller of players of all age are top-5 leagues - 1st leagues of Germany, Italy, England, Spain, France. But main leagues for loans are lower divisions and La Liga (I think because of regalations allowing loan and buy players from South America, while in England it's not allowed).

## Relation between club's pagerank and existence of other clubs in network

In [21]:
st4 = """--- Top-10 clubs with highes initial pagerank --- 

Atlético Madrid --- 0.02641
Juventus --- 0.0245
TSG Hoffenheim --- 0.02255
FC Porto --- 0.01896
Benfica --- 0.01736
LOSC Lille --- 0.01719
AS Roma --- 0.01708
Monaco --- 0.01633
Chelsea --- 0.01624
VfL Wolfsburg --- 0.01545"""

In [22]:
print(st4)

--- Top-10 clubs with highes initial pagerank --- 

Atlético Madrid --- 0.02641
Juventus --- 0.0245
TSG Hoffenheim --- 0.02255
FC Porto --- 0.01896
Benfica --- 0.01736
LOSC Lille --- 0.01719
AS Roma --- 0.01708
Monaco --- 0.01633
Chelsea --- 0.01624
VfL Wolfsburg --- 0.01545


In [4]:
print('\033[1m' + 'The most positive influence' + '\033[0m', '\n', 'First column - club influenced by','\n',
      'Value = Δ pagerank/initial pagerank', '\n', 'Club - club created the influence')
df = pd.read_csv('figures/results_max.csv')
df[:20]

[1mThe most positive influence[0m 
 First column - club influenced by 
 Value = Δ pagerank/initial pagerank 
 Club - club created the influence


Unnamed: 0.1,Unnamed: 0,Value,Club
0,Birkirkara FC,0.051335,TSG Hoffenheim
1,Athletic Bilbao,0.051335,TSG Hoffenheim
2,QFC,0.051335,TSG Hoffenheim
3,Balzan FC,0.051335,TSG Hoffenheim
4,Sporting Gijón,0.051335,TSG Hoffenheim
5,Rayo Vallecano,0.051335,TSG Hoffenheim
6,Genclerbirligi,0.051335,TSG Hoffenheim
7,Sivasspor,0.051335,TSG Hoffenheim
8,K. Erciyesspor,0.051335,TSG Hoffenheim
9,Valletta,0.051353,TSG Hoffenheim


In [5]:
print('\033[1m' + 'The most negative influence' + '\033[0m', '\n', 'First column - club influenced by','\n',
      'Value = Δ pagerank/initial pagerank', '\n', 'Club - club created the influence')
df1 = pd.read_csv('figures/results_min.csv')
df1[:20]

[1mThe most negative influence[0m 
 First column - club influenced by 
 Value = Δ pagerank/initial pagerank 
 Club - club created the influence


Unnamed: 0.1,Unnamed: 0,Value,Club
0,SC Bastia,-0.633748,Montpellier
1,Marítimo,-0.615484,FC Porto
2,AC Cesena,-0.599719,Sassuolo
3,Rafaela,-0.599354,CA Huracán
4,Leicester,-0.596961,TSG Hoffenheim
5,Olimpo,-0.592347,Newell's
6,FC Empoli,-0.585524,SSC Napoli
7,San Lorenzo,-0.561383,Sporting CP
8,Kasimpasa,-0.552947,Galatasaray
9,Konyaspor,-0.549079,Balikesirspor


In these two tables, we can find very surprising and unexplainable results. For example, disappearing of TSG Hoffenheim makes the highest positive influence for 11 clubs from different championships. The only way we can explain it is big number of transfers of TSG Hoffenheim in seasons 15/16 and 16/17. However, TSG Hoffenheim is on the third place in the top of the clubs with the highest weighted pagerank with weight as 'transferValue'.

Also, we can find some really significant clubs in 'The most positive influence' ratio. 'The most negative influence' makes more sense because most pairs of clubs in top-20 in this ratio are from one country, but still, there are no very influential clubs on this list.

## Money infection analysis

>The code developed for this section can be seen in **money_infection.ipynb**

Here we are interested to understand how money flows into the network.
Especially since the prices of the players seem to constantly grow, we consider that a "money infection" is spreading in the network. 

We try to model here what happens when a club sets a new monetary transfer record for a player, or pays a huge amount:

>If a club pays a huge amount of money for a player, then another club that sells a player to that club will set a higher price than with someone else, because they see that this one is able to spend huge quantities of money. Similarly, when a club sells a player for a huge amount of money, the club gets rich and can now buy other players for huge amounts of money. This is how we consider the "disease" spreading. 

For this analysis the network was transformed into a Directed graph instead of the Multi-directed graph previously used. Now an edge represents the total amount of money one club paid for players in the other club over the period 2015-2016.

<img src="figures/hist_nbr_DiGraph.png"> 

The histogram shows the number of edges that have been merged between the MultiDiGraph and the DiGraph. From it we can conclude that over that period the effect is relatively negligible and that a vast majority of the transactions between clubs concern only one transfer. 


In [4]:
DiTransfer = nx.read_gml("networks/single_directed_monetary_network.gml")

in_deg = DiTransfer.in_degree(weight = 'transferValue')
out_deg = DiTransfer.out_degree(weight = 'transferValue')

Infection_coef = {}
for node in DiTransfer.nodes():
    Infection_coef[node] = 0.2*out_deg[node]+0.8*in_deg[node]

Infection_coef_sorted = [(Infection_coef[link_key], link_key) for link_key in Infection_coef.keys()]
Infection_coef_sorted.sort(reverse=True)
#print(centrality_sorted)
max30Infection_coef = [key for value,key in Infection_coef_sorted[:30]]
print('30 Highest Infection coef :\n')
for club in max30Infection_coef:
    print(club)

30 Highest Infection coef :

Man City
Man Utd
Juventus
Chelsea
Atlético Madrid
Inter
Liverpool
Valencia CF
Paris SG
Monaco
Newcastle
Spurs
FC Barcelona
AS Roma
VfL Wolfsburg
Bor. Dortmund
Bayern Munich 
Watford
Southampton
Aston Villa
SSC Napoli
Arsenal
Real Madrid
Villarreal CF
Leicester
Sevilla FC
AC Milan
Everton
GZ Evergrande
JS Suning


First to set the model it was necessary to find a measure of how likely it would be for club to get infected. The developed measure takes into account the **volume** of money flowing in and out of a club, showing the habits of payment of a particular club. The measure is the following: 

$$ \text{Infection coefficient} = 0.8\cdot \text{in_degree[money]}+0.2\cdot \text{out_degree[money]} $$

This formula is a weighted average of the in- and out- degrees of a particular node meaning the following: 
"If the club spends a lot money (*players in_degree*) and receives as well a lot of money (*player out_degree*), then it is more likely that this club will be infected." 

The weight is stronger for the amount of money a club is able to spend, because we believe it is the main component of the assumed infection. In a simplified way we believe the behavior to be something like:

>"Oh, you are able to spend that much money for that guy? My player has similar performances, so I will also sell it to you at an exhorbitant price".

<img src="figures/hist_infection_coef.png">

The histogram of the infection coefficients shows well the uneven distribution with the long tail of high spending clubs. 

The first model was based on the following: 

- Initialize the infection with a club, with a high infection coefficient.
- Assuming that the club would follow the buying habit of the years 2015-2016.
- Set a differentiated probability edge proportional to the infection coef of a club neighbour.
- A club infected can infect more than one club.
- When a club does not have any precedence in the graph (no buying habit observed), it buys with equal probability from any club.

The infection was run for 200 time steps. At each time step an infected club has a certain probability of buying a player. When a club buys a player, it buys from it's neighbours at random with a probability proportional to the infection coefficient. 

The results can be observed in the following table:

 ><img src="figures/Statistics.png">
 
For each different initial conditions the analysis is repeated a bunch of times to average out the random effect of the disease spread. The mean and standard deviation of the epidemia can be observed in the final plot. 

The main clubs spreading the most the disease are Paris SG, Atlético Madrid, Man Utd. Those are all big club that make huge transactions. But overall strong tendencies cannot be observed on this data because of the huge standard deviation. There is no statistical evidence observable. 

The conclusion of that is that the model does not fit properly the reality and should be further modified to put a sort of a limit of spreading based on economic power.

# Conclusion

To conclude from this analysis we can conclude several things on this data: loans and transfers networks have in general different properties, age distributions among players vary as well. The most influential leagues and clubs are different on these networks as well. Most of the results were not surprising from a football point of view, but the analysis pointed out also some unexpected conclusions, for example the Pagerank existence of clubs analysis. 
The infection model does not reflect the reality properly and could be further improved. 

We also had questions that we couldn't answer such as which are the best football academies. Because this is more a long term that should be analyzed with data for several years. Only 2 years are not enough for that. 

In the end some patterns on the network reflecting real football situations could be observed. This network allows to analyze interesting behaviors but a bigger analysis on a larger amount of data, with a time dimension taken into account in the network. A deeper analysis could make light on the hidden behaviors around the football market, and also the potential corruption present in sports. 

# Appendix: Project Structure

#### Report

- "0. Project Report.ipynb"

#### Data scraping and network creation
- "1. Build Data.ipynb"
- "2. Managers history.ipynb"

#### Analysis
- "3. Transfers vs Loan vs Free.ipynb"
   
   - _Centralities study (section 5.1)_
   
- "4. Agents to agents.ipynb"
    
   - _Creating the networks for agents and small analysis_
   
- "5. Age analysis and Pagerank influence analysis.ipynb"

   - _Age distribution, age groups and Pagerank influence analysis_
   
- "6. Money infection analysis.ipynb"

   - _Study of money flows in the graph_