Here, we will calculate the product of the data matrix, where we will obtain the similarity.

To evaluate, we will check if the position of the recommended players matches the player sought.

In [1]:
# Importing the libraries
import pandas as pd

In [2]:
# Importing the data
final_data = pd.read_csv('../data/processed/binarized_data.csv', index_col=0)
players_data = pd.read_csv('../data/interim/all_players_data_withoutDuplicates.csv', index_col=0)

Now that we have the data the way we want it, let's calculate the similarity between players by multiplying our dataframe matrix with its transposed matrix.

In [3]:
# Calculating the similarity
similarity_df = final_data.dot(final_data.T)

In [4]:
# Saving the data
similarity_df.to_csv("../similarity_results/similarity_df.csv")

In [5]:
def getsimilarplayers(df_similarity, player_id, qtd_recs):
    """Adds a column to the dataset containing the name of the competition.

    Args:
        df_similarity (str): Dataset containing similarity already calculated (USE pandas.dot())
        player_id (dataframe): player id to fetch recommendations
        qtd_recs (int): number of recommendations desired

    Returns:
        recs: id of most similar players
    """
    # Finding recommendations and sorting by value
    recs = df_similarity.loc[player_id].sort_values(ascending = False).index.to_list()

    return recs[1:qtd_recs+1]

def getplayersname(ids, df_players):
    """Adds a column to the dataset containing the name of the competition.

    Args:
        id (list): list of ids to search
        df_players (dataframe): dataset containing player data

    Returns:
        players_info: dataset with the player's name and team
    """
    # Passing an index and copying the dataset
    all_players = df_players.set_index('id').copy()

    # Searching for data of interest
    players_info = all_players.loc[ids, ['Player', 'Pos', 'Squad']]

    return players_info

def getsimilaritylvl(binarized_data, id_player, recommendation_ids):
    """Adds a column to the dataset containing the name of the competition.

    Args:
        binarized_data (dataframe): dataset containing binarized data
        id_player (str): player id used to get recommendations
        recommendation_ids (list): list of ids recommended

    Returns:
        players_info: dataset with the player's name and team
    """
    # Creating an empty list
    similarity_list = []

    # Calculating the similarity of each recommendation
    for id in recommendation_ids:

        # Searching for the number of columns with common values and dividing by the total number of columns
        points_in_commom = (binarized_data.loc[id_player] == binarized_data.loc[id]).sum()
        len_data = len(binarized_data.loc[id])

        # Rounding the values and adding them to the list
        similarity_percent = round((points_in_commom/len_data)*100, 2)
        similarity_list.append(similarity_percent)


    return similarity_list

In [6]:
print("Looking for similar players........")
players = getsimilarplayers(similarity_df, '72d0e1b6', 10)
print("Looking for player information..........")
data_recs = getplayersname(players, players_data)
print("Calculating the similarity..........\n")
data_recs['Similarity_prcnt'] = getsimilaritylvl(final_data, '72d0e1b6', players)
print("Recommendations: \n\n")
print(data_recs)


Looking for similar players........
Looking for player information..........
Calculating the similarity..........

Recommendations: 


                      Player   Pos            Squad  Similarity_prcnt
id                                                                   
892d5bb1        Riyad Mahrez    FW  Manchester City             89.98
eb2fe5b6         Luis Muriel  FWMF         Atalanta             89.58
74596f1a      Bart Ramselaar  MFFW          Utrecht             89.58
1880614f        Cengiz Ünder  FWMF        Marseille             89.18
d3d89d8d        David Terans    MF   Atl Paranaense             88.78
70cf63ca          Rafa Silva  MFFW          Benfica             88.38
bc6bd723       Alassane Pléa  MFFW       M'Gladbach             88.38
24ce161c             Luciano    FW        São Paulo             88.38
81255c03          João Pedro    FW         Cagliari             88.38
8d78e732  Robert Lewandowski    FW    Bayern Munich             88.38


We obtained relevant recommendations, considering that the statistics of soccer players vary according to their position. I believe that the chosen technique helped in this part, where I could summarize all the data in a similarity index, which for this case is excellent, because if I used only one feature, such as goals, defensive players would be less relevant during the recommendations , but the chosen technique allowed me to use other features like interceptions, passes, etc.