# FIFA 19 Player Undervaluation Analysis

* **Student:** Shola Lajuwomi
* **Class:** AI

## Project Description

Develop a Jupyter Notebook (for Kaggle/GitHub) that performs unsupervised learning (DBSCAN) on the FIFA 19 player dataset to identify potentially undervalued player archetypes within specific positions (e.g., Strikers). The analysis focuses on clustering players based on position-specific composite skill metrics versus their market value ('Value'). The notebook will follow the structure required for the AI class assignment, including introduction, EDA, preprocessing, model training/tuning, and conclusion.

## Target Audience

-   AI Course Instructor (Mr. Dole) for assignment evaluation.
-   CS students interested in a practical unsupervised learning example.
-   Football analysts/enthusiasts interested in player valuation methods.

## Data Source

The analysis uses the "FIFA 19 Complete Player Dataset" available on Kaggle:
[https://www.kaggle.com/datasets/javagarm/fifa-19-complete-player-dataset](https://www.kaggle.com/datasets/javagarm/fifa-19-complete-player-dataset)

## 1. Introduction

**Problem:** Identifying potentially undervalued player archetypes in the FIFA 19 dataset using unsupervised learning. Standard valuation methods might overlook players who offer high skill relative to their market price.

**Goal:** To cluster players within selected positions (starting with Strikers 'ST') based on a composite skill score versus their market value ('Value_EUR'). The aim is to use DBSCAN to identify distinct groups, particularly focusing on clusters representing high-skill, low-value players ("undervalued archetypes").

**Data Source:** The analysis utilizes the "FIFA 19 Complete Player Dataset" sourced from Kaggle.
* Dataset Link: [https://www.kaggle.com/datasets/stefanoleone992/fifa-19-complete-player-dataset](https://www.kaggle.com/datasets/stefanoleone992/fifa-19-complete-player-dataset)
*(Note: This URL is specified in the technical specification, though the README might reference another URL. This analysis will proceed using the specification URL.)*

**Methodology Outline:**
1.  **Data Loading & Initial Exploration:** Load the dataset and perform preliminary checks.
2.  **Data Preprocessing:** Clean the data, handle missing values, convert data types (e.g., height, weight, currency).
3.  **Feature Engineering:** Define and calculate a composite skill score for the target position(s).
4.  **Exploratory Data Analysis (EDA):** Visualize distributions and relationships in the cleaned data and engineered features.
5.  **Feature Scaling:** Scale the selected features (skill score, value) for clustering.
6.  **Unsupervised Learning (DBSCAN):** Apply DBSCAN to the scaled data, including hyperparameter tuning.
7.  **Results Analysis:** Visualize clusters, identify undervalued groups, and examine sample players.
8.  **Conclusion:** Summarize findings, evaluate the model, discuss limitations, and suggest future work.

In [2]:
#This cell imports necessary libraries and sets up the plotting environment.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import re
from sklearn.cluster import DBSCAN
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import NearestNeighbors
# Ensure plots are displayed inline in the notebook
%matplotlib inline
# Set a visually appealing style for the plots
sns.set_style('whitegrid')

print("Libraries imported successfully.")

SyntaxError: unterminated triple-quoted string literal (detected at line 38) (3395530026.py, line 24)

In [5]:
"""## 2. Data Loading and Initial Exploration

This section focuses on loading the dataset from the specified CSV file into a pandas DataFrame and performing initial checks to understand its structure, data types, and basic statistics.
"""

# Define the path to the dataset file
# Assumes the CSV file is in the same directory as the notebook, or in a './data/' subdirectory.
# Update this path if your file is located elsewhere.
data_path = 'FIFA_19_COMPLETE_PLAYER_DATASET.csv'
# data_path = './data/FIFA_19_COMPLETE_PLAYER_DATASET.csv' # Alternative if in a 'data' subfolder

# Attempt to load the dataset
try:
    fifa_df = pd.read_csv(data_path, encoding='Windows-1252')
    print(f"Dataset loaded successfully from '{data_path}'.")
except FileNotFoundError:
    print(f"Error: The file '{data_path}' was not found.")
    print("Please ensure the 'FIFA_19_COMPLETE_PLAYER_DATASET.csv' file is in the correct directory.")
    # Depending on the environment, you might want to stop execution here
    # For example, in a script: import sys; sys.exit()
    # In a notebook, this message serves as a clear warning.
    fifa_df = None # Set to None if loading failed

# Display the first few rows if the dataframe loaded successfully
if fifa_df is not None:
    print("Displaying the first 5 rows of the dataset:")
    # display(fifa_df.head()) # Use display() in Jupyter/Colab for better formatting
    display(fifa_df.head())
else:
    print("Cannot display head because the dataset failed to load.")

Dataset loaded successfully from 'FIFA_19_COMPLETE_PLAYER_DATASET.csv'.
Displaying the first 5 rows of the dataset:


Unnamed: 0.1,Unnamed: 0,ID,Name,Age,Photo,Nationality,Flag,Overall,Potential,Club,...,Composure,Marking,StandingTackle,SlidingTackle,GKDiving,GKHandling,GKKicking,GKPositioning,GKReflexes,Release Clause
0,0,158023,L. Messi,31.0,https://cdn.sofifa.org/players/4/19/158023.png,Argentina,https://cdn.sofifa.org/flags/52.png,94.0,94,FC Barcelona,...,96.0,33.0,28.0,26.0,6.0,11.0,15.0,14.0,8.0,€226.5M
1,1,20801,Cristiano Ronaldo,33.0,https://cdn.sofifa.org/players/4/19/20801.png,Portugal,https://cdn.sofifa.org/flags/38.png,94.0,94,Juventus,...,95.0,28.0,31.0,23.0,7.0,11.0,15.0,14.0,11.0,€127.1M
2,2,190871,Neymar Jr,26.0,https://cdn.sofifa.org/players/4/19/190871.png,Brazil,https://cdn.sofifa.org/flags/54.png,92.0,93,Paris Saint-Germain,...,94.0,27.0,24.0,33.0,9.0,9.0,15.0,15.0,11.0,€228.1M
3,3,193080,De Gea,27.0,https://cdn.sofifa.org/players/4/19/193080.png,Spain,https://cdn.sofifa.org/flags/45.png,91.0,93,Manchester United,...,68.0,15.0,21.0,13.0,90.0,85.0,87.0,88.0,94.0,€138.6M
4,4,192985,K. De Bruyne,27.0,https://cdn.sofifa.org/players/4/19/192985.png,Belgium,https://cdn.sofifa.org/flags/7.png,91.0,92,Manchester City,...,88.0,68.0,58.0,51.0,15.0,13.0,5.0,10.0,13.0,€196.4M
