# Notebook 1 — Data Acquisition and Setup


### Background & Goal(s)

**Description:** The FC 26 player CSV file is a dataset that contains all BASE (no special cards), actively playing soccer players included in EA Sports' FC  26 video game. 

**Acquisition:** The dataset was acquired and can be found on Kaggle.com through the name of "FC 26 (FIFA 26) Player Data," which was "curated by web scraping sofifa.com" by Rovnev. 

**Dataset:** size & description: Each observation (18,405 rows) in the following data set is a player, while each column (110 columns) is an attribute (name, country, overall, wages (in euros), etc.). To utilise this dataset, I downloaded the 10.58 MB CSV file, then stored it in my Design Final Project directory (working directory).

**Goal(s):** I'll be utilising this dataset to explore the relationship between a player's overall, potential overall, value (in euros), and their age, position, in-game capabilities (pace, shooting, passing, etc.), and nationality. In doing so, **I am aiming to explore which in-game attributes influence a player's value the most and if a player's potential, age, and nationality correlate with their value**. The following column names are necessary:

* Player Basics

    * player_id 

    * short_name

    * age

    * nationality_name

    * player_positions

* Overall Ratings

    * overall

    * potential

* Financial

    * value_eur

* Core In-Game Attributes

    * pace

    * shooting

    * passing

    * dribbling

    * defending

    * physic

## **Guiding Question to be Answered:** What makes a player expensive?

### Original Dataset

#### Reading in the Dataset

In [1]:
import pandas as pd

df = pd.read_csv("FC26_20250921.csv")
df.head()

  df = pd.read_csv("FC26_20250921.csv")


Unnamed: 0,player_id,player_url,fifa_version,fifa_update,fifa_update_date,short_name,long_name,player_positions,overall,potential,...,cdm,rdm,rwb,lb,lcb,cb,rcb,rb,gk,player_face_url
0,252371,/player/252371/jude-bellingham/260004/,26,4,2025-09-19,J. Bellingham,Jude Victor William Bellingham,"CAM, CM",90,94,...,85+3,85+3,83+3,82+3,81+3,81+3,81+3,82+3,18+3,https://cdn.sofifa.net/players/252/371/26_120.png
1,239053,/player/239053/federico-valverde/260004/,26,4,2025-09-19,F. Valverde,Federico Santiago Valverde Dipetta,"CM, CDM, RB",89,90,...,87+3,87+3,86+3,86+3,83+3,83+3,83+3,86+3,18+3,https://cdn.sofifa.net/players/239/053/26_120.png
2,212622,/player/212622/joshua-kimmich/260004/,26,4,2025-09-19,J. Kimmich,Joshua Walter Kimmich,"CDM, RB, CM",89,89,...,87+2,87+2,86+3,85+3,82+3,82+3,82+3,85+3,21+3,https://cdn.sofifa.net/players/212/622/26_120.png
3,235212,/player/235212/achraf-hakimi/260004/,26,4,2025-09-19,A. Hakimi,Achraf Hakimi Mouhأشرف حكيمي,"RB, RM",89,90,...,83+3,83+3,86+3,86+3,81+3,81+3,81+3,86+3,17+3,https://cdn.sofifa.net/players/235/212/26_120.png
4,224232,/player/224232/nicolo-barella/260004/,26,4,2025-09-19,N. Barella,Nicolò Barella,CM,87,87,...,85+2,85+2,84+3,83+3,80+3,80+3,80+3,83+3,19+3,https://cdn.sofifa.net/players/224/232/26_120.png


#### Dataset Information

In [2]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18405 entries, 0 to 18404
Columns: 110 entries, player_id to player_face_url
dtypes: float64(16), int64(48), object(46)
memory usage: 15.4+ MB


#### Dataset Description

In [3]:
df.describe()

Unnamed: 0,player_id,fifa_version,fifa_update,overall,potential,value_eur,wage_eur,age,height_cm,weight_kg,...,mentality_composure,defending_marking_awareness,defending_standing_tackle,defending_sliding_tackle,goalkeeping_diving,goalkeeping_handling,goalkeeping_kicking,goalkeeping_positioning,goalkeeping_reflexes,goalkeeping_speed
count,18405.0,18405.0,18405.0,18405.0,18405.0,18405.0,18405.0,18405.0,18405.0,18405.0,...,18405.0,18405.0,18405.0,18405.0,18405.0,18405.0,18405.0,18405.0,18405.0,2062.0
mean,207378.375822,26.0,4.0,65.766965,71.165173,2931633.0,10096.05542,25.222548,182.001358,75.141755,...,57.747786,46.253572,48.435914,46.248737,16.285683,16.082749,15.998642,16.139147,16.343928,34.489816
std,73761.964821,0.0,0.0,6.980628,6.403862,7947787.0,20082.981891,4.773553,6.891484,6.891777,...,12.257447,20.723894,21.038765,20.575523,17.589875,17.002686,16.770157,17.162219,17.859712,10.552502
min,19541.0,26.0,4.0,47.0,49.0,0.0,0.0,16.0,155.0,47.0,...,15.0,5.0,7.0,6.0,2.0,2.0,2.0,2.0,2.0,15.0
25%,195859.0,26.0,4.0,61.0,67.0,475000.0,1000.0,21.0,177.0,70.0,...,50.0,28.0,29.0,26.0,8.0,8.0,8.0,8.0,8.0,26.0
50%,237681.0,26.0,4.0,66.0,71.0,1000000.0,4000.0,25.0,182.0,75.0,...,59.0,52.0,56.0,53.0,11.0,11.0,11.0,11.0,11.0,33.0
75%,260601.0,26.0,4.0,70.0,75.0,2100000.0,10000.0,29.0,187.0,80.0,...,66.0,63.0,65.0,63.0,14.0,14.0,14.0,14.0,14.0,43.0
max,280142.0,26.0,4.0,91.0,95.0,174500000.0,610000.0,44.0,210.0,105.0,...,93.0,91.0,91.0,89.0,90.0,90.0,91.0,90.0,90.0,65.0


### Dataset Features (COLS Table)

| Column Name          | Type   | Description                                                                      |
| -------------------- | ------ | -------------------------------------------------------------------------------- |
| **player_id**        | int    | Unique identifier for each player in the dataset.                                |
| **short_name**       | string | Common display name of the player (used in-game).                                |
| **age**              | int    | Age of the player at the time of the FC26 dataset update.                        |
| **nationality**      | string | Country the player represents.                                                   |
| **player_positions** | string | Primary and secondary positions the player can play (e.g., “ST”, “CM”, “CB RM”). |
| **overall**          | int    | Player’s current overall rating based on all in-game attributes.                 |
| **potential**        | int    | Maximum projected future rating of the player.                                  
| **wage_eur**         | int    | Player’s weekly wage in euros (€).                                               |
| **pace**             | int    | Composite attribute measuring acceleration and sprint speed.                     |
| **shooting**         | int    | Player’s ability to finish, take long shots, and score.                          |
| **passing**          | int    | Accuracy and quality of short, long, and through passes.                         |
| **dribbling**        | int    | Player’s ball control, agility, and dribbling skill.                             |
| **defending**        | int    | Tackling, interception, and overall defensive ability.                           |
| **physic**           | int    | Physical strength, stamina, aggression, and jumping.                             |
