# Analysis and Implementation of Association Rules in Fantasy Premier League (FPL)

This notebook applies the Apriori algorithm to the ``merged_gw_processed.csv`` dataset, containing detailed player performance data for the 2024/2025 Fantasy Premier League (FPL) season. The goal is to identify relevant patterns in player behavior and performance across Gameweeks, creating useful association rules for descriptive analysis and potential decision-making support.

Fantasy Premier League is a game where millions of users build football teams based on real Premier League players. The athletes' actual performance translates into points used to compete in private or global leagues. Thus, understanding relationships between performances, match conditions, and specific events can help derive useful knowledge about trends and synergies within player performance.

This notebook assumes that the dataset has already been cleaned and discretized in the EDA phase, making it suitable for applying frequent pattern mining algorithms.

## Business Objectives

The application of association rules in this context aims to achieve the following:

### Identify frequent patterns in player performance

Examples:
- Players who score goals also tend to receive bonus points.
- Cheap players with high "creativity_tier" tend to score above-average points.

### Discover attribute combinations that occur together
Such as:
- "starts=1" + "value_tier=cheap" + "points_tier=high".

### Generate association rules that allow interpreting links between game events
Such as:
- Players who provide assists also tend to have higher "influence_tier".

### Create insights for descriptive and exploratory analyses
Allowing for:
- Better characterization of player types.
- Support for future predictive analyses.
- Identification of recurring patterns throughout the season.

This notebook focuses on the extraction and evaluation of these rules using the **Apriori** algorithm.

## Loading the Processed Dataset (Python)

In [8]:
import pandas as pd

df = pd.read_csv("../data/processed/merged_gw_processed.csv")
df.head()

Unnamed: 0,assists,bonus,clean_sheets,goals_conceded,goals_scored,own_goals,penalties_missed,penalties_saved,red_cards,starts,...,xP_Medium,bps_High,bps_Low,bps_Medium,Points_High,Points_Low,Points_Medium,Value_Budget,Value_Mid,Value_Premium
0,0,0,0,1,0,0,0,0,0,1,...,1,0,0,1,0,0,1,0,1,0
1,0,0,0,1,0,0,0,0,0,1,...,0,0,1,0,0,1,0,0,0,1
2,0,0,0,1,0,0,0,0,0,1,...,1,1,0,0,0,0,1,1,0,0
3,0,0,0,1,0,0,0,0,0,1,...,0,1,0,0,0,0,1,0,0,1
4,0,0,0,1,0,0,0,0,0,1,...,0,0,0,1,0,1,0,0,1,0


## Final Dataset Structure for Apriori

Before applying Apriori, it is necessary to confirm that the dataset is in a suitable format:

- All categorical columns are discretized.
- There are no continuous numeric columns.
- There are no null values.
- The features represent binary attributes or discrete categories.

The final structure of the dataset is presented below.

In [9]:
df.info() 
df.describe(include='all')

## Find values with data above 1 in all dataset
for column in df.columns:
    if df[column].dtype != 'object':  # Check if the column is not of object type
        count_above_one = (df[column] > 1).sum()
        print(f"Column '{column}' has {count_above_one} values above 1.")

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6347 entries, 0 to 6346
Data columns (total 70 columns):
 #   Column                             Non-Null Count  Dtype
---  ------                             --------------  -----
 0   assists                            6347 non-null   int64
 1   bonus                              6347 non-null   int64
 2   clean_sheets                       6347 non-null   int64
 3   goals_conceded                     6347 non-null   int64
 4   goals_scored                       6347 non-null   int64
 5   own_goals                          6347 non-null   int64
 6   penalties_missed                   6347 non-null   int64
 7   penalties_saved                    6347 non-null   int64
 8   red_cards                          6347 non-null   int64
 9   starts                             6347 non-null   int64
 10  was_home                           6347 non-null   int64
 11  yellow_cards                       6347 non-null   int64
 12  pos_DEF             

## Apriori Algorithm

Apriori is a classic frequent pattern mining algorithm. It operates according to two fundamental principles:

### 1. Frequent Itemset Generation
Identifies combinations of attributes that occur together above a minimum support threshold.

### 2. Association Rule Generation
From the frequent itemsets, it creates rules of the type:

A → B

which represent:

> whenever set A occurs, set B also tends to occur.

The metrics evaluated will be:
- **Support** → joint frequency
- **Confidence** → conditional probability
- **Lift** → strength of the association compared to random chance

This notebook implements Apriori using the ``mlxtend`` library.


## First Iteration of the Apriori Algorithm

In this section, the notebook applies the Apriori algorithm for the first time to the transformed and prepared dataset. The goal of this step is to identify **simple frequent patterns**, of size up to 3 items, to understand which attribute combinations occur most frequently among players in the Fantasy Premier League.

This first iteration serves as an exploratory phase, allowing us to:
- Evaluate the density of frequent combinations.
- Verify if the discretization levels and one-hot encoding are producing coherent patterns.
- Obtain initial itemsets that will serve as a basis for rule generation (in the next phase).

The following parameters were used:
- `min_support = 0.03`: only combinations present in at least 3% of the records.
- `max_len = 3`: intentional limitation to capture only simple combinations.
- `use_colnames = True`: to ensure result readability.

After obtaining all itemsets, those containing **exactly 2 or 3 items** were filtered, as they are considered the most relevant for pattern analysis and potential rules.

In [None]:
!pip install mlxtend

In [11]:
from mlxtend.frequent_patterns import apriori, association_rules
import warnings
warnings.filterwarnings("ignore")

# Applying Apriori – first iteration
frequent_items = apriori(df, min_support=0.03, max_len=3, use_colnames=True)

num_itemsets = len(frequent_items)
print(f'Number of itemsets: {num_itemsets}')

itemsets_2_3 = frequent_items[
    frequent_items['itemsets'].apply(lambda x: 2 <= len(x) <= 3)
]

itemsets_2_3

Number of itemsets: 7111


Unnamed: 0,support,itemsets
64,0.050733,"(goals_conceded, assists)"
65,0.067591,"(starts, assists)"
66,0.040334,"(was_home, assists)"
67,0.051520,"(pos_MID, assists)"
68,0.078620,"(save_null, assists)"
...,...,...
7106,0.049315,"(Value_Mid, Points_Low, bps_Medium)"
7107,0.037025,"(Value_Premium, Points_Low, bps_Medium)"
7108,0.054829,"(Value_Budget, Points_Medium, bps_Medium)"
7109,0.053726,"(Value_Mid, Points_Medium, bps_Medium)"


### Evaluation of Frequent Patterns (First Iteration)

The first execution of the Apriori algorithm generated a total of **7111 itemsets**, a significant part of which corresponds to combinations with 2 or 3 attributes, exactly the goal for this exploratory phase.

#### General Interpretation of Results

In general, the patterns obtained present:
- **Contextual coherence** (e.g., ``starts`` appears frequently combined with offensive metrics like ``assists``).
- **Positional coherence** (e.g., ``pos_MID`` appears in itemsets with ``assists``, confirming expected relationships).
- **Tactical/Contextual coherence** (e.g., ``was_home`` appears combined with ``assists``, reflecting home advantage).
- **Patterns derived from discretization** (e.g., ``Value_Mid``, ``bps_Medium``, ``Points_Medium`` appear frequently combined, characterizing player profiles).

The itemsets derived from the discretized columns are particularly interesting, being able to **reveal profiles of players with similar value, similar BPS contribution, and similar scoring levels**, which is extremely useful for generating association rules.

#### Examples of Relevant Patterns That Emerged

- `(starts, assists)` indicates that starting players contribute offensively.
- `(was_home, assists)` suggests a higher probability of assists in home games.
- `(pos_MID, assists)` confirms that midfielders are the main providers of assists.
- Itemsets with tiers, such as `(Value_Premium, bps_Medium, Points_Medium)`, show that expensive players do not always correspond to high performances, a surprising and potentially valuable result.

Overall, the patterns are coherent, informative, and sufficient to proceed to association rules.

## Second Iteration of the Apriori Algorithm

After the first interaction with the Apriori algorithm, it was verified that:

- More than **7000 itemsets** (with combinations of 2 to 3 items) were obtained.
- Many associations were too obvious or trivial, such as:
  - (MID, assists)
  - (bps_Medium, Value_Mid)
  - (Value_Budget, Points_Low)

Although useful for understanding the data structure, these rules did not yet capture truly strong or surprising patterns that could generate deep insights into player behavior in FPL.

Thus, in this second iteration, we proceed with strategic adjustments:

- Increase minimum support to focus on more consistent associations.
- Extract association rules (confidence, lift), absent in the first phase.
- Filter for stronger rules (lift > 1.1).
- Keep only rules with 2 to 3 items for analytical coherence.

The goal is to refine the discovered patterns and bring the analysis closer to relationships that are truly relevant for decision-making.

### Implementation: Second Iteration (Apriori Parameter Tuning)

#### 1. Increasing min_support (0.03 → 0.05)
The goal is to eliminate patterns that are too rare and concentrate attention on behaviors that manifest consistently across Gameweeks. This reduces noise and makes the rules more robust.

#### 2. Extraction of Association Rules
In this phase, we introduce:
- **Support**
- **Confidence**
- **Lift**

The first iteration only extracted itemsets, not rules.

The parameter ``metric="lift"`` + ``min_threshold=1.1`` allows identifying statistically more useful relationships.

#### 3. Filtering for 2 to 3 Item Rules
Maintains coherence with the previous analysis and avoids:
- trivial 1-item rules
- overly complex rules (4+ items)

The goal is to focus on the most interpretable and relevant associations.

In [12]:
from mlxtend.frequent_patterns import apriori, association_rules
import warnings
warnings.filterwarnings("ignore")

# Second iteration of Apriori — higher support and rule extraction
frequent_items_v2 = apriori(
    df,
    min_support=0.05,     # previously 0.03 — stricter in this phase
    max_len=3,
    use_colnames=True
)

print(f"Number of itemsets after second iteration: {len(frequent_items_v2)}")

# Extraction of association rules
rules_v2 = association_rules(
    frequent_items_v2,
    metric="lift",
    min_threshold=1.1     # focus on stronger relationships
)

# Filter rules with a total of 2 or 3 items
rules_v2["num_items"] = rules_v2["antecedents"].apply(len) + rules_v2["consequents"].apply(len)
rules_v2_filtered = rules_v2[rules_v2["num_items"].between(2, 3)]

# Show top rules by lift
rules_v2_filtered.sort_values("lift", ascending=False).head(20)

Number of itemsets after second iteration: 4840


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski,num_items
838,"(bonus, threat_(8.0, 120.0])",(goals_scored),0.074996,0.086813,0.061604,0.821429,9.462082,1.0,0.055093,5.113849,0.966823,0.61478,0.804453,0.765524,3
843,(goals_scored),"(bonus, threat_(8.0, 120.0])",0.086813,0.074996,0.061604,0.709619,9.462082,1.0,0.055093,3.185482,0.979334,0.61478,0.686076,0.765524,3
844,"(bonus, expected_goal_involvements_High)",(goals_scored),0.074523,0.086813,0.057665,0.773784,8.913266,1.0,0.051195,4.0368,0.959298,0.556231,0.752279,0.719016,3
849,(goals_scored),"(bonus, expected_goal_involvements_High)",0.086813,0.074523,0.057665,0.664247,8.913266,1.0,0.051195,2.75642,0.972208,0.556231,0.637211,0.719016,3
826,"(bonus, ict_index_High)",(goals_scored),0.088546,0.086813,0.062392,0.704626,8.11663,1.0,0.054705,3.091634,0.961975,0.552301,0.676546,0.71166,3
831,(goals_scored),"(bonus, ict_index_High)",0.086813,0.088546,0.062392,0.718693,8.11663,1.0,0.054705,3.240073,0.960149,0.552301,0.691365,0.71166,3
812,"(bonus, save_null)",(goals_scored),0.096108,0.086813,0.062392,0.64918,7.477945,1.0,0.054048,2.603011,0.958382,0.517647,0.615829,0.683937,3
815,(goals_scored),"(bonus, save_null)",0.086813,0.096108,0.062392,0.718693,7.477945,1.0,0.054048,3.213189,0.948626,0.517647,0.688783,0.683937,3
850,"(bonus, xP_High)",(goals_scored),0.083819,0.086813,0.054199,0.646617,7.448412,1.0,0.046922,2.584126,0.944948,0.465494,0.613022,0.635468,3
855,(goals_scored),"(bonus, xP_High)",0.086813,0.083819,0.054199,0.624319,7.448412,1.0,0.046922,2.438723,0.948046,0.465494,0.589949,0.635468,3


### Interpretation of Second Iteration Results

The analysis revealed a consistent set of extremely strong rules, especially involving:

#### Most present pattern:

**Players who scored goals → have high threat and receive bonus points**

Examples:
- `(goals_scored) → (threat_(8.0,120.0], bonus)`
  - conf = 0.71
  - lift = 9.46

- `(threat_(8.0,120.0], bonus) → goals_scored`
  - conf = 0.82
  - lift = 9.46

This is statistically very strong:
> Players with high threat + bonus are **9x more associated** with scoring goals than would be expected.

### Another strong pattern:

**Goals Scored → ICT alto + Bonus**

- lift ≈ 8.1  
- Confidence ~70%

This pattern confirms coherence in the dataset:
> Players who score goals almost always generate high impact (ICT High) and accumulate bonus points.

### Extracted global offensive pattern:

Players with:
- *Threat high*
- *Influence high*
- *Expected goal involvements high*
- *xP high*

→ **Have a much higher than average probability of scoring goals and obtaining bonus points.**.

### Interesting detail:

Inverse rules also appear strong:
- `(bonus, xP_High) → goals_scored`  
- `(goals_scored, xP_High) → bonus`

Showing a circular reinforcement relationship between:
- predicted yield (xP)
- actual performance (goals)
- additional scoring (bonus)

### General assessment:

The rules reveal **internal coherence** and **high statistical quality**, with lifts between **7 and 9**, which is exceptional.

These results validate:
- The discretization was well done 
- The dataset contains structure
- The advanced metrics are helping to isolate real patterns

### Iteration 3: Final Rule Optimization and Extraction of Most Relevant Patterns

#### Objective
Refine the set of rules even further, focusing exclusively on:
- **Short** rules (2–3 items),
- **Highly reliable** rules (confidence ≥ 0.75),
- **Statistically strong** rules (lift ≥ 1.2)
- Removing redundancy and overlap.

#### 1. Advanced Rule Filtering (Support, Confidence, Lift)

In [13]:
filtered_rules = rules_v2[
    (rules_v2['antecedents'].apply(len).between(1,2)) &
    (rules_v2['consequents'].apply(len).between(1,2)) &
    (rules_v2['confidence'] >= 0.75) &
    (rules_v2['lift'] >= 1.2)
].sort_values(by='lift', ascending=False)

filtered_rules.head(20)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski,num_items
838,"(bonus, threat_(8.0, 120.0])",(goals_scored),0.074996,0.086813,0.061604,0.821429,9.462082,1.0,0.055093,5.113849,0.966823,0.61478,0.804453,0.765524,3
844,"(bonus, expected_goal_involvements_High)",(goals_scored),0.074523,0.086813,0.057665,0.773784,8.913266,1.0,0.051195,4.0368,0.959298,0.556231,0.752279,0.719016,3
852,"(xP_High, goals_scored)",(bonus),0.070585,0.103513,0.054199,0.767857,7.417944,1.0,0.046892,3.861788,0.930899,0.452037,0.741053,0.645725,3
822,"(played_60+, goals_scored)",(bonus),0.073736,0.103513,0.055459,0.752137,7.266076,1.0,0.047827,3.616859,0.931024,0.455369,0.723517,0.643953,3
3613,(goals_scored),"(threat_(8.0, 120.0], Points_High)",0.086813,0.147944,0.085237,0.981851,6.636645,1.0,0.072394,46.948291,0.930063,0.570074,0.9787,0.778998,3
3559,(goals_scored),"(threat_(8.0, 120.0], influence_High)",0.086813,0.148889,0.085237,0.981851,6.594507,1.0,0.072312,46.896203,0.929008,0.566492,0.978676,0.777169,3
3607,(goals_scored),"(threat_(8.0, 120.0], bps_High)",0.086813,0.154404,0.083977,0.967332,6.264956,1.0,0.070572,25.884644,0.920273,0.534068,0.961367,0.755605,3
3565,(goals_scored),"(influence_High, expected_goal_involvements_High)",0.086813,0.155507,0.078462,0.903811,5.812047,1.0,0.064962,8.779545,0.906652,0.478846,0.886099,0.704185,3
3637,(goals_scored),"(expected_goal_involvements_High, Points_High)",0.086813,0.155664,0.078462,0.903811,5.806164,1.0,0.064949,8.777907,0.906462,0.478386,0.886078,0.70393,3
743,(assists),"(creativity_High, Points_High)",0.079092,0.139594,0.062707,0.792829,5.679553,1.0,0.051666,4.153116,0.894693,0.40202,0.759217,0.621019,3


#### 2. Removal of Redundant Rules

In [14]:
def remove_redundancy(rules_df):
    unique_rules = []
    seen = set()

    for _, row in rules_df.iterrows():
        rule = (frozenset(row['antecedents']), frozenset(row['consequents']))
        if rule not in seen:
            seen.add(rule)
            unique_rules.append(row)
    
    return pd.DataFrame(unique_rules)

clean_rules = remove_redundancy(filtered_rules)
clean_rules.head(20)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski,num_items
838,"(bonus, threat_(8.0, 120.0])",(goals_scored),0.074996,0.086813,0.061604,0.821429,9.462082,1.0,0.055093,5.113849,0.966823,0.61478,0.804453,0.765524,3
844,"(bonus, expected_goal_involvements_High)",(goals_scored),0.074523,0.086813,0.057665,0.773784,8.913266,1.0,0.051195,4.0368,0.959298,0.556231,0.752279,0.719016,3
852,"(xP_High, goals_scored)",(bonus),0.070585,0.103513,0.054199,0.767857,7.417944,1.0,0.046892,3.861788,0.930899,0.452037,0.741053,0.645725,3
822,"(played_60+, goals_scored)",(bonus),0.073736,0.103513,0.055459,0.752137,7.266076,1.0,0.047827,3.616859,0.931024,0.455369,0.723517,0.643953,3
3613,(goals_scored),"(threat_(8.0, 120.0], Points_High)",0.086813,0.147944,0.085237,0.981851,6.636645,1.0,0.072394,46.948291,0.930063,0.570074,0.9787,0.778998,3
3559,(goals_scored),"(threat_(8.0, 120.0], influence_High)",0.086813,0.148889,0.085237,0.981851,6.594507,1.0,0.072312,46.896203,0.929008,0.566492,0.978676,0.777169,3
3607,(goals_scored),"(threat_(8.0, 120.0], bps_High)",0.086813,0.154404,0.083977,0.967332,6.264956,1.0,0.070572,25.884644,0.920273,0.534068,0.961367,0.755605,3
3565,(goals_scored),"(influence_High, expected_goal_involvements_High)",0.086813,0.155507,0.078462,0.903811,5.812047,1.0,0.064962,8.779545,0.906652,0.478846,0.886099,0.704185,3
3637,(goals_scored),"(expected_goal_involvements_High, Points_High)",0.086813,0.155664,0.078462,0.903811,5.806164,1.0,0.064949,8.777907,0.906462,0.478386,0.886078,0.70393,3
743,(assists),"(creativity_High, Points_High)",0.079092,0.139594,0.062707,0.792829,5.679553,1.0,0.051666,4.153116,0.894693,0.40202,0.759217,0.621019,3


#### 3. Final Ranking of Rules

We select the most useful rules based on lift × confidence.

In [15]:
clean_rules['score'] = clean_rules['lift'] * clean_rules['confidence']
top_rules = clean_rules.sort_values(by='score', ascending=False)

top_rules.head(15)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski,num_items,score
838,"(bonus, threat_(8.0, 120.0])",(goals_scored),0.074996,0.086813,0.061604,0.821429,9.462082,1.0,0.055093,5.113849,0.966823,0.61478,0.804453,0.765524,3,7.772424
844,"(bonus, expected_goal_involvements_High)",(goals_scored),0.074523,0.086813,0.057665,0.773784,8.913266,1.0,0.051195,4.0368,0.959298,0.556231,0.752279,0.719016,3,6.896945
3613,(goals_scored),"(threat_(8.0, 120.0], Points_High)",0.086813,0.147944,0.085237,0.981851,6.636645,1.0,0.072394,46.948291,0.930063,0.570074,0.9787,0.778998,3,6.516197
3559,(goals_scored),"(threat_(8.0, 120.0], influence_High)",0.086813,0.148889,0.085237,0.981851,6.594507,1.0,0.072312,46.896203,0.929008,0.566492,0.978676,0.777169,3,6.474825
3607,(goals_scored),"(threat_(8.0, 120.0], bps_High)",0.086813,0.154404,0.083977,0.967332,6.264956,1.0,0.070572,25.884644,0.920273,0.534068,0.961367,0.755605,3,6.060293
852,"(xP_High, goals_scored)",(bonus),0.070585,0.103513,0.054199,0.767857,7.417944,1.0,0.046892,3.861788,0.930899,0.452037,0.741053,0.645725,3,5.695921
822,"(played_60+, goals_scored)",(bonus),0.073736,0.103513,0.055459,0.752137,7.266076,1.0,0.047827,3.616859,0.931024,0.455369,0.723517,0.643953,3,5.465083
3547,(goals_scored),"(ict_index_High, Points_High)",0.086813,0.18749,0.086813,1.0,5.333613,1.0,0.070536,inf,0.889752,0.463025,1.0,0.731513,3,5.333613
3565,(goals_scored),"(influence_High, expected_goal_involvements_High)",0.086813,0.155507,0.078462,0.903811,5.812047,1.0,0.064962,8.779545,0.906652,0.478846,0.886099,0.704185,3,5.252993
6819,(pos_GK),"(creativity_Low, played_60+)",0.066803,0.171262,0.063337,0.948113,5.536039,1.0,0.051896,15.972041,0.87802,0.362489,0.937391,0.658969,3,5.248792


#### 4. Analysis of Discovered Patterns

##### 1. Expected Patterns

Some identified rules reflect relationships already known in the context of FPL and real football, serving as a validation that the algorithm is correctly capturing game dynamics.

**Pattern 1 — ``(bonus, threat alto) → goals_scored``**

It is natural that players with high threat are responsible for a large offensive volume (shots, creating chances). Combining this with bonus points, which in FPL often favors players who score goals, makes this rule predictable.

Why it is expected:

- Players with high threat tend to shoot a lot.

- Scoring goals strongly drives BPS points → thus, receiving bonus points is common.

- Lift ~9 confirms a very strong association, but within expectations.

**Pattern 2 — ``(played 60+, goals_scored) → bonus``**

Bonus points in FPL generally favor:

- players who score goals,

- players who play many minutes.

Why it is expected:

- The BPS system benefits goals.

- Playing >60 minutes increases impact on BPS and reduces competition.

- A player who scores and plays a lot almost always receives bonus points.

**Pattern 3 — ``goals_scored → (bps_High, ict_index_High, influence_High)``**

The ICT Index and BPS were created precisely to measure offensive performance.

Why it is expected: Scoring goals automatically increases:

- Threat,

- Influence,

- ICT,

- BPS → many of these items appear as consequents.

#### 2. Interesting and Unexpected Patterns

Some rules are surprising and reveal less obvious trends, which can be useful for insights or strategic recommendations.

**⭐ Pattern 1 — ``pos_GK → (played_60+, creativity_Low)``**

Although it seems logical that GKs have low creativity, what is surprising is the strength of the rule, with confidence above 0.95 and lift >5.

Why it is unexpected:

- The algorithm captured goalkeepers as an extremely homogeneous cluster.

- Low creativity is expected, but the strength level of the association (lift >5) indicates it is practically deterministic in the dataset.

- Shows that attack metrics can be almost useless for GKs and could be eliminated in certain future analyses.

**⭐ Pattern 2 — ``assists → (creativity_High, Points_High)``**

This rule is intuitive, but the strength of the relationship is surprisingly high (lift ~5.68).

Why it is unexpected:

- One would expect creativity→assists, but not the reverse being so strong.

- This suggests that players who have provided assists consistently have more complete performances (not just an isolated play).

- May reflect playing styles of certain positions (e.g., attacking full-backs, creative midfielders).

**⭐ Pattern 3 — goals_scored → (expected_goal_involvements_High + Points_High)**

The presence of high XGI is normal, but the lift >5.8 shows that those who score goals are really always in the highest XGI tiers.

Why it is unexpected:

- XGI usually predicts goals, but here we see the reverse:

- scoring a goal is practically a guarantee that the player was in high XGI.

- Indicates there are few "outliers" who score goals with low XGI (e.g., isolated finishes).

**⭐ Pattern 4 — ``(bonus, xP_High) → goals_scored``**

xP ("expected points") does not depend directly on goals in a given game; it is cumulative and includes probability.

Why it is unexpected:

- The algorithm shows that players with good xP histories tend to score when they also receive bonus points.

- In other words, players in good accumulated form tend to have predictable performance peaks.

- This suggests consistency → useful for forecasting.

**⭐ Pattern 5 — ``pos_GK → (starts + expected_goal_involvements_Low)``**

Here the combination is relevant: GKs not only almost always start games, but rarely have high XGI, which was already expected, BUT the lift of ~4.9 shows it is an absolute pattern.

Why it is unexpected:

- Shows that GKs are a cluster totally separated from the rest of the players in offensive metrics.

- Useful insight for segmentation: GKs can be analyzed in separate pipelines in the future.

## Conclusion

In this notebook, a complete process of **association rule mining** applied to the **Fantasy Premier League (FPL) – 2024/2025 season** dataset was developed, with the aim of identifying relevant patterns between player performance metrics and their game results.

The work began with the **preparation and understanding of the dataset**, ensuring a suitable structure for the application of the Apriori algorithm, namely through the transformation of variables into a binary format (one-hot encoding) and the careful definition of items to analyze. The **EDA** stage allowed understanding the distribution of variables and grounding the decisions taken in the preprocessing phase.

Subsequently, a **first iteration of the Apriori algorithm** was applied, with relatively permissive parameters, allowing exploration of the frequent pattern space and validation of the coherence of the obtained results. This iteration revealed a high number of itemsets, confirming the complexity and richness of the dataset, but also highlighting the need to refine parameters to reduce noise.

In the **second and third iterations**, parameters were adjusted (minimum support, itemset size, and filtering by metrics like lift and confidence), allowing the analysis to focus on stronger, more interpretable patterns with greater practical relevance. The generation and analysis of association rules showed that the algorithm was able to identify both **expected patterns**, which validate the method's functioning (e.g., relationship between goals, bonus, BPS, and offensive metrics), and **interesting and less obvious patterns**, namely related to form metrics, expected points, and segmentation by position.

The results obtained demonstrate that:

- Metrics like **goals_scored**, **bonus**, **BPS**, **ICT**, **xGI**, **and xP** are strongly interconnected, correctly reflecting FPL's scoring logic.

- There are **well-defined behavior clusters**, especially for specific positions like goalkeepers, who present patterns clearly distinct from other players.

- Some form and expectation metrics reveal potential to support strategic decisions, such as player selection or future performance prediction.

In summary, this work proves that **association rule mining** is a valid and effective approach to extract interpretable knowledge from complex data in the context of FPL. Although not aimed at direct prediction, the method revealed useful patterns that can serve as a basis for **recommender systems**, decision-making support, or future integration with predictive models.

Este notebook cumpre, assim, o objetivo proposto de **descobrir, analisar e interpretar padrões relevantes** no desempenho dos jogadores do FPL, documentando de forma clara todo o processo e os resultados obtidos.

This notebook thus fulfills the proposed objective of **discovering**, **analyzing**, **and interpreting relevant patterns** in FPL player performance, clearly documenting the entire process and the obtained results.