# ASSOCIATION PATTERNS
This notebook aims to find and extract interesting rules from the found clusters..
1. Use algorithms like Apriori or FP-Growth to find associations between variables, in particular those reating to the work-family balance and impacts on well-being.
2. Interpret the most relevant rules (e.g. support, confidence, lift) and connect these rules to the founded clusters.

In [108]:
# pip install efficient-apriori

In [109]:
import numpy as np
import pandas as pd
from efficient_apriori import apriori

In [110]:
UMAP_cluster1 = pd.read_csv("./Clustering_results/UMAP_cluster1.csv")
UMAP_cluster2 = pd.read_csv("./Clustering_results/UMAP_cluster2.csv")

This method transforms each row of the input dataframe into a transaction, so you can run Apriori algorithm

In [111]:
def get_transaction_from_df(cluster):

    # Get the columns a strings
    columns = list(cluster.columns)

    transactions = []  # Prepare a list of transactions

    # Loop over every row in the DataFrame to collect transactions
    for i in range(len(cluster)):

        # Encode values as e.g., 'col1=2.0' and 'col2=VASCO'.
        # We include column names to differentiate between values
        values = list(str(t).strip() for t in cluster.iloc[i, :].values)
        transaction = tuple(f"{column}={value}" for column, value in zip(columns, values))

        # Add transaction to transactions
        transactions.append(transaction)

    return transactions

## UMAP

### Cluster1

In [112]:
UMAP_cluster1.mode()

Unnamed: 0,Gender,AutonomousCommunity,SchoolOwnership,KindOfPlace,LivingUnit,DependentPersons,WorkConfinement,WorkConfinementsSecondAdult,ChildEarlyEducation1,ChildEarlyEducation2,...,SpaceGym,SpaceOther,SpaceNone,SpaceStreet,SpacePlots,SpaceParks,SpacePublic,SpaceSurroundingOther,SpaceSurroundingNone,ActivitiesOutside
0,2.0,9.0,2.0,2.0,2,0,1,1.0,0,0,...,1,0,0,0,0,0,1,0,0,3.0


In [113]:
UMAP_cluster1_transactions = get_transaction_from_df(UMAP_cluster1)

In [114]:
itemsets_cl1, rules_cl1 = apriori(UMAP_cluster1_transactions, min_support=0.93, min_confidence=0.90, max_length=5, verbosity=1)

Generating itemsets.
 Counting itemsets of length 1.
  Found 284 candidate itemsets of length 1.
  Found 12 large itemsets of length 1.
 Counting itemsets of length 2.
  Found 66 candidate itemsets of length 2.
  Found 30 large itemsets of length 2.
 Counting itemsets of length 3.
  Found 39 candidate itemsets of length 3.
  Found 31 large itemsets of length 3.
 Counting itemsets of length 4.
  Found 20 candidate itemsets of length 4.
  Found 17 large itemsets of length 4.
 Counting itemsets of length 5.
  Found 4 candidate itemsets of length 5.
  Found 4 large itemsets of length 5.
Itemset generation terminated.

Generating rules from itemsets.
 Generating rules of size 2.
 Generating rules of size 3.
 Generating rules of size 4.
 Generating rules of size 5.
Rule generation terminated.



In [115]:
itemsets_cl1[5]

{('ChildBaccaleaurate=0.0',
  'ChildMissExam=0.0',
  'ChildNoAnswers=0.0',
  'ChildNoVariation=0.0',
  'ChildVocationalTraining=0.0'): 3691,
 ('ChildBaccaleaurate=0.0',
  'ChildMissExam=0.0',
  'ChildNoAnswers=0.0',
  'ChildVocationalTraining=0.0',
  'InterruptChildren=1.0'): 3703,
 ('ChildBaccaleaurate=0.0',
  'ChildNoAnswers=0.0',
  'ChildNoVariation=0.0',
  'ChildVocationalTraining=0.0',
  'InterruptChildren=1.0'): 3745,
 ('ChildMissExam=0.0',
  'ChildNoAnswers=0.0',
  'ChildNoVariation=0.0',
  'ChildVocationalTraining=0.0',
  'InterruptChildren=1.0'): 3686}

In [116]:
rules_cl1

[{ChildBaccaleaurate=0.0} -> {AlcoholMedicationDrug=0.0},
 {AlcoholMedicationDrug=0.0} -> {ChildBaccaleaurate=0.0},
 {ChildMissExam=0.0} -> {AlcoholMedicationDrug=0.0},
 {AlcoholMedicationDrug=0.0} -> {ChildMissExam=0.0},
 {ChildNoAnswers=0.0} -> {AlcoholMedicationDrug=0.0},
 {AlcoholMedicationDrug=0.0} -> {ChildNoAnswers=0.0},
 {ChildNoVariation=0.0} -> {AlcoholMedicationDrug=0.0},
 {AlcoholMedicationDrug=0.0} -> {ChildNoVariation=0.0},
 {ChildVocationalTraining=0.0} -> {AlcoholMedicationDrug=0.0},
 {AlcoholMedicationDrug=0.0} -> {ChildVocationalTraining=0.0},
 {InterruptChildren=1.0} -> {AlcoholMedicationDrug=0.0},
 {AlcoholMedicationDrug=0.0} -> {InterruptChildren=1.0},
 {ChildMissExam=0.0} -> {ChildBaccaleaurate=0.0},
 {ChildBaccaleaurate=0.0} -> {ChildMissExam=0.0},
 {ChildNoAnswers=0.0} -> {ChildBaccaleaurate=0.0},
 {ChildBaccaleaurate=0.0} -> {ChildNoAnswers=0.0},
 {ChildNoVariation=0.0} -> {ChildBaccaleaurate=0.0},
 {ChildBaccaleaurate=0.0} -> {ChildNoVariation=0.0},
 {ChildVoc

### Cluster2

In [117]:
UMAP_cluster2.mode()

Unnamed: 0,Gender,AutonomousCommunity,SchoolOwnership,KindOfPlace,LivingUnit,DependentPersons,WorkConfinement,WorkConfinementsSecondAdult,ChildEarlyEducation1,ChildEarlyEducation2,...,SpaceGym,SpaceOther,SpaceNone,SpaceStreet,SpacePlots,SpaceParks,SpacePublic,SpaceSurroundingOther,SpaceSurroundingNone,ActivitiesOutside
0,2.0,9.0,2.0,2.0,2,0,0,1.0,0,0,...,1,0,0,0,0,0,1,0,0,3.0


In [118]:
UMAP_cluster2_transactions = get_transaction_from_df(UMAP_cluster2)

In [119]:
itemsets_cl2, rules_cl2 = apriori(UMAP_cluster2_transactions, min_support=0.90, min_confidence=0.90, max_length=6, verbosity=1)

Generating itemsets.
 Counting itemsets of length 1.
  Found 281 candidate itemsets of length 1.
  Found 21 large itemsets of length 1.
 Counting itemsets of length 2.
  Found 210 candidate itemsets of length 2.
  Found 92 large itemsets of length 2.
 Counting itemsets of length 3.
  Found 228 candidate itemsets of length 3.
  Found 167 large itemsets of length 3.
 Counting itemsets of length 4.
  Found 157 candidate itemsets of length 4.
  Found 127 large itemsets of length 4.
 Counting itemsets of length 5.
  Found 40 candidate itemsets of length 5.
  Found 36 large itemsets of length 5.
 Counting itemsets of length 6.
  Found 2 candidate itemsets of length 6.
  Found 2 large itemsets of length 6.
Itemset generation terminated.

Generating rules from itemsets.
 Generating rules of size 2.
 Generating rules of size 3.
 Generating rules of size 4.
 Generating rules of size 5.
 Generating rules of size 6.
Rule generation terminated.



In [123]:
itemsets_cl2[1]

{('Gender=2.0',): 1408,
 ('DependentPersons=0.0',): 1411,
 ('ChildSecondaryEducation=0.0',): 1360,
 ('ChildBaccaleaurate=0.0',): 1479,
 ('ChildVocationalTraining=0.0',): 1491,
 ('ChildrenSpecialNeeds=0.0',): 1423,
 ('InterruptChildren=0.0',): 1437,
 ('InterruptChildrenFrequency=0.0',): 1390,
 ('LossFamilyMember=0.0',): 1416,
 ('AlcoholMedicationDrug=0.0',): 1462,
 ('NegativeImpactOther=0.0',): 1422,
 ('ChildFake=0.0',): 1377,
 ('ChildIntrospection=0.0',): 1413,
 ('ChildNoVariation=0.0',): 1469,
 ('ChildNoAnswers=0.0',): 1497,
 ('ChildMissContents=0.0',): 1350,
 ('ChildMissExam=0.0',): 1459,
 ('ChallengeNeeds=0.0',): 1415,
 ('ChallengeEvaluation=0.0',): 1363,
 ('ChallengeVulnerable=0.0',): 1384,
 ('SpaceNone=0.0',): 1352}

In [121]:
rules_cl2

[{ChallengeNeeds=0.0} -> {AlcoholMedicationDrug=0.0},
 {AlcoholMedicationDrug=0.0} -> {ChallengeNeeds=0.0},
 {ChallengeVulnerable=0.0} -> {AlcoholMedicationDrug=0.0},
 {AlcoholMedicationDrug=0.0} -> {ChallengeVulnerable=0.0},
 {ChildBaccaleaurate=0.0} -> {AlcoholMedicationDrug=0.0},
 {AlcoholMedicationDrug=0.0} -> {ChildBaccaleaurate=0.0},
 {ChildIntrospection=0.0} -> {AlcoholMedicationDrug=0.0},
 {AlcoholMedicationDrug=0.0} -> {ChildIntrospection=0.0},
 {ChildMissExam=0.0} -> {AlcoholMedicationDrug=0.0},
 {AlcoholMedicationDrug=0.0} -> {ChildMissExam=0.0},
 {ChildNoAnswers=0.0} -> {AlcoholMedicationDrug=0.0},
 {AlcoholMedicationDrug=0.0} -> {ChildNoAnswers=0.0},
 {ChildNoVariation=0.0} -> {AlcoholMedicationDrug=0.0},
 {AlcoholMedicationDrug=0.0} -> {ChildNoVariation=0.0},
 {ChildVocationalTraining=0.0} -> {AlcoholMedicationDrug=0.0},
 {AlcoholMedicationDrug=0.0} -> {ChildVocationalTraining=0.0},
 {ChildrenSpecialNeeds=0.0} -> {AlcoholMedicationDrug=0.0},
 {AlcoholMedicationDrug=0.0} -