# ASSOCIATION PATTERNS
This notebook aims to find and extract interesting rules from the found clusters.
1. Use algorithms like Apriori or FP-Growth to find associations between variables, in particular those reating to the work-family balance and impacts on well-being.
2. Interpret the most relevant rules (e.g. support, confidence, lift) and connect these rules to the founded clusters.

In [255]:
# pip install efficient-apriori

In [256]:
import numpy as np
import pandas as pd
from efficient_apriori import apriori

In [257]:
UMAP_cluster1 = pd.read_csv("./Clustering_results/UMAP_cluster1.csv")
UMAP_cluster2 = pd.read_csv("./Clustering_results/UMAP_cluster2.csv")

This method transforms each row of the input dataframe into a transaction, so you can run Apriori algorithm

In [258]:
def get_transaction_from_df(cluster):

    # Get the columns a strings
    columns = list(cluster.columns)

    transactions = []  # Prepare a list of transactions

    # Loop over every row in the DataFrame to collect transactions
    for i in range(len(cluster)):

        # Encode values as e.g., 'col1=2.0' and 'col2=VASCO'.
        # We include column names to differentiate between values
        values = [str(t).strip() for t in cluster.iloc[i, :].values]
        transaction = tuple(f"{column}={value}" for column, value in zip(columns, values))

        # Add transaction to transactions
        transactions.append(transaction)

    return transactions

## UMAP

### Cluster1

In [259]:
UMAP_cluster1.mode()

Unnamed: 0,Gender,AutonomousCommunity,SchoolOwnership,KindOfPlace,LivingUnit,DependentPersons,WorkConfinement,WorkConfinementsSecondAdult,ChildEarlyEducation1,ChildEarlyEducation2,...,SpaceGym,SpaceOther,SpaceNone,SpaceStreet,SpacePlots,SpaceParks,SpacePublic,SpaceSurroundingOther,SpaceSurroundingNone,ActivitiesOutside
0,2.0,9.0,2.0,2.0,2,0,1,1.0,0,0,...,1,0,0,0,0,0,1,0,0,3.0


In [260]:
UMAP_cluster1_transactions = get_transaction_from_df(UMAP_cluster1)

#### Not interesting patterns
First, remove all features whose their support is higher than 0.90 because there aren't interesting patterns

In [261]:
itemsets_cl1, rules_cl1 = apriori(UMAP_cluster1_transactions, min_support=0.90, min_confidence=0.90, max_length=1, verbosity=1)

Generating itemsets.
 Counting itemsets of length 1.
  Found 284 candidate itemsets of length 1.
  Found 17 large itemsets of length 1.
Itemset generation terminated.

Generating rules from itemsets.
Rule generation terminated.



In [262]:
drop_column_list = [i[0] for i in itemsets_cl1[1].keys()]
drop_column_list.remove("InterruptChildren=1.0")

In [263]:
drop_column_list

['Gender=2.0',
 'DependentPersons=0.0',
 'ChildBaccaleaurate=0.0',
 'ChildVocationalTraining=0.0',
 'ChildrenSpecialNeeds=0.0',
 'LossFamilyMember=0.0',
 'AlcoholMedicationDrug=0.0',
 'NegativeImpactOther=0.0',
 'ChildFake=0.0',
 'ChildIntrospection=0.0',
 'ChildNoVariation=0.0',
 'ChildNoAnswers=0.0',
 'ChildMissExam=0.0',
 'ChallengeNeeds=0.0',
 'SpaceSurroundingOther=0.0',
 'ChildSecondaryEducation=0.0']

In [264]:
temp_transactions = []


for trans in UMAP_cluster1_transactions:
    
    # Remove all the items with a higer support from the transaction 
    trans = tuple(i for i in trans if i not in drop_column_list)
    temp_transactions.append(trans)

UMAP_cluster1_transactions = temp_transactions

#### Interesting patterns

In [265]:
itemsets_cl1, rules_cl1 = apriori(UMAP_cluster1_transactions, min_support=0.70, min_confidence=0.90, max_length=5, verbosity=1)

Generating itemsets.
 Counting itemsets of length 1.
  Found 268 candidate itemsets of length 1.
  Found 45 large itemsets of length 1.
 Counting itemsets of length 2.
  Found 990 candidate itemsets of length 2.
  Found 245 large itemsets of length 2.
 Counting itemsets of length 3.
  Found 928 candidate itemsets of length 3.
  Found 207 large itemsets of length 3.
 Counting itemsets of length 4.
  Found 43 candidate itemsets of length 4.
  Found 14 large itemsets of length 4.
 Counting itemsets of length 5.
  Found 0 candidate itemsets of length 5.
Itemset generation terminated.

Generating rules from itemsets.
 Generating rules of size 2.
 Generating rules of size 3.
 Generating rules of size 4.
Rule generation terminated.



In [266]:
itemsets_cl1[1]

{('LivingUnit=2.0',): 3374,
 ('WorkConfinement=1.0',): 3144,
 ('WorkConfinementsSecondAdult=1.0',): 3173,
 ('HouseworkMore=2.0',): 3502,
 ('Reconciling=2.0',): 3137,
 ('InterruptChildren=1.0',): 3884,
 ('BondingNeighbours=0.0',): 3299,
 ('TastesAndAbilities=0.0',): 2858,
 ('Responsability=0.0',): 2910,
 ('Understanding=0.0',): 3078,
 ('NegativeImpact=2.0',): 3321,
 ('Loneliness=0.0',): 3262,
 ('Sadness=0.0',): 2768,
 ('FinancialLoss=0.0',): 3171,
 ('Arguments=0.0',): 2862,
 ('ChildScreens=1.0',): 2925,
 ('ChildDiet=0.0',): 3359,
 ('ChildFear=0.0',): 3210,
 ('ChildIFriends=1.0',): 3163,
 ('ChildMissExercise=1.0',): 3062,
 ('ChildMissFriends=1.0',): 3504,
 ('ChildMissContents=0.0',): 3530,
 ('ChallengeContent=0.0',): 3165,
 ('ChallengeMonitoring=0.0',): 3222,
 ('ChallengeEvaluation=0.0',): 3434,
 ('ChallengeCommunication=0.0',): 2790,
 ('ChallengeOnline=0.0',): 2808,
 ('ChallengeEquipment=0.0',): 3403,
 ('ChallengeVulnerable=0.0',): 3513,
 ('PriorityAutonomy=3.0',): 2886,
 ('FamiliesColl

In [267]:
rules_cl1

[{Arguments=0.0} -> {InterruptChildren=1.0},
 {Bedtime=1.0} -> {HouseworkMore=2.0},
 {Bedtime=1.0} -> {InterruptChildren=1.0},
 {BondingNeighbours=0.0} -> {InterruptChildren=1.0},
 {ChallengeContent=0.0} -> {ChallengeEquipment=0.0},
 {ChallengeContent=0.0} -> {ChallengeEvaluation=0.0},
 {ChallengeMonitoring=0.0} -> {ChallengeContent=0.0},
 {ChallengeContent=0.0} -> {ChallengeMonitoring=0.0},
 {ChallengeContent=0.0} -> {ChallengeVulnerable=0.0},
 {ChallengeContent=0.0} -> {InterruptChildren=1.0},
 {ChallengeEvaluation=0.0} -> {ChallengeEquipment=0.0},
 {ChallengeEquipment=0.0} -> {ChallengeEvaluation=0.0},
 {ChallengeMonitoring=0.0} -> {ChallengeEquipment=0.0},
 {ChallengeEquipment=0.0} -> {ChallengeVulnerable=0.0},
 {ChallengeEquipment=0.0} -> {ChildMissContents=0.0},
 {ChallengeEquipment=0.0} -> {InterruptChildren=1.0},
 {ChallengeMonitoring=0.0} -> {ChallengeEvaluation=0.0},
 {ChallengeVulnerable=0.0} -> {ChallengeEvaluation=0.0},
 {ChallengeEvaluation=0.0} -> {ChallengeVulnerable=0.

#### Conclusion

There are rules like:

- 40: {DomesticHelp=0.0} -> {HouseworkMore=2.0} (conf: 0.906, supp: 0.730, lift: 1.020, conv: 1.186)  
-> Even though there is more homework than before, there is no household help, this could mean that parents are more busy caring for their children. 

- 45: {Reconciling=2.0} -> {HouseworkMore=2.0} (conf: 0.915, supp: 0.728, lift: 1.030, conv: 1.314)
-> As the domestic workload increases, so does reconciliation.

- 51: {PositiveImpact=2.0} -> {InterruptChildren=1.0} (conf: 0.986, supp: 0.702, lift: 1.001, conv: 1.067)
-> The main positive impact of Covid-19 could be the reconnection of families with their children.

- 276: {FamiliesCollaboration=3.0, PrioritySocialisation=3.0} -> {InterruptChildren=1.0} (conf: 0.988, supp: 0.705, lift: 1.002, conv: 1.162)
-> Families are worried about having to return to tehir children's old social life. They think that a family collaboration would be the optimal path to follow for a quick and safe return to school (for the chilren of course).

### Cluster2

In [167]:
UMAP_cluster2.mode()

Unnamed: 0,Gender,AutonomousCommunity,SchoolOwnership,KindOfPlace,LivingUnit,DependentPersons,WorkConfinement,WorkConfinementsSecondAdult,ChildEarlyEducation1,ChildEarlyEducation2,...,SpaceGym,SpaceOther,SpaceNone,SpaceStreet,SpacePlots,SpaceParks,SpacePublic,SpaceSurroundingOther,SpaceSurroundingNone,ActivitiesOutside
0,2.0,9.0,2.0,2.0,2,0,0,1.0,0,0,...,1,0,0,0,0,0,1,0,0,3.0


In [172]:
UMAP_cluster2_transactions = get_transaction_from_df(UMAP_cluster2)

#### Not interesting patterns
First, remove all features whose their support is higher than 0.90 because there aren't interesting patterns

In [173]:
itemsets_cl2, rules_cl2 = apriori(UMAP_cluster2_transactions, min_support=0.90, min_confidence=0.90, max_length=6, verbosity=1)

Generating itemsets.
 Counting itemsets of length 1.
  Found 281 candidate itemsets of length 1.
  Found 21 large itemsets of length 1.
 Counting itemsets of length 2.
  Found 210 candidate itemsets of length 2.
  Found 92 large itemsets of length 2.
 Counting itemsets of length 3.
  Found 228 candidate itemsets of length 3.
  Found 167 large itemsets of length 3.
 Counting itemsets of length 4.
  Found 157 candidate itemsets of length 4.
  Found 127 large itemsets of length 4.
 Counting itemsets of length 5.
  Found 40 candidate itemsets of length 5.
  Found 36 large itemsets of length 5.
 Counting itemsets of length 6.
  Found 2 candidate itemsets of length 6.
  Found 2 large itemsets of length 6.
Itemset generation terminated.

Generating rules from itemsets.
 Generating rules of size 2.
 Generating rules of size 3.
 Generating rules of size 4.
 Generating rules of size 5.
 Generating rules of size 6.
Rule generation terminated.



In [174]:
drop_column_list = [i[0] for i in itemsets_cl2[1].keys()]

In [175]:
drop_column_list

['Gender=2.0',
 'DependentPersons=0.0',
 'ChildSecondaryEducation=0.0',
 'ChildBaccaleaurate=0.0',
 'ChildVocationalTraining=0.0',
 'ChildrenSpecialNeeds=0.0',
 'InterruptChildren=0.0',
 'InterruptChildrenFrequency=0.0',
 'LossFamilyMember=0.0',
 'AlcoholMedicationDrug=0.0',
 'NegativeImpactOther=0.0',
 'ChildFake=0.0',
 'ChildIntrospection=0.0',
 'ChildNoVariation=0.0',
 'ChildNoAnswers=0.0',
 'ChildMissContents=0.0',
 'ChildMissExam=0.0',
 'ChallengeNeeds=0.0',
 'ChallengeEvaluation=0.0',
 'ChallengeVulnerable=0.0',
 'SpaceNone=0.0']

In [176]:
temp_transactions = []


for trans in UMAP_cluster2_transactions:
    
    # Remove all the items with a higer support from the transaction 
    trans = tuple(i for i in trans if i not in drop_column_list)
    temp_transactions.append(trans)

UMAP_cluster2_transactions = temp_transactions

#### Interesting patterns

In [252]:
itemsets_cl2, rules_cl2 = apriori(UMAP_cluster2_transactions, min_support=0.60, min_confidence=0.90, max_length=6, verbosity=1)

Generating itemsets.
 Counting itemsets of length 1.
  Found 260 candidate itemsets of length 1.
  Found 50 large itemsets of length 1.
 Counting itemsets of length 2.
  Found 1225 candidate itemsets of length 2.
  Found 565 large itemsets of length 2.
 Counting itemsets of length 3.
  Found 4734 candidate itemsets of length 3.
  Found 1068 large itemsets of length 3.
 Counting itemsets of length 4.
  Found 2107 candidate itemsets of length 4.
  Found 147 large itemsets of length 4.
 Counting itemsets of length 5.
  Found 51 candidate itemsets of length 5.
Itemset generation terminated.

Generating rules from itemsets.
 Generating rules of size 2.
 Generating rules of size 3.
 Generating rules of size 4.
Rule generation terminated.



In [253]:
itemsets_cl2[1]

{('SchoolOwnership=2.0',): 1009,
 ('LivingUnit=2.0',): 1339,
 ('WorkConfinementsSecondAdult=1.0',): 1078,
 ('ChildPrimaryEducation=0.0',): 926,
 ('HouseworkMore=2.0',): 1198,
 ('DomesticHelp=0.0',): 1203,
 ('BondingNeighbours=0.0',): 1311,
 ('TastesAndAbilities=0.0',): 1168,
 ('Responsability=0.0',): 1157,
 ('Understanding=0.0',): 1213,
 ('NegativeImpact=2.0',): 1151,
 ('Loneliness=0.0',): 1286,
 ('Sadness=0.0',): 1092,
 ('FinancialLoss=0.0',): 1123,
 ('Arguments=0.0',): 1180,
 ('ChildSleep=0.0',): 1033,
 ('ChildDiet=0.0',): 1293,
 ('ChildFear=0.0',): 1222,
 ('ChildFrustation=0.0',): 1196,
 ('ChildIFriends=1.0',): 1198,
 ('ChildMissExperience=0.0',): 959,
 ('ParentsCommunicationReturn=2.0',): 938,
 ('ChallengeContent=0.0',): 1288,
 ('ChallengeMonitoring=0.0',): 1290,
 ('ChallengeEmotionalSupport=0.0',): 1131,
 ('ChallengeCommunication=0.0',): 1117,
 ('ChallengeOnline=0.0',): 1163,
 ('ChallengeEquipment=0.0',): 1327,
 ('SpaceBarracks=0.0',): 1216,
 ('SpaceStreet=0.0',): 1268,
 ('SpacePl

In [254]:
rules_cl2

[{Arguments=0.0} -> {SpaceSurroundingOther=0.0},
 {ChallengeCommunication=0.0} -> {ChallengeContent=0.0},
 {ChallengeCommunication=0.0} -> {ChallengeEquipment=0.0},
 {ChallengeCommunication=0.0} -> {ChallengeMonitoring=0.0},
 {ChallengeCommunication=0.0} -> {LivingUnit=2.0},
 {ChallengeCommunication=0.0} -> {SpaceSurroundingOther=0.0},
 {ChallengeEmotionalSupport=0.0} -> {ChallengeContent=0.0},
 {ChallengeContent=0.0} -> {ChallengeEquipment=0.0},
 {ChallengeMonitoring=0.0} -> {ChallengeContent=0.0},
 {ChallengeContent=0.0} -> {ChallengeMonitoring=0.0},
 {ChallengeOnline=0.0} -> {ChallengeContent=0.0},
 {ChallengeContent=0.0} -> {SpaceSurroundingOther=0.0},
 {ChallengeEmotionalSupport=0.0} -> {ChallengeEquipment=0.0},
 {ChallengeEmotionalSupport=0.0} -> {ChallengeMonitoring=0.0},
 {ChallengeEmotionalSupport=0.0} -> {LivingUnit=2.0},
 {ChallengeEmotionalSupport=0.0} -> {SpaceSurroundingOther=0.0},
 {ChallengeMonitoring=0.0} -> {ChallengeEquipment=0.0},
 {ChallengeOnline=0.0} -> {Challeng

#### Conclusion

There are rules like:

- 30: {ChildScreens=1.0} -> {ChildMissFriends=1.0} (conf: 0.914, supp: 0.630, lift: 1.022, conv: 1.225)  
-> This makes me think that even if children spend more hours in front of the screen (so... they have the opportunity to see and talk to their friends or play and fill their free time) they feel the need to **socialize face to face**.

- 48: {PositiveImpact=2.0} -> {LivingUnit=2.0} (conf: 0.903, supp: 0.632, lift: 1.011, conv: 1.099)  
-> This rule suggests that if there is an affirmative response to the positive impact, the family usually consists of 2 adults with children

- 58: {PriorityEmotional=3.0} -> {SpaceSurroundingOther=0.0} (conf: 0.904, supp: 0.720, lift: 1.005, conv: 1.046)  
-> This rule suggests that perhaps in families where children don't have the opportunity to be outdoors, more attention and priority is given to their emotions.

- 476: {LivingUnit=2.0, NegativeImpact=2.0} -> {ChildMissFriends=1.0} (conf: 0.906, supp: 0.625, lift: 1.013, conv: 1.124)  
-> This rule suggests that in families with 2 adults the most problematic consequence of covid-19 for children is **the feeling of missing** of friends compared to other negative impacts.

