<a href="https://colab.research.google.com/github/beatrizjafelice/Association-Rules-for-Artists/blob/master/Association_Rules_for_Artists_EN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Assignment
1. Create a collaborative database where each student lists their favorite bands on the same line, separated by commas (minimum of 2 and maximum of 5 bands). For consistency, the names should be in uppercase and use underscores (_) instead of spaces.

2. Extract association rules from the compiled database.

3. Identify the rules relevant to you that suggest bands you are unfamiliar with. Listen to a song from each of these bands and note your impressions :)

Extracting the data:

In [None]:
import pandas as pd

df = pd.read_csv('https://drive.google.com/u/3/uc?id=1aDEAX-LqUtdLl10n5SrjqF-dMAmDBmTE&export=download', names=['bandas'])
df.head()

Unnamed: 0,bandas
0,"MAMONAS_ASSASSINAS, QUEEN, RACIONAIS_MCS, EXAL..."
1,"COLDPLAY, THE_KILLERS, U2, ARCTIC_MONKEYS, ONE..."
2,"DREAM_THEATER, AVENGED_SEVENFOLD, OFICINA_G3, ..."
3,"METALLICA, TERNO_REI, PARAMORE, ZIMBRA"
4,"IMAGINE_DRAGONS, GUNS_AND_ROSES, COLDPLAY, LUM..."


In association rule analysis, the rows of the dataset are referred to as "transactions." Each transaction consists of one or more items. In this context, the transactions represent the students' favorite artists, with each item corresponding to a specific artist.

Separating the items into lists:

In [None]:
data = list(df['bandas'].apply(lambda x: x.split(',')))
data

[['MAMONAS_ASSASSINAS', ' QUEEN', ' RACIONAIS_MCS', ' EXALTASSAMBA'],
 ['COLDPLAY',
  ' THE_KILLERS',
  ' U2',
  ' ARCTIC_MONKEYS',
  ' ONE_REPUBLIC',
  ' IMAGINE_DRAGONS'],
 ['DREAM_THEATER', ' AVENGED_SEVENFOLD', ' OFICINA_G3', ' ROSA_DE_SARON'],
 ['METALLICA', ' TERNO_REI', ' PARAMORE', ' ZIMBRA'],
 ['IMAGINE_DRAGONS', ' GUNS_AND_ROSES', ' COLDPLAY', ' LUMINNIERS'],
 ['RACIONAIS_MCS', ' DJONGA', ' JEAN_TASSY', ' PURO_SUCO'],
 ['JOURNEY',
  ' SUPERTRAMP',
  ' ABBA',
  ' EARTH_WIND_AND_FIRE',
  ' TEARS_FOR_FEARS'],
 ['QUEEN', ' BLONDIE', ' MUSE', ' PINK_FLOYD', ' IMAGINE_DRAGONS'],
 ['CHARLIE_BROWN_JR',
  ' MENOS_E_MAIS',
  ' EXALTASSAMBA',
  ' RACIONAIS_MCS',
  ' NATIRUTS',
  ' ONZE:20',
  ' FALAMANSA']]

Removing extra spaces to prevent duplicate values:

In [None]:
for row in data:
    for i, name in enumerate(row):
        row[i] = name.strip()
data

[['MAMONAS_ASSASSINAS', 'QUEEN', 'RACIONAIS_MCS', 'EXALTASSAMBA'],
 ['COLDPLAY',
  'THE_KILLERS',
  'U2',
  'ARCTIC_MONKEYS',
  'ONE_REPUBLIC',
  'IMAGINE_DRAGONS'],
 ['DREAM_THEATER', 'AVENGED_SEVENFOLD', 'OFICINA_G3', 'ROSA_DE_SARON'],
 ['METALLICA', 'TERNO_REI', 'PARAMORE', 'ZIMBRA'],
 ['IMAGINE_DRAGONS', 'GUNS_AND_ROSES', 'COLDPLAY', 'LUMINNIERS'],
 ['RACIONAIS_MCS', 'DJONGA', 'JEAN_TASSY', 'PURO_SUCO'],
 ['JOURNEY', 'SUPERTRAMP', 'ABBA', 'EARTH_WIND_AND_FIRE', 'TEARS_FOR_FEARS'],
 ['QUEEN', 'BLONDIE', 'MUSE', 'PINK_FLOYD', 'IMAGINE_DRAGONS'],
 ['CHARLIE_BROWN_JR',
  'MENOS_E_MAIS',
  'EXALTASSAMBA',
  'RACIONAIS_MCS',
  'NATIRUTS',
  'ONZE:20',
  'FALAMANSA']]

Now, the dataset is encoded as a table where each row represents a transaction and each column represents an item. The values True or False indicate whether the item in the column is present in that transaction.

In [None]:
from mlxtend.preprocessing import TransactionEncoder

t_enc = TransactionEncoder()
t_enc_data = t_enc.fit(data).transform(data)
t_enc_data

array([[False, False, False, False, False, False, False, False, False,
         True, False, False, False, False, False, False,  True, False,
        False, False, False, False, False, False, False, False, False,
         True,  True, False, False, False, False, False, False, False],
       [False,  True, False, False, False,  True, False, False, False,
        False, False, False,  True, False, False, False, False, False,
        False, False, False, False,  True, False, False, False, False,
        False, False, False, False, False, False,  True,  True, False],
       [False, False,  True, False, False, False, False,  True, False,
        False, False, False, False, False, False, False, False, False,
        False, False, False,  True, False, False, False, False, False,
        False, False,  True, False, False, False, False, False, False],
       [False, False, False, False, False, False, False, False, False,
        False, False, False, False, False, False, False, False, False,
   

Converting the array into a dataframe for improved visualization:

In [None]:
df_encoded = pd.DataFrame(t_enc_data, columns=t_enc.columns_)
df_encoded

Unnamed: 0,ABBA,ARCTIC_MONKEYS,AVENGED_SEVENFOLD,BLONDIE,CHARLIE_BROWN_JR,COLDPLAY,DJONGA,DREAM_THEATER,EARTH_WIND_AND_FIRE,EXALTASSAMBA,...,PURO_SUCO,QUEEN,RACIONAIS_MCS,ROSA_DE_SARON,SUPERTRAMP,TEARS_FOR_FEARS,TERNO_REI,THE_KILLERS,U2,ZIMBRA
0,False,False,False,False,False,False,False,False,False,True,...,False,True,True,False,False,False,False,False,False,False
1,False,True,False,False,False,True,False,False,False,False,...,False,False,False,False,False,False,False,True,True,False
2,False,False,True,False,False,False,False,True,False,False,...,False,False,False,True,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,True,False,False,True
4,False,False,False,False,False,True,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
5,False,False,False,False,False,False,True,False,False,False,...,True,False,True,False,False,False,False,False,False,False
6,True,False,False,False,False,False,False,False,True,False,...,False,False,False,False,True,True,False,False,False,False
7,False,False,False,True,False,False,False,False,False,False,...,False,True,False,False,False,False,False,False,False,False
8,False,False,False,False,True,False,False,False,False,True,...,False,False,True,False,False,False,False,False,False,False


Next, we extract the itemsets, which are sets of items that appear most frequently in the database. Itemsets can consist of one or more items. The *min_support* parameter specifies the minimum frequency of the itemset in the database for it to be considered frequent.

In [None]:
from mlxtend.frequent_patterns import apriori, association_rules

# Note: in large databases, increase the min_support parameter
# (minimum proportion of transactions in which the itemset appears)
freq_itemsets = apriori(df_encoded, min_support=0.2, use_colnames=True)
freq_itemsets

Unnamed: 0,support,itemsets
0,0.222222,(COLDPLAY)
1,0.222222,(EXALTASSAMBA)
2,0.333333,(IMAGINE_DRAGONS)
3,0.222222,(QUEEN)
4,0.333333,(RACIONAIS_MCS)
5,0.222222,"(IMAGINE_DRAGONS, COLDPLAY)"
6,0.222222,"(EXALTASSAMBA, RACIONAIS_MCS)"


Extracting the association rules:

In [None]:
rules = association_rules(freq_itemsets, metric='confidence', min_threshold=0.6)
rules

  def _forward_input(self, allow_stdin=False):


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,(IMAGINE_DRAGONS),(COLDPLAY),0.333333,0.222222,0.222222,0.666667,3.0,0.148148,2.333333,1.0
1,(COLDPLAY),(IMAGINE_DRAGONS),0.222222,0.333333,0.222222,1.0,3.0,0.148148,inf,0.857143
2,(EXALTASSAMBA),(RACIONAIS_MCS),0.222222,0.333333,0.222222,1.0,3.0,0.148148,inf,0.857143
3,(RACIONAIS_MCS),(EXALTASSAMBA),0.333333,0.222222,0.222222,0.666667,3.0,0.148148,2.333333,1.0


The association rule table summarizes the relationships between pairs of artists and bands in terms of how often they are listened to together. Here's a detailed interpretation of each column in the table:

1. **Antecedents**: The artist or band that appears first in the rule.
2. **Consequents**: The artist or band that appears second in the rule.
3. **Antecedent support**: The proportion of transactions (e.g., listening events) that include the antecedent.
4. **Consequent support**: The proportion of transactions that include the consequent.
5. **Support**: The proportion of transactions that include both the antecedent and the consequent.
6. **Confidence**: The probability that the consequent is listened to when the antecedent is listened to.
7. **Lift**: The ratio between the confidence of the rule and the support of the consequent. It measures how much more likely the consequent is to appear with the antecedent than by random chance. A lift greater than 1 indicates a positive association.
8. **Leverage**: The difference between the observed support and the expected support if the antecedent and consequent were independent. Values close to 1 indicate a strong positive association, values close to 0 indicate no association, and values close to -1 indicate a negative association.
9. **Conviction**: The ratio between the expected frequency of the antecedent without the consequent and the observed frequency of the antecedent without the consequent. It indicates how much more often the antecedent implies the consequent compared to the cases where the antecedent does not imply the consequent. Higher values indicate a stronger association.
10. **Zhang's metric**: Analyzes how the presence and absence of the antecedent impact the presence of the consequent. It ranges from -1 to 1, similar to leverage, with values indicating the strength and direction of the association.

### Detailed Interpretation of the Rules

1. **Rule 1: (COLDPLAY) -> (IMAGINE_DRAGONS)**
   - **Antecedent support**: 0.222222 (22.22% of the transactions include COLDPLAY)
   - **Consequent support**: 0.333333 (33.33% of the transactions include IMAGINE_DRAGONS)
   - **Support**: 0.222222 (22.22% of the transactions include both COLDPLAY and IMAGINE_DRAGONS)
   - **Confidence**: 1.000000 (100% of the transactions that include COLDPLAY also include IMAGINE_DRAGONS)
   - **Lift**: 3.0 (COLDPLAY listeners are 3 times more likely to also listen to IMAGINE_DRAGONS than would be expected by chance)
   - **Leverage**: 0.148148 (Positive value indicates a positive association)
   - **Conviction**: inf (Infinite, suggesting a very strong implication since COLDPLAY always leads to IMAGINE_DRAGONS)
   - **Zhang's metric**: 0.857143 (High value indicating a strong association)

2. **Rule 2: (IMAGINE_DRAGONS) -> (COLDPLAY)**
   - **Antecedent support**: 0.333333 (33.33% of the transactions include IMAGINE_DRAGONS)
   - **Consequent support**: 0.222222 (22.22% of the transactions include COLDPLAY)
   - **Support**: 0.222222 (22.22% of the transactions include both IMAGINE_DRAGONS and COLDPLAY)
   - **Confidence**: 0.666667 (66.67% of the transactions that include IMAGINE_DRAGONS also include COLDPLAY)
   - **Lift**: 3.0 (IMAGINE_DRAGONS listeners are 3 times more likely to also listen to COLDPLAY than would be expected by chance)
   - **Leverage**: 0.148148 (Positive value indicates a positive association)
   - **Conviction**: 2.333333 (Indicates that the rule IMAGINE_DRAGONS -> COLDPLAY is quite strong, but not as strong as the converse rule)
   - **Zhang's metric**: 1.000000 (High value indicating a strong association)

3. **Rule 3: (EXALTASSAMBA) -> (RACIONAIS_MCS)**
   - **Antecedent support**: 0.222222 (22.22% of the transactions include EXALTASSAMBA)
   - **Consequent support**: 0.333333 (33.33% of the transactions include RACIONAIS_MCS)
   - **Support**: 0.222222 (22.22% of the transactions include both EXALTASSAMBA and RACIONAIS_MCS)
   - **Confidence**: 1.000000 (100% of the transactions that include EXALTASSAMBA also include RACIONAIS_MCS)
   - **Lift**: 3.0 (EXALTASSAMBA listeners are 3 times more likely to also listen to RACIONAIS_MCS than would be expected by chance)
   - **Leverage**: 0.148148 (Positive value indicates a positive association)
   - **Conviction**: inf (Infinite, suggesting a very strong implication since EXALTASSAMBA always leads to RACIONAIS_MCS)
   - **Zhang's metric**: 0.857143 (High value indicating a strong association)

4. **Rule 4: (RACIONAIS_MCS) -> (EXALTASSAMBA)**
   - **Antecedent support**: 0.333333 (33.33% of the transactions include RACIONAIS_MCS)
   - **Consequent support**: 0.222222 (22.22% of the transactions include EXALTASSAMBA)
   - **Support**: 0.222222 (22.22% of the transactions include both RACIONAIS_MCS and EXALTASSAMBA)
   - **Confidence**: 0.666667 (66.67% of the transactions that include RACIONAIS_MCS also include EXALTASSAMBA)
   - **Lift**: 3.0 (RACIONAIS_MCS listeners are 3 times more likely to also listen to EXALTASSAMBA than would be expected by chance)
   - **Leverage**: 0.148148 (Positive value indicates a positive association)
   - **Conviction**: 2.333333 (Indicates that the rule RACIONAIS_MCS -> EXALTASSAMBA is quite strong, but not as strong as the converse rule)
   - **Zhang's metric**: 1.000000 (High value indicating a strong association)

### Summary

- The pairs (COLDPLAY, IMAGINE_DRAGONS) and (EXALTASSAMBA, RACIONAIS_MCS) show strong associations, indicated by high confidence, lift, and leverage values.
- The rules where the antecedent perfectly predicts the consequent (confidence of 1.0) show an infinite conviction, meaning that the consequent is always present when the antecedent is present.
- The strong lift values (all 3.0) suggest that these artists/bands are three times more likely to be listened to together than by random chance.
- Zhang's metric values close to 1 or high (0.857143 to 1.0) further confirm the strength of these associations.
<br>
<br>

**Note:** Given the small size of the dataset, the extracted rules are only observed within the provided data. Therefore, they do not reflect a general trend.