# Assignment 3: Association Rule Mining

Instructions:
Mine interesting association rules from this data between movies rated highly (> =4) by a user. These rules will indicate relationships of the form X -> Y where if a user rates X highly, they're also likely to rank Y highly. 
</br>

Also, mine relationships between different genres of movies where a user is likely to rank a movie of a certain genre highly (>=4) if they also rank a different movie of a different genre highly

</br>
Restrict your rules to one item on the left and one item on the right
</br>

Select an algorithm to use (Apriori/ FP tree etc) and appropriate interestingness measures along with their thresholds. 
</br>

Submit an ipyb notebook and a report that describes and justifies your choice of algorithm, interestingness measures, and thresholds. Also describe the top 20 relationships you mine and any interesting relationships you mine. 
</br>

My Report is with in the report in Markdown blocks

In [114]:
#import all neeeded libraries
import pandas as pd
import numpy as np
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.preprocessing import OnehotTransactions
from mlxtend.frequent_patterns import association_rules
from mlxtend.frequent_patterns import apriori

In [2]:
#load in datasets
df1 = pd.read_csv(r"Movies.tsv", delimiter = "\t")
df2 = pd.read_csv(r"Ratings.tsv", delimiter = "\t")

In [3]:
df1.head(3)

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance


In [4]:
df2.head(3)

Unnamed: 0,userId,movieId,rating
0,1,2,3.5
1,1,29,3.5
2,1,32,3.5


In [5]:
df2.shape

(9999, 3)

In [6]:
df = pd.merge(df1,df2,on='movieId')

In [7]:
df.isna().sum()

movieId    0
title      0
genres     0
userId     0
rating     0
dtype: int64

In [8]:
#reduce the size of the data we only want rating >=4
df = df[df['rating']>=4]

In [99]:
df.describe()

Unnamed: 0,movieId,userId,rating
count,5628.0,5628.0,5628.0
mean,7110.716063,44.171286,4.40334
std,16968.135995,25.702806,0.455973
min,1.0,1.0,4.0
25%,648.0,22.0,4.0
50%,1625.0,50.0,4.0
75%,3595.25,61.0,5.0
max,118696.0,91.0,5.0


In [9]:
df.head(3)

Unnamed: 0,movieId,title,genres,userId,rating
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,3,4.0
1,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,6,5.0
2,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,8,4.0


In [10]:
#split the genres into lists
df['genrelist'] = df['genres'].str.split('|')

In [11]:
df.head()

Unnamed: 0,movieId,title,genres,userId,rating,genrelist
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,3,4.0,"[Adventure, Animation, Children, Comedy, Fantasy]"
1,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,6,5.0,"[Adventure, Animation, Children, Comedy, Fantasy]"
2,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,8,4.0,"[Adventure, Animation, Children, Comedy, Fantasy]"
3,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,10,4.0,"[Adventure, Animation, Children, Comedy, Fantasy]"
4,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,11,4.5,"[Adventure, Animation, Children, Comedy, Fantasy]"


### Mine interesting association rules from this data between movies rated highly (> =4) by a user. These rules will indicate relationships of the form X -> Y where if a user rates X highly, they're also likely to rank Y highly.

In [113]:
'''create a pivot table indexing by userid
where the columns are the movies 
and the values are the values that the user ratings
this will be used, seemed the easiest way to create the table
needed to create the boolean table later'''

usermoviesratings= df.pivot_table(index='userId', columns='title', values='rating', fill_value = 0)
usermoviesratings

title,'night Mother (1986),*batteries not included (1987),...And Justice for All (1979),10 Things I Hate About You (1999),"10,000 BC (2008)",100 Girls (2000),101 Dalmatians (One Hundred and One Dalmatians) (1961),12 Angry Men (1957),127 Hours (2010),13 Ghosts (1960),...,X2: X-Men United (2003),Yellow Submarine (1968),Yojimbo (1961),Young Frankenstein (1974),Young Guns (1988),Young Guns II (1990),Young Sherlock Holmes (1985),Zoolander (2001),eXistenZ (1999),Â¡Three Amigos! (1986)
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0,...,4.0,0,0.0,4,0,0,0,0.0,0,0
2,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0,...,0.0,0,0.0,0,0,0,0,0.0,0,0
3,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0,...,0.0,0,0.0,5,0,0,0,0.0,0,0
4,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0,...,0.0,0,0.0,0,0,0,0,0.0,0,0
5,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0,...,0.0,0,0.0,0,0,0,0,0.0,0,0
6,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0,...,0.0,0,0.0,0,0,0,0,0.0,0,0
7,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0,...,0.0,0,0.0,0,0,0,0,0.0,0,0
8,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0,...,0.0,0,0.0,0,0,0,0,0.0,0,0
9,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0,...,0.0,0,0.0,0,0,0,0,0.0,0,0
10,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0,...,0.0,0,0.0,0,0,0,0,0.0,0,0


In [13]:
def booleanize(x):
    if x >= 1:
        return 1
    if x <= 0:
        return 0
'''if a values exists turns that value into a 1, which creates the boolean 
matrix'''

usermoviesratings = usermoviesratings.applymap(booleanize)

In [14]:
usermoviesratings.head(3)

title,'night Mother (1986),*batteries not included (1987),...And Justice for All (1979),10 Things I Hate About You (1999),"10,000 BC (2008)",100 Girls (2000),101 Dalmatians (One Hundred and One Dalmatians) (1961),12 Angry Men (1957),127 Hours (2010),13 Ghosts (1960),...,X2: X-Men United (2003),Yellow Submarine (1968),Yojimbo (1961),Young Frankenstein (1974),Young Guns (1988),Young Guns II (1990),Young Sherlock Holmes (1985),Zoolander (2001),eXistenZ (1999),Â¡Three Amigos! (1986)
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,0,0,0,0,0,0,0,0,0,0,...,1,0,0,1,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,1,0,0,0,0,0,0


In [15]:
frequent_itemsets = apriori(usermoviesratings, min_support=0.1, use_colnames=True)
frequent_itemsets.head(3)

Unnamed: 0,support,itemsets
0,0.131868,(2001: A Space Odyssey (1968))
1,0.131868,(Aladdin (1992))
2,0.120879,(Alien (1979))


## Check the Association Rules based on Confidence for Movies

In [105]:
# min .9 Confidence
associationrules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.9)
#create 2 new columns counting the amount of antecendents and consequents
# we only want one on each side
#first create a count for each
associationrules["antecedent_length"] = associationrules["antecedents"].apply(lambda x: len(x))
associationrules["consequents_length"] = associationrules["consequents"].apply(lambda x: len(x))
#then filter the rules to only have one antecedent and consequent
associationrules = associationrules.loc[associationrules["antecedent_length"] == 1]
associationrules = associationrules.loc[associationrules["consequents_length"] == 1]

In [18]:
associationrules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,antecedent_length,consequents_length
0,(Alien (1979)),(Star Wars: Episode IV - A New Hope (1977)),0.120879,0.307692,0.10989,0.909091,2.954545,0.072697,7.615385,1,1
1,(Alien (1979)),(Star Wars: Episode V - The Empire Strikes Bac...,0.120879,0.274725,0.10989,0.909091,3.309091,0.076682,7.978022,1,1
2,(Outbreak (1995)),(Dances with Wolves (1990)),0.120879,0.252747,0.10989,0.909091,3.596838,0.079338,8.21978,1,1
3,(Die Hard (1988)),"(Shawshank Redemption, The (1994))",0.131868,0.417582,0.120879,0.916667,2.195175,0.065813,6.989011,1,1
4,"(Fish Called Wanda, A (1988))","(Shawshank Redemption, The (1994))",0.120879,0.417582,0.10989,0.909091,2.177033,0.059413,6.406593,1,1
5,(Mrs. Doubtfire (1993)),(Forrest Gump (1994)),0.131868,0.395604,0.120879,0.916667,2.31713,0.068712,7.252747,1,1
6,(Outbreak (1995)),(Forrest Gump (1994)),0.120879,0.395604,0.10989,0.909091,2.29798,0.06207,6.648352,1,1
7,"(Godfather: Part II, The (1974))","(Godfather, The (1972))",0.153846,0.230769,0.142857,0.928571,4.02381,0.107354,10.769231,1,1
8,"(Lord of the Rings: The Two Towers, The (2002))","(Lord of the Rings: The Return of the King, Th...",0.131868,0.131868,0.120879,0.916667,6.951389,0.10349,10.417582,1,1
9,"(Lord of the Rings: The Return of the King, Th...","(Lord of the Rings: The Two Towers, The (2002))",0.131868,0.131868,0.120879,0.916667,6.951389,0.10349,10.417582,1,1


The Confidence levels are high and im surprised two of them came out to 1.0, and aren,t movie in a sequence of movies.
I would assume some of these rules especially the movies that are the next/previous movie or sequence of movies.
I think most of these rules make sense which is reassuring.
I am really surprised that i have seen all these movies.



## Check the Association Rules based on Lift for Movies

In [109]:
# min .9 Confidence
associationrules = association_rules(frequent_itemsets, metric="lift", min_threshold=0.9)
'''create 2 new columns counting the amount of antecendents and consequents
we only want one on each side
first create a count for each'''
associationrules["antecedent_length"] = associationrules["antecedents"].apply(lambda x: len(x))
associationrules["consequents_length"] = associationrules["consequents"].apply(lambda x: len(x))
#then filter the rules to only have one antecedent and consequent
associationrules = associationrules.loc[associationrules["antecedent_length"] == 1]
associationrules = associationrules.loc[associationrules["consequents_length"] == 1]

In [110]:
associationrules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,antecedent_length,consequents_length
0,(Action),(Adventure),0.287491,0.233653,0.131663,0.457973,1.960054,0.06449,1.413853,1,1
1,(Adventure),(Action),0.233653,0.287491,0.131663,0.563498,1.960054,0.06449,1.632316,1,1
2,(Sci-Fi),(Action),0.167555,0.287491,0.101457,0.605514,2.106202,0.053286,1.806172,1,1
3,(Action),(Sci-Fi),0.287491,0.167555,0.101457,0.352905,2.106202,0.053286,1.286434,1,1
4,(Thriller),(Action),0.271322,0.287491,0.127043,0.468238,1.628706,0.049041,1.339903,1,1
5,(Action),(Thriller),0.287491,0.271322,0.127043,0.441904,1.628706,0.049041,1.305649,1,1
6,(Romance),(Comedy),0.183724,0.32978,0.103056,0.560928,1.700919,0.042468,1.526449,1,1
7,(Comedy),(Romance),0.32978,0.183724,0.103056,0.3125,1.700919,0.042468,1.18731,1,1
8,(Romance),(Drama),0.183724,0.483653,0.106965,0.582205,1.203766,0.018106,1.235886,1,1
9,(Drama),(Romance),0.483653,0.183724,0.106965,0.221161,1.203766,0.018106,1.048067,1,1


The lift shows that all of these relationships between the antecedents are symetric when looking at the lift for the genres because all of the A->B also have a B->A. Also i choice 1 because a lift value of 1 shows that A and B are independent. Since all of these have a value greater then they have a positive effect on the occurence of the rule.

### Also, mine relationships between different genres of movies where a user is likely to rank a movie of a certain genre highly (>=4) if they also rank a different movie of a different genre highly

### Restrict your rules to one item on the left and one item on the right

### Select an algorithm to use (Apriori/ FP tree etc) and appropriate interestingness measures along with their thresholds.

In [49]:
'''get the list of lists for the genres given for each userrating in the
original dataframe'''
genrelists = list(df['genrelist'])
genrelists

[['Adventure', 'Animation', 'Children', 'Comedy', 'Fantasy'],
 ['Adventure', 'Animation', 'Children', 'Comedy', 'Fantasy'],
 ['Adventure', 'Animation', 'Children', 'Comedy', 'Fantasy'],
 ['Adventure', 'Animation', 'Children', 'Comedy', 'Fantasy'],
 ['Adventure', 'Animation', 'Children', 'Comedy', 'Fantasy'],
 ['Adventure', 'Animation', 'Children', 'Comedy', 'Fantasy'],
 ['Adventure', 'Animation', 'Children', 'Comedy', 'Fantasy'],
 ['Adventure', 'Animation', 'Children', 'Comedy', 'Fantasy'],
 ['Adventure', 'Animation', 'Children', 'Comedy', 'Fantasy'],
 ['Adventure', 'Animation', 'Children', 'Comedy', 'Fantasy'],
 ['Adventure', 'Animation', 'Children', 'Comedy', 'Fantasy'],
 ['Adventure', 'Animation', 'Children', 'Comedy', 'Fantasy'],
 ['Adventure', 'Animation', 'Children', 'Comedy', 'Fantasy'],
 ['Adventure', 'Animation', 'Children', 'Comedy', 'Fantasy'],
 ['Adventure', 'Animation', 'Children', 'Comedy', 'Fantasy'],
 ['Adventure', 'Animation', 'Children', 'Comedy', 'Fantasy'],
 ['Adven

In [53]:
'''we get the list of the genres '''
columns= ['Action', 'Adventure', 'Animation', 'Children', 'Comedy', 'Crime', 'Documentary', 'Drama', 'Fantasy', 'Film-Noir', 'Horror', 'IMAX', 'Musical', 'Mystery', 'Romance', 'Sci-Fi', 'Thriller', 'War', 'Western']

In [100]:
'''now we create a list of lists of boolean values to check if
the genres are in the genres list'''
usergenreratings= []
for lists in genrelists:
    booleanlist = [items in lists for items in columns]
    usergenreratings.append(booleanlist) 

usergenreratings

[[False,
  True,
  True,
  True,
  True,
  False,
  False,
  False,
  True,
  False,
  False,
  False,
  False,
  False,
  False,
  False,
  False,
  False,
  False],
 [False,
  True,
  True,
  True,
  True,
  False,
  False,
  False,
  True,
  False,
  False,
  False,
  False,
  False,
  False,
  False,
  False,
  False,
  False],
 [False,
  True,
  True,
  True,
  True,
  False,
  False,
  False,
  True,
  False,
  False,
  False,
  False,
  False,
  False,
  False,
  False,
  False,
  False],
 [False,
  True,
  True,
  True,
  True,
  False,
  False,
  False,
  True,
  False,
  False,
  False,
  False,
  False,
  False,
  False,
  False,
  False,
  False],
 [False,
  True,
  True,
  True,
  True,
  False,
  False,
  False,
  True,
  False,
  False,
  False,
  False,
  False,
  False,
  False,
  False,
  False,
  False],
 [False,
  True,
  True,
  True,
  True,
  False,
  False,
  False,
  True,
  False,
  False,
  False,
  False,
  False,
  False,
  False,
  False,
  False,
  False]

In [85]:
#convert the Boolean list of lists into a dataframe
from pandas import DataFrame
genredataframe = DataFrame.from_records(usergenreratings, columns = columns)

In [86]:
#look at head to make sure that the dataframe looks okay
genredataframe.head(3)

Unnamed: 0,Action,Adventure,Animation,Children,Comedy,Crime,Documentary,Drama,Fantasy,Film-Noir,Horror,IMAX,Musical,Mystery,Romance,Sci-Fi,Thriller,War,Western
0,False,True,True,True,True,False,False,False,True,False,False,False,False,False,False,False,False,False,False
1,False,True,True,True,True,False,False,False,True,False,False,False,False,False,False,False,False,False,False
2,False,True,True,True,True,False,False,False,True,False,False,False,False,False,False,False,False,False,False


In [87]:
#add userID to the genre boolean matrix
genredataframe= genredataframe.set_index(df.userId)

In [102]:
genredataframe.head(3)

Unnamed: 0_level_0,Action,Adventure,Animation,Children,Comedy,Crime,Documentary,Drama,Fantasy,Film-Noir,Horror,IMAX,Musical,Mystery,Romance,Sci-Fi,Thriller,War,Western
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
3,False,True,True,True,True,False,False,False,True,False,False,False,False,False,False,False,False,False,False
6,False,True,True,True,True,False,False,False,True,False,False,False,False,False,False,False,False,False,False
8,False,True,True,True,True,False,False,False,True,False,False,False,False,False,False,False,False,False,False


In [90]:
genredataframe.shape

(5628, 19)

In [93]:
# look at the amount of genre tags movies to see how balanced the gernre data is
np.sum(genredataframe)

Action         1618
Adventure      1315
Animation       331
Children        426
Comedy         1856
Crime           941
Documentary      68
Drama          2722
Fantasy         614
Film-Noir        57
Horror          396
IMAX            156
Musical         181
Mystery         418
Romance        1034
Sci-Fi          943
Thriller       1527
War             411
Western         145
dtype: int64

In [94]:
frequent_itemsets = apriori(genredataframe, min_support=0.1, use_colnames=True)
frequent_itemsets

Unnamed: 0,support,itemsets
0,0.287491,(Action)
1,0.233653,(Adventure)
2,0.32978,(Comedy)
3,0.1672,(Crime)
4,0.483653,(Drama)
5,0.109097,(Fantasy)
6,0.183724,(Romance)
7,0.167555,(Sci-Fi)
8,0.271322,(Thriller)
9,0.131663,"(Action, Adventure)"


## Check the Association Rules based on Confidence for Genres

In [104]:
# min .6 Confidence
associationrules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.6)

associationrules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(Sci-Fi),(Action),0.167555,0.287491,0.101457,0.605514,2.106202,0.053286,1.806172


In [106]:
# min .5 Confidence
associationrules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.5)
# we only want one on each side
#first create a count for each
associationrules["antecedent_length"] = associationrules["antecedents"].apply(lambda x: len(x))
associationrules["consequents_length"] = associationrules["consequents"].apply(lambda x: len(x))
#then filter the rules to only have one antecedent and consequent
associationrules = associationrules.loc[associationrules["antecedent_length"] == 1]
associationrules = associationrules.loc[associationrules["consequents_length"] == 1]
associationrules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,antecedent_length,consequents_length
0,(Adventure),(Action),0.233653,0.287491,0.131663,0.563498,1.960054,0.06449,1.632316,1,1
1,(Sci-Fi),(Action),0.167555,0.287491,0.101457,0.605514,2.106202,0.053286,1.806172,1,1
2,(Romance),(Comedy),0.183724,0.32978,0.103056,0.560928,1.700919,0.042468,1.526449,1,1
3,(Romance),(Drama),0.183724,0.483653,0.106965,0.582205,1.203766,0.018106,1.235886,1,1


It is interesting that confidences are pretty low values. I figure the values would be higher. But i think this is due to the feature set aka the genres being such a low number, There are only 19 vs previous mining where there were 5628 movies.

## Check the Association Rules based on Lift for Genres

In [107]:
associationrules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)
associationrules["antecedent_length"] = associationrules["antecedents"].apply(lambda x: len(x))
associationrules["consequents_length"] = associationrules["consequents"].apply(lambda x: len(x))
#then filter the rules to only have one antecedent and consequent
associationrules = associationrules.loc[associationrules["antecedent_length"] == 1]
associationrules = associationrules.loc[associationrules["consequents_length"] == 1]
associationrules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,antecedent_length,consequents_length
0,(Action),(Adventure),0.287491,0.233653,0.131663,0.457973,1.960054,0.06449,1.413853,1,1
1,(Adventure),(Action),0.233653,0.287491,0.131663,0.563498,1.960054,0.06449,1.632316,1,1
2,(Sci-Fi),(Action),0.167555,0.287491,0.101457,0.605514,2.106202,0.053286,1.806172,1,1
3,(Action),(Sci-Fi),0.287491,0.167555,0.101457,0.352905,2.106202,0.053286,1.286434,1,1
4,(Thriller),(Action),0.271322,0.287491,0.127043,0.468238,1.628706,0.049041,1.339903,1,1
5,(Action),(Thriller),0.287491,0.271322,0.127043,0.441904,1.628706,0.049041,1.305649,1,1
6,(Romance),(Comedy),0.183724,0.32978,0.103056,0.560928,1.700919,0.042468,1.526449,1,1
7,(Comedy),(Romance),0.32978,0.183724,0.103056,0.3125,1.700919,0.042468,1.18731,1,1
8,(Romance),(Drama),0.183724,0.483653,0.106965,0.582205,1.203766,0.018106,1.235886,1,1
9,(Drama),(Romance),0.483653,0.183724,0.106965,0.221161,1.203766,0.018106,1.048067,1,1


I choice 1 because is is suppose to be "If A and C are independent, the Lift score will be exactly 1." but since the values are greater then 1 then they are positive effect on the occurence.