# Miner notebook

Purpose: Extract rules from the data using the apriori algorithm

In [1]:
from apyori import apriori

Make some sample transactions to get to know the function

In [2]:
transactions = [
    ['beer', 'diapers'],
    ['diapers', 'milk'],
    ['beer', 'nuts', 'diapers'],    
    ['milk', 'nuts']
]

In [3]:
rec = apriori(transactions, min_confidence=0.8, min_support=0.5)
l = list(rec)

In [4]:
print(l)

[RelationRecord(items=frozenset({'diapers', 'beer'}), support=0.5, ordered_statistics=[OrderedStatistic(items_base=frozenset({'beer'}), items_add=frozenset({'diapers'}), confidence=1.0, lift=1.3333333333333333)])]


Probably need some pretty print for this..

In [5]:
def pprint_records(rec_list):
    r = rec_list.copy()
    for row in r:
        print(f'Items:  {set(row.items)}')
        print(f'Support: {row.support:.4f}')
        print('BODY -> HEAD[Confidence, Lift]\n')
        for os in row.ordered_statistics:
            print(f'\t{set(os.items_base)}  ->  {set(os.items_add)}[{os.confidence:.2f}, {os.lift:.2f}]\n\n')

In [6]:
pprint_records(l)

Items:  {'diapers', 'beer'}
Support: 0.5000
BODY -> HEAD[Confidence, Lift]

	{'beer'}  ->  {'diapers'}[1.00, 1.33]




## Import some real data

In [7]:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
%matplotlib inline

In [8]:
all_raw = pd.read_csv('cleansed_data//all_consumption_metadata.csv', parse_dates=True, index_col=0,
                     dtype={'loc_id':'str', 'consumption_kvah':'float32', 'temperature':'float32',
                           'el_price':'float32', 'oil_price':'float32'})
all_raw.head()

Unnamed: 0_level_0,loc_id,consumption_kvah,temperature,el_price,oil_price
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2018-01-01 00:00:00,0,27.0,5.5,26.33,66.730003
2018-01-01 01:00:00,0,27.5,5.0,26.43,66.730003
2018-01-01 02:00:00,0,27.0,4.8,26.1,66.730003
2018-01-01 03:00:00,0,23.0,4.9,24.700001,66.730003
2018-01-01 04:00:00,0,23.0,3.7,24.74,66.730003


In [9]:
all_df = all_raw.copy()

Separate all locations

In [10]:
loc_ids = all_df['loc_id'].unique()

d = {}

for loc in loc_ids: 
    d[loc] = all_df[all_df['loc_id']==loc]
    print(f'Length of {loc}: {len(d[loc])}')

Length of 0: 18385
Length of 1: 18385
Length of 3: 18385
Length of 4: 18385
Length of 6: 18385
Length of 10: 18385
Length of 11: 18385
Length of 12: 18385
Length of 16: 18385
Length of 17: 18385
Length of 18: 18385
Length of 19: 18385
Length of 7: 18344
Length of 8: 16111
Length of 14: 15299
Length of 2: 15134
Length of 5: 14001
Length of 15: 12493
Length of 13: 6090
Length of 9: 11783


13 is probably a nice sample dataset

In [None]:
sample = d['13']

In [None]:
fig, ax = plt.subplots(1,1,figsize=(20,5))
ax.plot(sample['consumption_kvah'][:1000])
plt.show()

Might be, might not be. Considering the distinct even peaks this timeseries is probably not representative of private householdings, but rather some commercial customer.

It should do for prototyping anyway.

**The validity** of the apriori algortithm is heavily dependent on the categorization of numerical values. Probably have to try a few approaches

#### **Test 1**: Bucket all numerical values in percentiles

In [None]:
loc = '13'
sample = d[loc]
sample = sample.drop(['loc_id'], axis=1)

In [11]:
def create_percentile_mask_dict(df, means=None, stds=None):
    
    df_copy = df.copy()
    cols = df_copy.columns
    
    # Use either a supplied means and stds list or extract from the given data
    if means==None: means = [np.mean(df_copy[c]) for c in cols]
    if stds==None: stds = [np.std(df_copy[c]) for c in cols]
        
    masks = {}
    masks['m3std'] = [df_copy[c]<(m-3*s) for c, m, s in zip(cols, means, stds)]
    masks['m2std'] = [(df_copy[c]<(m-2*s)) & ~m3std for c, m, s, m3std in zip(cols, means, stds, masks['m3std'])]
    masks['m1std'] = [(df_copy[c]<(m-s)) & ~(m3std|m2std) for c, m, s, m3std, m2std in zip(cols, means, stds, masks['m3std'], masks['m2std'])]

    masks['m0std'] = [(df_copy[c]<=(m+s)) & ~(m3std|m2std|m1std) for c, m, s, m3std, m2std, m1std 
                 in zip(cols, means, stds, masks['m3std'], masks['m2std'], masks['m1std'])]

    masks['p3std'] = [df_copy[c]>(m+3*s) for c, m, s in zip(cols, means, stds)]
    masks['p2std'] = [(df_copy[c]>(m+2*s)) & ~p3std for c, m, s, p3std in zip(cols, means, stds, masks['p3std'])]
    masks['p1std'] = [(df_copy[c]>(m+s)) & ~(p3std|p2std) for c, m, s, p3std, p2std in zip(cols, means, stds, masks['p3std'], masks['p2std'])]
    
    return masks.copy()

In [12]:
def create_transactions_array(d, n):
    transactions = []
    
    i = 0
    while i < n:
        row = []
        for name, m in d.items():
            for entry in m:
                if entry[i]: row.append(name+'_'+entry.name)
        transactions.append(row)
        i += 1
        
    return transactions

In [None]:
masks = create_percentile_mask_dict(sample)
n = len(masks['m0std'][0])
    
transactions = create_transactions_array(masks, n)

In [None]:
n

Sample is only 6000 rows, so we should be smart about choosing support threshold.

 * Too low threshold might find false rules. Some relationships might happen by chance in a few rows.
 * Finding rules for rare occurrences is not necessarily of value compared to strong everyday rules.
 * Too high threshold might overlook interesting relationships.

 
One much-used example is demanding 50 occurrences over 10,000 rows. I.e. min_support=0.005.

Given the size of our dataset, at 6,000 rows, we'll start out with min_support of 1%. This gives us rules with quite solid grounding.

Playing with confidence and lift to filter out the strongest rules.

In [None]:
min_support = 0.01

In [None]:
rec = apriori(transactions, min_support=min_support, min_confidence=0.7, min_lift=10)
pprint_records(list(rec))

This is our most solid rule with support threshold at 1%. When oil price $\in (0.2,2.1)$  percentile, so is probably the el price.

In [None]:
rec = apriori(transactions, min_support=min_support, min_confidence=0.6, min_lift=2)
pprint_records(list(rec))

**Surprising remark**: When temperature $\in (97.7, 99.8)$ percentile, the consumption is typically moderately high.

This could say something about this particular customer, and it's likely that this rule won't hold water for the other locations in general.

### Run percentile mining on `Oilspot_prices` and `Elspot_prices`

Since we only have daily values on oilspot prices, and hourly on elspot, it's probably fair to resample the elspot prices.

In [None]:
oilspot = pd.read_csv('cleansed_data//Oilspot_prices.csv', index_col=0, parse_dates=True, dtype='float32')
oilspot['day'] = oilspot.index.date
oilspot.head(3)

Resampling to compare day by day since we only have daily oil prices.

In [None]:
elspot_raw = pd.read_csv('cleansed_data//Elspot_prices.csv', index_col=0, parse_dates=True, dtype='float32')

elspot = elspot.resample(rule='D').mean()
elspot['day'] = elspot.index.date
elspot.head(3)

In [None]:
oil_el = oilspot.merge(elspot, how='inner', left_on='day', right_on='day', suffixes=('_oil', '_el'))

In [None]:
oil_el_days = oil_el['day']
oil_el = oil_el.drop('day', axis=1)
oil_el.head(3)

First out: Use the charateristics of the data before merge. For oil prices at least, we have one more year of data

In [None]:
means = [np.mean(df['price']) for df in [oilspot, elspot]]
stds = [np.std(df['price']) for df in [oilspot, elspot]]

In [None]:
perc_masks = create_percentile_mask_dict(oil_el, means=means, stds=stds)

And we do have a timestamp available, so why not extract some rules from it as well?

In [None]:
def create_date_mask_dict(date_ts, weekday=False, month=False):
    
    df = date_ts.copy().to_frame()
    
    date_col = df.columns[0]
    
    masks = {}
    
    if weekday:
        df['weekday'] = [d.weekday() for d in df[date_col]]
        
        weekdays = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']

        for i, day in enumerate(weekdays):
            masks[day] = [df['weekday']==i]
            
    if month:
        df['month'] = [d.month for d in df[date_col]]
        
        months = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 
                  'August', 'September', 'October', 'November', 'December']
        
        for i, m in enumerate(months):
            masks[m] = [df['month']==i]
        
    return masks.copy()

In [None]:
date_masks = create_date_mask_dict(oil_el_days, weekday=True, month=False)

In [None]:
all_masks = date_masks.copy()
all_masks.update(perc_masks)

In [None]:
n = len(date_masks['Monday'][0])
n

In [None]:
transactions = create_transactions_array(all_masks, n)

Here again is time to be smart about the length of the data. It's quite short (800 rows), so demanding 5% support is probably not a bad idea. That is, at least 40 rows must contain our frequent itemset.

In [None]:
min_support = 0.05

In [None]:
rec = apriori(transactions, min_support=min_support, min_confidence=0.8, min_lift=1.1)
pprint_records(list(rec))

Few strong connections here. A lift above one means there is a definite value to the rule. What we seem to get from this is that typical oil prices typically imply typical el prices. But the buckets are defined to be largest at the typical values anyway. And the low lift implies that typical values are prevalent anyway.

Might dare to lower support a bit. 2.5% support means 20 rows must contain the itemset.

In [None]:
min_support = 0.025

In [None]:
rec = apriori(transactions, min_support=min_support, min_confidence=0.8, min_lift=1.2)
pprint_records(list(rec))

In [None]:
n*0.0266

Cool! So, 22 rows seem to definitely imply that a very strong drop in oil prices drew el prices down as well!

This is the strongest rule by far for this support. No other rule is even above lift 1.2.

This is somewhat interesting given the fact that 98% of Norway's electricity production is renewable, but we're of course not detached from trade either.

Any difference if we use the stds and means from the merged data?

In [None]:
perc_masks = create_percentile_mask_dict(oil_el)
all_masks = date_masks.copy()
all_masks.update(perc_masks)
transactions = create_transactions_array(all_masks, n)

In [None]:
rec = apriori(transactions, min_support=min_support, min_confidence=0.8, min_lift=1.2)
pprint_records(list(rec))

Nope. We'll use the characteristics of the merged data from now on, to have a fair comparison.

### How 'bout weather and el prices?

In [None]:
w = pd.read_csv('cleansed_data//Weather.csv', index_col=[0,1], 
                dtype={'temperature':'float32', 'weather_station':'str'}, parse_dates=True)
w.head(3)

Let's make one complete temperature series using the median measurement in Agder for this one.

In [None]:
first_date = min(w.index.get_level_values(0))
last_date = max(w.index.get_level_values(0))
median_temps = pd.DataFrame(index=pd.date_range(start=first_date, end=last_date, freq='H'), columns=['temperature'])

In [None]:
median_temps['temperature'] = [np.nanmedian(w.loc[idx]) for idx in median_temps.index]

In [None]:
median_temps.head()

In [None]:
median_temps['day'] = median_temps.index.date

In [None]:
temp_el = median_temps.merge(elspot_raw, how='inner', left_index=True, right_index=True)

In [None]:
temp_el_days = temp_el['day']
temp_el = temp_el.drop('day', axis=1)

In [None]:
temp_el.head()

In [None]:
perc_masks = create_percentile_mask_dict(temp_el)
date_masks = create_date_mask_dict(temp_el_days, weekday=True, month=True)

all_masks = date_masks.copy()
all_masks.update(perc_masks)

n = len(perc_masks['m0std'][0])
transactions = create_transactions_array(all_masks, n)

In [None]:
n

19,000 rows. Demanding first 0.005% support first, i.e. ~95 rows back our itemset.

In [None]:
min_support = 0.005

In [None]:
rec = apriori(transactions, min_support=min_support, min_confidence=0.8, min_lift=10)
pprint_records(list(rec))

Some suspicious rules here. You're much more likely to be in August month if the temperatures are really high and the el prices are moderately high. Not that surprising. Also, if you happen to find yourself on a Saturday when prices are really low, you're much more likely than elsewhen to find out that the month is March and the temperature is moderate!

Skipping weekdays and months and going down to 0.0025% support, i.e. 50 rows.

In [None]:
transactions = create_transactions_array(perc_masks, n)

In [None]:
min_support = 0.0025

In [None]:
rec = apriori(transactions, min_support=min_support, min_confidence=0.6, min_lift=1.4)
pprint_records(list(rec))

Based of some ~60 rows found with both normal temperature and really low el prices, it seems that your best bet on temperature is that it is normal if prices are high. Furthermore, with the support of ~100 rows, it seems that temperatures during high prices are moderately low. 

This does not imply that low temperatures drive prices up, though. It can very well be the case that prices are mostly normal during colder times, but that when high prices occur, it's usually moderately cold.

Summary thus far:
 * Strong indication that extremely low oil prices drive down el prices substantially.
 * In times of very high el prices, the weather is much more likely to be cold than elsewhen.

## Consumption and el prices?

Using local weather and el prices at hourly frequency. All datasets should agree upon what's a low, medium and high el price.

In [None]:
first_date = min(all_raw.index)
last_date = max(all_raw.index)

el_price_subset = elspot_raw.loc[pd.date_range(start=first_date, end=last_date, freq='H')]

el_subset_mean = np.mean(el_price_subset['price'])
el_subset_std = np.std(el_price_subset['price'])

In [None]:
transactions_with_loc_id = []
transactions_wo_loc_id = []

for loc, df in d.items():
    df_copy = df.copy()
        
    df_copy = df_copy.drop(['loc_id', 'oil_price', 'temperature'], axis=1)
    
    means = [np.mean(df_copy['consumption_kvah']), el_subset_mean]
    stds = [np.std(df_copy['consumption_kvah']), el_subset_std]
    
    perc_masks = create_percentile_mask_dict(df_copy, means=means, stds=stds)
    
    n = len(perc_masks['m0std'][0])
    trans = create_transactions_array(perc_masks, n)
    
    trans_loc = [t+[f'loc_{loc}'] for t in trans]

    transactions_with_loc_id += trans_loc
    transactions_wo_loc_id += trans

In [None]:
n = len(transactions_wo_loc_id)
n

That's 330,000 transactions! The apriori algorithm does not like very low thresholds, but we'll give this a shot.

Let's start out moderately easy. 0.01% support means at least 3300 rows back us.

Loc ids will probably not enter the equation before we lower the support threshold to find more specific rule. For now we'll keep them in a separated set of transactions to keep them off our results. Any general rules?

In [None]:
min_support = 0.01

This high support threshold yields itemsets that are valid for (supported by) many rows. They might not yield extraordinary value since these occurrences are typical anyway. Thus the lift is low.

In [None]:
rec = apriori(transactions_wo_loc_id, min_support=min_support, min_confidence=0.6, min_lift=1.05)
pprint_records(list(rec))

Too general. All we have here is basically that what we usually see is the usual. 

Do we see any strong patterns for the different locations?

In [None]:
rec = apriori(transactions_with_loc_id, min_support=min_support, min_confidence=0.6, min_lift=1.2)
pprint_records(list(rec))

Location 4 is the customer that the fact of there being moderate el prices increases the proability of having a moderate consumption.

In [None]:
min_support = 0.001

In [None]:
rec = apriori(transactions_with_loc_id, min_support=min_support, min_confidence=0.6, min_lift=1.5)
pprint_records(list(rec))

In [None]:
0.0010 * n

0.1% support means we need at least 330 rows to back the itemset. If you observe moderate prices and low consumption, you have a tremendous increase in probability that the location at which this happens is 14. This could also mean that location 14 is one of the few which has data in this period, though.

In [None]:
min_support = 0.0005
min_support * n

In [None]:
rec = apriori(transactions_wo_loc_id, min_support=min_support, min_confidence=0.6, min_lift=1.15)
pprint_records(list(rec))

In [None]:
rec = apriori(transactions_with_loc_id, min_support=min_support, min_confidence=0.6, min_lift=1.4)
pprint_records(list(rec))

At loc 15, when prices are very low, you're very likely to find that they have moderate consumption. Lift implies more so than elsewhen.

In [None]:
m, s = np.mean(d['15']['consumption_kvah']), np.std(d['15']['consumption_kvah'])
print(f'Normal consumption for 15 is between {m-s:.0f} and {m+s:.0f}')
print(f'Low prices are between {el_subset_mean-2*el_subset_std:.0f} and {el_subset_mean-3*el_subset_std:.0f}')

In [None]:
mask_1 = d['15']['consumption_kvah'] > 6
mask_2 = d['15']['consumption_kvah'] < 13
mask_3 = mask_1 & mask_2

mask_4 = el_price_subset['price'] < 21
mask_5 = el_price_subset['price'] > 12
mask_6 = mask_4 & mask_5

mask_7 = mask_6 & mask_3

In [None]:
start = d['15'].index[0]
end = d['15'].index[-1]

In [None]:
period = pd.date_range(start=start, end=end, freq='H')

In [None]:
fig, ax = plt.subplots(2,1,figsize=(20,10))
ax[0].scatter(period, d['15'].loc[period]['consumption_kvah'], label='consumption')
ax[0].scatter(period, pd.DataFrame(index=period, data=d['15'][mask_3])['consumption_kvah'], c='r', label='consumption within one std')
ax[0].scatter(period, pd.DataFrame(index=period, data=d['15'][mask_7])['consumption_kvah'], c='g', label='consumption within one std when prices are between two and three stds below')
ax[0].set_title('Consumption timeseries for loc id 15')

ax[1].set_title('El prices')
ax[1].scatter(period, el_price_subset.loc[period], label='prices')
ax[1].scatter(period, pd.DataFrame(index=period, data=el_price_subset[mask_6])['price'], c='r', label='prices between two and three stds below')

ax[0].legend()
ax[1].legend()

plt.show()

This customer does indeed have some, though not very much, fluctuations in usage, and they are at normal levels when prices are low. However, they have also high consumption when prices are high.

It could be useful to base the numbers on weekly or daily averages instead of overall averages and deviations.

In [None]:
m, s = np.mean(d['14']['consumption_kvah']), np.std(d['14']['consumption_kvah'])

print(f'Minus 2 stds consumption for 14 is between {m-2*s:.0f} and {max(0,m-3*s):.0f}')
print(f'Normal prices are between {el_subset_mean-el_subset_std:.0f} and {el_subset_mean+el_subset_std:.0f}')

In [None]:
mask_1 = d['14']['consumption_kvah'] > 0
mask_2 = d['14']['consumption_kvah'] < 4
mask_3 = mask_1 & mask_2

mask_4 = el_price_subset['price'] < 50
mask_5 = el_price_subset['price'] > 31
mask_6 = mask_4 & mask_5

mask_7 = mask_6 & mask_3

start = d['14'].index[0]
end = d['14'].index[-1]

period = pd.date_range(start=start, end=end, freq='H')


fig, ax = plt.subplots(2,1,figsize=(20,10))
ax[0].scatter(period, d['14'].loc[period]['consumption_kvah'], label='consumption')
ax[0].scatter(period, pd.DataFrame(index=period, data=d['14'][mask_3])['consumption_kvah'], c='r', label='consumption between minus two and three stds')
ax[0].scatter(period, pd.DataFrame(index=period, data=d['14'][mask_7])['consumption_kvah'], c='g', label='consumption between minus two and three stds when prices are normal')
ax[0].set_title('Consumption timeseries for loc id 14')

ax[1].set_title('El prices')
ax[1].scatter(period, el_price_subset.loc[period], label='prices')
ax[1].scatter(period, pd.DataFrame(index=period, data=el_price_subset[mask_6])['price'], c='r', label='normal prices')

ax[0].legend()
ax[1].legend()

plt.show()

What we see here is that loc 14 has timeseries with missing values, or simply down time on their transformers. Nothing worth mentioning.

## Desperately need other categories!

Compare prices and consumption to last week's value at this time.

How to do this: 
 * Decide how fine resolution (n) you want for bucket size.
 * For each column, define n buckets between min and max value of that column.
 * Compare last week's bucket from this. (Last week I was in bucket 3, this week in bucket 5) -> The category is +2

In [176]:
np.arange(0.5,11, (11-0.5)/5)

array([0.5, 2.6, 4.7, 6.8, 8.9])

In [336]:
def create_shift_diff_mask_dict(df, n_buckets=10, shift=24*7, label='lw'):
    
    df_copy = df.copy()
    cols = df_copy.columns
        
    masks = {}
    
    shifted = df_copy.shift(shift)
    shifted = shifted.dropna()
    df_copy = df_copy.loc[shifted.index]
        
    diffs = pd.DataFrame(index=df_copy.index)
    
    for c in cols: 
        maxim = max(df_copy[c])
        minim = min(df_copy[c])

        arange = np.arange(minim, maxim, (maxim-minim)/n_buckets)

        old = pd.DataFrame(index=shifted.index, columns=['bucket'], data=0)
        new = pd.DataFrame(index=df_copy.index, columns=['bucket'], data=0)

        for i, r in enumerate(arange):
            old[shifted[c]>r] = i 
            new[df_copy[c]>r] = i
            
        diffs[c] = new['bucket'] - old['bucket'] 
     
    
    for n in range(-n_buckets+1, 0):
        masks[f'm{(100/n_buckets * n*(-1)):.0f}'] = [diffs[c]==n for c in cols]
    
    for n in range(0, n_buckets):
        masks[f'p{(100/n_buckets * n):.0f}'] = [diffs[c]==n for c in cols]
        
    return masks

In [356]:
transactions_with_loc_id = []
transactions_wo_loc_id = []

for loc, df in d.items():
    df_copy = df.copy().drop(['loc_id', 'oil_price'], axis=1).resample(rule='W').mean()
    
    diff_masks = create_shift_diff_mask_dict(df_copy, n_buckets=20, shift=1)
    
    n = len(diff_masks['p0'][0])
    
    trans = create_transactions_array(diff_masks, n)
    
    trans_loc = [t+[f'loc_{loc}'] for t in trans]

    transactions_with_loc_id += trans_loc
    transactions_wo_loc_id += trans

In [357]:
n = len(transactions_wo_loc_id)

In [358]:
min_support = 0.005
min_support*n

9.785

In [363]:
rec = apriori(transactions_wo_loc_id, min_support=min_support, min_confidence=0.7, min_lift=1.2)
pprint_records(list(rec))

Items:  {'m25_temperature', 'p5_el_price'}
Support: 0.0066
BODY -> HEAD[Confidence, Lift]

	{'m25_temperature'}  ->  {'p5_el_price'}[0.72, 3.34]


Items:  {'m15_temperature', 'p5_el_price', 'p15_consumption_kvah'}
Support: 0.0077
BODY -> HEAD[Confidence, Lift]

	{'m15_temperature', 'p15_consumption_kvah'}  ->  {'p5_el_price'}[0.79, 3.65]




In [374]:
min_support = 0.0075
min_support*n

14.6775

In [375]:
rec = apriori(transactions_wo_loc_id, min_support=min_support, min_confidence=0.7, min_lift=1.15)
pprint_records(list(rec))

Items:  {'m15_temperature', 'p5_el_price', 'p15_consumption_kvah'}
Support: 0.0077
BODY -> HEAD[Confidence, Lift]

	{'m15_temperature', 'p15_consumption_kvah'}  ->  {'p5_el_price'}[0.79, 3.65]




In [383]:
transactions_with_loc_id = []
transactions_wo_loc_id = []

for loc, df in d.items():
    df_copy = df.copy().drop(['loc_id', 'oil_price', 'el_price'], axis=1).resample(rule='W').mean()
    
    diff_masks = create_shift_diff_mask_dict(df_copy, n_buckets=10, shift=1)
    
    n = len(diff_masks['p0'][0])
    
    trans = create_transactions_array(diff_masks, n)
    
    trans_loc = [t+[f'loc_{loc}'] for t in trans]

    transactions_with_loc_id += trans_loc
    transactions_wo_loc_id += trans

In [384]:
min_support = 0.005
min_support*n

0.35000000000000003

In [390]:
rec = apriori(transactions_wo_loc_id, min_support=min_support, min_confidence=0.6, min_lift=1.2)
pprint_records(list(rec))

Items:  {'p0_consumption_kvah', 'p0_temperature'}
Support: 0.2253
BODY -> HEAD[Confidence, Lift]

	{'p0_temperature'}  ->  {'p0_consumption_kvah'}[0.61, 1.26]




In [392]:
transactions_with_loc_id = []
transactions_wo_loc_id = []

for loc, df in d.items():
    df_copy = df.copy().drop(['loc_id', 'oil_price', 'el_price'], axis=1).resample(rule='W').mean()
    
    diff_masks = create_shift_diff_mask_dict(df_copy, n_buckets=15, shift=1)
    
    n = len(diff_masks['p0'][0])
    
    trans = create_transactions_array(diff_masks, n)
    
    trans_loc = [t+[f'loc_{loc}'] for t in trans]

    transactions_with_loc_id += trans_loc
    transactions_wo_loc_id += trans

In [393]:
min_support = 0.005
min_support*n

0.35000000000000003

In [397]:
rec = apriori(transactions_with_loc_id, min_support=min_support, min_confidence=0.6, min_lift=1.15)
pprint_records(list(rec))

Items:  {'p0_consumption_kvah', 'loc_0', 'p0_temperature'}
Support: 0.0082
BODY -> HEAD[Confidence, Lift]

	{'loc_0', 'p0_temperature'}  ->  {'p0_consumption_kvah'}[0.67, 1.87]


Items:  {'p0_consumption_kvah', 'loc_1', 'p0_temperature'}
Support: 0.0118
BODY -> HEAD[Confidence, Lift]

	{'loc_1', 'p0_temperature'}  ->  {'p0_consumption_kvah'}[0.68, 1.89]


Items:  {'m7_temperature', 'p7_consumption_kvah', 'loc_10'}
Support: 0.0056
BODY -> HEAD[Confidence, Lift]

	{'p7_consumption_kvah', 'loc_10'}  ->  {'m7_temperature'}[0.61, 2.62]


Items:  {'loc_2', 'p7_consumption_kvah', 'm13_temperature'}
Support: 0.0051
BODY -> HEAD[Confidence, Lift]

	{'loc_2', 'm13_temperature'}  ->  {'p7_consumption_kvah'}[0.67, 3.29]


Items:  {'loc_4', 'p0_consumption_kvah', 'p0_temperature'}
Support: 0.0061
BODY -> HEAD[Confidence, Lift]

	{'loc_4', 'p0_temperature'}  ->  {'p0_consumption_kvah'}[0.67, 1.87]


Items:  {'loc_4', 'p0_consumption_kvah', 'p7_temperature'}
Support: 0.0056
BODY -> HEAD[Confidence, L

In [412]:
transactions_with_loc_id = []
transactions_wo_loc_id = []

for loc, df in d.items():
    df_copy = df.copy().drop(['loc_id', 'oil_price', 'el_price'], axis=1).resample(rule='W').mean()
    
    diff_masks = create_shift_diff_mask_dict(df_copy, n_buckets=10, shift=1)
    
    n = len(diff_masks['p0'][0])
    
    trans = create_transactions_array(diff_masks, n)
    
    trans_loc = [t+[f'loc_{loc}'] for t in trans]

    transactions_with_loc_id += trans_loc
    transactions_wo_loc_id += trans

In [413]:
n = len(transactions_wo_loc_id)
min_support = 0.005
min_support*n

9.785

In [414]:
rec = apriori(transactions_with_loc_id, min_support=min_support, min_confidence=0.7, min_lift=1.2)
pprint_records(list(rec))

Items:  {'p0_consumption_kvah', 'loc_1', 'p0_temperature'}
Support: 0.0169
BODY -> HEAD[Confidence, Lift]

	{'loc_1', 'p0_temperature'}  ->  {'p0_consumption_kvah'}[0.73, 1.52]


Items:  {'loc_17', 'p0_consumption_kvah', 'p0_temperature'}
Support: 0.0153
BODY -> HEAD[Confidence, Lift]

	{'loc_17', 'p0_temperature'}  ->  {'p0_consumption_kvah'}[0.81, 1.68]


Items:  {'p0_consumption_kvah', 'p0_temperature', 'loc_6'}
Support: 0.0148
BODY -> HEAD[Confidence, Lift]

	{'p0_temperature', 'loc_6'}  ->  {'p0_consumption_kvah'}[0.71, 1.46]


Items:  {'p0_consumption_kvah', 'p0_temperature', 'loc_8'}
Support: 0.0118
BODY -> HEAD[Confidence, Lift]

	{'p0_temperature', 'loc_8'}  ->  {'p0_consumption_kvah'}[0.74, 1.54]


Items:  {'p0_consumption_kvah', 'loc_9', 'p0_temperature'}
Support: 0.0082
BODY -> HEAD[Confidence, Lift]

	{'loc_9', 'p0_temperature'}  ->  {'p0_consumption_kvah'}[0.73, 1.51]




In [416]:
rec = apriori(transactions_wo_loc_id, min_support=min_support, min_confidence=0.6, min_lift=1.2)
pprint_records(list(rec))

Items:  {'p0_consumption_kvah', 'p0_temperature'}
Support: 0.2253
BODY -> HEAD[Confidence, Lift]

	{'p0_temperature'}  ->  {'p0_consumption_kvah'}[0.61, 1.26]




In [417]:
transactions_with_loc_id = []
transactions_wo_loc_id = []

for loc, df in d.items():
    df_copy = df.copy().drop(['loc_id', 'oil_price', 'el_price'], axis=1).resample(rule='W').mean()
    
    diff_masks = create_shift_diff_mask_dict(df_copy, n_buckets=10, shift=2)
    
    n = len(diff_masks['p0'][0])
    
    trans = create_transactions_array(diff_masks, n)
    
    trans_loc = [t+[f'loc_{loc}'] for t in trans]

    transactions_with_loc_id += trans_loc
    transactions_wo_loc_id += trans

In [418]:
n = len(transactions_wo_loc_id)
min_support = 0.005
min_support*n

9.685

In [422]:
rec = apriori(transactions_with_loc_id, min_support=min_support, min_confidence=0.6, min_lift=1.2)
pprint_records(list(rec))

Items:  {'p0_consumption_kvah', 'loc_1', 'p0_temperature'}
Support: 0.0119
BODY -> HEAD[Confidence, Lift]

	{'loc_1', 'p0_temperature'}  ->  {'p0_consumption_kvah'}[0.66, 1.72]


Items:  {'p0_consumption_kvah', 'loc_15', 'p0_temperature'}
Support: 0.0057
BODY -> HEAD[Confidence, Lift]

	{'loc_15', 'p0_temperature'}  ->  {'p0_consumption_kvah'}[0.65, 1.70]


Items:  {'loc_17', 'p0_consumption_kvah', 'p0_temperature'}
Support: 0.0088
BODY -> HEAD[Confidence, Lift]

	{'loc_17', 'p0_temperature'}  ->  {'p0_consumption_kvah'}[0.63, 1.65]


Items:  {'p0_consumption_kvah', 'loc_18', 'p0_temperature'}
Support: 0.0103
BODY -> HEAD[Confidence, Lift]

	{'loc_18', 'p0_temperature'}  ->  {'p0_consumption_kvah'}[0.65, 1.69]


Items:  {'p0_consumption_kvah', 'loc_19', 'p0_temperature'}
Support: 0.0114
BODY -> HEAD[Confidence, Lift]

	{'p0_consumption_kvah', 'loc_19'}  ->  {'p0_temperature'}[0.65, 2.28]


Items:  {'loc_2', 'p0_consumption_kvah', 'p10_temperature'}
Support: 0.0077
BODY -> HEAD[Confiden

Resample per day, look at yesterday

In [436]:
transactions_with_loc_id = []
transactions_wo_loc_id = []

for loc, df in d.items():
    df_copy = df.copy().drop(['loc_id', 'oil_price', 'el_price'], axis=1).resample(rule='D').mean()
    
    diff_masks = create_shift_diff_mask_dict(df_copy, n_buckets=20, shift=1)
    
    n = len(diff_masks['p0'][0])
    
    trans = create_transactions_array(diff_masks, n)
    
    trans_loc = [t+[f'loc_{loc}'] for t in trans]

    transactions_with_loc_id += trans_loc
    transactions_wo_loc_id += trans

In [437]:
n = len(transactions_wo_loc_id)
min_support = 0.005
min_support*n

68.74

In [441]:
rec = apriori(transactions_with_loc_id, min_support=min_support, min_confidence=0.6, min_lift=1.2)
pprint_records(list(rec))

Items:  {'loc_4', 'p0_consumption_kvah', 'p0_temperature'}
Support: 0.0115
BODY -> HEAD[Confidence, Lift]

	{'loc_4', 'p0_temperature'}  ->  {'p0_consumption_kvah'}[0.61, 1.63]




In [443]:
transactions_with_loc_id = []
transactions_wo_loc_id = []

for loc, df in d.items():
    df_copy = df.copy().drop(['loc_id', 'oil_price', 'temperature'], axis=1).resample(rule='D').mean()
    
    diff_masks = create_shift_diff_mask_dict(df_copy, n_buckets=20, shift=7)
    
    n = len(diff_masks['p0'][0])
    
    trans = create_transactions_array(diff_masks, n)
    
    trans_loc = [t+[f'loc_{loc}'] for t in trans]

    transactions_with_loc_id += trans_loc
    transactions_wo_loc_id += trans

In [446]:
rec = apriori(transactions_with_loc_id, min_support=min_support, min_confidence=0.6, min_lift=1.1)
pprint_records(list(rec))

In [466]:
transactions_with_loc_id = []
transactions_wo_loc_id = []

for loc, df in d.items():
    df_copy = df.copy().drop(['loc_id', 'oil_price'], axis=1).resample(rule='M').mean()
    
    diff_masks = create_shift_diff_mask_dict(df_copy, n_buckets=10, shift=1)
    
    n = len(diff_masks['p0'][0])
    
    trans = create_transactions_array(diff_masks, n)
    
    trans_loc = [t+[f'loc_{loc}'] for t in trans]

    transactions_with_loc_id += trans_loc
    transactions_wo_loc_id += trans

In [471]:
n = len(transactions_wo_loc_id)
min_support = 0.02
min_support*n

9.0

In [476]:
rec = apriori(transactions_wo_loc_id, min_support=min_support, min_confidence=0.9, min_lift=2)
pprint_records(list(rec))

Items:  {'m10_el_price', 'm40_consumption_kvah'}
Support: 0.0311
BODY -> HEAD[Confidence, Lift]

	{'m40_consumption_kvah'}  ->  {'m10_el_price'}[1.00, 2.81]


Items:  {'p40_temperature', 'm10_el_price'}
Support: 0.0311
BODY -> HEAD[Confidence, Lift]

	{'p40_temperature'}  ->  {'m10_el_price'}[1.00, 2.81]


Items:  {'m40_el_price', 'p10_temperature'}
Support: 0.0400
BODY -> HEAD[Confidence, Lift]

	{'m40_el_price'}  ->  {'p10_temperature'}[0.95, 3.84]




In [477]:
rec = apriori(transactions_with_loc_id, min_support=min_support, min_confidence=0.9, min_lift=2)
pprint_records(list(rec))

Items:  {'m10_el_price', 'm40_consumption_kvah'}
Support: 0.0311
BODY -> HEAD[Confidence, Lift]

	{'m40_consumption_kvah'}  ->  {'m10_el_price'}[1.00, 2.81]


Items:  {'p40_temperature', 'm10_el_price'}
Support: 0.0311
BODY -> HEAD[Confidence, Lift]

	{'p40_temperature'}  ->  {'m10_el_price'}[1.00, 2.81]


Items:  {'m40_el_price', 'p10_temperature'}
Support: 0.0400
BODY -> HEAD[Confidence, Lift]

	{'m40_el_price'}  ->  {'p10_temperature'}[0.95, 3.84]




In [478]:
transactions_with_loc_id = []
transactions_wo_loc_id = []

for loc, df in d.items():
    df_copy = df.copy().drop(['loc_id', 'oil_price', 'temperature'], axis=1).resample(rule='W').mean()
    
    diff_masks = create_shift_diff_mask_dict(df_copy, n_buckets=20, shift=1)
    
    n = len(diff_masks['p0'][0])
    
    trans = create_transactions_array(diff_masks, n)
    
    trans_loc = [t+[f'loc_{loc}'] for t in trans]

    transactions_with_loc_id += trans_loc
    transactions_wo_loc_id += trans

In [493]:
n = len(transactions_wo_loc_id)
min_support = 0.01
min_support*n

19.57

In [494]:
n

1957

In [496]:
rec = apriori(transactions_with_loc_id, min_support=min_support, min_confidence=0.95, min_lift=1.5)
pprint_records(list(rec))

In [526]:
transactions_with_loc_id = []
transactions_wo_loc_id = []

for loc, df in d.items():
    df_copy = df.copy().drop(['loc_id', 'oil_price'], axis=1).resample(rule='W').mean()
    
    diff_masks = create_shift_diff_mask_dict(df_copy, n_buckets=10, shift=1)
    
    n = len(diff_masks['p0'][0])
    
    trans = create_transactions_array(diff_masks, n)
    
    trans_loc = [t+[f'loc_{loc}'] for t in trans]

    transactions_with_loc_id += trans_loc
    transactions_wo_loc_id += trans

In [527]:
n = len(transactions_with_loc_id)
n

1957

In [528]:
min_support = 0.01
min_support*n

19.57

In [531]:
rec = apriori(transactions_with_loc_id, min_support=min_support, min_confidence=0.6, min_lift=1.5)
pprint_records(list(rec))

Items:  {'p0_consumption_kvah', 'loc_1', 'p0_temperature'}
Support: 0.0169
BODY -> HEAD[Confidence, Lift]

	{'loc_1', 'p0_temperature'}  ->  {'p0_consumption_kvah'}[0.73, 1.52]


Items:  {'loc_17', 'p0_consumption_kvah', 'p0_temperature'}
Support: 0.0153
BODY -> HEAD[Confidence, Lift]

	{'loc_17', 'p0_consumption_kvah'}  ->  {'p0_temperature'}[0.60, 1.62]


	{'loc_17', 'p0_temperature'}  ->  {'p0_consumption_kvah'}[0.81, 1.68]


Items:  {'p0_consumption_kvah', 'p0_temperature', 'loc_8'}
Support: 0.0118
BODY -> HEAD[Confidence, Lift]

	{'p0_temperature', 'loc_8'}  ->  {'p0_consumption_kvah'}[0.74, 1.54]


Items:  {'p0_consumption_kvah', 'p0_el_price', 'loc_1', 'p0_temperature'}
Support: 0.0123
BODY -> HEAD[Confidence, Lift]

	{'p0_consumption_kvah', 'p0_el_price', 'loc_1'}  ->  {'p0_temperature'}[0.69, 1.85]


	{'p0_el_price', 'loc_1', 'p0_temperature'}  ->  {'p0_consumption_kvah'}[0.92, 1.91]


Items:  {'p0_consumption_kvah', 'p0_temperature', 'p0_el_price', 'loc_6'}
Support: 0.0102
BO

In [534]:
rec = apriori(transactions_wo_loc_id, min_support=min_support, min_confidence=0.6, min_lift=1.2)
pprint_records(list(rec))

Items:  {'p0_consumption_kvah', 'p0_temperature'}
Support: 0.2253
BODY -> HEAD[Confidence, Lift]

	{'p0_temperature'}  ->  {'p0_consumption_kvah'}[0.61, 1.26]


Items:  {'p0_consumption_kvah', 'p20_el_price'}
Support: 0.0215
BODY -> HEAD[Confidence, Lift]

	{'p20_el_price'}  ->  {'p0_consumption_kvah'}[0.63, 1.30]


Items:  {'p0_el_price', 'p0_temperature'}
Support: 0.2259
BODY -> HEAD[Confidence, Lift]

	{'p0_temperature'}  ->  {'p0_el_price'}[0.61, 1.24]


Items:  {'p0_consumption_kvah', 'p0_el_price', 'p0_temperature'}
Support: 0.1415
BODY -> HEAD[Confidence, Lift]

	{'p0_consumption_kvah', 'p0_temperature'}  ->  {'p0_el_price'}[0.63, 1.28]


	{'p0_el_price', 'p0_temperature'}  ->  {'p0_consumption_kvah'}[0.63, 1.30]


Items:  {'p0_el_price', 'p10_consumption_kvah', 'p0_temperature'}
Support: 0.0404
BODY -> HEAD[Confidence, Lift]

	{'p10_consumption_kvah', 'p0_temperature'}  ->  {'p0_el_price'}[0.61, 1.24]




In [541]:
transactions_with_loc_id = []
transactions_wo_loc_id = []

for loc, df in d.items():
    df_copy = df.copy().drop(['loc_id', 'oil_price', 'el_price'], axis=1).resample(rule='W').mean()
    
    diff_masks = create_shift_diff_mask_dict(df_copy, n_buckets=20, shift=1)
    
    n = len(diff_masks['p0'][0])
    
    trans = create_transactions_array(diff_masks, n)
    
    trans_loc = [t+[f'loc_{loc}'] for t in trans]

    transactions_with_loc_id += trans_loc
    transactions_wo_loc_id += trans

In [542]:
min_support = 0.01
min_support*len(transactions_wo_loc_id)

19.57

In [544]:
rec = apriori(transactions_wo_loc_id, min_support=min_support, min_confidence=0.6, min_lift=1.15)
pprint_records(list(rec))

In [551]:
transactions_with_loc_id = []
transactions_wo_loc_id = []

for loc, df in d.items():
    df_copy = df.copy().drop(['loc_id', 'consumption_kvah', 'oil_price'], axis=1).resample(rule='W').mean()
    
    diff_masks = create_shift_diff_mask_dict(df_copy, n_buckets=20, shift=1)
    
    n = len(diff_masks['p0'][0])
    
    trans = create_transactions_array(diff_masks, n)
    
    trans_loc = [t+[f'loc_{loc}'] for t in trans]

    transactions_with_loc_id += trans_loc
    transactions_wo_loc_id += trans

In [558]:
min_support = 0.005
min_support*len(transactions_wo_loc_id)

9.785

In [561]:
rec = apriori(transactions_wo_loc_id, min_support=min_support, min_confidence=0.6, min_lift=1.3)
pprint_records(list(rec))

Items:  {'m10_temperature', 'p25_el_price'}
Support: 0.0061
BODY -> HEAD[Confidence, Lift]

	{'p25_el_price'}  ->  {'m10_temperature'}[0.63, 4.43]


Items:  {'m25_temperature', 'p5_el_price'}
Support: 0.0066
BODY -> HEAD[Confidence, Lift]

	{'m25_temperature'}  ->  {'p5_el_price'}[0.72, 3.34]




In [562]:
transactions_with_loc_id = []
transactions_wo_loc_id = []

for loc, df in d.items():
    df_copy = df.copy().drop(['loc_id', 'consumption_kvah', 'oil_price'], axis=1).resample(rule='W').mean()
    
    diff_masks = create_shift_diff_mask_dict(df_copy, n_buckets=10, shift=1)
    
    n = len(diff_masks['p0'][0])
    
    trans = create_transactions_array(diff_masks, n)
    
    trans_loc = [t+[f'loc_{loc}'] for t in trans]

    transactions_with_loc_id += trans_loc
    transactions_wo_loc_id += trans

In [571]:
min_support = 0.005
min_support*len(transactions_wo_loc_id)

9.785

In [572]:
rec = apriori(transactions_wo_loc_id, min_support=min_support, min_confidence=0.6, min_lift=1.2)
pprint_records(list(rec))

Items:  {'p30_el_price', 'm10_temperature'}
Support: 0.0082
BODY -> HEAD[Confidence, Lift]

	{'p30_el_price'}  ->  {'m10_temperature'}[0.84, 3.23]


Items:  {'m30_el_price', 'p0_temperature'}
Support: 0.0082
BODY -> HEAD[Confidence, Lift]

	{'m30_el_price'}  ->  {'p0_temperature'}[0.89, 2.40]


Items:  {'m30_temperature', 'p0_el_price'}
Support: 0.0051
BODY -> HEAD[Confidence, Lift]

	{'m30_temperature'}  ->  {'p0_el_price'}[0.83, 1.69]


Items:  {'p0_el_price', 'p0_temperature'}
Support: 0.2259
BODY -> HEAD[Confidence, Lift]

	{'p0_temperature'}  ->  {'p0_el_price'}[0.61, 1.24]




In [573]:
transactions_with_loc_id = []
transactions_wo_loc_id = []

for loc, df in d.items():
    df_copy = df.copy().drop(['loc_id', 'consumption_kvah', 'temperature'], axis=1).resample(rule='W').mean()
    
    diff_masks = create_shift_diff_mask_dict(df_copy, n_buckets=10, shift=1)
    
    n = len(diff_masks['p0'][0])
    
    trans = create_transactions_array(diff_masks, n)
    
    trans_loc = [t+[f'loc_{loc}'] for t in trans]

    transactions_with_loc_id += trans_loc
    transactions_wo_loc_id += trans

In [574]:
min_support = 0.005
min_support*len(transactions_wo_loc_id)

9.785

In [575]:
rec = apriori(transactions_wo_loc_id, min_support=min_support, min_confidence=0.6, min_lift=1.2)
pprint_records(list(rec))

Items:  {'m30_el_price', 'p10_oil_price'}
Support: 0.0092
BODY -> HEAD[Confidence, Lift]

	{'m30_el_price'}  ->  {'p10_oil_price'}[1.00, 4.25]


Items:  {'p0_el_price', 'p20_oil_price'}
Support: 0.0169
BODY -> HEAD[Confidence, Lift]

	{'p20_oil_price'}  ->  {'p0_el_price'}[0.94, 1.92]


Items:  {'p0_oil_price', 'p30_el_price'}
Support: 0.0092
BODY -> HEAD[Confidence, Lift]

	{'p30_el_price'}  ->  {'p0_oil_price'}[0.95, 1.98]


Items:  {'p20_el_price', 'p10_oil_price'}
Support: 0.0240
BODY -> HEAD[Confidence, Lift]

	{'p20_el_price'}  ->  {'p10_oil_price'}[0.70, 2.98]




In [576]:
transactions_with_loc_id = []
transactions_wo_loc_id = []

for loc, df in d.items():
    df_copy = df.copy().drop(['loc_id', 'el_price', 'temperature'], axis=1).resample(rule='W').mean()
    
    diff_masks = create_shift_diff_mask_dict(df_copy, n_buckets=10, shift=1)
    
    n = len(diff_masks['p0'][0])
    
    trans = create_transactions_array(diff_masks, n)
    
    trans_loc = [t+[f'loc_{loc}'] for t in trans]

    transactions_with_loc_id += trans_loc
    transactions_wo_loc_id += trans

In [585]:
min_support = 0.005
min_support*len(transactions_wo_loc_id)

9.785

In [583]:
rec = apriori(transactions_wo_loc_id, min_support=min_support, min_confidence=0.6, min_lift=1.1)
pprint_records(list(rec))

In [584]:
transactions_with_loc_id = []
transactions_wo_loc_id = []

for loc, df in d.items():
    df_copy = df.copy().drop(['loc_id', 'oil_price', 'temperature'], axis=1).resample(rule='W').mean()
    
    diff_masks = create_shift_diff_mask_dict(df_copy, n_buckets=10, shift=1)
    
    n = len(diff_masks['p0'][0])
    
    trans = create_transactions_array(diff_masks, n)
    
    trans_loc = [t+[f'loc_{loc}'] for t in trans]

    transactions_with_loc_id += trans_loc
    transactions_wo_loc_id += trans

In [586]:
rec = apriori(transactions_wo_loc_id, min_support=min_support, min_confidence=0.6, min_lift=1.1)
pprint_records(list(rec))

Items:  {'m30_el_price', 'p0_consumption_kvah'}
Support: 0.0077
BODY -> HEAD[Confidence, Lift]

	{'m30_el_price'}  ->  {'p0_consumption_kvah'}[0.83, 1.73]


Items:  {'p0_consumption_kvah', 'p20_el_price'}
Support: 0.0215
BODY -> HEAD[Confidence, Lift]

	{'p20_el_price'}  ->  {'p0_consumption_kvah'}[0.63, 1.30]




In [591]:
transactions_with_loc_id = []
transactions_wo_loc_id = []

for loc, df in d.items():
    df_copy = df.copy().drop(['loc_id', 'oil_price', 'consumption_kvah'], axis=1).resample(rule='W').mean()
    
    diff_masks = create_shift_diff_mask_dict(df_copy, n_buckets=15, shift=1)
    
    n = len(diff_masks['p0'][0])
    
    trans = create_transactions_array(diff_masks, n)
    
    trans_loc = [t+[f'loc_{loc}'] for t in trans]

    transactions_with_loc_id += trans_loc
    transactions_wo_loc_id += trans

In [592]:
rec = apriori(transactions_wo_loc_id, min_support=min_support, min_confidence=0.6, min_lift=1.1)
pprint_records(list(rec))

Items:  {'m27_el_price', 'p0_temperature'}
Support: 0.0056
BODY -> HEAD[Confidence, Lift]

	{'m27_el_price'}  ->  {'p0_temperature'}[0.61, 2.32]


Items:  {'p7_el_price', 'm27_temperature'}
Support: 0.0082
BODY -> HEAD[Confidence, Lift]

	{'m27_temperature'}  ->  {'p7_el_price'}[0.84, 5.20]


Items:  {'m7_temperature', 'p27_el_price'}
Support: 0.0061
BODY -> HEAD[Confidence, Lift]

	{'p27_el_price'}  ->  {'m7_temperature'}[0.63, 2.71]




In [596]:
transactions_with_loc_id = []
transactions_wo_loc_id = []

for loc, df in d.items():
    df_copy = df.copy().drop(['loc_id', 'oil_price', 'el_price'], axis=1).resample(rule='W').mean()
    
    diff_masks = create_shift_diff_mask_dict(df_copy, n_buckets=10, shift=1)
    
    n = len(diff_masks['p0'][0])
    
    trans = create_transactions_array(diff_masks, n)
    
    trans_loc = [t+[f'loc_{loc}'] for t in trans]

    transactions_with_loc_id += trans_loc
    transactions_wo_loc_id += trans

In [597]:
rec = apriori(transactions_with_loc_id, min_support=min_support, min_confidence=0.6, min_lift=1.1)
pprint_records(list(rec))

Items:  {'p0_consumption_kvah', 'p0_temperature'}
Support: 0.2253
BODY -> HEAD[Confidence, Lift]

	{'p0_temperature'}  ->  {'p0_consumption_kvah'}[0.61, 1.26]


Items:  {'p0_consumption_kvah', 'm10_temperature', 'loc_0'}
Support: 0.0107
BODY -> HEAD[Confidence, Lift]

	{'m10_temperature', 'loc_0'}  ->  {'p0_consumption_kvah'}[0.62, 1.28]


Items:  {'p0_consumption_kvah', 'loc_0', 'p0_temperature'}
Support: 0.0123
BODY -> HEAD[Confidence, Lift]

	{'loc_0', 'p0_temperature'}  ->  {'p0_consumption_kvah'}[0.67, 1.38]


Items:  {'p0_consumption_kvah', 'loc_1', 'p0_temperature'}
Support: 0.0169
BODY -> HEAD[Confidence, Lift]

	{'loc_1', 'p0_temperature'}  ->  {'p0_consumption_kvah'}[0.73, 1.52]


Items:  {'p0_consumption_kvah', 'loc_15', 'p0_temperature'}
Support: 0.0077
BODY -> HEAD[Confidence, Lift]

	{'loc_15', 'p0_temperature'}  ->  {'p0_consumption_kvah'}[0.65, 1.35]


Items:  {'p0_consumption_kvah', 'm10_temperature', 'loc_16'}
Support: 0.0087
BODY -> HEAD[Confidence, Lift]

	{'m10_tem

In [626]:
transactions_with_loc_id = []
transactions_wo_loc_id = []

for loc, df in d.items():
    df_copy = df.copy().drop(['loc_id', 'oil_price', 'el_price'], axis=1).resample(rule='M').mean()
    
    diff_masks = create_shift_diff_mask_dict(df_copy, n_buckets=10, shift=1)
    
    n = len(diff_masks['p0'][0])
    
    trans = create_transactions_array(diff_masks, n)
    
    trans_loc = [t+[f'loc_{loc}'] for t in trans]

    transactions_with_loc_id += trans_loc
    transactions_wo_loc_id += trans

# Good

In [627]:
min_support = 0.05
min_support*len(transactions_wo_loc_id)

22.5

In [628]:
rec = apriori(transactions_with_loc_id, min_support=min_support, min_confidence=0.6, min_lift=1.1)
pprint_records(list(rec))

Items:  {'m20_temperature', 'p20_consumption_kvah'}
Support: 0.0711
BODY -> HEAD[Confidence, Lift]

	{'p20_consumption_kvah'}  ->  {'m20_temperature'}[0.67, 3.06]


Items:  {'m20_temperature', 'p30_consumption_kvah'}
Support: 0.0533
BODY -> HEAD[Confidence, Lift]

	{'p30_consumption_kvah'}  ->  {'m20_temperature'}[0.65, 2.98]




In [629]:
transactions_with_loc_id = []
transactions_wo_loc_id = []

for loc, df in d.items():
    df_copy = df.copy().drop(['loc_id', 'oil_price'], axis=1).resample(rule='M').mean()
    
    diff_masks = create_shift_diff_mask_dict(df_copy, n_buckets=10, shift=1)
    
    n = len(diff_masks['p0'][0])
    
    trans = create_transactions_array(diff_masks, n)
    
    trans_loc = [t+[f'loc_{loc}'] for t in trans]

    transactions_with_loc_id += trans_loc
    transactions_wo_loc_id += trans

In [630]:
rec = apriori(transactions_with_loc_id, min_support=min_support, min_confidence=0.6, min_lift=1.1)
pprint_records(list(rec))

Items:  {'m20_temperature', 'p20_consumption_kvah'}
Support: 0.0711
BODY -> HEAD[Confidence, Lift]

	{'p20_consumption_kvah'}  ->  {'m20_temperature'}[0.67, 3.06]


Items:  {'m20_temperature', 'p30_consumption_kvah'}
Support: 0.0533
BODY -> HEAD[Confidence, Lift]

	{'p30_consumption_kvah'}  ->  {'m20_temperature'}[0.65, 2.98]




In [635]:
min_support = 0.02
min_support*len(transactions_wo_loc_id)

9.0

In [636]:
rec = apriori(transactions_with_loc_id, min_support=min_support, min_confidence=0.6, min_lift=1.1)
pprint_records(list(rec))

Items:  {'m25_el_price', 'p20_temperature'}
Support: 0.0244
BODY -> HEAD[Confidence, Lift]

	{'p20_temperature'}  ->  {'m25_el_price'}[0.65, 14.56]




In [637]:
min_support = 0.01
min_support*len(transactions_wo_loc_id)

4.5

In [640]:
rec = apriori(transactions_with_loc_id, min_support=min_support, min_confidence=0.8, min_lift=1.1)
pprint_records(list(rec))

Items:  {'m15_el_price', 'p40_temperature'}
Support: 0.0178
BODY -> HEAD[Confidence, Lift]

	{'p40_temperature'}  ->  {'m15_el_price'}[0.89, 4.49]


Items:  {'m15_el_price', 'm15_consumption_kvah', 'p5_temperature'}
Support: 0.0111
BODY -> HEAD[Confidence, Lift]

	{'m15_el_price', 'm15_consumption_kvah'}  ->  {'p5_temperature'}[0.83, 8.93]


Items:  {'m15_consumption_kvah', 'm20_el_price', 'p15_temperature'}
Support: 0.0133
BODY -> HEAD[Confidence, Lift]

	{'m15_consumption_kvah', 'm20_el_price'}  ->  {'p15_temperature'}[0.86, 7.56]


Items:  {'p10_temperature', 'm20_el_price', 'm20_consumption_kvah'}
Support: 0.0111
BODY -> HEAD[Confidence, Lift]

	{'m20_el_price', 'm20_consumption_kvah'}  ->  {'p10_temperature'}[0.83, 6.25]


Items:  {'p10_el_price', 'p5_temperature', 'm5_consumption_kvah'}
Support: 0.0111
BODY -> HEAD[Confidence, Lift]

	{'p10_el_price', 'p5_temperature'}  ->  {'m5_consumption_kvah'}[1.00, 10.98]




In [608]:
k_support = 100

In [None]:
patterns = pyfpgrowth.find_frequent_patterns(transactions, k_support)
rules = pyfpgrowth.generate_association_rules(patterns, 0.9)

In [89]:
min_support = 0.005
min_support*n

61.625

In [95]:
rec = apriori(transactions, min_support=min_support, min_confidence=0.7, min_lift=1.5)
pprint_records(list(rec))

Items:  {'lw_p0_el_price', 'lw_p40_consumption_kvah', 'lw_m40_temperature'}
Support: 0.0062
BODY -> HEAD[Confidence, Lift]

	{'lw_p40_consumption_kvah', 'lw_m40_temperature'}  ->  {'lw_p0_el_price'}[0.71, 1.97]


