## TEMPORAL SET PREDICTION USING GRAPH BASED APPROACHES.

This is the accompanying notebook to the Medium Blog entitled ["'Don't forget the milk again!': Predicting temporal shopping sets using Graph Neural Networks"](https://medium.com/@wwwidonja/dont-forget-the-milk-again-adc8924fdbe1). It was prepared as part of the Stanford CS224W course project @UL FRI; 2021/22 by Sara Bizjak, Maruša Oražem and Vid Stropnik. While this notebook is meant to be self-sufficient, it will be best experienced by concurrently reading the accompanying blog, linked above. Here, a more robust and theorethical overview of the model and problem at hand will be given, while the blog contextualizes the intent of the model more thoroughly and also gives a good introduction into the theory used here.

In [1]:
import torch
import torch.nn as nn
from torch_geometric.nn import GCNConv
import os
import pandas as pd

In [3]:
all_unique_items = [i for i in list(pd.read_csv(os.path.join('..\\data\\', 'transaction_data_smaller.csv')).PRODUCT_ID.unique())]
num_all_unique_items = len(all_unique_items)
reverse_uid = {str(item_code) : idx for idx, item_code in enumerate(all_unique_items)}

We are going to construct a convolutions on dynamic graphs. 
Input for this module is a sequence of dynamic graphs $\mathbb{G}_i = \{\mathcal{G}_i^1,...\mathcal{G}_i^T\}$, where graph $\mathcal{G}_i^t \in \mathbb{G}_i$ has a sequene of elements represented as $\{e_{i,j}^t \in \mathbb{R}^F, \forall v_{i,j} \in \mathcal{V}_i\}$. (F is the dimention of element representation equal to `in_features` and *i* is the considered household).

For each graph $\mathcal{G}_i$ the output of this modelue is a new sequence representation, which we will denote as  $\{c_{i,j}^t \in \mathbb{R}^{F'}, \forall v_{i,j} \in \mathcal{V}_i\}$. (F' is the new dimension equal to `out_features`).

To reduce the parameter scale and also make our method flexible to deal with sequences with variable lengths, a parameter sharing strategy is adopted. The weighted convolutions are implemented by propagating information of elements in each dynamic graphs as follows. For graph $\mathcal{G}_i$
$$c_{i,j}^{t,l+1} = \sigma\left( b^l + \sum_{k \in N_{i,j}^t \cup \{j\}}   A_i^t[j,k] \cdot \left( W^t c_{i,k}^{t,l} \right) \right),$$ where $A_i^t[j,k]$ represents the item in j-th row and k-th column of matrix $A_i^t$, which is the edge weight of $v_{i,j}$ and $v_{i,k}$ in graph $\mathcal{G}_i^t$.

We are going to override the `nn.Module` for constructing our convolutional layer.

**Convolutional layer**
For the convolutions, we're going to use the `GCNConv` layer from the PyG library. The convolutions are realized as follows:

$$\mathbf{X}^{\prime} = \mathbf{\hat{D}}^{-1/2} \mathbf{\hat{A}}
\mathbf{\hat{D}}^{-1/2} \mathbf{X} \mathbf{\Theta}$$, where $\mathbf{\hat{A}} = \mathbf{A + I}$ is the adjacency matrix of a graph with inserted self-loops, and $\mathbf{\hat{D}}$ is its diagonal degree matrix.

PyG makes the use of convolutions simple by simpy asking us to input the node feature tensor of shape `[num_of_nodes, num_of_features]` and its Sparse transposed adjecency matrix `adj_t`, which takes into account the weights in our graphs.

Here are some other terms needed to understand the following code:


`nn.ModuleList()` - Holds submodules in a list. <br>
`nn.ReLU()` - Applies the rectified linear unit function element-wise: ReLU(x) = max(0,x) <br>
`nn.BatchNorm1d` - Applies Batch Normalization over a 2D or 3D input. $y=\frac{x-E[x]}{\sqrt{var[x]+\epsilon}} \cdot \gamma + \beta$, The mean and standard-deviation are calculated per-dimension over the mini-batches and \gammaγ and \betaβ are learnable parameter vectors of size C (where C is the input size).

In [4]:
class weighted_GCN(nn.Module):
    def __init__(self, in_features, hidden_sizes, out_features):
        '''
        :param in_features: int, number of input features
        :param hidden_sizes: List[int], list of integers of hidden sizes
        :param out_features: int, number of output features
        '''
        super(weighted_GCN, self).__init__()
        # we are going to use 3 layers, first graph conv we wrote before, ReLu function and normalization
        gcns, relus, bns = nn.ModuleList(), nn.ModuleList(), nn.ModuleList()
        
        # layers for hidden_size
        input_size = in_features
        for hidden_size in hidden_sizes:
            # go through all the layers and call all three functions
            gcns.append(GCNConv(in_channels=input_size, 
                            out_channels=hidden_size,
                            improved=False,
                            cached=False,
                            add_self_loops=False,
                            normalize=False,
                            bias=False)) 
            relus.append(nn.ReLU())
            bns.append(nn.BatchNorm1d(hidden_size))
            input_size = hidden_size # next layer start size will be output from one layer before
        
        # output layer
        gcns.append(GCNConv(in_channels=hidden_sizes[-1], 
                            out_channels=out_features,
                            improved=False,
                            cached=False,
                            add_self_loops=False,
                            normalize=False,
                            bias=False
                            )
                   )
        relus.append(nn.ReLU())
        bns.append(nn.BatchNorm1d(out_features))
        self.gcns, self.relus, self.bns = gcns, relus, bns

    def forward(self, x, adj_t):
        """
        :param graph: dgl.DGLGraph
        :param node_features: torch.Tensor shape (n_1+n_2+..., n_features)
               edges_weight: torch.Tensor shape (T, n_1^2+n_2^2+...)
        :return:
        """
        h = x
        for gcn, relu, bn in zip(self.gcns, self.relus, self.bns):
            
            #run the Convolutional layer
            h = gcn(h, adj_t)
            #run the batch norm
            h = bn(h.transpose(1, -1)).transpose(1, -1)
            #run the ReLu
            h = relu(h)
        return h

In [5]:
import os
from os import path
import os.path as osp
import networkx as nx
import torch_geometric
from torch_geometric.data import Data
from torch_geometric.utils import erdos_renyi_graph, to_networkx, from_networkx
import torch_geometric.transforms as T
import torch_sparse
from torch_geometric.data import InMemoryDataset, download_url
from tqdm import tqdm
import pickle

f1 = 32  ## F'
hidden_dims = [32, 32]

shopping_per_hh = {}

#This is just a test -- we're only constructing the graphs for houshold id 22 as a proof of concept!
print('Creating graphs from files')
if not path.exists(path.join("..\\data\\pickles", f"shopping_per_hh_F1_{f1}_hid_{hidden_dims}.pkl.gz")):
    for filename in tqdm(os.listdir("../data/Test-Graphs/content/Graphs/")):
        splits = filename.split('_')
        hh_id = splits[0]
        if hh_id not in shopping_per_hh: shopping_per_hh[hh_id] = []


        ## we construct a NX graph and cast it to pytorch.data.Data
        G = nx.Graph(nx.read_pajek(os.path.join("../data/Test-Graphs/content/Graphs/",filename)))
        data = from_networkx(G)
        
        articles_in_basket = [i for i in list(G.nodes()) if i not in list(nx.isolates(G))] # vsi izdelki ki so v kosarici
        
        gt = torch.zeros(num_all_unique_items)
        indices_in_E = [reverse_uid[i] for i in articles_in_basket]
        gt[indices_in_E] = 1
        gt = gt.to_sparse()

        ## Then, we override the data.x in data to get the desired format of the dimensions.
        ## We're just using a vector of ones here. We can chamge this in the long run to get more expressivness.
        x = torch.ones(G.number_of_nodes(), 1)
        data.x = x
        data.id = {i:code for (i, code) in zip ([i for i in range(G.number_of_nodes())], list(G.nodes()))}
        data.y = gt

        shopping_per_hh[hh_id].append(data)
    
    with open(path.join("..\\data\\pickles", f"shopping_per_hh_F1_{f1}_hid_{hidden_dims}.pkl.gz"), "wb") as f:
        pickle.dump(shopping_per_hh, f)

else:
    with open(path.join("..\\data\\pickles", f"shopping_per_hh_F1_{f1}_hid_{hidden_dims}.pkl.gz"), "rb") as f:
        shopping_per_hh = pickle.load(f)

Creating graphs from files


In [6]:
final_tensors = {}
print('Running models and converting them to tensors')
if not path.exists(path.join("..\\data\\pickles", f"final_tensors_F1_{f1}_hid_{hidden_dims}.pkl.gz")): 
    for hh in tqdm(list(shopping_per_hh.keys())):
        
        """ Check why we need this try/catch block -- if it causes problems, come here and try to fix it."""
        
        try:   
            in_dims = shopping_per_hh[hh][0].num_features
            model = weighted_GCN(in_dims, 
                                 hidden_dims, 
                                 f1)

            embeddings_at_t = []
            ## iterate over all graphs for a givn household
            for i in range(len(shopping_per_hh[hh])):
                graph = shopping_per_hh[hh][i]
                o = model(graph.x,graph.edge_index)
                embeddings_at_t.append(o)

            ## initialize a dictionary of lists for each item purchased by this household at a shop
            item_embeddings = {j : [] for j in range(len(embeddings_at_t[0]))}
            for t in range(len(embeddings_at_t)):
                for j in range(len(embeddings_at_t[t])):
                    ## and add the embeddings for each item to its corresponding temporal index t in the newly created list
                    item_embeddings[j].append(embeddings_at_t[t][j].tolist())

            ## convert the final 3D array to a tensor and save it to the dictionary for further use.
            final_tensors[hh] = torch.tensor(list(item_embeddings.values()))
        except ValueError: continue
    with open(path.join("..\\data\\pickles", f"final_tensors_F1_{f1}_hid_{hidden_dims}.pkl.gz"), "wb") as f:
        pickle.dump(final_tensors, f)
else:
    with open(path.join("..\\data\\pickles", f"final_tensors_F1_{f1}_hid_{hidden_dims}.pkl.gz"), "rb") as f:
        final_tensors = pickle.load(f)

Running models and converting them to tensors


## Masked Self Attention

In [7]:
import numpy as np
class masked_self_attention_origi(nn.Module):

    def __init__(self, input_dim, output_dim, n_heads=4):
        super(masked_self_attention_origi, self).__init__()
        self.input_dim = input_dim
        self.output_dim = output_dim

        self.n_heads = n_heads

        self.per_head_dim = output_dim // n_heads
        # inicialization of the weights as described above in the text
        self.Wq = nn.Linear(input_dim, n_heads * self.per_head_dim, bias=False)
        self.Wk = nn.Linear(input_dim, n_heads * self.per_head_dim, bias=False)
        self.Wv = nn.Linear(input_dim, n_heads * self.per_head_dim, bias=False)

    def forward(self, input_tensor):

        """
        Args:
            input_tensor: tensor, shape (nodes_num, T_max, features_num)
        Returns:
            output: tensor, shape (nodes_num, T_max, output_dim = features_num)
        """
        
        seq_length = input_tensor.shape[1]
        # tensor, shape (nodes_num, T_max, n_heads * dim_per_head)
        Q = self.Wq(input_tensor)
        K = self.Wk(input_tensor)
        V = self.Wv(input_tensor)
        
        """
        TODO: Figure out these transposes/reshapes/permutes (and explain/make them prettier if possible)
        """
        
        # multi_head attention
        # Q, tensor, shape (nodes_num, n_heads, T_max, dim_per_head)
        Q = Q.reshape(input_tensor.shape[0], input_tensor.shape[1], self.n_heads, self.per_head_dim).transpose(1, 2)
        # K after transpose, tensor, shape (nodes_num, n_heads, dim_per_head, T_max)
        K = K.reshape(input_tensor.shape[0], input_tensor.shape[1], self.n_heads, self.per_head_dim).permute(0, 2, 3, 1)
        # V, tensor, shape (nodes_num, n_heads, T_max, dim_per_head)
        V = V.reshape(input_tensor.shape[0], input_tensor.shape[1], self.n_heads, self.per_head_dim).transpose(1, 2)

        # scaled attention_score, tensor, shape (nodes_num, n_heads, T_max, T_max)
        attention_score = Q.matmul(K) / np.sqrt(self.per_head_dim)

        # attention_mask, tensor, shape -> (T_max, T_max)  -inf in the top and right
        attention_mask = torch.zeros(seq_length, seq_length).masked_fill(
            torch.tril(torch.ones(seq_length, seq_length)) == 0, -np.inf)

        
        
        # attention_mask will be broadcast to (nodes_num, n_heads, T_max, T_max)
        attention_score = attention_score + attention_mask
        
        
        # (nodes_num, n_heads, T_max, T_max)
        attention_score = torch.softmax(attention_score, dim=-1)

        # multi_result, tensor, shape (nodes_num, n_heads, T_max, dim_per_head)
        multi_head_result = attention_score.matmul(V)
        # multi_result, tensor, shape (nodes_num, T_max, n_heads * dim_per_head = output_dim)
        # concat multi-head attention results
        output = multi_head_result.transpose(1, 2).reshape(input_tensor.shape[0],
                                                           seq_length, self.n_heads * self.per_head_dim)
        
        return output

In [8]:
class aggregate_nodes_temporal_feature_origi(nn.Module):

    def __init__(self, item_embed_dim):
        
        """
        :param item_embed_dim: the dimension of input features
        """
        
        super(aggregate_nodes_temporal_feature_origi, self).__init__()

        self.Wq = nn.Linear(item_embed_dim, 1, bias=False)

    def forward(self, Z):
        ### Equation 4 in the paper
        
        
        """
        TODO: CHECK IF THESE TRANSPOSES ARE OK
        
        """
        output = self.Wq(Z).transpose(1,2).matmul(Z).transpose(1,2)
        return output

Get the self attention masked tensors for every household and pickle them.

In [9]:
f2 = 32 ## F''
"""
###TODO - check if masked_self_attention_origi is working OK, since it only returns a tensor of dim=f2 if f2%4==0
PS: It's not. Must fix this!!!!!
"""

attention_tensors = {}
if not path.exists(path.join("..\\data\\pickles", f"attention_tensors_F1_{f1}_F2_{f2}.pkl.gz")): 
    for hh in tqdm(final_tensors.keys()):
        tens = final_tensors[hh]
        tens = tens
        model1 = masked_self_attention_origi(input_dim=f1, output_dim=f2)
        model1 = model1
        o = model1(tens)
        model_2 = aggregate_nodes_temporal_feature_origi(item_embed_dim=f2)
        o2 = model_2(o)
        attention_tensors[hh] = o2[:,:,0]
    with open(path.join("..\\data\\pickles", f"attention_tensors_F1_{f1}_F2_{f2}.pkl.gz"), "wb") as f:
        pickle.dump(attention_tensors, f)
else:
    with open(path.join("..\\data\\pickles", f"attention_tensors_F1_{f1}_F2_{f2}.pkl.gz"), "rb") as f:
        attention_tensors = pickle.load(f)
        


## Gated Information Fusing

In [10]:
class global_gated_update(nn.Module):
    ### num_all_unique_items, f0
    def __init__(self, items_total, f0, item_dict):
        super(global_gated_update, self).__init__()
        
        self.num_items_total = items_total
        self.embedding_dim = f0
        self.E = torch.randn((self.num_items_total, self.embedding_dim))
        self.gamma = nn.Parameter(torch.rand(self.num_items_total, 1), requires_grad=True)
        self.item_dict = {int(i):item_dict[i] for i in item_dict}
        
    def forward(self, ids, Z, f2):
        num_nodes = len(ids)
        beta = torch.zeros(self.num_items_total, 1)
        ### masking
        nodes_in_graph = ids
        rows_in_E = [self.item_dict[code.item()] for code in nodes_in_graph]

        beta[rows_in_E] = 1
        ### update
        E_clone = self.E.clone()
        ei_update = (1 - beta * self.gamma) * E_clone
        #embed[output_nodes, :] = embed[output_nodes, :] + self.gamma[output_nodes] * output_node_features
        #print(self.gamma[rows_in_E] * Z)
        ei_update[rows_in_E, :] = ei_update[rows_in_E, :] + self.gamma[rows_in_E] * Z        
        return ei_update       

## This might not work if the initial graphs don't have identical orderings (.ids) 
> <font color="red">Yes it does, everything is OK :)</font>

In [11]:
f0 = f2
model_fuse = global_gated_update(num_all_unique_items, f0, {i : reverse_uid[i] for i in reverse_uid.keys()})

In [12]:
E_updates = {}
if not path.exists(path.join("..\\data\\pickles", f"E_updates_F1_{f1}_F2_{f2}.pkl.gz")): 
    for hh in tqdm(attention_tensors.keys()):
        Z = attention_tensors[hh]
        ids = list(shopping_per_hh[hh][0].id.values())
        E_update = model_fuse(ids, Z, f2)
        E_updates[hh] = E_update
    with open(path.join("..\\data\\pickles", f"E_updates_F1_{f1}_F2_{f2}.pkl.gz"), "wb") as f:
            pickle.dump(E_updates, f)
else:
    with open(path.join("..\\data\\pickles", f"E_updates_F1_{f1}_F2_{f2}.pkl.gz"), "rb") as f:
        E_updates = pickle.load(f)

  0%|          | 0/1590 [00:00<?, ?it/s]


AttributeError: 'str' object has no attribute 'item'

In [13]:
class temporal_set_prediction(nn.Module):
    def __init__(self, items_total, item_embedding_dim, reverse_uid):
        """
        :param items_total: int
        :param item_embedding_dim: int
        :param n_heads: int
        :param attention_aggregate: sre
        """
        super(temporal_set_prediction, self).__init__()

        ### To je njegov f0
        self.item_embedding_dim = item_embedding_dim
        
        self.reverse_uid = reverse_uid
        ## to je njegov num_all_unique_items
        self.items_total = items_total
        
        
        self.our_gcn = weighted_GCN(1, [self.item_embedding_dim, self.item_embedding_dim], self.item_embedding_dim)
        
        """
        self.stacked_gcn = stacked_weighted_GCN_blocks([weighted_GCN(item_embedding_dim,
                                                                     [item_embedding_dim],
                                                                     item_embedding_dim)])
        """

        self.masked_self_attention = masked_self_attention_origi(input_dim=self.item_embedding_dim,
                                                           output_dim=self.item_embedding_dim)

        self.aggregate_nodes_temporal_feature = aggregate_nodes_temporal_feature_origi(self.item_embedding_dim)


        
        #
        #(num_all_unique_items, f0, reverse_uid
        #
        self.global_gated_update = global_gated_update(items_total=self.items_total,
                                                       f0=self.item_embedding_dim,
                                                       item_dict=self.reverse_uid)

        self.fc_output = nn.Sequential(nn.Linear(self.item_embedding_dim, 1, bias=True),
                                       nn.Sigmoid())


    
    def forward(self, graph_list_for_hh, hh_ids):
        embeddings_at_t = []
        for graph in graph_list_for_hh:
            o = self.our_gcn(graph.x,graph.edge_index)
            embeddings_at_t.append(o)
        item_embeddings = {j : [] for j in range(len(embeddings_at_t[0]))}
        for t in range(len(embeddings_at_t)):
            for j in range(len(embeddings_at_t[t])):
                ## and add the embeddings for each item to its corresponding temporal index t in the newly created list
                item_embeddings[j].append(embeddings_at_t[t][j].tolist())

        ## convert the final 3D array to a tensor and save it to the dictionary for further use.
        h = torch.tensor(list(item_embeddings.values()))

        h = self.masked_self_attention(h)
        h = self.aggregate_nodes_temporal_feature(h)
        h = h[:,:,0]
        #ids = torch.tensor([i[0] for i in list(graph_list_for_hh[0].id.values())])
        h = self.global_gated_update(hh_ids, h, self.item_embedding_dim)
        out = self.fc_output(h).squeeze(dim=-1)
        return out

In [14]:
def train(model, ids, train_list, test_data, optimizer, loss_fn):
    # TODO: Implement a function that trains the model by 
    # using the given optimizer and loss_fn.
    model.train()

    ############# Your code here ############
    ## Note:
    ## 1. Zero grad the optimizer
    optimizer.zero_grad()
    ## 2. Feed the data into the model
    out = model(train_list, ids)
    ## 3., 4. Slice the model output and label by train_idx & feed them to loss
    
    ## this was used with a train/test split
    #loss = loss_fn(out[train_idx], data.y[train_idx][:,0])
    #print(test_data)
    loss = loss_fn(out, test_data.y.to_dense())
    #########################################

    loss.backward()
    optimizer.step()

    return loss.item()

In [15]:
"""
def evaluate(model, loader, loss_fn, evaluator, typ='Validation'):
    model.eval()
    y_true = []
    y_pred = []

    for batch in tqdm(loader, desc=f'Evaluation on {typ} set'):
        with torch.no_grad():
            train_list = batch[:-1]
            ids = torch.tensor([int(i[0]) for i in list(train_list[0].id.values())])
            pred = model(train_list, ids)
        y_true.append(list(batch[-1].y.to_dense()))
        y_pred.append(list(pred))

    y_true = torch.tensor(y_true).type(torch.LongTensor)
    y_pred = torch.tensor(y_pred)
    print()
    
    return {'Recall@K' : evaluator(y_pred, y_true)}
"""
def MakeEvaluatorRatK(K):

    macro_K = K
    def RecallAtK(pred, truth):
        p_idx = torch.argsort(pred, descending=True)[:macro_K]
        t_idx = truth.nonzero()
        num_of_elements = sum(el in p_idx for el in t_idx)
        RatK = num_of_elements / len(t_idx)
        return RatK

    return RecallAtK




def evaluate2(model, ids, train_list, test_data, evaluator):
    model.eval()
    with torch.no_grad():
        pred = model(train_list, ids)

    truth = test_data.y.to_dense()
    RatK = evaluator(pred, truth)

    return RatK

def eval_loss(model, ids, train_list, test_data, loss_fn):
    model.eval()
    out = model(train_list, ids)
    loss = loss_fn(out, test_data.y.to_dense())
    return loss.item()


In [16]:
from torchmetrics import Recall

ev = Recall(mdmc_average = 'global', average='samples', top_k=2)
yt = torch.tensor([[1, 0, 0, 0], [1, 0, 0, 0], [1, 0, 0, 0]])
yp = torch.tensor([[0.2, 0.3, 0.4, 0.1], [1, 0.2, 0, 0], [0.8, 1, 0.2, 0]])
yt.type()

'torch.LongTensor'

In [17]:
model_solo = temporal_set_prediction(num_all_unique_items, f0, reverse_uid)
#model.reset_parameters()
optimizer_solo = torch.optim.Adam(model_solo.parameters(), lr=0.005, weight_decay=1e-5)
## todo
loss_fn = nn.BCELoss()
epoch_num = 30
"""
for epoch in range(1, 1 + epoch_num):
    loss = train(model_solo, shopping_per_hh["1000"][:-1], shopping_per_hh["1000"][-1], optimizer_solo, loss_fn)
    if epoch%5 == 0:
        print(f'epoch : {epoch}/{epoch_num}, loss: ', loss)
"""


'\nfor epoch in range(1, 1 + epoch_num):\n    loss = train(model_solo, shopping_per_hh["1000"][:-1], shopping_per_hh["1000"][-1], optimizer_solo, loss_fn)\n    if epoch%5 == 0:\n        print(f\'epoch : {epoch}/{epoch_num}, loss: \', loss)\n'

### THIS IS THE MAIN CELL

In [18]:
from torch_geometric.loader import DataLoader
from sklearn.model_selection import train_test_split, TimeSeriesSplit
import copy
from torchmetrics import Recall
import statistics as st
f0 = 32


model = temporal_set_prediction(num_all_unique_items, f0, reverse_uid)

## make the dictionary into a list that we'll be train-test splitting over
list_of_dats = [shopping_per_hh[hh] for hh in shopping_per_hh]

### filter the list so that we only have households with at least 5 observations
at_least_5 = [i for i in list_of_dats if len(i)>=5]

## get the (joint) training data and the test data.
training_data, test_idx = train_test_split(
    at_least_5, test_size=0.2, random_state=42, shuffle=False)

## Put the test data in its data loader.
test_loader = DataLoader(test_idx)

### initialize the loss, evaluators, optimizer and rolling window splitter
### that we're gonna us during training.
loss_fn = nn.BCELoss()
KfortopK = 10
#evaluator = Recall(mdmc_average = 'global', average='samples', top_k=KfortopK)

#evaluator2 = Recall(mdmc_average = 'global', average='samples', top_k=KfortopK)
rolling_window_splitter = TimeSeriesSplit(test_size=1, gap=0, n_splits = 4)


evaluator3 = MakeEvaluatorRatK(KfortopK)
optimizer = torch.optim.Adam(model.parameters(), lr=0.0005, weight_decay=1e-5)


## specify the number of epochs to train for
n_epochs = 50

best_model = None
best_valid_loss= 0
best_valid_epoch = 0

###initialize the model


for epoch in range(n_epochs):
    num_skipped = 0
    ## On each iteration, we change which data we use for validation.
    ## Then, wee put them into loaders
    train_idx, valid_idx= train_test_split(
        training_data, test_size=0.15, random_state=200, shuffle=True)




    tl = DataLoader(train_idx)
    vl = DataLoader(valid_idx)
    loss = 0


    for batch in tqdm(tl, desc='Training Households'):
        # we get 4 splits from our household (we learn from each household multiple times).
        ids = torch.tensor([int(i[0]) for i in list(batch[-1].id.values())])
        loss += train(model, ids, batch[:-1], batch[-1], optimizer, loss_fn)
        for learn_from, target_in_list in rolling_window_splitter.split(batch):

                target = batch[target_in_list[0]]
                lrnd = [batch[i] for i in learn_from]
                loss += train(model, ids, lrnd, target, optimizer, loss_fn)




    #valid_results = evaluate(model, vl, loss_fn, evaluator)
    #test_results = evaluate(model, test_loader, loss_fn, evaluator, typ='Testing')

    valid_loss = 0
    test_loss = 0
    for batch in tqdm(vl, desc='Calculating Validation R@K'):
        # we get 4 splits from our household (we learn from each household multiple times).
        #ids = torch.tensor([int(i[0]) for i in list(batch[-1].id.values())])
        #valid_results.append(evaluate2(model, ids, batch[:-1], batch[-1], evaluator3))
        ids = torch.tensor([int(i[0]) for i in list(batch[-1].id.values())])
        valid_loss += eval_loss(model, ids, batch[:-1], batch[-1], loss_fn)



    for batch in tqdm(test_loader, desc='Calculating Test R@K'):        # we get 4 splits from our household (we learn from each household multiple times).
        ids = torch.tensor([int(i[0]) for i in list(batch[-1].id.values())])
        test_loss += eval_loss(model, ids, batch[:-1], batch[-1], loss_fn)


    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        best_model = copy.deepcopy(model)
        best_valid_epoch = epoch

    # todo - this loss isn't the actual loss.
    print(f'Epoch: {epoch:02d}, '
      f'Train: {loss:.4f},   '
      f'Valid: {valid_loss:.2f}'
      f'Test: {test_loss:.2f} '
          f' num_skipped: {num_skipped}')

    if epoch-best_valid_epoch > 3:
        print('Early Stopping.')
        break

Training Households: 100%|██████████| 537/537 [01:52<00:00,  4.76it/s]
Calculating Validation R@K: 100%|██████████| 95/95 [00:06<00:00, 13.77it/s]
Calculating Test R@K: 100%|██████████| 159/159 [00:09<00:00, 16.58it/s]


Epoch: 00, Train: 1188.5875,   Valid: 26.33Test: 43.31  num_skipped: 0


Training Households: 100%|██████████| 537/537 [01:49<00:00,  4.90it/s]
Calculating Validation R@K: 100%|██████████| 95/95 [00:06<00:00, 13.98it/s]
Calculating Test R@K: 100%|██████████| 159/159 [00:09<00:00, 16.76it/s]


Epoch: 01, Train: 477.9441,   Valid: 15.41Test: 24.37  num_skipped: 0


Training Households: 100%|██████████| 537/537 [01:49<00:00,  4.90it/s]
Calculating Validation R@K: 100%|██████████| 95/95 [00:06<00:00, 13.72it/s]
Calculating Test R@K: 100%|██████████| 159/159 [00:09<00:00, 16.15it/s]


Epoch: 02, Train: 203.7012,   Valid: 5.73Test: 9.08  num_skipped: 0


Training Households: 100%|██████████| 537/537 [01:53<00:00,  4.75it/s]
Calculating Validation R@K: 100%|██████████| 95/95 [00:07<00:00, 12.79it/s]
Calculating Test R@K: 100%|██████████| 159/159 [00:10<00:00, 15.48it/s]


Epoch: 03, Train: 90.2458,   Valid: 2.13Test: 3.50  num_skipped: 0


Training Households: 100%|██████████| 537/537 [01:58<00:00,  4.52it/s]
Calculating Validation R@K: 100%|██████████| 95/95 [00:07<00:00, 12.90it/s]
Calculating Test R@K: 100%|██████████| 159/159 [00:10<00:00, 15.78it/s]

Epoch: 04, Train: 42.2776,   Valid: 1.07Test: 1.71  num_skipped: 0
Early Stopping.





In [5]:
## TODO: This still isn't used.
## This class was used before for prepping our representations for the attention based temporal learning modulee. It's not used anywhere at the moment.
class stacked_weighted_GCN_blocks(nn.ModuleList):
    def __init__(self, *args, **kwargs):
        super(stacked_weighted_GCN_blocks, self).__init__(*args, **kwargs)

    def forward(self, *input):
        nodes_feature, edge_weights = input
        h = nodes_feature
        for module in self:
            h = module(h, edge_weights)
        return h