<a href="https://colab.research.google.com/github/JasaZnidar/Predvidenje-zmagovalca-vaterpolo/blob/heterogen-classification-crashedWhenRunningMetapath2Vec/Diplomska_naloga.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In this model, real data is used to construct the model.

The data is stored in a Heterogeneus graph.
A node is a team and it's feature is the average team statistic.

An edge represents a match. For each match there are 2 edges. It represents the result, from the originating node teams point of view. Meaning that if the number is negative, they lost. The absolute value is the difference in gols. This value is also used as a weight for the edges during learning.

To predict the result, we use a GCN evalueate all teams, concatinate these evaluations in acordance with the edge we are trying to predict (originating node team is first) and then do a linear regression to get a single value, which is the result.

We also filter out matches that result in a tie.

# Setup

## Download whl-s and requirements

In [1]:
import urllib.request
import sys

urllib.request.urlretrieve("https://raw.githubusercontent.com/JasaZnidar/Predvidenje-zmagovalca-vaterpolo/refs/heads/main/requirements.txt", "requirements.txt")

('requirements.txt', <http.client.HTTPMessage at 0x7d20832fefd0>)

## pip

In [2]:
!sudo apt-get install libcairo2-dev pkg-config python3-dev
!pip install --upgrade pip
!pip install --force-reinstall --no-cache-dir pycairo

!pip install -r requirements.txt

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
python3-dev is already the newest version (3.10.6-1~22.04.1).
python3-dev set to manually installed.
The following packages were automatically installed and are no longer required:
  libbz2-dev libpkgconf3 libreadline-dev
Use 'sudo apt autoremove' to remove them.
The following additional packages will be installed:
  libblkid-dev libcairo-script-interpreter2 libffi-dev libglib2.0-dev
  libglib2.0-dev-bin libice-dev liblzo2-2 libmount-dev libpixman-1-dev
  libselinux1-dev libsepol-dev libsm-dev libxcb-render0-dev libxcb-shm0-dev
Suggested packages:
  libcairo2-doc libgirepository1.0-dev libglib2.0-doc libgdk-pixbuf2.0-bin
  | libgdk-pixbuf2.0-dev libxml2-utils libice-doc libsm-doc
The following packages will be REMOVED:
  pkgconf r-base-dev
The following NEW packages will be installed:
  libblkid-dev libcairo-script-interpreter2 libcairo2-dev libffi-dev
  libglib2.0-dev libglib2.0-dev-bin li

# Imports

In [3]:
import json
import copy
import networkx as nx
import torch
import torch.nn.functional as F
import torch_geometric
from torch_geometric.utils.convert import from_networkx
from torch_geometric import nn, sampler
from torch_geometric.data import HeteroData, Data
from torch_geometric import transforms as T
from torch_geometric import loader
from torcheval import metrics as M
from torcheval.metrics import R2Score, MeanSquaredError
import tqdm
from sklearn.metrics import roc_auc_score, roc_curve
import matplotlib.pyplot as plt
import requests
from zipfile import ZipFile
from math import log
from io import BytesIO
%matplotlib inline

# Data definition

In [4]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'

win_value = 1.0
loss_value = -1.0
tie_value = (win_value + loss_value)/2
tie_accuracy = 0.1

data_file = "test"
train_rate = 0.7
val_rate = 1.0 - train_rate

# goals, shots, assists, blocks, saves, exclusion, penalty foul, suspention, brutality, sprint won, sprints
used_features = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
# birth, hand height, position, weight
used_player_stats = [1, 1, 1, 1, 1]

## Get JSON data from github

In [5]:
with requests.get(f"https://github.com/JasaZnidar/totalwaterpolo-web-scraper/raw/master/{data_file}.zip", ) as r:
  ZipFile(BytesIO(r.content), "r").extractall()

## Extract data to json object

In [6]:
with open(f"/content/{data_file}.json") as f:
    scraped_data = json.load(f)

## Supporting functions

### Filter features

In [7]:
def filterFeatures(features: list[float]) -> list[float]:
  assert len(features) == len(used_features)

  return [features[i] for i in range(len(features)) if used_features[i]]

### Event data to vector data

In [8]:
def eventToVector(event: dict) -> tuple[list[float], list[float]]:
  player1Statistic = [0.0]*11
  player2Statistic = [0.0]*11

  if "goal scored" in event['action']:
    # goals
    player1Statistic[0] += 1
    # shots
    player1Statistic[1] += 1
    # assists
    player2Statistic[2] += 1

  elif "exclusion" in event['action']:
    # exclusion
    player1Statistic[5] += 1

  elif "penalty foul" in event['action']:
    # penalty
    player1Statistic[6] += 1

  elif "shot missed" in event['action']:
    # shots
    player1Statistic[1] += 1

  elif "shot saved" in event['action']:
    # shots
    player1Statistic[1] += 1
    # saves
    player2Statistic[4] += 1

  elif "shot blocked" in event['action']:
    # shots
    player1Statistic[1] += 1
    # blocks
    player2Statistic[3] += 1

  elif "suspention" in event['action']:
    # suspensions
    player1Statistic[7] += 1

  elif "brutality" in event['action']:
    # brutalities
    player1Statistic[8] += 1

  elif "sprint won" in event['action']:
    # sprint won
    player1Statistic[9] += 1
    # sprint
    player1Statistic[10] += 1
    # sprint
    player2Statistic[10] += 1

  return (player1Statistic, player2Statistic);

### Result classification

In [9]:
def resultClass(result: float) -> float:
  if result > tie_value + win_value*tie_accuracy:
    return win_value
  elif result < tie_value + loss_value*tie_accuracy:
    return loss_value
  else:
    return tie_value

### Is second player ally

In [10]:
def isAlly(event: dict) -> bool:
  if 'goal scored' in event['action']:
    return True
  return False

### Update list

In [11]:
def Update(original: list, update: list) -> list:
  assert len(original) == len(update)

  return[original[x] + update[x] for x in range(len(original))]

## Gather match ids

In [12]:
match_ids = [match_id for match_id in scraped_data['matches']]

## Data generation

### Player data gathering

In [13]:
player_data = {}
player_ids = [player_id for player_id in scraped_data['players']]

for player_id in player_ids:
  # birth, hand, height, position, weight
  player_data[player_id] = [0.0]*5

  player_data[player_id][0] = scraped_data['players'][player_id]['birth']
  player_data[player_id][1] = 1 if scraped_data['players'][player_id]['hand'] == 'R' else -1 if scraped_data['players'][player_id]['hand'] == 'L' else 0
  player_data[player_id][2] = scraped_data['players'][player_id]['height'] if scraped_data['players'][player_id]['height'] else 0
  player_data[player_id][4] = scraped_data['players'][player_id]['weight'] if scraped_data['players'][player_id]['weight'] else 0
  match scraped_data['players'][player_id]['position']:
      case '':
        player_data[player_id][3] = 0
      case 'Goalkeeper':
        player_data[player_id][3] = 1
      case 'Driver':
        player_data[player_id][3] = 2
      case 'Left Driver':
        player_data[player_id][3] = 3
      case 'Right Driver':
        player_data[player_id][3] = 4
      case 'Central Defender':
        player_data[player_id][3] = 5
      case 'Left Winger':
        player_data[player_id][3] = 6
      case 'Right Winger':
        player_data[player_id][3] = 7
      case 'Center Forward':
        player_data[player_id][3] = 8

### Team data gathering

In [14]:
match_list = match_ids
results = {}
team_data = {}

match_data = {}  # match_id: ['team A', [...], {...}, 'team B', [...], {...}, 4] home team name, home player lineups(the index dictates the players hat number), home statistics, away team name, away player lineups(the index dictates the players hat number), away statistics, result (for the home team)

for match_id in match_list:
  #=============================================================================
  # Collect match data
  #=============================================================================
  lineups = scraped_data['matches'][match_id]['lineup']
  match_data[match_id] = [
      scraped_data['matches'][match_id]['name']['home'],
      [lineups['home'][str(i + 1)]['id'] if str(i + 1) in lineups['home'] else 0 for i in range(15)],
      {},
      scraped_data['matches'][match_id]['name']['away'],
      [lineups['away'][str(i + 1)]['id'] if str(i + 1) in lineups['away'] else 0 for i in range(15)],
      {},
      scraped_data['matches'][match_id]['result']['home'] - scraped_data['matches'][match_id]['result']['away']
  ]
  """results[match_id] = scraped_data['matches'][match_id]['result']['home'] - scraped_data['matches'][match_id]['result']['away']
  if results[match_id] == 0:
    del results[match_id]
    continue"""

  #=============================================================================
  # Prepare statistics dictionary for a teams players
  #=============================================================================
  statistics = {
      'home': {int(x): [0.0]*11 for x in scraped_data['matches'][match_id]['lineup']['home']},
      'away': {int(x): [0.0]*11 for x in scraped_data['matches'][match_id]['lineup']['away']}
  }

  if len(statistics['home']) < 7 or len(statistics['away']) < 7:
    #del results[match_id]
    del match_data[match_id]
    continue

  #=============================================================================
  # Add missing teams in team_data
  #=============================================================================
  """for team in ['home', 'away']:
    if not scraped_data['matches'][match_id]['name'][team] in team_data:
      team_data[scraped_data['matches'][match_id]['name'][team]] = [[0.0]*11, 0]"""

  #=============================================================================
  # Loop through all events and update player statistics
  #=============================================================================
  for event in scraped_data['matches'][match_id]['plays']:
    num_1 = event['player_1']
    num_2 = event['player_2']
    primary_team = event['team']
    secondary_team = primary_team if isAlly(event) else 'away' if primary_team == 'home' else 'home'

    # no player was recorded for this event
    if num_1 == 0:
      continue

    # no secondary player was recorded for this event
    elif num_2 == 0:
      data_1, _ = eventToVector(event)
      statistics[primary_team][num_1] = Update(statistics[primary_team][num_1], data_1)

    # there are 2 players recorded for this event
    else:
      data_1, data_2 = eventToVector(event)
      statistics[primary_team][num_1] = Update(statistics[primary_team][num_1], data_1)
      statistics[secondary_team][num_2] = Update(statistics[secondary_team][num_2], data_2)

  #=============================================================================
  # Save team statistics
  #=============================================================================
  match_data[match_id][2] = statistics['home']
  match_data[match_id][5] = statistics['away']


print(match_data)

{'100': ['Spain', ['1141', '1142', '1143', '1144', '1145', '1146', '1147', '1148', '1149', '1150', '1151', '1152', '1153', 0, 0], {1: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], 10: [1.0, 6.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], 11: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], 12: [2.0, 5.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], 13: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], 2: [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], 3: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], 4: [5.0, 7.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], 5: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], 6: [2.0, 4.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0], 7: [4.0, 6.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0], 8: [0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0], 9: [4.0, 5.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]}, 'Israel', ['1219', '1220', '1221', '1222', '1223', '1224', '1225', '1226', '1227', '

### Create data matrices

Create player matrix

In [15]:
players_list = set()
teams_list = set()
for match_id in match_data:
  teams_list.add(match_data[match_id][0])
  teams_list.add(match_data[match_id][3])
  for home_player in match_data[match_id][1]:
    players_list.add(int(home_player))
  for away_player in match_data[match_id][4]:
    players_list.add(int(home_player))

teams_list = list(teams_list)
players_list = list(players_list)

player_matrix = torch.zeros((len(players_list), sum(used_player_stats)), dtype=torch.float32)
playerId_to_index = {str(players_list[i]): i for i in range(len(players_list))}

for player_id in player_data:
  if player_id not in playerId_to_index:
    continue

  cnt = 0
  for stat in used_player_stats:
    if stat:
      player_matrix[playerId_to_index[player_id], cnt] = player_data[player_id][cnt]
    cnt += 1

Create playerInMatch and edge matricies

In [16]:
# set starting player in match values and connect them to their games
playerInMatch_matrix = torch.zeros((len(players_list), sum(used_features)), dtype=torch.float32)
playerInMatch_player_matrix = torch.Tensor([[x for x in range(len(players_list))]]*2).T.long()   # playerInMatch -> player

# set starting team values
team_nameToIndex = {teams_list[i]: i for i in range(len(teams_list))}
teamInMatch_num = 0
teamInMatch_team_matrix = torch.zeros((0, 2), dtype=torch.long)  # teamInMatch -> team
teamInMatch_playerInMatch_matrix = torch.zeros((0, 2), dtype=torch.long)   # teamInMatch -> playerInMatch
teamInMatch_playerInMatch_attr = torch.zeros((1, 0), dtype=torch.float32)
teamInMatch_teamInMatch_matrix = torch.zeros((0, 2), dtype=torch.long)   # teamInMatch -> teamInMatch
teamInMatch_teamInMatch_attr = torch.zeros((1, 0), dtype=torch.float32)
teamInMatch_teamInMatch_weight = torch.zeros((0,), dtype=torch.float32)

# sort match
match_list.sort(key=lambda x: int(x))

Fill up playerInMatch matrix and edges

In [17]:
# loop through matches and update data in playerInMatch matrix
for match_id in match_list:
  if match_id not in match_data:
    continue

  # add teamInMatch
  teamInMatch_num += 2

  # link teamInMatch to team
  teamInMatch_team_matrix = torch.cat((teamInMatch_team_matrix, torch.Tensor([[teamInMatch_num - 2, team_nameToIndex[match_data[match_id][0]]]])), dim=0)  # home
  teamInMatch_team_matrix = torch.cat((teamInMatch_team_matrix, torch.Tensor([[teamInMatch_num - 1, team_nameToIndex[match_data[match_id][3]]]])), dim=0)  # away

  # link teamInMatch to teamInMatch
  # home
  teamInMatch_teamInMatch_matrix = torch.cat((teamInMatch_teamInMatch_matrix, torch.Tensor([[teamInMatch_num - 2, teamInMatch_num - 1]])), dim=0)
  teamInMatch_teamInMatch_attr = torch.cat((teamInMatch_teamInMatch_attr, torch.Tensor([[1.0 if match_data[match_id][6] > 0 else -1.0 if match_data[match_id][6] < 0 else 0.0]])), dim=1)
  teamInMatch_teamInMatch_weight = torch.cat((teamInMatch_teamInMatch_weight, torch.Tensor([match_data[match_id][6]])))
  # away
  teamInMatch_teamInMatch_matrix = torch.cat((teamInMatch_teamInMatch_matrix, torch.Tensor([[teamInMatch_num - 1, teamInMatch_num - 2]])), dim=0)
  teamInMatch_teamInMatch_attr = torch.cat((teamInMatch_teamInMatch_attr, torch.Tensor([[-1.0 if match_data[match_id][6] > 0 else 1.0 if match_data[match_id][6] < 0 else 0.0]])), dim=1)
  teamInMatch_teamInMatch_weight = torch.cat((teamInMatch_teamInMatch_weight, torch.Tensor([-1*match_data[match_id][6]])))

  # team_data => (match_data index for lineup, teamInMatch_matrix index for team)
  for team_data in [(1, -2), (4, -1)]:
    data_index = team_data[0]
    teamInMatch_index = team_data[1]
    for player_num in range(len(match_data[match_id][data_index])):
      player_id = match_data[match_id][data_index][player_num]
      if player_id == 0:
        continue

      # player id as an index
      id = int(player_id) - 1

      # find last instance of player connection in order to find latest data in playerInMatch_matrix
      last_playerInMatch_index = (playerInMatch_player_matrix[:, 1] == id).nonzero(as_tuple=False)[0][-1].item()
      updated_playerInMatch_data = torch.Tensor(Update(playerInMatch_matrix[last_playerInMatch_index, :].tolist(), match_data[match_id][data_index + 1][player_num+1])).reshape((1, -1))

      # add new data to matrix
      playerInMatch_matrix = torch.cat((playerInMatch_matrix, updated_playerInMatch_data), dim=0)

      # link playerInMatch to player
      playerInMatch_player_matrix = torch.cat((playerInMatch_player_matrix, torch.Tensor([[id, playerInMatch_matrix.shape[0] - 1]])), dim=0)

      # link teamInMatch to playerInMatch
      teamInMatch_playerInMatch_matrix = torch.cat((teamInMatch_playerInMatch_matrix, torch.Tensor([[teamInMatch_num + teamInMatch_index, playerInMatch_matrix.shape[0] - 1]])), dim=0)

      # add attribute (0 - player, 1 - goalkeeper)
      teamInMatch_playerInMatch_attr = torch.cat((teamInMatch_playerInMatch_attr, torch.Tensor([[1.0 if player_num in [1, 13] else 0]])), dim=1)

### Create training and validation masks

In [18]:
train_mask = torch.rand((len(teamInMatch_teamInMatch_weight),)) < train_rate
train_samples = round(len(teamInMatch_teamInMatch_weight) * train_rate)
if train_samples%2 != 0:
  train_samples += 1

train_mask = torch.zeros(len(teamInMatch_teamInMatch_weight), dtype=torch.bool)
train_mask[train_samples+1:] = False

val_mask = train_mask == False

# Learning

In [19]:
epochs = 10_000

lr = 0.01
dropout_p = 0.0
crit = torch.nn.MSELoss()

## Data

In [20]:
data = HeteroData()

data['player'].x = player_matrix
data['playerInMatch'].x = playerInMatch_matrix
data['teamInMatch'].num_nodes = teamInMatch_num
data['team'].num_nodes = len(teams_list)

data['playerInMatch', 'playerInstance', 'player'].edge_index = playerInMatch_player_matrix.T.type(torch.long)
data['player', 'playerInstance', 'playerInMatch'].edge_index = torch.flip(playerInMatch_player_matrix, dims=[0]).T.type(torch.long)

data['teamInMatch', 'played', 'playerInMatch'].edge_index = teamInMatch_playerInMatch_matrix.T.type(torch.long)
data['teamInMatch', 'played', 'playerInMatch'].edge_attr = teamInMatch_playerInMatch_attr
data['playerInMatch', 'played', 'teamInMatch'].edge_index = torch.flip(teamInMatch_playerInMatch_matrix, dims=[0]).T.type(torch.long)
data['playerInMatch', 'played', 'teamInMatch'].edge_attr = teamInMatch_playerInMatch_attr

data['teamInMatch', 'teamInstance', 'team'].edge_index = teamInMatch_team_matrix.T.type(torch.long)
data['team', 'teamInstance', 'teamInMatch'].edge_index = torch.flip(teamInMatch_team_matrix, dims=[0]).T.type(torch.long)

data['teamInMatch', 'result', 'teamInMatch'].edge_index = teamInMatch_teamInMatch_matrix.T.type(torch.long)
data['teamInMatch', 'result', 'teamInMatch'].edge_attr = teamInMatch_teamInMatch_attr
data['teamInMatch', 'result', 'teamInMatch'].weight = teamInMatch_teamInMatch_weight

data.to(device)

HeteroData(
  player={ x=[9573, 5] },
  playerInMatch={ x=[119664, 11] },
  teamInMatch={ num_nodes=8768 },
  team={ num_nodes=497 },
  (playerInMatch, playerInstance, player)={ edge_index=[2, 119664] },
  (player, playerInstance, playerInMatch)={ edge_index=[2, 119664] },
  (teamInMatch, played, playerInMatch)={
    edge_index=[2, 110091],
    edge_attr=[1, 110091],
  },
  (playerInMatch, played, teamInMatch)={
    edge_index=[2, 110091],
    edge_attr=[1, 110091],
  },
  (teamInMatch, teamInstance, team)={ edge_index=[2, 8768] },
  (team, teamInstance, teamInMatch)={ edge_index=[2, 8768] },
  (teamInMatch, result, teamInMatch)={
    edge_index=[2, 8768],
    edge_attr=[1, 8768],
    weight=[8768],
  }
)

In [21]:
edgeIndexDict = {
    edge : data[edge].edge_index for edge in data.metadata()[1]
}

# modify edgeIndex for result to include only train data
edgeIndexDict['teamInMatch', 'result', 'teamInMatch'] = data['teamInMatch', 'result', 'teamInMatch'].edge_index[:, train_mask]

embeddingDim = 16
metaPath = [
    ('teamInMatch', 'played', 'playerInMatch'),
    ('playerInMatch', 'playerInstance', 'player'),
    #('player', 'playerInstance', 'playerInMatch'),
    #('playerInMatch', 'played', 'teamInMatch'),
    #('teamInMatch', 'result', 'teamInMatch')
]
numNodesDict = {
    node_type: data[node_type].num_nodes if hasattr(data[node_type], 'num_nodes') else data[node_type].x.shape[0] for node_type in data.metadata()[0]
}

In [None]:
dataVectors = torch_geometric.nn.MetaPath2Vec(
    edge_index_dict=edgeIndexDict,
    embedding_dim=embeddingDim,
    metapath=metaPath,
    walks_per_node=9,
    walk_length=3,
    context_size=4,
    num_nodes_dict=numNodesDict
).to(device)

In [None]:
def plotLoss(loss_data: list[float]):
  plt.plot(range(len(loss_data)), loss_data)
  plt.show()

## Model

In [None]:
class model(torch.nn.Module):
  def __init__(self, channels: list[int], device: str='cpu', dropout: float=0.0):
    assert len(channels) >= 2

    super().__init__()

    self.device = device
    self.dropout = dropout

    self.gcn = torch.nn.ModuleList()
    for i in range(len(channels) - 1):
      self.gcn.append(nn.GCNConv(channels[i], channels[i+1], add_self_loops=False))
    self.gcn.to(self.device)

    self.lin = torch.nn.Linear(channels[-1]*2, 1).to(self.device)

  def forward(self, x: torch.Tensor, edge_index: torch.Tensor, edge_weight: torch.Tensor) -> torch.Tensor:
    # move data to device
    x = x.to(self.device)
    edge_index = edge_index.to(self.device)
    edge_weight = edge_weight.to(self.device)

    # calculate values for each team
    for module in self.gcn:
      x = module(x, edge_index, edge_weight)
      x = x.relu()
      x = F.dropout(x, p=self.dropout, training=self.training)

    # create match vectors by combining teams that played
    x_1st = x[edge_index[0]]
    x_2nd = x[edge_index[1]]
    lin_input = torch.cat((x_1st, x_2nd), dim=1)

    # calculate result from match vector
    x = self.lin(lin_input)

    return x.T

## Learn

In [None]:
N = embeddingDim
gcn = model([N, N*4, N*2], device=device, dropout=dropout_p)
gcn.to(device)
optimizer = torch.optim.Adam(gcn.parameters(), lr=lr)

loss_values = []

for epoch in range(epochs):
  # prepare model for training
  gcn.train()
  optimizer.zero_grad()

  # prepare train data
  train_mask = torch.reshape(data.train_mask, (-1,))
  x = data.x
  train_edge_index = data.edge_index[:, train_mask]
  train_edge_weight = data.edge_weight[train_mask]

  # train
  pred = gcn(x, train_edge_index, train_edge_weight).to(device)

  # prepare validation data
  ground_truth = data.edge_attr[:, train_mask].to(device)
  ground_truth.requires_grad = True

  # calculate loss
  loss = crit(pred, ground_truth)
  loss_values.append(loss.item())
  loss.backward()
  optimizer.step()

# draw loss
plotLoss(loss_values)

## Evaluate

In [None]:
gcn.eval()

val_mask = torch.reshape(data.val_mask, (-1,))
x = data.x
val_edge_index = data.edge_index[:, val_mask]
val_edge_weight = data.edge_weight[val_mask]

pred = gcn(x, val_edge_index, val_edge_weight).to(device).T
class_pred = torch.zeros(pred.size())
for i in range(len(pred)):
  class_pred[i] = (resultClass(pred[i]) + 1)/2

ground_truth = data.edge_attr[:, val_mask].to(device).T
ground_truth.requires_grad = True
class_ground_truth = (ground_truth + 1)/2

R2 = M.R2Score()
R2.to(device)
R2.update(pred, ground_truth)
print(f"R2: {R2.compute()}")

MSE = M.MeanSquaredError()
MSE.to(device)
MSE.update(pred, ground_truth)
print(f"MSE: {MSE.compute()}")

BiAcc = M.BinaryAccuracy()
BiAcc.to(device)
BiAcc.update(torch.reshape(class_pred, (-1,)), torch.reshape(class_ground_truth, (-1,)))
print(f"BiAcc: {BiAcc.compute()}")

# Repeated learning for data gathering

In [None]:
feature_list = [[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1], [1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0], [0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0]]
repeats = 100

csv = ""

for f in range(len(feature_list)):
  #=============================================================================
  # apply used_feature
  #=============================================================================
  used_features = feature_list[f]

  #=============================================================================
  # update team matrix
  #=============================================================================
  team_matrix = [filterFeatures(team_statistics[team_IndexToId[i]]) for i in team_IndexToId]

  #=============================================================================
  # update data
  #=============================================================================
  data.x = torch.Tensor(
      team_matrix
  )

  #=============================================================================
  # create ground truth
  #=============================================================================
  ground_truth = data.edge_attr[:, val_mask].to(device).T
  ground_truth.requires_grad = True
  class_ground_truth = (ground_truth + 1)/2

  #=============================================================================
  # repeat 10 times
  #=============================================================================
  for r in range(repeats):
    #===========================================================================
    # reset crit (not sure if it does anything)
    #===========================================================================
    crit = torch.nn.MSELoss()

    #===========================================================================
    # learn
    #===========================================================================
    N = sum(used_features)
    gcn = model([N, N*4, N*2], device=device, dropout=dropout_p)
    gcn.to(device)
    optimizer = torch.optim.Adam(gcn.parameters(), lr=lr)

    loss_values = []

    for epoch in range(epochs):
      # prepare model for training
      gcn.train()
      optimizer.zero_grad()

      # prepare train data
      train_mask = torch.reshape(data.train_mask, (-1,))
      x = data.x
      train_edge_index = data.edge_index[:, train_mask]
      train_edge_weight = data.edge_weight[train_mask]

      # train
      pred = gcn(x, train_edge_index, train_edge_weight).to(device)

      # prepare validation data
      ground_truth = data.edge_attr[:, train_mask].to(device)
      ground_truth.requires_grad = True

      # calculate loss
      loss = crit(pred, ground_truth)
      loss_values.append(loss.item())
      loss.backward()
      optimizer.step()

    #===========================================================================
    # evaluate
    #===========================================================================
    gcn.eval()

    val_mask = torch.reshape(data.val_mask, (-1,))
    x = data.x
    val_edge_index = data.edge_index[:, val_mask]
    val_edge_weight = data.edge_weight[val_mask]

    pred = gcn(x, val_edge_index, val_edge_weight).to(device).T
    class_pred = torch.zeros(pred.size())
    for i in range(len(pred)):
      class_pred[i] = (resultClass(pred[i]) + 1)/2

    BiAcc = M.BinaryAccuracy()
    BiAcc.to(device)
    BiAcc.update(torch.reshape(class_pred, (-1,)), torch.reshape(class_ground_truth, (-1,)))
    item = f"{BiAcc.compute().item()}".split(".")
    csv += f";{item[0]},{item[1]}"
  print(csv)
  csv = ""