# Graph outlier detection

We'll initially give [pygod](https://pypi.org/project/pygod/) as shot, as it [implements many different algorithms](https://pypi.org/project/pygod/#:~:text=Implemented%20Algorithms) for us to do graph outlier detection (anomaly detection).

Prerequisites:

- Download data using the link shared through Slack
- Put data into `/data` folder as `/data/fraud_detection_data.feather`
- Create conda environment by running:

> `conda env create -f environment.yml -n azd_madoff && conda activate azd_madoff && pip install -r requirements.txt`

Then you should be able to use the `azd_madoff` kernel in this notebook.

In [None]:
%load_ext autoreload
%autoreload 2

import numpy as np
import pandas as pd

import torch
from torch_geometric.data import Data

from data_preparation import load_fraud_data, extract_node_data, extract_edge_data


### Data loading

In [None]:
df_fraud_data = load_fraud_data()
df_fraud_data.head(3)

In [None]:
df_fraud_data["link_type"].value_counts(dropna=False)

### Extract nodes and edges

Unfortunately we did not prepare the data into the right format, so we need to do a bunch of data munging.
Goal is to extract all nodes and edges separately and make sure the edges map the correct indices from the nodes.

In [None]:
df_nodes = extract_node_data(df_fraud_data)

df_nodes.head(3)

In [None]:
df_edges = extract_edge_data(df_fraud_data, df_nodes)

df_edges.head(3)

### Create torch.geometric Data object

See https://pytorch-geometric.readthedocs.io/en/latest/generated/torch_geometric.data.Data.html#torch-geometric-data-data

In [None]:
# x (torch.Tensor, optional) – Node feature matrix with shape [num_nodes, num_node_features]. (default: None)
x = df_nodes[["node_type_int"]].to_numpy()
x

In [None]:
# edge_index (LongTensor, optional) – Graph connectivity in COO format with shape [2, num_edges]. (default: None)
edge_index = df_edges[["node_from", "node_to"]].values.T
edge_index

In [None]:
# edge_attr (torch.Tensor, optional) – Edge feature matrix with shape [num_edges, num_edge_features]. (default: None)
edge_attr = df_edges["link_type_int"].to_numpy()
edge_attr

In [None]:
# Create the data object describing a homogeneous graph 
data = Data(x=torch.tensor(x), edge_index=torch.tensor(edge_index), edge_attr=torch.tensor(edge_attr))

data

In [None]:
# Validates the correctness of the data.
data.validate()

### PyGOD Detector Example

See: https://docs.pygod.org/en/latest/tutorials/1_intro.html#sphx-glr-tutorials-1-intro-py

In [None]:
from pygod.detector import DOMINANT

detector = DOMINANT(hid_dim=64, num_layers=4, epoch=100)

In [None]:
# DefaultCPUAllocator: can't allocate memory: you tried to allocate 335076584164 bytes
# detector.fit(data)