# Impersonation Scam Prediction
## Members
1) Heng Jing Han (1003590)

2) Jeremy Chew (1003301)

3) Lim Yang Zhi (1003458)

4) Tiang Pei Yuan (1003323)

5) Yu Wenling (1003793)

## Introduction
Impersonation scams are online scams where malicious users impersonate other users on sites such as social media in order to target their family or close ones. In this project, we aims to predict potential victims of these types of scams, assuming that the impersonator is already known.

In this notebook, we allow the user to input their own data.

## Google Colab initialisation
Mount the drive and navigate to the correct directory.

In [None]:
from google.colab import drive
import os

drive.mount('/content/drive')
os.chdir("drive/My Drive/DL BIG/final-code")

Mounted at /content/drive


In [None]:
!ls

data		  graph.py  network.html  requirements.txt
deployment.ipynb  models    __pycache__   training.ipynb


## Import libraries
The full list of requirements can be found in `requirements.txt`, and can be installed through `pip` using `pip install -r requirements.txt`.

In [None]:
!pip install -r requirements.txt --quiet

[K     |████████████████████████████████| 317kB 5.0MB/s 
[K     |████████████████████████████████| 51kB 4.4MB/s 
[?25h

In [1]:
# required libraries
import networkx as nx
import matplotlib.pyplot as plt

import torch
import torch.nn as nn
import torch.optim as optim

import random

# project-specific functions
import graph

## Build graph object
Read the graph structure from `data/`. We also set the appropriate device (GPU or CPU).

In [2]:
# read data and build graph object
G, _ = graph.read_data('data/3980_edited.edges', 'data/3980_edited.feat')

# check if gpu is available
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(device)

cuda


## Get user input
There are 44 nodes in this dataset. A user can decide which of those nodes would be labelled as scammers. Ctrl-Click to select multiple scammers. There has to be exactly 2 scammers.

In [3]:
import ipywidgets as widgets

scammer_select = widgets.SelectMultiple(
    options=list(G.nodes),
    description='Scammers',
    rows=10,
    value=(3985, 3993),
    disabled=False
)

display(scammer_select)

SelectMultiple(description='Scammers', index=(3, 9), options=(3981, 3982, 3983, 3985, 3986, 3988, 3989, 3991, …

Also, the user can input the name of the saved model, if it was different from the default.

In [4]:
model_select = widgets.Text(
    value='models/model_final.pt',
    description='Model name'
)

display(model_select)

Text(value='models/model_final.pt', description='Model name')

Run the following cell after the performing your inputs

In [5]:
# print out the user features as selected above
print('Scammers: ', scammer_select.value)
print('Model name: ', model_select.value)

Scammers:  (3985, 3993)
Model name:  models/model_final.pt


## Generate input features
Our model takes as input the hop distance from each scammer, whether that node was a scammer, as well as feature 19, which is the anonymised gender.

In [6]:
X = []
distances = dict(nx.all_pairs_shortest_path_length(G))
for node in G.nodes:
    in_feat = []
    # scammer label
    if node in scammer_select.value:
        in_feat.append(1)
    else:
        in_feat.append(0)
    # hop distance to scammer 1
    in_feat.append(distances[scammer_select.value[0]][node])
    # hop distance to scammer 2
    in_feat.append(distances[scammer_select.value[1]][node])
    # feature 19
    in_feat.append(G.nodes[node][19])
    X.append(in_feat)

## Load the model
Create and load the saved model that was created in `main.ipynb`. The default name is `models/final_model.pt`.

In [7]:
# load adjacency matrix
A = nx.adjacency_matrix(G).todense()
A = torch.Tensor(A).to(device)

# get the input feature size
input_size = len(X[0])

# load the model
model = graph.Net(A, input_size, 10, 2, device)
model.load_state_dict(torch.load(model_select.value))
model.eval()

Net(
  (conv1): GCNKipf_Layer()
  (conv2): GCNKipf_Layer()
)

## Run the model
Run the user inputs on the trained model, and show the results in terms of a coloured graph.

In [8]:
# run this through our model
X = torch.Tensor(X).to(device)
out = torch.argmax(model(X), dim=1)

# display the raw predictions
print(out.cpu().tolist())

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0]


In [9]:
from pyvis import network as net
from dimcli.utils.networkviz import *

# display the results as a coloured graph
# if it was predicted to be a victim        : yellow
# if it was not predicted to be a victim    : cyan
# if it was a scammer                       : black

yellow = '#FFD700'
cyan = '#51F59D'
black = '#000000'

# create a visualisation object
g = NetworkViz(notebook=True)

# convert the networkx graph to one supported by pyvis
for node in G.nodes:
    g.add_node(node, size=8)

for (f_node, t_node) in G.edges:
    g.add_edge(f_node, t_node)

# colour the nodes
for i, node in enumerate(G.nodes):
    pred = out.cpu().tolist()[i]
    if X[i][0] == 1:
        # scammer
        g.nodes[i]['color'] = black
    elif pred == 0:
        g.nodes[i]['color'] = cyan
    elif pred == 1:
        # TP
        g.nodes[i]['color'] = yellow

# show the graph
g.show("network.html")