![title](https://github.com/benedekrozemberczki/datasets/raw/master/images/tigerlily_logo.jpg)

# What do we achieve by using Tigerlily? Why do we care?

![title](https://github.com/benedekrozemberczki/datasets/raw/master/images/pair_scoring.jpg)

# 1. Imports

## 1.1. Tigerlily specific imports

In [2]:
from tigerlily.dataset import ExampleDataset
from tigerlily.embedding import EmbeddingMachine
from tigerlily.pagerank import PersonalizedPageRankMachine
from tigerlily.operator import hadamard_operator, concatenation_operator

# 1.2. General data manipulation and machine learning imports

In [3]:
import pandas as pd
from lightgbm import LGBMClassifier
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import train_test_split

# 2. DrugBank DDI and BioSNAP Loading

In [4]:
dataset = ExampleDataset()

In [5]:
edges = dataset.read_edges()
target = dataset.read_target()

print(edges.shape)
print(target.shape)

(816683, 4)
(187850, 3)


# 3. PageRank Computation with TigerGraph



![title](https://github.com/benedekrozemberczki/datasets/raw/master/images/pair_scoring_A.jpg)

## 3.1. Etablishing a connection and installing the Personalized PageRank query

In [None]:
machine = PersonalizedPageRankMachine(host="https://tigerlily.i.tgcloud.io",
                           graphname="tester",
                           secret="",
                           password="")

In [None]:
machine.connect()

In [None]:
machine.install_query()

## 3.2. Defining a graph and computing Personalized PageRank for the drug nodes

![title](https://github.com/benedekrozemberczki/datasets/raw/master/images/pair_scoring_B.jpg)

In [None]:
machine.upload_graph(new_graph=True, edges=edges)

In [None]:
drug_node_ids = machine.connection.getVertices("drug")

In [None]:
pagerank_scores = machine.get_personalized_pagerank(drug_node_ids)

# 4. Embedding learning from Personalized PageRank scores

![title](https://github.com/benedekrozemberczki/datasets/raw/master/images/pair_scoring_C.jpg)

In [23]:
pagerank_scores = dataset.read_pagerank()
print(pagerank_scores.shape)

(54110, 3)


In [24]:
embedding_machine = EmbeddingMachine(42, 32, 100)

embedding = embedding_machine.fit(pagerank_scores)

# 5. Classifier Training and Inference


![title](https://github.com/benedekrozemberczki/datasets/raw/master/images/pair_scoring_D.jpg)

In [25]:
drug_pair_features = embedding_machine.create_features(target, concatenation_operator)

In [29]:
model = LGBMClassifier(learning_rate=0.01, n_estimators=100)

X_train, X_test, y_train, y_test = train_test_split(drug_pair_features,
                                                    target,
                                                    train_size=0.8,
                                                    random_state=42)

model.fit(X_train,y_train["label"])

LGBMClassifier(learning_rate=0.01)

In [30]:
predicted_label = model.predict_proba(X_test)

In [31]:
auroc_score_value = roc_auc_score(y_test["label"], predicted_label[:,1])
print(f'AUROC score: {auroc_score_value :.4f}')

AUROC score: 0.9664


# 6. Ideas and Readings

## 6.1. Ideas

## 6.2. Readings