# Introduction:

This notebook shows how to use a `GraphCreator` instance in a recommendation pipeline to easily produce the top recommendations and display their predicted order (before/after)

In [9]:
%load_ext autoreload
%autoreload 1

import sys
sys.path.append('../utils/')

import pickle
import numpy as np
import pandas as pd

from GraphAPI import GraphCreator
from RecommenderPipeline import Recommender
from save_to_mlab import save_dict_to_mlab

from sklearn.preprocessing import normalize, StandardScaler, Normalizer, RobustScaler, MinMaxScaler, MaxAbsScaler


%aimport GraphAPI
%aimport RecommenderPipeline
%aimport save_to_mlab

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


# Load in Models

When we run our pipeline, we will need to pass a trained classifier model to it when making the recommendations for before/after.

The models below have all been trained on human labeled data, with slightly different parameters.  

In [10]:
with open("../models/rf_classifier_v2_normalized.pkl", "rb") as model:
    rf_v2_classifier = pickle.load(model)
    
with open("../models/rf_classifier_v3_normalized_714.pkl", "rb") as model:
    rf_v3_classifier = pickle.load(model)    
    
with open("../models/rf_classifier_v4_732.pkl", "rb") as model:
    rf_v4_classifier = pickle.load(model)    
    
with open("../models/xg_model_semisupervised_v2.pkl", "rb") as model:
    xg_classifier = pickle.load(model)

# Initialize `GraphCreator` Instance

After initialization, pass as an argument to a new recommender instance

In [16]:
gc = GraphCreator("https://en.wikipedia.org/wiki/Decision_tree", include_see_also=False, max_recursive_requests=50)
print("Layer 1 nodes:", len(gc.next_links))
rec = Recommender(gc, threads=50, chunk_size=1)

Layer 1 nodes: 259


# Fit the Recommender 

In [17]:
rec.fit(scaler=Normalizer)

# Make Predictions
Pass in your model to make predictions on the data

In [18]:
rec.predict(rf_v2_classifier)
# rec.predict(xg_classifier)

# Format the Results
Will return as a dictionary containing the entry node and the predictions of the top articles.

In [19]:
formatted_results = rec.format_results(0.5)
formatted_results['classes'] = list(rec.classes)
formatted_results

{'entry': 'Decision tree',
 'decision_threshold': 0.5,
 'predictions': [{'node': 'Decision analysis',
   'similarity_rank': 1.8543678464820745,
   'degree': 0.8122156245681585,
   'category_matches_with_source': 0.006825341382925702,
   'in_edges': 0.47777389680479915,
   'out_edges': 0.3344417277633594,
   'shared_neighbors_with_entry_score': 0.0004432039859042663,
   'centrality': 5.023467846696596e-05,
   'page_rank': 7.893483424028635e-07,
   'adjusted_reciprocity': 5.7180912259476224e-05,
   'shortest_path_length_from_entry': 0.006825341382925702,
   'shortest_path_length_to_entry': 0.006825341382925702,
   'jaccard_similarity': 0.000264776174337635,
   'primary_link': 0.006825341382925702,
   'label_proba': [0.4372199310000259, 0.5627800689999742],
   'position': 'before'},
  {'node': 'Decision tree learning',
   'similarity_rank': 1.6345069275325246,
   'degree': 0.8161628169144219,
   'category_matches_with_source': 0.0024883012710805548,
   'in_edges': 0.4279878186258554,
   '

In [46]:
save_dict_to_mlab(formatted_results)

# Optional: Format as DataFrame for Easy Viewing

In [15]:
formatted_results = rec.format_results(0.47)

recommendations = pd.DataFrame(formatted_results['predictions'])
print(recommendations.position.value_counts())
print("Decision Threshold:", round(formatted_results['decision_threshold'], 2))
recommendations[['node', 'position', "label_proba"]]

before    52
after     47
Name: position, dtype: int64
Decision Threshold: 0.47


Unnamed: 0,node,position,label_proba
0,Decision analysis,before,"[0.4372199310000259, 0.5627800689999742]"
1,Decision tree learning,after,"[0.518875472249066, 0.4811245277509338]"
2,Influence diagram,after,"[0.5109180442839001, 0.4890819557160998]"
3,"Behavior tree (artificial intelligence, roboti...",after,"[0.5601010085015717, 0.43989899149842815]"
4,Information gain in decision trees,after,"[0.5811665675523409, 0.41883343244765897]"
5,ID3 algorithm,after,"[0.5943469096282273, 0.40565309037177244]"
6,Decision table,before,"[0.40920095728596345, 0.5907990427140366]"
7,Decision tree model,after,"[0.5152046099237989, 0.4847953900762011]"
8,Random forest,after,"[0.48755540175801315, 0.5124445982419868]"
9,Tree (graph theory),before,"[0.4007070533590098, 0.5992929466409901]"
