<a href="https://colab.research.google.com/github/mneedham/data-science-training/blob/master/03_Recommendations.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Recommendations: Part 2

In the 2nd part of our recommendations notebook, we're going to use the PageRank algorithm to make article recommendations to an author. Let's import our libraries in case we don't have those from the previous notebooks:

In [3]:
from py2neo import Graph
import pandas as pd

import matplotlib 
import matplotlib.pyplot as plt

plt.style.use('fivethirtyeight')
pd.set_option('display.float_format', lambda x: '%.3f' % x)
pd.set_option('display.max_colwidth', 100)

## PageRank

We're going to use the PageRank algorithm, so let's first get up to speed on this algorithm. 

PageRank is an algorithm that measures the transitive influence or connectivity of nodes. It can be computed by either iteratively distributing one node’s rank (originally based on degree) over its neighbours or by randomly traversing the graph and counting the frequency of hitting each node during these walks.

## Full Text Search

The following code will create a full text search index on the 'title' and 'abstract' properties of all nodes that have the label 'Article'

In [None]:
query = """
CALL db.index.fulltext.createNodeIndex('articles', ['Article'], ['title', 'abstract'])
"""
graph.run(query).to_data_frame()

## Article recommendation

When an author is searcing for articles to read, they want that search to take them into account. Two authors using the same search term would expect to see different results depending on their area of research.

We're going to use the Full Text Search functionality added in Neo4j 3.5 to help us with the search part of the problem.

The following

In [37]:
query = """
MATCH (a:Article)-[:AUTHOR]->(author:Author)
WHERE author.name=$authorName
WITH author, collect(a) as articles
CALL algo.pageRank.stream(
  'CALL db.index.fulltext.queryNodes("articles", $searchTerm)
   YIELD node, score
   RETURN id(node) as id',
  'MATCH (a1:Article)-[:CITED]->(a2:Article) 
   RETURN id(a1) as source,id(a2) as target', 
  {sourceNodes: articles,graph:'cypher', params: {searchTerm: $searchTerm}})
YIELD nodeId, score
WITH algo.getNodeById(nodeId) AS n, score
WHERE not(exists((author)<-[:AUTHOR]-(n)))
RETURN n.title as article, score, [(n)-[:AUTHOR]->(author) | author.name][..5] AS authors
order by score desc limit 10
"""

params = {"authorName": "Tao Xie", "searchTerm": "open source"}
graph.run(query, params).to_data_frame()

Unnamed: 0,article,authors,score
0,Using structural context to recommend source code examples,"[Reid Holmes, Gail C. Murphy]",0.68
1,Static detection of cross-site scripting vulnerabilities,"[Gary Wassermann, Zhendong Su]",0.382
2,Evolutionary testing of classes,[Paolo Tonella],0.314
3,Clone detection using abstract syntax trees,"[Lorraine Bier, Marcelo M. SantAnna, Leonardo Mendonça de Moura, Andrew Yahin, Ira D. Baxter]",0.308
4,Bandera: extracting finite-state models from Java source code,"[Hongjun Zheng, Robby, Corina S. Pasareanu, Shawn Laubach, John Hatcliff]",0.299
5,Concern graphs: finding and describing concerns using structural program dependencies,"[Martin P. Robillard, Gail C. Murphy]",0.276
6,The source code control system,[Marc J. Rochkind],0.256
7,Detecting object usage anomalies,"[Christian Lindig, Andrzej Wasylkowski, Andreas Zeller]",0.246
8,Generalized symbolic execution for model checking and testing,"[Sarfraz Khurshid, Corina S. Păsăreanu, Willem Visser]",0.246
9,Hipikat: recommending pertinent software development artifacts,"[Davor Cubranic, Gail C. Murphy]",0.226


In [36]:
params = {"authorName": "Margus Veanes", "searchTerm": "open source"}
graph.run(query, params).to_data_frame()

Unnamed: 0,article,authors,score
0,The source code control system,[Marc J. Rochkind],22.13
1,Program Improvement by Source-to-Source Transformation,[David B. Loveman],16.746
2,Make — a program for maintaining computer programs,[Stuart I. Feldman],16.325
3,Two case studies of open source software development: Apache and Mozilla,"[Audris Mockus, Roy Fielding, James D. Herbsleb]",15.716
4,Improving and refining programs by program manipulation,"[Dennis F. Kibler, James Milne Neighbors, Thomas A. Standish]",15.708
5,Equivariant adaptive source separation,"[Beate Hvam Laheld, J.-F. Cardoso]",15.261
6,StackGuard: automatic adaptive detection and prevention of buffer-overflow attacks,"[Qian Zhang, Perry Wagle, Aaron Grier, Steve Beattie, P. Bakke]",10.425
7,Clone detection using abstract syntax trees,"[Lorraine Bier, Marcelo M. SantAnna, Leonardo Mendonça de Moura, Andrew Yahin, Ira D. Baxter]",10.25
8,A New Learning Algorithm for Blind Signal Separation,"[Shun-ichi Amari, Andrzej Cichocki, Howard Hua Yang]",9.989
9,Building diverse computer systems,"[Stephanie Forrest, Anil Somayaji, David H. Ackley]",9.729
