# Recommendations: Part 1

In this notebook you will learn how to make recommendations using Neo4j. 

Execute the code to import the libraries (remember to unset Reset all runtimes before running):

In [1]:
from py2neo import Graph
import pandas as pd

import matplotlib 
import matplotlib.pyplot as plt

plt.style.use('fivethirtyeight')
pd.set_option('display.float_format', lambda x: '%.3f' % x)
pd.set_option('display.max_colwidth', 100)

Next, create a connection to your Neo4j Sandbox, just as you did previously when you set up your environment. 

<div align="left">
    <img src="https://github.com/neo4j-contrib/training-v2/blob/master/Courses/DataScience/notebooks/images/sandbox-citations.png?raw=1" alt="Citation Sandbox"/>
</div>

Update the cell below to use the IP Address, Bolt Port, and Password, as you did previously.

In [2]:
# Change the line of code below to use the IP Address, Bolt Port, and Password of your Sandbox.
# graph = Graph("bolt://<IP Address>:<Bolt Port>", auth=("neo4j", "<Password>")) 
 
#graph = Graph("bolt://52.3.242.176:33698", auth=("neo4j", "equivalent-listing-parts"))
graph = Graph("bolt://localhost:7687", auth=("neo4j", "graphdb"))

##  Finding popular authors

Since we're going to make collaborator suggestions find authors who have written the most articles so that we have some data to work with.

In [3]:
popular_authors_query = """
MATCH (author:Author)
RETURN author.name, size((author)<-[:AUTHOR]-()) AS articlesPublished
ORDER BY articlesPublished DESC
LIMIT 10
"""

graph.run(popular_authors_query).to_data_frame()

Unnamed: 0,author.name,articlesPublished
0,Peter G. Neumann,89
1,Peter J. Denning,80
2,Moshe Y. Vardi,72
3,Pamela Samuelson,71
4,Bart Preneel,65
5,Vinton G. Cerf,56
6,Barry W. Boehm,53
7,Mark Guzdial,49
8,Edwin R. Hancock,47
9,Josef Kittler,46


Pick one of these authors...

In [4]:
author_name = "Peter G. Neumann"

Retrieve the articles they've published and how many citations they've received:

In [5]:
author_articles_query = """
MATCH (:Author {name: $authorName})<-[:AUTHOR]-(article)
RETURN article.title AS article, article.year AS year, size((article)<-[:CITED]-()) AS citations
ORDER BY citations DESC
LIMIT 20
"""

graph.run(author_articles_query,  {"authorName": author_name}).to_data_frame()

Unnamed: 0,article,year,citations
0,"The foresight saga, redux",2012,2
1,Security by obscurity,2003,2
2,Risks of automation: a cautionary total-system perspective of our cyberfuture,2016,1
3,Crypto policy perspectives,1994,1
4,Risks of National Identity Cards,2001,1
5,"Computers, ethics, and values",1991,1
6,Are dependable systems feasible,1993,1
7,Information system security redux,2003,1
8,The foresight saga,2006,1
9,Robust open-source software,1999,1


Find the author's collaborators:

In [6]:
collaborations_query = """
MATCH (:Author {name: $authorName})<-[:AUTHOR]-(article)-[:AUTHOR]->(coauthor)
RETURN coauthor.name AS coauthor, count(*) AS collaborations
ORDER BY collaborations DESC
LIMIT 10
"""

graph.run(collaborations_query,  {"authorName": author_name}).to_data_frame()

Unnamed: 0,coauthor,collaborations
0,Lauren Weinstein,3
1,Whitfield Diffie,3
2,Susan Landau,3
3,Steven Michael Bellovin,2
4,Matt Blaze,2
5,Rebecca T. Mercuri,2
6,Alfred Z. Spector,1
7,Seymour E. Goodman,1
8,David Lorge Parnas,1
9,Douglas Miller,1


How would you suggest some future collaborators for this author? One way is by looking at the collaborators of their collaborators!

In [7]:
collaborations_query = """
MATCH (author:Author {name: $authorName})<-[:AUTHOR]-(article)-[:AUTHOR]->(coauthor),
      (coauthor)<-[:AUTHOR]-()-[:AUTHOR]->(coc)
WHERE not((coc)<-[:AUTHOR]-()-[:AUTHOR]->(author)) AND coc <> author      
RETURN coc.name AS coauthor, count(*) AS collaborations
ORDER BY collaborations DESC
LIMIT 10
"""

graph.run(collaborations_query,  {"authorName": author_name}).to_data_frame()

Unnamed: 0,coauthor,collaborations
0,John Ioannidis,10
1,Scott Bradner,9
2,Angelos D. Keromytis,8
3,John Kelsey,7
4,Virgil D. Gligor,5
5,David Wagner,4
6,Peter Wolcott,4
7,Ran Canetti,4
8,Gerald Jay Sussman,4
9,David K. Gifford,4


Each of these people have collaborated with someone that Peter has worked with before, so they might be able to do an introduction.

## Exercise

1. Can you find the top 20 suggested collaborators for 'Brian Fitzgerald' instead of 'Peter G. Neumann'?
2. How many of these potential collaborators have collaborated with Brian's collaborators more than 3 times?

Keep the results of this exercise handy as they may be useful for the quiz at the end of this module.

In [8]:
author_name = "Brian Fitzgerald"

Retrieve the articles they've published and how many citations they've received:

In [9]:
author_articles_query = """
MATCH (:Author {name: $authorName})<-[:AUTHOR]-(article)
RETURN article.title AS article, article.year AS year, size((article)<-[:CITED]-()) AS citations
ORDER BY citations DESC
LIMIT 20
"""

graph.run(author_articles_query,  {"authorName": author_name}).to_data_frame()

Unnamed: 0,article,year,citations
0,Continuous software engineering and beyond: trends and challenges,2014,3
1,Grounded theory in software engineering research: a critical review and guidelines,2016,2
2,"Two's company, three's a crowd: a case study of crowdsourcing software development",2014,2
3,Software development method tailoring at Motorola,2003,2
4,Scaling agile methods to regulated environments: an industry case study,2013,2
5,Global software development: where are the benefits?,2009,1
6,Evidence-based decision making in lean software project management,2014,1
7,"Collaboration, conflict and control: the 4th workshop on open source software engineering",2004,0
8,First International Workshop on Emerging Trends in FLOSS Research and Development,2007,0
9,Experiences from Representing Software Architecture in a Large Industrial Project Using Model Dr...,2007,0


Find the author's collaborators:

In [10]:
collaborations_query = """
MATCH (:Author {name: $authorName})<-[:AUTHOR]-(article)-[:AUTHOR]->(coauthor)
RETURN coauthor.name AS coauthor, count(*) AS collaborations
ORDER BY collaborations DESC
LIMIT 10
"""

graph.run(collaborations_query,  {"authorName": author_name}).to_data_frame()

Unnamed: 0,coauthor,collaborations
0,Klaas-Jan Stol,6
1,Joseph Feller,5
2,Scott A. Hissam,4
3,Karim R. Lakhani,3
4,Walt Scacchi,2
5,Donal O'Brien,1
6,Björn Lundell,1
7,Martin Krafft,1
8,Brian Lings,1
9,Andrea Capiluppi,1


How would you suggest some future collaborators for this author? One way is by looking at the collaborators of their collaborators!

In [14]:
collaborations_query = """
MATCH (author:Author {name: $authorName})<-[:AUTHOR]-(article)-[:AUTHOR]->(coauthor),
      (coauthor)<-[:AUTHOR]-()-[:AUTHOR]->(coc)
WHERE not((coc)<-[:AUTHOR]-()-[:AUTHOR]->(author)) AND coc <> author      
WITH coc.name AS coauthor, count(*) AS collaborations
WHERE collaborations > 3
RETURN coauthor, collaborations
ORDER BY collaborations DESC
LIMIT 20
"""

graph.run(collaborations_query,  {"authorName": author_name}).to_data_frame()

Unnamed: 0,coauthor,collaborations
0,Holger Giese,5
1,Robert C. Seacord,4
2,Judith A. Stafford,4
3,Gabriel A. Moreno,4
4,Kurt C. Wallnau,4
5,Grace A. Lewis,4
6,Chris Jensen,4
