## ANLT-243 - NoSQL DATABASES
## ASSIGNMENT #4 - ANALYSIS of HIGGS BOSON PARTICLE TWEETS - Neo4j

__by Arda Ugur__<br>
<a_ugur1@u.pacific.edu>
<br><br/>
__University of the Pacific, School of Engineering and Computer Science<br />
Data Science Graduate Program__<br><br>
__September 2018__

![image.png](attachment:image.png)

## Problem Definition:

The Higgs dataset has been built after monitoring the spreading processes on Twitter before, during and after the announcement of the discovery of a new particle with the features of the elusive __Higgs Boson__ on 4th July 2012.
<br><br>
Here are the summary statistics on the twitter data as shown in the web site (http://snap.stanford.edu/data/higgs-twitter.htmlLinks)
<br><br>
__Social Network statistics:__<br>
<br>
Nodes    456626<br>
Edges    14855842<br>
<br>
__Retweet Network statistics:__<br>
<br>
Nodes    256491<br>
Edges    328132<br>
<br>
__Reply Network statistics:__<br>
<br>
Nodes    38918<br>
Edges    32523<br>
<br>
__Mention Network statistics:__<br>
<br>
Nodes    116408<br>
Edges    150818<br>
<br><br>
Higgs Boson Twitter data is published in the __Neo4j__ database deployed with the following configuration:<br>
<br>
__
host = 'ec2-54-67-15-68.us-west-1.compute.amazonaws.com'<br>
port = '9232'<br>
__

The nodes use the label __"User"__

The four relationship edges use the following _Labels_:

Followers Social Network edge label  __"Follows"__

Retweet Network edge label __"Retweets"__

Reply Network edge label __"Replys"__

Mention Network edge label __"Mentions"__
<br><br>

Based on the information given above, answer the following questions using __Neo4J Graph queries__. For the first three questions, use the provided queries and perform Neo4j query transactions and print your results. For the rest of the questions, build your own queries to print the answers.

1. How many total nodes with Label :User are there in the database?

    match (:User) return count(*) as User_Count<br>
<br>
2. How many Social Network relationships are there in the database?

    match ()-[follows:Follows]->() return count(follows)<br>
<br>
3. How many Social Network followers does user 89805 have?

    match (follower)-[:Follows]->(:User{user:89805}) return count(follower)<br>
<br>
4. How many total times did users in this network retweet?<br>
<br>
5. How many times did users in this network reply to other user's tweets?<br>
<br>
6. How many times did users in this network mention other users in their tweets?<br>
<br>
7. How many users follow user 89805?<br>
<br>
8. How many users does user 89805 follow?<br>
<br>
9. Did user 14907 ever retweet user 89805?<br>
<br>
10. Did user 89805 ever retweet user 14907?<br>
<br>
11. Find out the top 5 users with the highest number of followers?<br>
<br>
12. What is the total count of followers of followers of user 89805?<br>


__References:__

Here is a good Neo4j reference on the __Cypher__ language. You can use it to learn simple and advanced queries and operations on Neo4j.

https://neo4j.com/docs/cypher-refcard/current/Links

## Solution

### 0. Configuration

In [1]:
# Graph DB Config
#
from neo4j.v1 import GraphDatabase

In [2]:
# Remote Database Connection
# 
host = 'ec2-54-67-15-68.us-west-1.compute.amazonaws.com'
port = '9232'

In [3]:
# Remote Database Access
#
uri = "bolt://" + host + ':' + port
#
driver = GraphDatabase.driver(uri, auth=("neo4j", "password"), encrypted=False)
session = driver.session()

### 1. How many total nodes with Label :User are there in the database?

In [4]:
# Question 1:
# How many total nodes with Label :User are there in the database?
#
# We are using the provided code
#
with session.begin_transaction() as higgsboson:
    for i in higgsboson.run('match(:User) return count(*) as User_Count'):
        print(i)

<Record User_Count=456626>


### 2. How many Social Network relationships are there in the database?

In [5]:
# Question 2:
# How many Social Network relationships are there in the database?
#
# We are using the provided code
#
with session.begin_transaction() as higgsboson:
    for i in higgsboson.run('match ()-[follows:Follows]->() return count(follows)'):
        print(i)

<Record count(follows)=14855842>


### 3. How many Social Network followers does user 89805 have?

In [6]:
# Question 3: 
# How many Social Network followers does user 89805 have?
#
# We are using the provided code
#
with session.begin_transaction() as higgsboson:
    for i in higgsboson.run('match (follower)-[:Follows]->(:User{user:89805}) return count(follower)'):
        print(i)

<Record count(follower)=156>


### 4. How many total times did users in this network retweet?

In [7]:
# Question 4:
# How many total times did users in this network retweet?
#
with session.begin_transaction() as higgsboson:
    for i in higgsboson.run('match ()-[retweets:Retweets]->() return count(retweets)'):
        print(i)

<Record count(retweets)=328132>


### 5. How many times did users in this network reply to other user's tweets?

In [8]:
# Question 5:
# How many times did users in this network reply to other user's tweets?
#
with session.begin_transaction() as higgsboson:
    for i in higgsboson.run('match ()-[replys:Replys]->() return count(replys)'):
        print(i)

<Record count(replys)=32523>


### 6. How many times did users in this network mention other users in their tweets?

In [9]:
# Question 6:
# How many times did users in this network mention other users in their tweets?
#
with session.begin_transaction() as higgsboson:
    for i in higgsboson.run('match ()-[mentions:Mentions]->() return count(mentions)'):
        print(i)

<Record count(mentions)=150818>


### 7. How many users follow user 89805 ?

In [10]:
# Question 7:
# How many users follow user 89805?
#
with session.begin_transaction() as higgsboson:
    for i in higgsboson.run('match (follower)-[:Follows]->(:User{user:89805}) return count(follower)'):
        print(i)

<Record count(follower)=156>


### 8. How many users does user 89805 follow?

In [11]:
# Question 8:
# How many users does user 89805 follow?
#
with session.begin_transaction() as higgsboson:
    for i in higgsboson.run('match (:User{user:89805})-[:Follows]->(user:User) return count(user)'):
        print(i)

<Record count(user)=171>


### 9. Did user 14907 ever retweet user 89805?

In [12]:
# Question 9:
# Did user 14907 ever retweet user 89805?
#
# Let us call this quantity as n_x_tweets
#
with session.begin_transaction() as higgsboson:
    for i in higgsboson.run('match ({user:14907})-[n_x_tweets:Retweets]->({user:89805}) return count(n_x_tweets)'):
        print(i)

<Record count(n_x_tweets)=0>


### 10. Did user 89805 ever retweet user 14907?

In [13]:
# Question 10:
# Did user 89805 ever retweet user 14907?
#
# Let us call this quantity as n_x_tweets
# This is the reverse version of Question 9 - so we can use the same code
#
with session.begin_transaction() as higgsboson:
    for i in higgsboson.run('match ({user:89805})-[n_x_tweets:Retweets]->({user:14907}) return count(n_x_tweets)'):
        print(i)

<Record count(n_x_tweets)=1>


### 11. Find out the top 5 users with the highest number of followers?

In [14]:
# Question 11:
# Find out the top 5 users with the highest number of followers?
#
# We should be able to sort the number of followers per user in descending order
# and then we can limit our output to only 5
#
# To avoid confusion, let us consider this as x-y pairs 
# x being followers and y being users
#
with session.begin_transaction() as higgsboson:
    for i in higgsboson.run('match ()-[x:Follows]->(y:User) return(y), count(x) order by count(x) desc limit 5'):
        print(i)

<Record y=<Node id=1502 labels={'User'} properties={'user': 1503}> count(x)=51386>
<Record y=<Node id=205 labels={'User'} properties={'user': 206}> count(x)=48414>
<Record y=<Node id=87 labels={'User'} properties={'user': 88}> count(x)=45221>
<Record y=<Node id=137 labels={'User'} properties={'user': 138}> count(x)=44188>
<Record y=<Node id=1061 labels={'User'} properties={'user': 1062}> count(x)=40120>


### 12. What is the total count of followers of followers of user 89805?

In [15]:
# Question 12:
# What is the total count of followers of followers of user 89805?
#
# The below line is going to return all users who follow 89805
# There should 156
#
with session.begin_transaction() as higgsboson:
    for i in higgsboson.run('match (follower)-[:Follows]->(:User{user:89805}) return follower'):
        print(i)

<Record follower=<Node id=371756 labels={'User'} properties={'user': 371757}>>
<Record follower=<Node id=113222 labels={'User'} properties={'user': 113223}>>
<Record follower=<Node id=51402 labels={'User'} properties={'user': 51403}>>
<Record follower=<Node id=11904 labels={'User'} properties={'user': 11905}>>
<Record follower=<Node id=444473 labels={'User'} properties={'user': 444474}>>
<Record follower=<Node id=113800 labels={'User'} properties={'user': 113801}>>
<Record follower=<Node id=93158 labels={'User'} properties={'user': 93159}>>
<Record follower=<Node id=86367 labels={'User'} properties={'user': 86368}>>
<Record follower=<Node id=129522 labels={'User'} properties={'user': 129523}>>
<Record follower=<Node id=225225 labels={'User'} properties={'user': 225226}>>
<Record follower=<Node id=177655 labels={'User'} properties={'user': 177656}>>
<Record follower=<Node id=115059 labels={'User'} properties={'user': 115060}>>
<Record follower=<Node id=38805 labels={'User'} properties={

In [16]:
# Now let us find the total number of followers of 89805 followers
#
with session.begin_transaction() as higgsboson:
    for i in higgsboson.run('match (who_follows_89805_followers)-[:Follows]-()-[:Follows]->({user:89805}) return count(who_follows_89805_followers)'):
        print(i)

<Record count(who_follows_89805_followers)=172520>
