# DSI Social Network

Author: Ananya Nimbalkar

This notebook contains all code for the necessary node & edge creation, as well as the queries for mini project 3.

## Install & Load Libraries

In [1]:
!pip install neo4j
import neo4j
import pandas as pd




[notice] A new release of pip is available: 24.0 -> 24.3.1
[notice] To update, run: C:\Users\anani\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\python.exe -m pip install --upgrade pip


## Wipe Database
Wipe out database to ensure creation started from scratch.

In [2]:
def connect_db():
    #auth from .yml file, username password
    driver = neo4j.GraphDatabase.driver(uri="neo4j://localhost:7687", auth=("neo4j","password"))
    session = driver.session(database="neo4j") # Like use mydb;
    return session
    
def wipe_out_db(session):
    # wipe out database by deleting all nodes and relationships
    
    # similar to SELECT * FROM graph_db in SQL
    query = "match (node)-[relationship]->() delete node, relationship"
    session.run(query)
    
    query = "match (node) delete node"
    session.run(query)

session = connect_db()
wipe_out_db(session) 

## Data Insertion
Insert variety of data for the DSI social network (Nodes: Users, Posts, Groups, Events, Badges; Edges: FRIEND, CREATED, MEMBER_OF, ATTENDING, AWARDED)

- All users have a major of Data Science as this is a DSI social network
- Users' emails follow firstname.lastname@vanderbilt.edu format
- Interests align with data science related topics or industry preferences (i.e healthcare)

In [None]:
# Define the Cypher query for data insertion
dsi_social_network_query = '''
// Users
CREATE (Alice:User {user_id: 1, name: 'Alice Smith', email: 'alice.smith@vanderbilt.edu', join_date: '2018-06-15', major: 'Data Science', interests: ['Machine Learning', 'Healthcare']}),
       (Bob:User {user_id: 2, name: 'Bob Jones', email: 'bob.jones@vanderbilt.edu', join_date: '2019-04-20', major: 'Data Science', interests: ['AI Ethics', 'Data Engineering','Natural Language Processing']}),
       (Charlie:User {user_id: 3, name: 'Charlie Brown', email: 'charlie.brown@vanderbilt.edu', join_date: '2020-09-05', major: 'Data Science', interests: ['Natural Language Processing', 'Finance']}),
       (Diana:User {user_id: 4, name: 'Diana Prince', email: 'diana.prince@vanderbilt.edu', join_date: '2021-11-11', major: 'Data Science', interests: ['Computer Vision', 'Robotics']}),
       (Evan:User {user_id: 5, name: 'Evan Lee', email: 'evan.lee@vanderbilt.edu', join_date: '2022-03-15', major: 'Data Science', interests: ['Data Visualization', 'Statistics']}),
       (Fay:User {user_id: 6, name: 'Fay Wong', email: 'fay.wong@vanderbilt.edu', join_date: '2018-07-21', major: 'Data Science', interests: ['Deep Learning', 'Natural Language Processing']}),
       (Grace:User {user_id: 7, name: 'Grace Kim', email: 'grace.kim@vanderbilt.edu', join_date: '2019-12-10', major: 'Data Science', interests: ['Data Privacy', 'AI Ethics']}),
       (Henry:User {user_id: 8, name: 'Henry Ford', email: 'henry.ford@vanderbilt.edu', join_date: '2023-01-22', major: 'Data Science', interests: ['Bioinformatics', 'Genomics']}),
       (Ivy:User {user_id: 9, name: 'Ivy Nguyen', email: 'ivy.nguyen@vanderbilt.edu', join_date: '2018-10-25', major: 'Data Science', interests: ['Quantum Computing', 'Optimization']}),
       (Jake:User {user_id: 10, name: 'Jake Miller', email: 'jake.miller@vanderbilt.edu', join_date: '2020-08-14', major: 'Data Science', interests: ['Data Engineering', 'Big Data']}),
       (Laura:User {user_id: 11, name: 'Laura Chen', email: 'laura.chen@vanderbilt.edu', join_date: '2021-04-17', major: 'Data Science', interests: ['Cybersecurity', 'Data Privacy']}),
       (Matt:User {user_id: 12, name: 'Matt Green', email: 'matt.green@vanderbilt.edu', join_date: '2020-06-30', major: 'Data Science', interests: ['Computer Vision', 'Healthcare', 'Big Data']})

// Posts
CREATE (Post1:Post {post_id: 1, content: 'Excited to start my journey in data science!', created_date: '2021-07-01', topic: 'Career Advice', likes_count: 10, comments_count: 2, visibility: 'public'}),
       (Post2:Post {post_id: 2, content: 'Anyone interested in a project on NLP?', created_date: '2022-01-15', topic: 'Project Sharing', likes_count: 15, comments_count: 5, visibility: 'friends-only'}),
       (Post3:Post {post_id: 3, content: 'Machine Learning in Finance is fascinating!', created_date: '2021-11-20', topic: 'Finance', likes_count: 20, comments_count: 3, visibility: 'public'}),
       (Post4:Post {post_id: 4, content: 'Exploring AI Ethics and its impact.', created_date: '2020-08-25', topic: 'AI Ethics', likes_count: 8, comments_count: 1, visibility: 'public'}),
       (Post5:Post {post_id: 5, content: 'Tips on Data Engineering best practices?', created_date: '2023-02-01', topic: 'Data Engineering', likes_count: 5, comments_count: 0, visibility: 'public'}),
       (Post6:Post {post_id: 6, content: 'Looking for a team to join for a statistics project.', created_date: '2022-09-05', topic: 'Project Sharing', likes_count: 12, comments_count: 3, visibility: 'friends-only'}),
       (Post7:Post {post_id: 7, content: 'The future of Quantum Computing in AI.', created_date: '2019-04-10', topic: 'Quantum Computing', likes_count: 22, comments_count: 4, visibility: 'public'}),
       (Post8:Post {post_id: 8, content: 'Computer Vision techniques I recently learned.', created_date: '2021-05-18', topic: 'Computer Vision', likes_count: 18, comments_count: 2, visibility: 'public'}),
       (Post9:Post {post_id: 9, content: 'Applications of Bioinformatics in disease research.', created_date: '2023-03-10', topic: 'Bioinformatics', likes_count: 30, comments_count: 6, visibility: 'public'}),
       (Post10:Post {post_id: 10, content: 'Big Data challenges and how to solve them.', created_date: '2019-12-05', topic: 'Big Data', likes_count: 17, comments_count: 1, visibility: 'public'}),
       (Post11:Post {post_id: 11, content: 'Navigating the world of Cybersecurity', created_date: '2020-03-22', topic: 'Cybersecurity', likes_count: 25, comments_count: 5, visibility: 'public'}),
       (Post12:Post {post_id: 12, content: 'Data Privacy issues in the modern age', created_date: '2021-09-01', topic: 'Data Privacy', likes_count: 10, comments_count: 2, visibility: 'public'})

// Groups
CREATE (Group1:Group {group_id: 1, name: 'Healthcare AI', description: 'For those interested in AI applications in healthcare.', created_date: '2021-06-10', topic: 'Healthcare'}),
       (Group2:Group {group_id: 2, name: 'NLP Enthusiasts', description: 'A group for people interested in NLP projects.', created_date: '2022-06-15', topic: 'Natural Language Processing'}),
       (Group3:Group {group_id: 3, name: 'AI Ethics & Society', description: 'Discuss the ethical implications of AI.', created_date: '2019-03-25', topic: 'AI Ethics'}),
       (Group4:Group {group_id: 4, name: 'Quantum Computing Group', description: 'For those interested in quantum applications in AI.', created_date: '2018-09-30', topic: 'Quantum Computing'}),
       (Group5:Group {group_id: 5, name: 'Computer Vision Innovators', description: 'Exploring computer vision and its applications.', created_date: '2020-10-20', topic: 'Computer Vision'})

// Events
CREATE (Event1:Event {event_id: 1, name: 'DSI Workshop on Machine Learning', location: 'Online', event_date: '2023-08-01', topic: 'Machine Learning'}),
       (Event2:Event {event_id: 2, name: 'NLP Networking Event', location: 'Campus Hall A', event_date: '2023-05-15', topic: 'Natural Language Processing'}),
       (Event3:Event {event_id: 3, name: 'Ethics in AI Panel', location: 'Main Auditorium', event_date: '2022-11-12', topic: 'AI Ethics'}),
       (Event4:Event {event_id: 4, name: 'Quantum Computing Conference', location: 'Research Lab', event_date: '2023-02-20', topic: 'Quantum Computing'}),
       (Event5:Event {event_id: 5, name: 'Data Engineering Meetup', location: 'Tech Building Room 202', event_date: '2023-04-05', topic: 'Data Engineering'})

// Badges
CREATE (Badge1:Badge {badge_id: 1, name: 'Top Contributor', description: 'Awarded for high activity on the platform', awarded_date: '2023-01-05'}),
       (Badge2:Badge {badge_id: 2, name: 'Data Science Mentor', description: 'Awarded for mentoring others', awarded_date: '2023-02-15'}),
       (Badge3:Badge {badge_id: 3, name: 'AI Ethics Expert', description: 'Recognized for expertise in AI ethics', awarded_date: '2023-03-20'})

// Relationships
CREATE (Alice)-[:FRIEND {since: '2019-05-10'}]->(Bob),
       (Bob)-[:FRIEND {since: '2020-10-15'}]->(Charlie),
       (Charlie)-[:FRIEND {since: '2021-11-11'}]->(Diana),
       (Matt)-[:FRIEND {since: '2023-08-23'}]->(Charlie),
       (Ivy)-[:FRIEND {since: '2023-03-14'}]->(Henry),
       (Henry)-[:FRIEND {since: '2024-11-05'}]->(Laura),
       (Alice)-[:CREATED]->(Post1),
       (Bob)-[:CREATED]->(Post2),
       (Charlie)-[:CREATED]->(Post3),
       (Bob)-[:CREATED]->(Post4),
       (Bob)-[:CREATED]->(Post5),
       (Fay)-[:CREATED]->(Post6),
       (Ivy)-[:CREATED]->(Post7),
       (Matt)-[:CREATED]->(Post8),
       (Henry)-[:CREATED]->(Post9),
       (Matt)-[:CREATED]->(Post10),
       (Laura)-[:CREATED]->(Post11),
       (Bob)-[:CREATED]->(Post12),
       (Alice)-[:MEMBER_OF {join_date: '2021-06-11'}]->(Group1),
       (Bob)-[:MEMBER_OF {join_date: '2022-06-15'}]->(Group2),
       (Charlie)-[:MEMBER_OF {join_date: '2023-02-05'}]->(Group2),
       (Fay)-[:MEMBER_OF {join_date: '2022-10-12'}]->(Group2),
       (Grace)-[:MEMBER_OF {join_date: '2023-10-12'}]->(Group3),
       (Bob)-[:MEMBER_OF {join_date: '2023-10-12'}]->(Group3),
       (Ivy)-[:MEMBER_OF {join_date: '2022-10-12'}]->(Group4),
       (Bob)-[:AWARDED]->(Badge1),
       (Evan)-[:AWARDED {awarded_date: '2023-02-01'}]->(Badge2),
       (Grace)-[:AWARDED {awarded_date: '2024-10-31'}]->(Badge2),
       (Grace)-[:AWARDED {awarded_date: '2023-03-01'}]->(Badge3),
       (Charlie)-[:ATTENDING {rsvp_date: '2023-07-20'}]->(Event1),
       (Alice)-[:ATTENDING {rsvp_date: '2023-07-31'}]->(Event1),
       (Matt)-[:ATTENDING {rsvp_date: '2023-07-05'}]->(Event1),
       (Fay)-[:ATTENDING {rsvp_date: '2023-05-14'}]->(Event2),
       (Charlie)-[:ATTENDING {rsvp_date: '2023-05-01'}]->(Event2),
       (Grace)-[:ATTENDING {rsvp_date: '2022-11-01'}]->(Event3),
       (Jake)-[:ATTENDING {rsvp_date: '2023-04-01'}]->(Event5),
       (Matt)-[:ATTENDING {rsvp_date: '2023-03-30'}]->(Event5)
'''

# Execute the query in the Neo4j session
session.run(dsi_social_network_query)


<neo4j._sync.work.result.Result at 0x1f6e586b820>

## Data Retrieval & Querying
user_id is used when querying to ensure uniqueness and to avoid ambiguity, as there could be a user who joins with the same name in the future.

### 1. Retrieve a property of a specific User

In [4]:
# Can change email to desired property
query = '''
MATCH (u:User {user_id: 1})
RETURN u.name AS name, u.email AS email
'''
result = session.run(query)
user_property_df = pd.DataFrame([record.data() for record in result])
print(user_property_df)

          name                       email
0  Alice Smith  alice.smith@vanderbilt.edu


### 2. Find all Posts created by a specific User

In [5]:
query = '''
MATCH (u:User {user_id: 2})-[:CREATED]->(p:Post)
RETURN p.post_id AS post_id, p.content AS content, p.created_date AS created_date, p.topic AS topic, p.likes_count AS total_likes, p.comments_count AS total_comments, p.visibility AS visibility
'''
result = session.run(query)
user_posts_df = pd.DataFrame([record.data() for record in result])
print(user_posts_df)

   post_id                                   content created_date  \
0        2    Anyone interested in a project on NLP?   2022-01-15   
1        5  Tips on Data Engineering best practices?   2023-02-01   
2       12     Data Privacy issues in the modern age   2021-09-01   
3        4       Exploring AI Ethics and its impact.   2020-08-25   

              topic  total_likes  total_comments    visibility  
0   Project Sharing           15               5  friends-only  
1  Data Engineering            5               0        public  
2      Data Privacy           10               2        public  
3         AI Ethics            8               1        public  


### 3. Find all Users who posted a specific topic of Post

In [6]:
query = '''
MATCH (u:User)-[:CREATED]->(p:Post {topic: 'Project Sharing'})
RETURN u.user_id AS user_id, u.name AS name
'''
result = session.run(query)
users_by_topic_df = pd.DataFrame([record.data() for record in result])
print(users_by_topic_df)

   user_id       name
0        2  Bob Jones
1        6   Fay Wong


### 4. Find common interests between two specific Users

In [7]:
query = '''
MATCH (u1:User {user_id: 1}), (u2:User {user_id: 12})
RETURN [interest IN u1.interests WHERE interest IN u2.interests] AS common_interests
'''
result = session.run(query)
common_interests_df = pd.DataFrame([record.data() for record in result])
print(common_interests_df)

  common_interests
0     [Healthcare]


### 5. Retrieve top 3 Users who created most Posts

In [8]:
query = '''
MATCH (u:User)-[:CREATED]->(p:Post)
RETURN u.user_id AS user_id, u.name AS name, COUNT(p) AS post_count
ORDER BY post_count DESC
LIMIT 3
'''
result = session.run(query)
top_post_creators_df = pd.DataFrame([record.data() for record in result])
print(top_post_creators_df)

   user_id           name  post_count
0        2      Bob Jones           4
1       12     Matt Green           2
2        3  Charlie Brown           1


### 6. Retrieve Users who haven’t created any Posts

In [9]:
query = '''
MATCH (u:User)
WHERE NOT (u)-[:CREATED]->(:Post)
RETURN u.user_id AS user_id, u.name AS name
'''
result = session.run(query)
users_without_posts_df = pd.DataFrame([record.data() for record in result])
print(users_without_posts_df)

   user_id          name
0        4  Diana Prince
1        5      Evan Lee
2        7     Grace Kim
3       10   Jake Miller


### 7. Given two Users, identify if they are indirectly connected through a chain of friends and, if so, return the connecting path

In [10]:
query = '''
MATCH path = (u1:User {user_id: 1})-[:FRIEND*1..5]-(u2:User {user_id: 3})
RETURN path
LIMIT 1
'''
result = session.run(query)
friendship_path_df = pd.DataFrame([record.data() for record in result])
print(friendship_path_df)

                                                path
0  [{'major': 'Data Science', 'user_id': 1, 'join...


Here is another way to query the chain of friends that outputs a readible format for the user to understand:

In [11]:
query = '''
MATCH path = (u1:User {user_id: 1})-[:FRIEND*1..5]-(u2:User {user_id: 3})
RETURN path
LIMIT 1
'''
result = session.run(query)

# Extract nodes and relationships
for record in result:
    path = record["path"]  # Get the path object
    nodes = path.nodes      # Get the nodes along the path
    relationships = path.relationships  # Get the relationships along the path

    # Build a human-readable output
    output = []
    for i in range(len(relationships)):
        start_node = nodes[i]
        end_node = nodes[i + 1]
        relationship = relationships[i]

        # Append to output
        output.append(
            f"{start_node['name']} is friends with {end_node['name']}"
        )

# Print the output
print(" -> ".join(output))

Alice Smith is friends with Bob Jones -> Bob Jones is friends with Charlie Brown


### 8. Write a query to identify orphaned Users

In [12]:
query = '''
MATCH (u:User)
WHERE NOT (u)-[:FRIEND]-()
RETURN u.user_id AS user_id, u.name AS name
'''
result = session.run(query)
orphaned_users_df = pd.DataFrame([record.data() for record in result])
print(orphaned_users_df)

   user_id         name
0        5     Evan Lee
1        6     Fay Wong
2        7    Grace Kim
3       10  Jake Miller


Identifying orphaned users is useful when trying to improve user engagement & retention. These users lack connections, thus they may feel isolated and less prone to interacting on the platform. Once these users are recognized, we can use targeted strategies to increase their engagement by:

- Connection Recommendations: Suggest relevant users, posts, or groups based on shared interests to help them build connections, integrating them into the community.
- Support & Onboarding: Personalized mentorship programs can be offered that empowers such users to make their first connections.
- Event Invitations: Event organizers can send invites to these users so they can come and converse with others, fostering a sense of belonging and active participation.

Therefore, this would help create a more welcoming and engaging environment, boosting overall user satisfaction and retention.

### Additional: Retrieve all Badges Awarded to each User

In [13]:
query = '''
MATCH (u:User)-[a:AWARDED]->(b:Badge)
RETURN u.name AS user_name, b.name AS badge_name, b.description AS badge_description, a.awarded_date AS awarded_date
ORDER BY u.name, awarded_date
'''
result = session.run(query)
awarded_badges_df = pd.DataFrame([record.data() for record in result])
print(awarded_badges_df)


   user_name           badge_name                          badge_description  \
0  Bob Jones      Top Contributor  Awarded for high activity on the platform   
1   Evan Lee  Data Science Mentor               Awarded for mentoring others   
2  Grace Kim     AI Ethics Expert      Recognized for expertise in AI ethics   
3  Grace Kim  Data Science Mentor               Awarded for mentoring others   

  awarded_date  
0         None  
1   2023-02-01  
2   2023-03-01  
3   2024-10-31  
