In [5]:
%load_ext sql

In [7]:
from dotenv import load_dotenv
import os

load_dotenv()

user = os.getenv("MYSQL_USER")
password = os.getenv("MYSQL_PASSWORD")
host = os.getenv("MYSQL_HOST")
port = os.getenv("MYSQL_PORT")
db = os.getenv("MYSQL_DB")

connection_str = f"mysql+pymysql://{user}:{password}@{host}:{port}/{db}"

%sql $connection_str

## 1. Scenario Q1
### Analysis of most active users
You are a Data Analyst at Twitter and are tasked with analysing user engagement. You
need to find out which users have the highest engagement on the platform. For this
analysis, consider a user to be engaged if they have sent messages, liked tweets,
been mentioned, received notifications, and retweeted tweets

In [20]:
%%sql
SELECT
    u.UserID, u.Username, u.FullName,
    COUNT(m.SenderID) AS total_messages_sent,
    COUNT(l.LikeID) AS total_liked_tweets,
    COUNT(men.MentionedUserID) AS total_mentions,
    COUNT(n.NotificationID) AS total_notifications_received,
    COUNT(r.RetweetID) AS total_retweets,
    (
        COUNT(m.SenderID) +
        COUNT(l.LikeID) +
        COUNT(men.MentionedUserID) +
        COUNT(n.NotificationID) +
        COUNT(r.RetweetID)
    ) AS total_engagements
FROM
    users u
INNER JOIN directmessages m
ON u.UserID = m.SenderID
INNER JOIN likes l
ON u.UserID = l.LikeID
INNER JOIN mentions men
ON u.UserID = men.MentionedUserID
INNER JOIN notifications n
ON u.UserID = n.NotificationID
INNER JOIN retweets r
ON u.UserID = r.RetweetID
GROUP BY
    (u.UserID)
ORDER BY
    total_messages_sent DESC, total_liked_tweets DESC,
    total_mentions DESC, total_notifications_received DESC,
    total_retweets DESC
LIMIT 5;

 * mysql+pymysql://root:***@localhost:3306/twitterdb
5 rows affected.


UserID,Username,FullName,total_messages_sent,total_liked_tweets,total_mentions,total_notifications_received,total_retweets,total_engagements
397.0,joseph47,Douglas Nichols,15,15,15,15,15,75
258.0,christopher66,James Gomez,9,9,9,9,9,45
461.0,gkelly,Jeffrey Meadows,8,8,8,8,8,40
268.0,rachelflores,Michelle Myers MD,8,8,8,8,8,40
300.0,michael20,Penny Curtis,8,8,8,8,8,40


## 2. Scenario Q2
### Find Users with Most Followers
Retrieve the top 5 users with the most followers, including their UserID, Username,
FullName, and FollowerCount

In [55]:
%%sql

SELECT
    u.UserID, u.Username, u.FullName,
    COUNT(f.FollowedUserID) AS FollowerCount
FROM
    users u
INNER JOIN followers f
ON u.UserID = f.FollowerUserID
GROUP BY
    u.UserID
ORDER BY
    FollowerCount DESC
LIMIT 5;

 * mysql+pymysql://root:***@localhost:3306/twitterdb
5 rows affected.


UserID,Username,FullName,FollowerCount
363.0,evansimmons,Brandon Mosley,4
153.0,kanerussell,Kristen Wade,4
335.0,trivas,Stacey Brown MD,4
95.0,ryanadams,Christopher Williams,4
456.0,fclark,Melinda Huynh,4


## 3. Scenario Q3
### Identify Users with Unread Notifications
Identify all users who have unread notifications. Display their UserID, Username,
FullName, and the count of UnreadNotifications.

In [30]:
%%sql

SELECT
    u.UserID, u.Username, u.FullName,
    COUNT(n.notificationID) AS total_unread_notifications
FROM
    users u
INNER JOIN notifications n
ON u.UserID = n.UserID
WHERE
    n.IsRead = 0
GROUP BY
    u.UserID
ORDER BY
    total_unread_notifications DESC;

 * mysql+pymysql://root:***@localhost:3306/twitterdb
189 rows affected.


UserID,Username,FullName,total_unread_notifications
80.0,tayloralexis,Lisa Davis,3
353.0,hickmancarrie,Tina Lopez,3
369.0,rlynch,Tyler Wilkerson,3
8.0,katherinerogers,Stacy Chan,3
317.0,cmiller,Sarah Henderson,3
426.0,lwilliams,Savannah Noble,3
466.0,tmccarty,Laura Nguyen,3
424.0,elopez,Duane Everett,3
40.0,thomasjeffery,Bradley Berg,2
408.0,wdavidson,Amber Newman,2


## 4. Scenario Q4
### Identify Most Liked Tweets
Find the top 5 tweets that have received the most likes, including their TweetID and
LikeCount. Also, retrieve the UserID, Username, and FullName of the users who
posted these tweets

In [33]:
%%sql

SELECT
    u.UserID, u.Username, u.FullName,
    t.TweetID, t.TweetText,
    COUNT(l.LikeID) AS LikeCount
FROM users u
INNER JOIN tweets t
ON u.UserID = t.UserID
INNER JOIN likes l
ON t.TweetID = l.TweetID
GROUP BY
    u.UserID, t.TweetID
ORDER BY
    LikeCount DESC
LIMIT 5;

 * mysql+pymysql://root:***@localhost:3306/twitterdb
5 rows affected.


UserID,Username,FullName,TweetID,TweetText,LikeCount
305.0,jason06,Dennis Davis,245.0,Medical executive great fear much term rate establish. Soon air view Mrs attack room feeling involve. Decision character theory memory yes. Tough win walk really discuss. Choose nor exactly paper. House student play win popular member need floor.,5
100.0,rmartin,Jessica Carpenter,73.0,Everyone no garden evidence contain. Safe up include skin ahead cell cultural. Tonight anything through skin dog near. Morning staff into keep third responsibility. Campaign heart radio others much. Seek choose yet grow summer so certainly. Campaign agent form enough black.,4
372.0,nortiz,James Hernandez,266.0,New improve relationship all participant herself popular. Detail only strong color. Power and life hour her especially couple program. Claim product edge. Soon according generation we sit. Someone cell indicate because whose politics. Alone film successful fine prove these.,4
44.0,mcmahonanna,Morgan Jackson,217.0,Republican white stop catch senior. What perform more budget rest produce. Back shake government card. Me must strong help health rate. Look name season material nothing. Result never never spring still size pass human. Forward nature sing purpose.,4
491.0,johnsonmonica,Cheyenne Aguirre,180.0,Traditional spring operation community could our. Real approach many treatment. Heart budget program particular half send operation window. Then phone character shake.,4


## 5. Scenario Q5
### Determine User Growth Over Time
Analyse the growth of users joining the platform over time. Retrieve the count of users
who joined each month and year

In [39]:
%%sql
SELECT
    MONTH(JoinDate) AS month_join,
    YEAR(JoinDate) AS year_join,
    COUNT(UserID) AS number_of_users
FROM
    users
GROUP BY
    year_join, month_join
ORDER BY
    year_join ASC, month_join ASC;

 * mysql+pymysql://root:***@localhost:3306/twitterdb
61 rows affected.


month_join,year_join,number_of_users
9,2018,2
10,2018,7
11,2018,9
12,2018,7
1,2019,12
2,2019,4
3,2019,14
4,2019,11
5,2019,6
6,2019,5


## 6. Scenario Q6
### Analyse Notification Engagement
Evaluate how users are engaging with notifications. Find the number of notifications
sent to users, the number of notifications read, and the read rate (percentage of
notifications read) for each NotificationType.


In [52]:
%%sql
SELECT
    NotificationType,
    COUNT(NotificationID) AS total_notifications_sent,
    SUM(CASE WHEN IsRead = 1 THEN 1 ELSE 0 END) AS read_notifications,
    CONCAT(ROUND((SUM(CASE WHEN IsRead = 1 THEN 1 ELSE 0 END)/COUNT(NotificationID)) * 100, 2), '%') AS read_percentage
FROM
    notifications
GROUP BY
    NotificationType
ORDER BY
    total_notifications_sent DESC, read_notifications DESC, read_percentage DESC;


 * mysql+pymysql://root:***@localhost:3306/twitterdb
3 rows affected.


NotificationType,total_notifications_sent,read_notifications,read_percentage
Follow,176,85,48.30%
Like,167,91,54.49%
Comment,157,82,52.23%


## 7. Scenario Q7
### Identify Influential Users and Their Impact on User Engagement
Twitter is interested in identifying influential users on its platform to understand their
impact on user engagement. Influential users are characterized by a high number of
followers and frequent interactions (e.g., messages, likes, mentions, retweets). By
identifying these users, Twitter can analyze their content, engagement patterns, and
potentially leverage their influence for marketing and promotional activities.
Specific Tasks:
#### a. Identify Influential Users: Find the top 5 users with the most followers and list their
UserID, Username, FullName, and the number of followers.
#### b. Analyze User Engagement: For each of the identified influential users, calculate
their engagement on the platform by counting the number of messages sent,
tweets liked, mentions received, notifications received, and tweets retweeted.

In [67]:
%%sql

WITH TopFollowers AS (
    SELECT 
        f.FollowedUserID AS UserID,
        COUNT(f.FollowerID) AS NumberOfFollowers
    FROM followers f
    GROUP BY f.FollowedUserID
    ORDER BY COUNT(f.FollowerID) DESC
    LIMIT 5
)
SELECT
    u.UserID,
    u.Username,
    u.FullName,
    tf.NumberOfFollowers,
    COUNT(DISTINCT m.MessageID) AS NumberOfMessagesSent,
    COUNT(DISTINCT l.LikeID) AS TweetsLiked,
    COUNT(DISTINCT men.MentionID) AS MentionsReceived,
    COUNT(DISTINCT n.NotificationID) AS NotificationsReceived,
    COUNT(DISTINCT r.RetweetID) AS TweetsRetweeted
FROM TopFollowers tf
JOIN users u ON u.UserID = tf.UserID
LEFT JOIN directmessages m ON u.UserID = m.SenderID
LEFT JOIN likes l ON u.UserID = l.UserID
LEFT JOIN mentions men ON u.UserID = men.UserID
LEFT JOIN notifications n ON u.UserID = n.UserID
LEFT JOIN retweets r ON u.UserID = r.UserID
GROUP BY u.UserID, u.Username, u.FullName, tf.NumberOfFollowers
ORDER BY tf.NumberOfFollowers DESC;


 * mysql+pymysql://root:***@localhost:3306/twitterdb
5 rows affected.


UserID,Username,FullName,NumberOfFollowers,NumberOfMessagesSent,TweetsLiked,MentionsReceived,NotificationsReceived,TweetsRetweeted
220.0,crystal20,Julie Porter,5,1,1,0,1,5
2.0,joseph66,Sydney Taylor,4,0,0,1,0,2
27.0,vpierce,Anne Lopez,4,2,0,1,0,0
31.0,cgardner,Jessica Hansen,4,3,3,2,3,0
59.0,laurasparks,Jennifer Smith,4,0,0,1,0,1


## 8. Scenario Q8
### Track User Retention and Platform Engagement Over Time
Twitter aims to enhance user experience and engagement on its platform to ensure
user retention. The company wants to analyse the activities of users who joined the
platform within the last year and assess their engagement levels to identify patterns
and areas for improvement.
Specific Tasks:
### a. User Retention Analysis: Determine the number of users who joined in the last year
and are still active on the platform, based on their engagement in various activities
(messages, likes, mentions, retweets) in the last month.
### b. Engagement Analysis: For the retained users, analyse the frequency of their
engagement in different activities on the platform.

In [86]:
%%sql
# a. User Retention Analysis: Determine the number of users who joined in the last year

CREATE VIEW RecentActiveUsers AS (
    SELECT
        u.UserID, u.Username, u.FullName, u.JoinDate,
        COUNT(m.MessageID) AS MessagesSentLastMonth 
    FROM 
        users u
    LEFT JOIN
        directmessages m
    ON u.UserID = m.SenderID
    WHERE
        u.JoinDate >= DATE_SUB(CURDATE(), INTERVAL 1 YEAR) AND
        (m.`Timestamp` >= DATE_SUB(CURDATE(), INTERVAL 1 MONTH))
    GROUP BY
        u.UserID, u.Username, u.FullName
    ORDER BY
        MessagesSentLastMonth DESC
)

 * mysql+pymysql://root:***@localhost:3306/twitterdb
0 rows affected.


[]

In [87]:
%%sql
# b. Engagement analysis: 

SELECT
    rau.UserID, rau.Username, rau.FullName, rau.JoinDate, rau.MessagesSentLastMonth,
    COUNT(l.UserID) AS likeMade,
    COUNT(men.MentionID) AS MentionsMade,
    COUNT(n.NotificationID) AS NotificationMade,
    COUNT(r.RetweetID) AS RetweetMade,
    COUNT(t.TweetID) AS TweetMade
FROM
    RecentActiveUsers rau
LEFT JOIN likes l
ON rau.UserID = l.UserID
LEFT JOIN mentions men
ON rau.UserID = men.MentionedUserID
LEFT JOIN notifications n
ON rau.UserID = n.UserID
LEFT JOIN retweets r
ON rau.UserID = r.UserID
LEFT JOIN tweets t
ON rau.UserID = t.UserID
GROUP BY
    rau.UserID, rau.Username, rau.FullName
ORDER BY
    rau.JoinDate ASC;


 * mysql+pymysql://root:***@localhost:3306/twitterdb
0 rows affected.


UserID,Username,FullName,JoinDate,MessagesSentLastMonth,likeMade,MentionsMade,NotificationMade,RetweetMade,TweetMade


## 9. Scenario Q9
### Analyse the Most Common Type of User Notifications
Twitter wants to understand which type of notifications users receive most frequently
to optimize user engagement and notification relevance. By analysing the most
common notification types, Twitter can tailor its notification system to enhance user
experience.
The task is to Identify the most common notification type received by users. Display
the NotificationType and the count of notifications for each type, ordered by the count
in descending order.

In [111]:
%%sql
SELECT
    notificationtype,
    COUNT(DISTINCT notificationID) AS NumberOfNotification
FROM
    notifications
GROUP BY
    notificationtype
ORDER BY
    NumberOfNotification DESC;

 * mysql+pymysql://root:***@localhost:3306/twitterdb
3 rows affected.


notificationtype,NumberOfNotification
Follow,176
Like,167
Comment,157


## 10.Scenario Q10
### Discover Users with High Engagement but Low Follower Count
Twitter is interested in identifying users who are highly engaged on the platform but
have relatively few followers. Recognizing and promoting such users could potentially
diversify the content on the platform and improve overall user engagement.
Specific Task:
Find the top 5 users who have sent the most messages but have fewer than 100
followers. Display their UserID, Username, FullName, MessageCount, and
FollowerCount

In [94]:
%%sql
SELECT
    u.UserID,
    u.Username,
    u.FullName,
    COUNT(DISTINCT m.MessageID) AS MessageCount,
    COUNT(DISTINCT f.FollowerID) AS FollowerCount
FROM
    users u
LEFT JOIN directmessages m
ON u.UserID = m.SenderID
LEFT JOIN followers f
ON u.UserID = f.FollowedUserID
GROUP BY
    u.UserID, u.Username, u.FullName
HAVING
    FollowerCount < 100
ORDER BY
    MessageCount DESC
LIMIT 5;


 * mysql+pymysql://root:***@localhost:3306/twitterdb
5 rows affected.


UserID,Username,FullName,MessageCount,FollowerCount
148.0,carolyn81,Tracy Mccall,4,2
210.0,burnsphillip,Ethan Reyes,4,0
303.0,camposjulia,Felicia Moody,4,1
381.0,jamesrichards,Mr. Larry Oliver,4,1
413.0,huffmanmegan,Jeremy Holt,4,0


## 11. Scenario Q11
### Identify Users Who Have Not Engaged Recently
Twitter aims to maintain high user engagement levels on its platform. Identifying users
who have not engaged in any activities recently can help Twitter develop targeted
strategies to re-engage these users and improve overall platform activity.
Specific Task:
List the UserID, Username, and FullName of users who have not sent messages, or
tweets in the last 30 days.

In [97]:
%%sql
SELECT
    u.UserID, u.Username, u.FullName
FROM
    users u
WHERE u.UserID NOT IN
    (
        SELECT dm.SenderID
        FROM directmessages dm
        WHERE dm.`Timestamp` >= DATE_SUB(CURDATE(), INTERVAL 30 DAY)

        UNION

        SELECT t.UserID
        FROM tweets t
        WHERE t.`Timestamp` >= DATE_SUB(CURDATE(), INTERVAL 30 DAY)
    );

 * mysql+pymysql://root:***@localhost:3306/twitterdb


500 rows affected.


UserID,Username,FullName
1.0,hernandezelizabeth,Gary Murray
2.0,joseph66,Sydney Taylor
3.0,charlesjohnson,Maria Mckay
4.0,tinakelly,Laura Taylor
5.0,hubbardkelly,Joshua Jones
6.0,fmckee,Michael Tran
7.0,jasmineberry,Mark James
8.0,katherinerogers,Stacy Chan
9.0,jordansusan,William Edwards
10.0,davisgina,Teresa Carr


In [100]:
%%sql

SELECT
    u.userID, u.username, u.Fullname
FROM
    users u
WHERE NOT EXISTS
    (
        SELECT dm.SenderID
        FROM directmessages dm
        WHERE dm.SenderID = u.UserID AND
        dm.`Timestamp` >= NOW() - INTERVAL 30 DAY AND
        dm.`Timestamp` <= NOW()
    )
AND NOT EXISTS
    (
        SELECT t.UserID
        FROM tweets t
        WHERE t.userID = u.UserID AND
        t.`Timestamp` >= NOW() - INTERVAL 30 DAY AND
        t.`Timestamp` <= NOW()
    );

 * mysql+pymysql://root:***@localhost:3306/twitterdb
500 rows affected.


userID,username,Fullname
1.0,hernandezelizabeth,Gary Murray
2.0,joseph66,Sydney Taylor
3.0,charlesjohnson,Maria Mckay
4.0,tinakelly,Laura Taylor
5.0,hubbardkelly,Joshua Jones
6.0,fmckee,Michael Tran
7.0,jasmineberry,Mark James
8.0,katherinerogers,Stacy Chan
9.0,jordansusan,William Edwards
10.0,davisgina,Teresa Carr


## 12. Scenario Q12
### Find Most Popular Tweets Based on User Interaction
Twitter wants to identify and promote the most popular tweets on its platform. A tweet’s
popularity can be assessed by the number of likes, retweets, and mentions it receives. 
Highlighting popular tweets can enhance user engagement and attract more users to
the platform.
The task is to Identify the top 3 tweets that have received the most combined user
interactions (likes, retweets, mentions). Display the TweetID, UserID (of the user who
posted the tweet), Username, FullName, and the total count of interactions

In [110]:
%%sql
SELECT
    u.UserID, u.Username, u.FullName,
    t.TweetID,
    COUNT(DISTINCT l.LikeID) AS LikeCount,
    COUNT(DISTINCT rt.RetweetID) AS RetweetCount,
    COUNT(DISTINCT men.MentionID) AS MentionCount,
    (
        COUNT(DISTINCT l.LikeID) +
        COUNT(DISTINCT rt.RetweetID) +
        COUNT(DISTINCT men.MentionID)
    ) AS TotalCountInteractions
FROM
    users u
LEFT JOIN tweets t
ON u.UserID = t.UserID
LEFT JOIN likes l
ON t.TweetID = l.TweetID
LEFT JOIN retweets rt
ON t.TweetID = rt.OriginalTweetID
LEFT JOIN mentions men
ON t.TweetID = men.TweetID
GROUP BY
    u.UserID, u.Username, u.FullName, t.TweetID
ORDER BY
    TotalCountInteractions DESC
LIMIT 3;

 * mysql+pymysql://root:***@localhost:3306/twitterdb
3 rows affected.


UserID,Username,FullName,TweetID,LikeCount,RetweetCount,MentionCount,TotalCountInteractions
305.0,jason06,Dennis Davis,245.0,5,2,4,11
83.0,jeremysandoval,Edward Summers,426.0,2,4,3,9
44.0,mcmahonanna,Morgan Jackson,217.0,4,3,1,8


## 13. Scenario Q13
### Find Users Who Have Never Been Mentioned
Twitter is interested in increasing interactions among users. The company wants to
identify users who have never been mentioned by others so that it can encourage more
interactions involving these users.
The task is to retrieve a list of UserID, Username, and FullName of users who have
never been mentioned in any tweets

In [113]:
%%sql
SELECT
    u.UserID, u.Username, u.FullName
FROM
    users u
WHERE NOT EXISTS
    (
        SELECT
            men.MentionedUserID
        FROM
            mentions men
        WHERE
            men.MentionedUserID = u.UserID
    );

 * mysql+pymysql://root:***@localhost:3306/twitterdb
180 rows affected.


UserID,Username,FullName
2.0,joseph66,Sydney Taylor
5.0,hubbardkelly,Joshua Jones
6.0,fmckee,Michael Tran
7.0,jasmineberry,Mark James
8.0,katherinerogers,Stacy Chan
11.0,tashley,Nicole Davis
12.0,kelli25,Scott White
16.0,david76,Brandi Quinn
29.0,edwardhooper,David Franco
30.0,robertscarrie,Mary Elliott


## 14. Scenario Q14
##### Analyse the Average Number of Followers per User
Twitter wants to analyse the average number of followers that users on the platform
have. This will help Twitter understand the distribution of followers and identify whether
the platform has a balanced user interaction or is dominated by a few users with a
large number of followers. The task is to calculate the average number of followers
that users on the platform have.

In [115]:
%%sql

SELECT 
    ROUND(AVG(FollowersCount), 2) AS AvgFollowersPerUser
FROM (
    SELECT 
        FollowedUserID AS UserID,
        COUNT(FollowerUserID) AS FollowersCount
    FROM Followers
    GROUP BY FollowedUserID
) AS UserFollowers;


 * mysql+pymysql://root:***@localhost:3306/twitterdb
1 rows affected.


AvgFollowersPerUser
1.59
