Popularity Percentage

Find the popularity percentage for each user on Meta/Facebook. The popularity percentage is defined as the total number of friends the user has divided by the total number of users on the platform, then converted into a percentage by multiplying by 100.
Output each user along with their popularity percentage. Order records in ascending order by user id.
The 'user1' and 'user2' column are pairs of friends.

In [1]:
import pandas as pd
import numpy as np

In [4]:
facebook_friends = pd.read_csv("../CSV/facebook_friends.csv")
facebook_friends = facebook_friends.iloc[:, :2]
facebook_friends

Unnamed: 0,user1,user2
0,2,1
1,1,3
2,4,1
3,1,5
4,1,6
5,2,6
6,7,2
7,8,3
8,3,9


In [6]:
concat = np.concatenate([facebook_friends.user1.values,facebook_friends.user2.values])
concat

array([2, 1, 4, 1, 1, 2, 7, 8, 3, 1, 3, 1, 5, 6, 6, 2, 3, 9])

In [7]:
concat_uniq = np.unique(concat)
concat_uniq

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [8]:
concat_len = len(concat_uniq)
concat_len

9

In [5]:
concatvalues =len(np.unique(np.concatenate([facebook_friends.user1.values,facebook_friends.user2.values])))
concatvalues

9

In [9]:
revert = facebook_friends.rename(columns= {'user1':'user2','user2':'user1'})
revert

Unnamed: 0,user2,user1
0,2,1
1,1,3
2,4,1
3,1,5
4,1,6
5,2,6
6,7,2
7,8,3
8,3,9


In [10]:
final = pd.concat([facebook_friends, revert],sort = False).drop_duplicates()
final

Unnamed: 0,user1,user2
0,2,1
1,1,3
2,4,1
3,1,5
4,1,6
5,2,6
6,7,2
7,8,3
8,3,9
0,1,2


In [11]:
result = final.groupby('user1').size().to_frame('count').reset_index()
result

Unnamed: 0,user1,count
0,1,5
1,2,3
2,3,3
3,4,1
4,5,1
5,6,2
6,7,1
7,8,1
8,9,1


In [12]:
result['popularity_percent'] = 100*(result['count'] /concatvalues)
result

Unnamed: 0,user1,count,popularity_percent
0,1,5,55.555556
1,2,3,33.333333
2,3,3,33.333333
3,4,1,11.111111
4,5,1,11.111111
5,6,2,22.222222
6,7,1,11.111111
7,8,1,11.111111
8,9,1,11.111111


In [13]:
result = result[['user1', 'popularity_percent']]
result

Unnamed: 0,user1,popularity_percent
0,1,55.555556
1,2,33.333333
2,3,33.333333
3,4,11.111111
4,5,11.111111
5,6,22.222222
6,7,11.111111
7,8,11.111111
8,9,11.111111


Solution Walkthrough
In this problem, we are given a dataset representing pairs of friends on a social media platform. We need to find the popularity percentage for each user, which is defined as the total number of friends the user has divided by the total number of users on the platform, then converted into a percentage by multiplying by 100. We will use the Pandas library in Python to manipulate and analyze the data.

Understanding The Data
The given data consists of a table with two columns: 'user1' and 'user2'. Each row represents a pair of friends in the social media platform. The 'user1' and 'user2' columns contain the user IDs of the friends.

The Problem Statement
Our task is to calculate the popularity percentage for each user on the social media platform. The popularity percentage is defined as the total number of friends the user has divided by the total number of users on the platform, then multiplied by 100.

Breaking Down The Code
Let's breakdown the code snippet step by step:

We start by importing the required libraries: pandas and numpy.
Next, we calculate the total number of unique users on the platform by concatenating the 'user1' and 'user2' columns from the dataframe and finding the length of the unique values. We store this value in the variable 'concatvalues'.
Then, we create a new dataframe called 'revert' by renaming the 'user1' column to 'user2' and the 'user2' column to 'user1'.
We use the 'concat' function from pandas to concatenate the 'facebook_friends' dataframe and the 'revert' dataframe vertically. We set 'sort' to False to maintain the order of the rows. We then use the 'drop_duplicates' function to remove any duplicate rows from the concatenated dataframe. We store the result in the variable 'final'.
Next, we group the 'final' dataframe by the 'user1' column and calculate the size of each group using the 'size' function. We convert the result to a dataframe with the column name 'count'. We use the 'reset_index' function to reset the index of the dataframe and store the result in the variable 'result'.
Finally, we calculate the popularity percentage for each user by dividing the 'count' column in the 'result' dataframe by 'concatvalues', then multiplying by 100. We store the result in the new column 'popularity_percent' in the 'result' dataframe.
Bringing It All Together
The complete code snippet to solve the problem is as follows:

import pandas as pd
import numpy as np

# Calculate the total number of unique users on the platform
concatvalues = len(
    np.unique(
        np.concatenate(
            [
                facebook_friends.user1.values,
                facebook_friends.user2.values,
            ]
        )
    )
)

# Rename columns and create a new dataframe
revert = facebook_friends.rename(
    columns={"user1": "user2", "user2": "user1"}
)

# Concatenate dataframes and remove duplicates
final = pd.concat(
    [facebook_friends, revert], sort=False
).drop_duplicates()

# Group by user1 and calculate the count
result = final.groupby("user1").size().to_frame("count").reset_index()

# Calculate popularity percentage
result["popularity_percent"] = 100 * (result["count"] / concatvalues)
Conclusion
The given code snippet successfully calculates the popularity percentage for each user on the social media platform. We used the Pandas library in Python to manipulate and analyze the data.