Flags per Video

For each video, find how many unique users flagged it. A unique user can be identified using the combination of their first name and last name. Do not consider rows in which there is no flag ID.

In [1]:
import pandas as pd
import numpy as np

In [8]:
user_flags = pd.read_excel("../CSV/user_flags.xlsx", skiprows=1)
user_flags.head()

Unnamed: 0,user_firstname,user_lastname,video_id,flag_id
0,Richard,Hasson,y6120QOlsfU,0cazx3
1,Mark,May,Ct6BUPvE2sM,1cn76u
2,Gina,Korman,dQw4w9WgXcQ,1i43zk
3,Mark,May,Ct6BUPvE2sM,1n0vef
4,Mark,May,jNQXAC9IVRw,1sv6ib


In [9]:
result = user_flags[user_flags["flag_id"].notnull()]
result.head()

Unnamed: 0,user_firstname,user_lastname,video_id,flag_id
0,Richard,Hasson,y6120QOlsfU,0cazx3
1,Mark,May,Ct6BUPvE2sM,1cn76u
2,Gina,Korman,dQw4w9WgXcQ,1i43zk
3,Mark,May,Ct6BUPvE2sM,1n0vef
4,Mark,May,jNQXAC9IVRw,1sv6ib


In [10]:
result["username"] = result["user_firstname"].astype(str) + " " + result["user_lastname"].astype(str)
result.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  result["username"] = result["user_firstname"].astype(str) + " " + result["user_lastname"].astype(str)


Unnamed: 0,user_firstname,user_lastname,video_id,flag_id,username
0,Richard,Hasson,y6120QOlsfU,0cazx3,Richard Hasson
1,Mark,May,Ct6BUPvE2sM,1cn76u,Mark May
2,Gina,Korman,dQw4w9WgXcQ,1i43zk,Gina Korman
3,Mark,May,Ct6BUPvE2sM,1n0vef,Mark May
4,Mark,May,jNQXAC9IVRw,1sv6ib,Mark May


In [11]:
result = result.groupby(by="video_id")["username"].nunique().reset_index()
result.head()

Unnamed: 0,video_id,username
0,5qap5aO4i9A,2
1,Ct6BUPvE2sM,2
2,dQw4w9WgXcQ,5
3,jNQXAC9IVRw,3
4,y6120QOlsfU,5


In [12]:
result = result.rename(columns={"username": "num_unique_users"})
result

Unnamed: 0,video_id,num_unique_users
0,5qap5aO4i9A,2
1,Ct6BUPvE2sM,2
2,dQw4w9WgXcQ,5
3,jNQXAC9IVRw,3
4,y6120QOlsfU,5


Solution Walkthrough
In this problem, we are given a dataset containing information about flagged videos and the users who flagged them. We need to find the number of unique users who flagged each video, considering only the rows where there is a flag ID.

To solve this problem, we can use pandas library in Python. We will read the dataset into a pandas DataFrame and perform some operations to filter, group, and count the data to get the desired result.

Now let's go through each step of the solution in detail.

Understanding The Data
The given dataset consists of multiple columns, but the important columns for our problem are:

video_id: ID of the flagged video
flag_id: ID of the flag
user_firstname: First name of the user who flagged the video
user_lastname: Last name of the user who flagged the video
The Problem Statement
We need to find the number of unique users who flagged each video, considering only the rows where there is a flag ID. A unique user can be identified by combining their first name and last name.

Breaking Down The Code
Let's breakdown the code and understand each part:

result = user_flags[user_flags["flag_id"].notnull()]
In this line, we are filtering the DataFrame user_flags to select only the rows where the flag_id column is not null. This will remove the rows where there is no flag ID.

result["username"] = (
    result["user_firstname"].astype(str)
    + " "
    + result["user_lastname"].astype(str)
)
Here, we are creating a new column called "username" by combining the first name and last name columns. We convert the columns to strings using astype(str) and concatenate them using the "+" operator.

result = (
    result.groupby(by="video_id")["username"].nunique().reset_index()
)
In this line, we are grouping the filtered DataFrame by the "video_id" column and calculating the number of unique usernames in each group using the nunique() function. The result will be a Series with the video IDs as the index. We use the reset_index() function to convert the result into a DataFrame.

result = result.rename(columns={"username": "num_unique_users"})
Finally, we rename the column "username" to "num_unique_users" to make it more descriptive.

Bringing It All Together
Putting all the code together, we have:

result = user_flags[user_flags["flag_id"].notnull()]
result["username"] = (
    result["user_firstname"].astype(str)
    + " "
    + result["user_lastname"].astype(str)
)
result = (
    result.groupby(by="video_id")["username"].nunique().reset_index()
)
result = result.rename(columns={"username": "num_unique_users"})
This code will give us a DataFrame named "result" which contains the video IDs and the number of unique users who flagged each video.

Conclusion
In this problem, we used pandas library in Python to find the number of unique users who flagged each video, considering only the rows where there is a flag ID. By filtering, grouping, and counting the data, we were able to solve the problem and obtain the desired result.