Unique Users Per Client Per Month

Write a query that returns the number of unique users per client per month

In [4]:
import pandas as pd
import datetime as dt

In [5]:
fact_events = pd.read_csv("CSV/fact_events.csv")
fact_events.head()

Unnamed: 0,id,time_id,user_id,customer_id,client_id,event_type,event_id
0,1,2020-02-28,3668-QPYBK,Sendit,desktop,message sent,3
1,2,2020-02-28,7892-POOKP,Connectix,mobile,file received,2
2,3,2020-04-03,9763-GRSKD,Zoomit,desktop,video call received,7
3,4,2020-04-02,9763-GRSKD,Connectix,desktop,video call received,7
4,5,2020-02-06,9237-HQITU,Sendit,desktop,video call received,7


In [8]:
fact_events['time_id'] = pd.to_datetime(fact_events['time_id'])
fact_events.head(3)

Unnamed: 0,id,time_id,user_id,customer_id,client_id,event_type,event_id
0,1,2020-02-28,3668-QPYBK,Sendit,desktop,message sent,3
1,2,2020-02-28,7892-POOKP,Connectix,mobile,file received,2
2,3,2020-04-03,9763-GRSKD,Zoomit,desktop,video call received,7


In [10]:
result = fact_events.groupby([fact_events['client_id'],
                              fact_events['time_id'].dt.month])['user_id'].nunique().reset_index()

In [11]:
result

Unnamed: 0,client_id,time_id,user_id
0,desktop,2,13
1,desktop,3,16
2,desktop,4,11
3,mobile,2,9
4,mobile,3,14
5,mobile,4,9


Solution Walkthrough
In this walkthrough, we'll explain the code snippet and the query that returns the number of unique users per client per month using the pandas library in Python.

Understanding The Data
The code assumes that there is a pandas DataFrame named fact_events with columns client_id, time_id, and user_id. The time_id column is assumed to be a datetime column.

The Problem Statement
The problem is to calculate the number of unique users (user_id) for each combination of clients (client_id) and months (time_id month). This essentially means grouping the data by client_id and month of time_id and counting the number of unique user_id values.

Breaking Down The Code
Let's break down the code step by step:

import pandas as pd
This line imports the pandas library and assigns it an alias pd. We need the pandas library to work with DataFrames and perform the required calculations.

result = (
    fact_events.groupby(
        [fact_events["client_id"], fact_events["time_id"].dt.month]
    )["user_id"]
    .nunique()
    .reset_index()
)
fact_events['time_id'].dt.month extracts the month component from the time_id column. The dt accessor is used to access the datetime properties of the column. This will give us the month value for each row.

[fact_events['client_id'], fact_events['time_id'].dt.month] specifies the columns to group by. We want to group by both client_id and the month of time_id.

['user_id'] specifies the column to perform the calculation on. We want to calculate the number of unique users.

.nunique() is a pandas function that calculates the number of unique values in a column.

.reset_index() resets the index of the resulting DataFrame. This is necessary to convert the grouped data back into a DataFrame with a default index.

The result of this code will be a DataFrame with columns client_id, time_id (month), and user_id_count (the number of unique users).

Bringing It All Together
To summarize, the code snippet uses the pandas library to group the fact_events DataFrame by client_id and the month of time_id. Then, it calculates the number of unique users for each group and stores the result in the result DataFrame.

Conclusion
The provided code snippet and query efficiently calculate the number of unique users per client per month using the pandas library in Python.