Apple Product Counts

Find the number of Apple product users and the number of total users with a device and group the counts by language. Assume Apple products are only MacBook-Pro, iPhone 5s, and iPad-air. Output the language along with the total number of Apple users and users with any device. Order your results based on the number of total users in descending order.

In [9]:
import pandas as pd
import numpy as np

In [21]:
playbook_events = pd.read_csv("../CSV/playbook_events.csv")
playbook_events = playbook_events.iloc[:, :6]
playbook_events.head()

Unnamed: 0,user_id,occurred_at,event_type,event_name,location,device
0,6991,2014-06-09 18:26:54,engagement,home_page,United States,iphone 5
1,18851,2014-08-29 13:18:38,signup_flow,enter_info,Russia,asus chromebook
2,14998,2014-07-01 12:47:56,engagement,login,France,hp pavilion desktop
3,8186,2014-05-23 10:44:16,engagement,home_page,Italy,macbook pro
4,9626,2014-07-31 17:15:14,engagement,login,Russia,nexus 7


In [22]:
playbook_users = pd.read_excel("../CSV/playbook_users.xlsx", header=1)
playbook_users.head()

Unnamed: 0,user_id,created_at,company_id,language,activated_at,state
0,11,2013-01-01 04:41:13,1,german,2013-01-01,active
1,52,2013-01-05 15:30:45,2866,spanish,2013-01-05,active
2,108,2013-01-10 11:04:58,1848,spanish,2013-01-10,active
3,167,2013-01-16 20:40:24,6709,arabic,2013-01-16,active
4,175,2013-01-16 11:22:22,4797,russian,2013-01-16,active


In [23]:
merged = pd.merge(playbook_events, playbook_users, on="user_id")
merged.head()

Unnamed: 0,user_id,occurred_at,event_type,event_name,location,device,created_at,company_id,language,activated_at,state
0,6991,2014-06-09 18:26:54,engagement,home_page,United States,iphone 5,2014-01-01 18:21:35,4073,english,2014-01-01,active
1,18851,2014-08-29 13:18:38,signup_flow,enter_info,Russia,asus chromebook,2014-08-29 13:17:38,11617,russian,2014-08-29,active
2,14998,2014-07-01 12:47:56,engagement,login,France,hp pavilion desktop,2014-07-01 12:46:20,373,english,2014-07-01,active
3,8186,2014-05-23 10:44:16,engagement,home_page,Italy,macbook pro,2014-02-05 07:31:44,10826,italian,2014-02-05,active
4,9626,2014-07-31 17:15:14,engagement,login,Russia,nexus 7,2014-03-14 11:05:15,148,russian,2014-03-14,active


In [24]:
mac_device = ["macbook pro", "iphone 5s", "ipad air"]

In [25]:
df = (
    merged[merged["device"].isin(mac_device)]
    .groupby("language")["user_id"]
    .nunique()
    .to_frame("n_apple_users")
)

df

Unnamed: 0_level_0,n_apple_users
language,Unnamed: 1_level_1
chinese,1
english,11
german,1
italian,1
japanese,2
portugese,1
spanish,3


In [26]:
result = (
    merged.groupby(["language"])["user_id"]
    .nunique()
    .rename("n_total_users")
    .reset_index()
)

result

Unnamed: 0,language,n_total_users
0,arabic,2
1,chinese,4
2,english,45
3,french,5
4,german,3
5,indian,2
6,italian,1
7,japanese,6
8,portugese,3
9,russian,5


In [31]:
result = result.merge(df, how="outer", left_on="language", right_on="language").fillna(
    0).sort_values("n_total_users", ascending=False)[["language", "n_apple_users", "n_total_users"]]
result

Unnamed: 0,language,n_apple_users,n_total_users
2,english,11.0,45
10,spanish,3.0,9
7,japanese,2.0,6
3,french,0.0,5
9,russian,0.0,5
1,chinese,1.0,4
4,german,1.0,3
8,portugese,1.0,3
0,arabic,0.0,2
5,indian,0.0,2


Solution Walkthrough
This walkthrough will explain the solution to the given question using the provided code. It will cover the understanding of the data, the problem statement, breaking down the code, and finally bringing it all together to obtain the desired output.

Understanding The Data
The data used in this solution consists of two tables, playbook_events and playbook_users. The playbook_events table contains information about various events and their corresponding user IDs. The playbook_users table contains information about the user IDs and their devices.

The Problem Statement
The problem requires finding the number of Apple product users and the total number of users with any device based on language. The output should include the language, the number of Apple users, and the number of total users, ordered by the total number of users in descending order.

Breaking Down The Code
Let's break down the code step by step to understand the implementation.

import pandas as pd
import numpy as np

merged = pd.merge(playbook_events, playbook_users, on="user_id")
The first step is to merge the playbook_events and playbook_users tables using the user_id as the common column. This will combine the two tables based on matching user IDs.

mac_device = ["macbook pro", "iphone 5s", "ipad air"]
df = (
    merged[merged["device"].isin(mac_device)]
    .groupby("language")["user_id"]
    .nunique()
    .to_frame("n_apple_users")
)
Next, a list mac_device is defined, containing the names of Apple devices. Then, a new DataFrame df is created by filtering the merged DataFrame for rows where the device column is one of the specified Apple devices. Then, this filtered DataFrame is grouped by language and the number of unique user_id is counted for each group. The resulting counts are stored in a new column named n_apple_users.

result = (
    merged.groupby(["language"])["user_id"]
    .nunique()
    .rename("n_total_users")
    .reset_index()
)
The result DataFrame is created by grouping the merged DataFrame by language and counting the number of unique user_id for each group. The resulting counts are stored in a new column named n_total_users. The rename function renames this column from user_id to n_total_users. Finally, the index is reset to make language a regular column.

result.merge(
    df, how="outer", left_on="language", right_on="language"
).fillna(0).sort_values("n_total_users", ascending=False)[
    ["language", "n_apple_users", "n_total_users"]
]
In the final step, the result DataFrame is merged with the df DataFrame using an outer join on the language column. This will combine the two DataFrames based on the common language values. Any missing values resulting from the merge are filled with 0. The merged DataFrame is then sorted by the n_total_users column in descending order. Finally, only the columns language, n_apple_users, and n_total_users are selected for the output.

Bringing It All Together
The code starts by merging the playbook_events and playbook_users tables based on the user_id column. Then, it filters the merged DataFrame to include only rows with Apple devices and calculates the number of unique users for each language. It also calculates the total number of unique users for each language. Finally, it merges the two results, fills any missing values with 0, sorts the DataFrame by total users in descending order, and selects the desired columns for the output.

Conclusion
The provided code solves the given problem by merging and manipulating the given data using pandas operations. It outputs the language along with the total number of Apple users and users with any device, ordered by the number of total users in descending order.