# User Purchase Platform

## Table Schema: Spending

| Column Name | Type    |
|-------------|---------|
| user_id     | int     |
| spend_date  | date    |
| platform    | str     | 
| amount      | int     |

- **Description**:  
  The `Spending` table logs the history of user spending on an online shopping website that offers both desktop and mobile applications.

- **Primary Key**:  
  The combination of `(user_id, spend_date, platform)` ensures each record is unique.

- **Platform Str**:  
  The `platform` column is a str type with possible values: `'desktop'`, `'mobile'`.

## Problem Statement

Write a solution to find the **total number of users** and the **total amount spent** using:

- **Mobile only**
- **Desktop only**
- **Both mobile and desktop together**

for each date.

**Return the result table in any order.**

### Example

#### Input

**Spending Table:**

| user_id | spend_date | platform | amount |
|---------|------------|----------|--------|
| 1       | 2019-07-01 | mobile   | 100    |
| 1       | 2019-07-01 | desktop  | 100    |
| 2       | 2019-07-01 | mobile   | 100    |
| 2       | 2019-07-02 | mobile   | 100    |
| 3       | 2019-07-01 | desktop  | 100    |
| 3       | 2019-07-02 | desktop  | 100    |

#### Output

| spend_date | platform | total_amount | total_users |
|------------|----------|--------------|-------------|
| 2019-07-01 | desktop  | 100          | 1           |
| 2019-07-01 | mobile   | 100          | 1           |
| 2019-07-01 | both     | 200          | 1           |
| 2019-07-02 | desktop  | 100          | 1           |
| 2019-07-02 | mobile   | 100          | 1           |
| 2019-07-02 | both     | 0            | 0           |

#### Explanation

- **On 2019-07-01**:
  - **User 1** purchased using **both** desktop and mobile.
  - **User 2** purchased using **mobile only**.
  - **User 3** purchased using **desktop only**.

- **On 2019-07-02**:
  - **User 2** purchased using **mobile only**.
  - **User 3** purchased using **desktop only**.
  - No users purchased using **both** platforms.


In [1]:
import pandas as pd

# Sample Data
data = [
    [1, '2019-07-01', 'mobile', 100],
    [1, '2019-07-01', 'desktop', 100],
    [2, '2019-07-01', 'mobile', 100],
    [2, '2019-07-02', 'mobile', 100],
    [3, '2019-07-01', 'desktop', 100],
    [3, '2019-07-02', 'desktop', 100]
]

# Create DataFrame
spending = pd.DataFrame(
    data,
    columns=['user_id', 'spend_date', 'platform', 'amount']
).astype({
    'user_id': 'Int64',
    'spend_date': 'datetime64[ns]',
    'platform': 'object',
    'amount': 'Int64'
})

display(spending)

Unnamed: 0,user_id,spend_date,platform,amount
0,1,2019-07-01,mobile,100
1,1,2019-07-01,desktop,100
2,2,2019-07-01,mobile,100
3,2,2019-07-02,mobile,100
4,3,2019-07-01,desktop,100
5,3,2019-07-02,desktop,100


## Solution 01

We can solve this problem using **Pandas** by following these steps:

**Step 1: Group by spend_date and user_id to aggregate platforms and amounts**

In [2]:
df = spending.groupby(['spend_date', 'user_id']).agg({
    'platform': lambda x: set(x),
    'amount': 'sum'
}).reset_index()

display(df)

Unnamed: 0,spend_date,user_id,platform,amount
0,2019-07-01,1,"{desktop, mobile}",200
1,2019-07-01,2,{mobile},100
2,2019-07-01,3,{desktop},100
3,2019-07-02,2,{mobile},100
4,2019-07-02,3,{desktop},100


**Step 2: Categorize each user based on the platforms used**

In [3]:
def categorize_platform(platforms):
    if platforms == {'mobile'}:
        return 'mobile'
    elif platforms == {'desktop'}:
        return 'desktop'
    elif platforms == {'mobile', 'desktop'}:
        return 'both'
    else:
        return 'other'  # For any unexpected cases

df['platform'] = df['platform'].apply(categorize_platform)

display(df)

Unnamed: 0,spend_date,user_id,platform,amount
0,2019-07-01,1,both,200
1,2019-07-01,2,mobile,100
2,2019-07-01,3,desktop,100
3,2019-07-02,2,mobile,100
4,2019-07-02,3,desktop,100


**Step 3: Aggregate total_amount and total_users for each spend_date and platform_category**

In [4]:
df = df.groupby(['spend_date', 'platform']).agg(
    total_amount=('amount', 'sum'),
    total_users=('user_id', 'nunique')
).reset_index()

display(df)

Unnamed: 0,spend_date,platform,total_amount,total_users
0,2019-07-01,both,200,1
1,2019-07-01,desktop,100,1
2,2019-07-01,mobile,100,1
3,2019-07-02,desktop,100,1
4,2019-07-02,mobile,100,1


**Step 4: Ensure all combinations of spend_date and platforms are present**

In [5]:
# Define all possible categories
categories = ['mobile', 'desktop', 'both']

# Create a data frame with all combinations of spend_date and platform
all_combinations = pd.MultiIndex.from_product(
    [spending['spend_date'].unique(), categories],
    names=['spend_date', 'platform']
).to_frame(index=False)

display(all_combinations)

Unnamed: 0,spend_date,platform
0,2019-07-01,mobile
1,2019-07-01,desktop
2,2019-07-01,both
3,2019-07-02,mobile
4,2019-07-02,desktop
5,2019-07-02,both


**Step 5: Merge with the df to include missing combinations with default values**

In [6]:
df = all_combinations.merge(
    df,
    on=['spend_date', 'platform'],
    how='left'
).fillna({'total_amount': 0, 'total_users': 0})

# Convert total_amount and total_users to integer type
df['total_amount'] = df['total_amount'].astype(int)
df['total_users'] = df['total_users'].astype(int)

# Optional: Sort for better readability
df = df.sort_values(by=['spend_date', 'platform']).reset_index(drop=True)

# Display the final output
display(df)

Unnamed: 0,spend_date,platform,total_amount,total_users
0,2019-07-01,both,200,1
1,2019-07-01,desktop,100,1
2,2019-07-01,mobile,100,1
3,2019-07-02,both,0,0
4,2019-07-02,desktop,100,1
5,2019-07-02,mobile,100,1


## Solution 02

We can solve this problem using **Pandas** by following these steps:

**Step 1: Identify spending by platform.**

In [7]:
df = spending.groupby(["spend_date", "user_id"], as_index=False).sum()

display(df)

Unnamed: 0,spend_date,user_id,platform,amount
0,2019-07-01,1,mobiledesktop,200
1,2019-07-01,2,mobile,100
2,2019-07-01,3,desktop,100
3,2019-07-02,2,mobile,100
4,2019-07-02,3,desktop,100


**Step 2: Replace "desktopmobile" or "mobiledesktop" with "both".**

In [8]:
df.loc[(df["platform"] != "mobile") & (df["platform"] != "desktop"), "platform"] = "both"

display(df)

Unnamed: 0,spend_date,user_id,platform,amount
0,2019-07-01,1,both,200
1,2019-07-01,2,mobile,100
2,2019-07-01,3,desktop,100
3,2019-07-02,2,mobile,100
4,2019-07-02,3,desktop,100


**Step 3: Set platform as category to show "0" results.**

In [9]:
df["platform"] = df["platform"].astype("category").cat.set_categories(["desktop", "mobile", "both"])

display(df)

Unnamed: 0,spend_date,user_id,platform,amount
0,2019-07-01,1,both,200
1,2019-07-01,2,mobile,100
2,2019-07-01,3,desktop,100
3,2019-07-02,2,mobile,100
4,2019-07-02,3,desktop,100


**Step 4: Return dataframe using final groupby aggregations.**

In [10]:
df = df.groupby(["spend_date", "platform"], 
                observed=False).agg(total_amount=("amount", "sum"), 
                                    total_users=("user_id", "count")).reset_index()

display(df)

Unnamed: 0,spend_date,platform,total_amount,total_users
0,2019-07-01,desktop,100,1
1,2019-07-01,mobile,100,1
2,2019-07-01,both,200,1
3,2019-07-02,desktop,100,1
4,2019-07-02,mobile,100,1
5,2019-07-02,both,0,0
