## Problem: Find user activity for the past 30 days I

### Table: Activity

| Column Name   | Type    |
|---------------|---------|
| user_id       | int     |
| session_id    | int     |
| activity_date | date    |
| activity_type | enum    |

This table may have duplicate rows.
The activity_type column is an ENUM (category) of type ('open_session', 'end_session', 'scroll_down', 'send_message').
The table shows the user activities for a social media website. 
Note that each session belongs to exactly one user.
 

Write a solution to find the daily active user count for a period of 30 days ending 2019-07-27 inclusively. A user was active on someday if they made at least one activity on that day.

Return the result table in any order.

The result format is in the following example.

 

### Example 1:

Input: 
Activity table:

| user_id | session_id | activity_date | activity_type |
|---------|------------|---------------|---------------|
| 1       | 1          | 2019-07-20    | open_session  |
| 1       | 1          | 2019-07-20    | scroll_down   |
| 1       | 1          | 2019-07-20    | end_session   |
| 2       | 4          | 2019-07-20    | open_session  |
| 2       | 4          | 2019-07-21    | send_message  |
| 2       | 4          | 2019-07-21    | end_session   |
| 3       | 2          | 2019-07-21    | open_session  |
| 3       | 2          | 2019-07-21    | send_message  |
| 3       | 2          | 2019-07-21    | end_session   |
| 4       | 3          | 2019-06-25    | open_session  |
| 4       | 3          | 2019-06-25    | end_session   |

Output: 

| day        | active_users |
|------------|--------------| 
| 2019-07-20 | 2            |
| 2019-07-21 | 2            |

Explanation: Note that we do not care about days with zero active users.

In [1]:
import pandas as pd

In [2]:
data = [[1, 1, '2019-07-20', 'open_session'], [1, 1, '2019-07-20', 'scroll_down'], [1, 1, '2019-07-20', 'end_session'], [2, 4, '2019-07-20', 'open_session'], [2, 4, '2019-07-21', 'send_message'], [2, 4, '2019-07-21', 'end_session'], [3, 2, '2019-07-21', 'open_session'], [3, 2, '2019-07-21', 'send_message'], [3, 2, '2019-07-21', 'end_session'], [4, 3, '2019-06-25', 'open_session'], [4, 3, '2019-06-25', 'end_session']]
activity = pd.DataFrame(data, columns=['user_id', 'session_id', 'activity_date', 'activity_type']).astype({'user_id':'Int64', 'session_id':'Int64', 'activity_date':'datetime64[ns]', 'activity_type':'object'})

In [16]:
end_date = pd.to_datetime('2019-07-27')
start_date = end_date - pd.Timedelta(days=29)
end_date, start_date

(Timestamp('2019-07-27 00:00:00'), Timestamp('2019-06-28 00:00:00'))

In [14]:
filtered_activity = activity[(activity['activity_date'] >= start_date) & (activity['activity_date'] <= end_date)]
filtered_activity

Unnamed: 0,user_id,session_id,activity_date,activity_type
0,1,1,2019-07-20,open_session
1,1,1,2019-07-20,scroll_down
2,1,1,2019-07-20,end_session
3,2,4,2019-07-20,open_session
4,2,4,2019-07-21,send_message
5,2,4,2019-07-21,end_session
6,3,2,2019-07-21,open_session
7,3,2,2019-07-21,send_message
8,3,2,2019-07-21,end_session


In [29]:
dayily_active = (filtered_activity
                 .groupby('activity_date')['user_id']
                 .nunique()
                 .reset_index()
                 .rename(columns={'activity_date': 'day', 'user_id': 'active_users'}))
dayily_active

Unnamed: 0,day,active_users
0,2019-07-20,2
1,2019-07-21,2


In [35]:
import pandas as pd

def user_activity(activity: pd.DataFrame) -> pd.DataFrame:
    end_date = pd.to_datetime('2019-07-27')
    start_date = end_date - pd.Timedelta(days=29)
    filtered_activity = activity[(activity['activity_date'] >= start_date) & (activity['activity_date'] <= end_date)]
    dayily_active = (
        filtered_activity
        .groupby('activity_date')['user_id']
        .nunique()
        .reset_index()
        .rename(columns={'activity_date': 'day', 'user_id': 'active_users'})
    )
    return dayily_active

In [36]:
user_activity(activity)

Unnamed: 0,day,active_users
0,2019-07-20,2
1,2019-07-21,2
