# mBrain: Feature calculation & regression models

This notebook contains the calculation of different features of mobile usage for stress level prediction.
These features possibly partly explain the stress level (1-10) of a person. These features are found in literature.

In the first part of the notebook, the data files are loaded and some information is displayed. The measured stress levels
are linked afterwards to the person's mobile usage via the 'panelkit-id'.

Then, the features are calculated and put together in one dataframe. Then there is looked at the correlations
of the features with the target variable.

At the end, some regression models are fitted and compared to the performance of a dummy model. These regression
models could possibly predict the stress level of a person based on their mobile usage.
#

Importing the necessary libraries and classes:

In [2]:
from mobiledna.core.appevents import Appevents

import pandas as pd

## 1. Open the needed files

In [3]:
# file with the mapping of the mobileDNA-id and the panelkit-id
df_mapping = pd.read_csv("../data/data_nervosity/mobdna_mapping_w2w3.csv", sep=';')
df_mapping = df_mapping[['panelkit_id', 'mobdna_id','wave','campagne']].rename(columns={'mobdna_id': 'id'})

# files needed for the feature calculation
ae_w2 = Appevents.load_data("../data/data_nervosity/wave_2/210310_nervocity_appevents_panelkitid.parquet")
ae_w3 = Appevents.load_data("../data/data_nervosity/wave_3_2/210803_nervocity_appevents.parquet")

ae_w2.add_category(scrape=False).add_time_of_day()
ae_w3.add_category(scrape=False).add_time_of_day()

# add a column with the day for each appEvent
ae_w2.__data__['day'] = ae_w2.__data__['startTime'].dt.date
ae_w3.__data__['day'] = ae_w3.__data__['startTime'].dt.date

ae_w2.__data__ = ae_w2.__data__.rename(columns={'panelkitid': 'panelkit_id'})
df_ae_all_waves = pd.concat([ae_w2.__data__, ae_w3.__data__])
ae_all_waves = Appevents(df_ae_all_waves, add_categories=False, strip=False)

ae_all_waves.__data__['day'] = pd.to_datetime(ae_all_waves.__data__['day'])
print(ae_all_waves.__data__.head())

2021-10-28 11:41:58 - Recognized file type as <parquet>.
2021-10-28 11:41:58 - 'load' took 0.203 seconds to complete.
2021-10-28 11:41:58 - Recognized file type as <parquet>.
2021-10-28 11:41:59 - 'load' took 0.421 seconds to complete.


Adding category: 100%|██████████| 407970/407970 [00:00<00:00, 1518932.35it/s]
Adding tod <startTime>: 100%|██████████| 407970/407970 [00:00<00:00, 1999236.13it/s]
Adding category: 100%|██████████| 1489208/1489208 [00:00<00:00, 1633635.62it/s]
Adding tod <startTime>: 100%|██████████| 1489208/1489208 [00:00<00:00, 2208527.67it/s]


                                           id    model     session  \
1320339  001b6dc4-9c95-4c7c-a21a-56b5deca6689  ELE-L29  1622720128   
1320340  001b6dc4-9c95-4c7c-a21a-56b5deca6689  ELE-L29  1622720128   
1320293  001b6dc4-9c95-4c7c-a21a-56b5deca6689  ELE-L29  1622720167   
1320310  001b6dc4-9c95-4c7c-a21a-56b5deca6689  ELE-L29  1622720167   
1320294  001b6dc4-9c95-4c7c-a21a-56b5deca6689  ELE-L29  1622720167   

                      startTime                 endTime  notification  \
1320339 2021-06-03 13:35:58.071 2021-06-03 13:36:02.945         False   
1320340 2021-06-03 13:36:05.643 2021-06-03 13:36:22.971         False   
1320293 2021-06-03 13:36:25.026 2021-06-03 13:36:28.287         False   
1320310 2021-06-03 13:36:30.215 2021-06-03 13:36:31.701         False   
1320294 2021-06-03 13:36:38.057 2021-06-03 13:36:38.927         False   

         notificationId                        application  battery  \
1320339               0        com.huawei.android.launcher       63  

Filter appevents on their dates:

In [4]:
# date ranges to filter on
df_dates = pd.read_csv("/Users/simonperneel/Documents/Imec-mict/mobiledna_py/mobiledna/data/data_nervosity/dates_nervo_allwaves.csv", sep=';')
df_dates['date'] = pd.to_datetime(df_dates['date'])

print(df_dates.head())


        date studyday  wave  campagne        day weekdeel
0 2021-02-22     day1     2         1    Maandag  Weekdag
1 2021-02-23     day2     2         1    Dinsdag  Weekdag
2 2021-02-24     day3     2         1   Woensdag  Weekdag
3 2021-02-25     day4     2         1  Donderdag  Weekdag
4 2021-02-26     day5     2         1    Vrijdag  Weekdag


In [5]:
ae_all_waves_filtered = ae_all_waves.__data__[ae_all_waves.__data__['day'].isin(df_dates['date'])]
ae_all_waves = Appevents(ae_all_waves_filtered, add_categories=False, strip=False)
print('Modified appEvents data file: ')
print(ae_all_waves_filtered.head())

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self[name] = value
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return func(*args, **kwargs)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['startDate'] = pd.to_datetime(df.startTime.dt.date)


Modified appEvents data file: 
                                          id     model     session  \
911903  0079ff14-3579-4e27-b1e5-a838bc23b295  SM-A715F  1619517678   
914811  0079ff14-3579-4e27-b1e5-a838bc23b295  SM-A715F  1619517678   
920379  0079ff14-3579-4e27-b1e5-a838bc23b295  SM-A715F  1619517678   
914812  0079ff14-3579-4e27-b1e5-a838bc23b295  SM-A715F  1619517678   
910207  0079ff14-3579-4e27-b1e5-a838bc23b295  SM-A715F  1619517678   

                     startTime                 endTime  notification  \
911903 2021-04-27 12:01:48.892 2021-04-27 12:02:12.592         False   
914811 2021-04-27 12:02:14.034 2021-04-27 12:02:18.299         False   
920379 2021-04-27 12:02:18.362 2021-04-27 12:02:20.293         False   
914812 2021-04-27 12:02:22.061 2021-04-27 12:02:49.931         False   
910207 2021-04-27 12:02:51.686 2021-04-27 12:02:53.050         False   

        notificationId          application  battery  latitude  ...  \
911903               0  be.imec.apt.stressy 

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['endDate'] = pd.to_datetime(df.endTime.dt.date)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['duration'] = (df['endTime'] - df['startTime']).dt.total_seconds()


## 3. Feature calculation
All features from literature are listed [here](./Constructlijst_features.xlsx).
### Stress Features
#### General screen time

In [6]:
general_screen_time = (ae_all_waves.get_daily_duration(series_unit='day') / 60)
print(general_screen_time.head())

id                                    day       
0079ff14-3579-4e27-b1e5-a838bc23b295  2021-04-27     54.487650
                                      2021-04-28    134.824350
                                      2021-04-29    138.101383
                                      2021-04-30    172.808150
                                      2021-05-01    121.137983
Name: daily_durations, dtype: float64


#### Smartphone use frequency

In [7]:
smartphone_use_freq = ae_all_waves.get_daily_events(series_unit=('day'))
print(smartphone_use_freq.head())


id                                    day       
0079ff14-3579-4e27-b1e5-a838bc23b295  2021-04-27     88.0
                                      2021-04-28    165.0
                                      2021-04-29    165.0
                                      2021-04-30    150.0
                                      2021-05-01    160.0
Name: daily_events, dtype: float64


#### Duration MIM applications

In [8]:
duration_MIM_applications = (ae_all_waves.get_daily_duration(category='chat', series_unit='day') / 60)
print(duration_MIM_applications.head())

id                                    day       
0079ff14-3579-4e27-b1e5-a838bc23b295  2021-04-28    13.189567
                                      2021-04-29    23.161533
                                      2021-04-30    13.919633
                                      2021-05-01    29.551150
                                      2021-05-02     8.238317
Name: daily_durations_chat, dtype: float64


#### Frequency MIM applications

In [9]:
freq_MIM_applications = ae_all_waves.get_daily_events(category='chat', series_unit='day')
print(freq_MIM_applications.head())

id                                    day       
0079ff14-3579-4e27-b1e5-a838bc23b295  2021-04-28    12.0
                                      2021-04-29    22.0
                                      2021-04-30    22.0
                                      2021-05-01    35.0
                                      2021-05-02    15.0
Name: daily_events_chat, dtype: float64


#### (Average) daily use of MIM applications during work hours

In [10]:
# TODO change morning & noon to work hours (8-16 => 9-17)
daily_use_work_hours = (ae_all_waves.get_daily_duration(time_of_day=['morning', 'noon'], category='chat', series_unit='day') / 60)
print(daily_use_work_hours.head())

avg_daily_use_work_hours = (ae_all_waves.get_daily_duration(time_of_day=['morning', 'noon'], category='chat') / 60)
print(avg_daily_use_work_hours.head())

id                                    day       
0079ff14-3579-4e27-b1e5-a838bc23b295  2021-04-28     7.222383
                                      2021-04-29     9.728833
                                      2021-04-30     5.574017
                                      2021-05-01    15.555833
                                      2021-05-02     4.564967
Name: daily_durations_chat_['morning', 'noon'], dtype: float64
id
0079ff14-3579-4e27-b1e5-a838bc23b295     6.244199
01617409-e832-4bd9-b139-30189de7e827     5.028610
028f0ac3-1720-487e-b5eb-3ee02d0edfe7    11.704453
037e80a0-ad12-4707-adf2-a77351a4456c     3.222398
03c0f04d-6eb5-4dc5-bcfd-5642a6240681     8.393506
Name: daily_durations_chat_['morning', 'noon'], dtype: float64


#### (Average) amount of social media notifications
First map some diversified social categories in one 'Social' category:

In [11]:
unknown_categories = {"banking": ["com.coinbase.pro", "com.kraken.trade", "com.kraken.invest.app"],"medical": ["be.imec.apt.stressy","be.imec.apt.ichange.chillplusclient","be.ilabt.contextaware.empatica","be.ilabt.contextaware.mbrain","be.sciensano.coronalert","com.j_ware.polarsensorlogger","com.urbandroid.sleep","heartzones.com.heartzonestraining","com.empatica.e4realtime",],"calling": ["com.oneplus.dialer"],"calendar": ["com.komorebi.SimpleCalendar"],"productivity": ["partl.workinghours"],}

category_map = {"medical": "Health","chat": "Social","email": "Productivity","system": "none", "unknown": "none",
                "social": "Social","tools": "Productivity","browser": "Web","productivity": "Productivity",
                "photography": "none","business": "Productivity","music&audio": "Entertainment","clock": "none",
                "banking": "Finance","lifestyle": "none","health&fitness": "Health","news&magazines": "News",
                "gaming": "Entertainment","calling": "Calling","calendar": "Productivity","video": "Entertainment",
                "maps&navigation": "Navigation","food & drink": "none","finance": "Finance","communication": "Social",
                "ecommerce": "Shopping","retail": "Shopping","weather": "none","sports": "none","smartconnectivity": "none",
                "card": "Entertainment","travel & local": "none","education": "Productivity","entertainment": "Entertainment",
                "music & audio": "Entertainment","books & reference": "none","shopping": "Shopping","mobility": "Navigation",
                "news & magazines": "News","puzzle": "Entertainment",}


#### (Average) daily use of social media applications
First map some diversified social categories in one 'Social' category:

In [12]:
i = ae_all_waves.__data__['category'].nunique()
ae_all_waves.__data__['category'] = ae_all_waves.__data__['category'].apply(lambda x: category_map.get(x,x))
j = ae_all_waves.__data__['category'].nunique()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ae_all_waves.__data__['category'] = ae_all_waves.__data__['category'].apply(lambda x: category_map.get(x,x))


In [13]:
daily_social_applications = (ae_all_waves.get_daily_duration(category='Social', series_unit='day') / 60)
print(daily_social_applications.head())

avg_daily_social_applications = (ae_all_waves.get_daily_duration(category='Social') / 60)
print(avg_daily_social_applications.head())

id                                    day       
0079ff14-3579-4e27-b1e5-a838bc23b295  2021-04-27     3.530633
                                      2021-04-28    13.709833
                                      2021-04-29    23.256950
                                      2021-04-30    15.288750
                                      2021-05-01    30.889183
Name: daily_durations_social, dtype: float64
id
0079ff14-3579-4e27-b1e5-a838bc23b295    12.989656
01617409-e832-4bd9-b139-30189de7e827    37.857345
028f0ac3-1720-487e-b5eb-3ee02d0edfe7    60.493867
037e80a0-ad12-4707-adf2-a77351a4456c     7.600031
03c0f04d-6eb5-4dc5-bcfd-5642a6240681    24.054024
Name: daily_durations_social, dtype: float64


#### (Average) daily amount of social media app events

In [14]:
freq_social_applications = ae_all_waves.get_daily_events(category='Social', series_unit='day')
print(freq_social_applications.head())

avg_freq_social_applications = ae_all_waves.get_daily_events(category='Social')
print(avg_freq_social_applications.head())

id                                    day       
0079ff14-3579-4e27-b1e5-a838bc23b295  2021-04-27     1.0
                                      2021-04-28    13.0
                                      2021-04-29    23.0
                                      2021-04-30    26.0
                                      2021-05-01    39.0
Name: daily_events_social, dtype: float64
id
0079ff14-3579-4e27-b1e5-a838bc23b295    18.666667
01617409-e832-4bd9-b139-30189de7e827    28.000000
028f0ac3-1720-487e-b5eb-3ee02d0edfe7    41.040000
037e80a0-ad12-4707-adf2-a77351a4456c    20.666667
03c0f04d-6eb5-4dc5-bcfd-5642a6240681    42.333333
Name: daily_events_social, dtype: float64


#### (Average) daily use during evening time

In [15]:
daily_use_evening = (ae_all_waves.get_daily_duration(time_of_day='eve', series_unit='day') / 60)
print(daily_use_evening.head())

avg_daily_use_evening = (ae_all_waves.get_daily_duration(time_of_day='eve') / 60)
print(avg_daily_use_evening.head())


id                                    day       
0079ff14-3579-4e27-b1e5-a838bc23b295  2021-04-27    28.904233
                                      2021-04-28    26.179000
                                      2021-04-29    36.047800
                                      2021-04-30    36.440250
                                      2021-05-01    30.913750
Name: daily_durations_eve, dtype: float64
id
0079ff14-3579-4e27-b1e5-a838bc23b295    25.677292
01617409-e832-4bd9-b139-30189de7e827    26.297925
028f0ac3-1720-487e-b5eb-3ee02d0edfe7    37.465267
037e80a0-ad12-4707-adf2-a77351a4456c    14.356721
03c0f04d-6eb5-4dc5-bcfd-5642a6240681    24.373490
Name: daily_durations_eve, dtype: float64


#### (Average) daily use during nighttime

In [16]:
daily_use_night = (ae_all_waves.get_daily_duration(time_of_day='night', series_unit='day') / 60)
print(daily_use_night.head())

avg_daily_use_night = (ae_all_waves.get_daily_duration(time_of_day='night') / 60)
print(avg_daily_use_night.head())

id                                    day       
0079ff14-3579-4e27-b1e5-a838bc23b295  2021-04-27     2.847150
                                      2021-04-28    16.157833
                                      2021-04-29    10.564817
                                      2021-04-30     3.839017
                                      2021-05-01    12.605483
Name: daily_durations_night, dtype: float64
id
0079ff14-3579-4e27-b1e5-a838bc23b295     9.528381
01617409-e832-4bd9-b139-30189de7e827    20.365642
028f0ac3-1720-487e-b5eb-3ee02d0edfe7    17.106953
037e80a0-ad12-4707-adf2-a77351a4456c     6.367640
03c0f04d-6eb5-4dc5-bcfd-5642a6240681    13.736192
Name: daily_durations_night, dtype: float64


#### (Average) daily amount of app events during evening/night time

In [17]:
freq_evening_use = (ae_all_waves.get_daily_events(time_of_day='eve', series_unit='day'))
print(freq_evening_use.head())

avg_freq_evening_use = (ae_all_waves.get_daily_events(time_of_day='eve'))
print(avg_freq_evening_use.head())

freq_night_use = (ae_all_waves.get_daily_events(time_of_day='night', series_unit='day'))
print(freq_night_use.head())

avg_freq_night_use = (ae_all_waves.get_daily_events(time_of_day='night'))
print(avg_freq_night_use.head())


id                                    day       
0079ff14-3579-4e27-b1e5-a838bc23b295  2021-04-27    45.0
                                      2021-04-28    44.0
                                      2021-04-29    42.0
                                      2021-04-30    46.0
                                      2021-05-01    48.0
Name: daily_events_eve, dtype: float64
id
0079ff14-3579-4e27-b1e5-a838bc23b295    39.933333
01617409-e832-4bd9-b139-30189de7e827    37.687500
028f0ac3-1720-487e-b5eb-3ee02d0edfe7    36.720000
037e80a0-ad12-4707-adf2-a77351a4456c    37.375000
03c0f04d-6eb5-4dc5-bcfd-5642a6240681    44.571429
Name: daily_events_eve, dtype: float64
id                                    day       
0079ff14-3579-4e27-b1e5-a838bc23b295  2021-04-27     4.0
                                      2021-04-28    23.0
                                      2021-04-29    14.0
                                      2021-04-30     6.0
                                      2021-05-01    21.0
N

### Depression features
###### General screen time
###### Smartphone use frequency
###### Screen unlocks (=checking behaviour)
###### Average daily social smartphone use/ appevents/ notifications
###### Average smartphone appevents/ use during evening/night hours
 &#8594; All done above
#### (Average) time between sessions started on notification

In [18]:
def calc_time_between_notification_sessions(df: pd.DataFrame, avg=False):
    session_firsts = df.groupby(["id", "session"]).head(1)
    session_firsts_notif = session_firsts[session_firsts["notification"] == True]
    session_firsts_notif = session_firsts_notif.assign(start_shift= session_firsts_notif.groupby(["id", "startDate"])[["startTime"]].shift(-1))
    session_firsts_notif = session_firsts_notif.assign(duration_shift=(session_firsts_notif["start_shift"] - session_firsts_notif["endTime"]).dt.total_seconds())

    mean_shift_pd = (session_firsts_notif.groupby(["id", "startDate"])["duration_shift"].mean() / 60)
    mean_shift = mean_shift_pd.groupby("id").mean()

    if avg:
        return mean_shift.rename("mins_between_notif_sessions")
    else:
        return mean_shift_pd.rename("mins_between_notif_sessions")


In [19]:
time_between_notif_sessions = calc_time_between_notification_sessions(ae_all_waves.__data__) # minutes
print(time_between_notif_sessions.head())

avg_time_between_notif_sessions = calc_time_between_notification_sessions(ae_all_waves.__data__, avg=True) # minutes
print(avg_time_between_notif_sessions.head())

id                                    startDate 
0079ff14-3579-4e27-b1e5-a838bc23b295  2021-04-27     76.623242
                                      2021-04-28    132.530831
                                      2021-04-29     64.333117
                                      2021-04-30     91.133078
                                      2021-05-01     78.702731
Name: mins_between_notif_sessions, dtype: float64
id
0079ff14-3579-4e27-b1e5-a838bc23b295    156.689824
01617409-e832-4bd9-b139-30189de7e827    138.011087
028f0ac3-1720-487e-b5eb-3ee02d0edfe7    134.676165
037e80a0-ad12-4707-adf2-a77351a4456c    106.222736
03c0f04d-6eb5-4dc5-bcfd-5642a6240681    116.813850
Name: mins_between_notif_sessions, dtype: float64


#### Variability smartphone use during week

In [20]:
def calc_weekly_use_variability(df: pd.DataFrame, duration: None):
    if duration:
        name = "duration"
        variability = df.groupby(["id", pd.Grouper(key="startDate", freq="W")])["duration"].sum().groupby("id").std()
    else:
        name = "appevents"
        variability = df.groupby(["id", pd.Grouper(key="startDate", freq="W")])["application"].count().groupby("id").std()

    return variability.rename(f"weekly_variability_{name}")

In [21]:
weekly_use_variability = calc_weekly_use_variability(ae_all_waves.__data__, duration=True)
print(weekly_use_variability.head())

id
0079ff14-3579-4e27-b1e5-a838bc23b295    20593.837046
01617409-e832-4bd9-b139-30189de7e827    20482.556924
028f0ac3-1720-487e-b5eb-3ee02d0edfe7    30855.065644
037e80a0-ad12-4707-adf2-a77351a4456c    10345.634541
03c0f04d-6eb5-4dc5-bcfd-5642a6240681    28985.049991
Name: weekly_variability_duration, dtype: float64


#### (Average) daily use/events/notifications
##### &#8594; non-social (process) related apps

In [22]:
social_cat = ["Social", "Calling"]
all_cat = ae_all_waves.__data__.category.unique().tolist()
non_social_cat = list(set(all_cat) - set(social_cat))

daily_non_social_applications = (ae_all_waves.get_daily_duration(category=non_social_cat, series_unit='day') / 60).rename('daily_durations_non_social')
print(daily_non_social_applications.head())
avg_daily_non_social_applications = (ae_all_waves.get_daily_duration(category=non_social_cat) / 60).rename('avg_daily_durations_non_social')
print(avg_daily_non_social_applications.head())

id                                    day       
0079ff14-3579-4e27-b1e5-a838bc23b295  2021-04-27     50.957017
                                      2021-04-28    121.114517
                                      2021-04-29    114.844433
                                      2021-04-30    157.519400
                                      2021-05-01     90.248800
Name: daily_durations_non_social, dtype: float64
id
0079ff14-3579-4e27-b1e5-a838bc23b295    108.309176
01617409-e832-4bd9-b139-30189de7e827     51.970424
028f0ac3-1720-487e-b5eb-3ee02d0edfe7     77.077691
037e80a0-ad12-4707-adf2-a77351a4456c     30.278039
03c0f04d-6eb5-4dc5-bcfd-5642a6240681     57.734578
Name: avg_daily_durations_non_social, dtype: float64


In [23]:
freq_non_social_applications = ae_all_waves.get_daily_events(category=non_social_cat, series_unit='day').rename('daily_events_non_social')
print(freq_non_social_applications.head())
avg_freq_non_social_applications = ae_all_waves.get_daily_events(category=non_social_cat).rename('daily_events_non_social')
print(avg_freq_non_social_applications.head())

id                                    day       
0079ff14-3579-4e27-b1e5-a838bc23b295  2021-04-27     87.0
                                      2021-04-28    152.0
                                      2021-04-29    142.0
                                      2021-04-30    124.0
                                      2021-05-01    121.0
Name: daily_events_non_social, dtype: float64
id
0079ff14-3579-4e27-b1e5-a838bc23b295    135.333333
01617409-e832-4bd9-b139-30189de7e827     99.117647
028f0ac3-1720-487e-b5eb-3ee02d0edfe7    112.200000
037e80a0-ad12-4707-adf2-a77351a4456c     80.444444
03c0f04d-6eb5-4dc5-bcfd-5642a6240681     96.000000
Name: daily_events_non_social, dtype: float64


##### &#8594; browser application

In [24]:
browser_use = (ae_all_waves.get_daily_duration(category='Web', series_unit='day') / 60)
print(browser_use.head())

avg_browser_use = (ae_all_waves.get_daily_duration(category='Web') / 60)
print(avg_browser_use.head())


freq_browser_use = ae_all_waves.get_daily_events(category='Web', series_unit='day')
print(freq_browser_use.head())

avg_freq_browser_use = ae_all_waves.get_daily_events(category='Web')
print(avg_freq_browser_use.head())


id                                    day       
0079ff14-3579-4e27-b1e5-a838bc23b295  2021-04-27    11.147367
                                      2021-04-28    26.382517
                                      2021-04-29    27.494983
                                      2021-04-30    16.093867
                                      2021-05-01    20.620367
Name: daily_durations_web, dtype: float64
id
0079ff14-3579-4e27-b1e5-a838bc23b295    20.696959
01617409-e832-4bd9-b139-30189de7e827    11.913770
028f0ac3-1720-487e-b5eb-3ee02d0edfe7     4.997208
037e80a0-ad12-4707-adf2-a77351a4456c     9.535759
03c0f04d-6eb5-4dc5-bcfd-5642a6240681    11.674350
Name: daily_durations_web, dtype: float64
id                                    day       
0079ff14-3579-4e27-b1e5-a838bc23b295  2021-04-27    18.0
                                      2021-04-28    25.0
                                      2021-04-29    31.0
                                      2021-04-30    18.0
                           

##### &#8594; news applications

In [25]:
news_use = (ae_all_waves.get_daily_duration(category='News', series_unit='day') / 60)
print(news_use.head())

avg_news_use = (ae_all_waves.get_daily_duration(category='News') / 60)
print(avg_news_use.head())

freq_news_use = ae_all_waves.get_daily_events(category='News', series_unit='day')
print(freq_news_use.head())

avg_freq_news_use = ae_all_waves.get_daily_events(category='News')
print(avg_freq_news_use.head())

id                                    day       
0079ff14-3579-4e27-b1e5-a838bc23b295  2021-04-27    14.531133
                                      2021-04-28    35.325367
                                      2021-04-29    48.477800
                                      2021-04-30    17.674983
                                      2021-05-01    25.763917
Name: daily_durations_news, dtype: float64
id
0079ff14-3579-4e27-b1e5-a838bc23b295    34.053221
01617409-e832-4bd9-b139-30189de7e827     0.871387
028f0ac3-1720-487e-b5eb-3ee02d0edfe7     3.519170
037e80a0-ad12-4707-adf2-a77351a4456c     1.325008
03c0f04d-6eb5-4dc5-bcfd-5642a6240681     1.523800
Name: daily_durations_news, dtype: float64
id                                    day       
0079ff14-3579-4e27-b1e5-a838bc23b295  2021-04-27     9.0
                                      2021-04-28    28.0
                                      2021-04-29    23.0
                                      2021-04-30    10.0
                         

##### &#8594; instagram

In [26]:
daily_instagram_use = (ae_all_waves.get_daily_duration(application="com.instagram.android", series_unit='day') / 60).rename('daily_durations_instagram')
print(daily_instagram_use.head())

avg_daily_instagram_use = (ae_all_waves.get_daily_duration(application="com.instagram.android") / 60).rename('avg_daily_durations_instagram')
print(avg_daily_instagram_use.head())

freq_instagram_use = ae_all_waves.get_daily_events(application="com.instagram.android", series_unit='day').rename('daily_events_instagram')
print(freq_instagram_use.head())

avg_freq_instagram_use = ae_all_waves.get_daily_events(application="com.instagram.android").rename('avg_daily_events_instagram')
print(avg_freq_instagram_use.head())

id                                    day       
01617409-e832-4bd9-b139-30189de7e827  2021-04-25    16.523050
                                      2021-04-26     7.634317
                                      2021-04-27    28.343483
                                      2021-04-28    20.830483
                                      2021-04-29    15.832467
Name: daily_durations_instagram, dtype: float64
id
01617409-e832-4bd9-b139-30189de7e827    16.484651
028f0ac3-1720-487e-b5eb-3ee02d0edfe7     0.706203
03ce19e0-e1e9-4051-a1ac-afaac0557806    17.611857
0405c848-f9bf-4a27-9fc6-62b624287609     7.701448
058d96e2-285f-42d3-a0f2-c9662da24814     0.399042
Name: avg_daily_durations_instagram, dtype: float64
id                                    day       
01617409-e832-4bd9-b139-30189de7e827  2021-04-25     4.0
                                      2021-04-26     2.0
                                      2021-04-27     7.0
                                      2021-04-28     8.0
           

### Headaches features
###### Daily screen time
&#8594; Already done

##### (Average) Daily call duration/frequency

In [27]:
daily_call_duration = (ae_all_waves.get_daily_duration(category='Calling', series_unit='day') /60)
print(daily_call_duration.head())

avg_daily_call_duration = (ae_all_waves.get_daily_duration(category='Calling') /60)
print(avg_daily_call_duration.head())

freq_daily_call = ae_all_waves.get_daily_events(category='Calling', series_unit='day')
print(freq_daily_call.head())

avg_freq_daily_call = ae_all_waves.get_daily_events(category='Calling')
print(avg_freq_daily_call.head())

id                                    day       
0079ff14-3579-4e27-b1e5-a838bc23b295  2021-05-03     1.438583
                                      2021-05-06     0.144367
                                      2021-05-08     0.113950
01617409-e832-4bd9-b139-30189de7e827  2021-04-26    10.198383
                                      2021-04-27     5.810800
Name: daily_durations_calling, dtype: float64
id
0079ff14-3579-4e27-b1e5-a838bc23b295     0.565633
01617409-e832-4bd9-b139-30189de7e827     8.321474
028f0ac3-1720-487e-b5eb-3ee02d0edfe7     0.355383
037e80a0-ad12-4707-adf2-a77351a4456c     1.987639
03c0f04d-6eb5-4dc5-bcfd-5642a6240681    16.871298
Name: daily_durations_calling, dtype: float64
id                                    day       
0079ff14-3579-4e27-b1e5-a838bc23b295  2021-05-03     4.0
                                      2021-05-06     1.0
                                      2021-05-08     1.0
01617409-e832-4bd9-b139-30189de7e827  2021-04-26    62.0
                   

### Activity features
##### Average daily number of (unique) used apps

In [28]:
daily_unique_apps = ae_all_waves.__data__.groupby(["id", "startDate"])["application"].nunique().rename('daily_unique_apps')
print(daily_unique_apps.head())

avg_daily_unique_apps = (ae_all_waves.__data__.groupby(["id", "startDate"])["application"].nunique().groupby("id")
                         .mean()).rename('avg_daily_unique_apps')
print(avg_daily_unique_apps.head())

id                                    startDate 
0079ff14-3579-4e27-b1e5-a838bc23b295  2021-04-27    16
                                      2021-04-28    21
                                      2021-04-29    23
                                      2021-04-30    30
                                      2021-05-01    24
Name: daily_unique_apps, dtype: int64
id
0079ff14-3579-4e27-b1e5-a838bc23b295    21.733333
01617409-e832-4bd9-b139-30189de7e827    27.352941
028f0ac3-1720-487e-b5eb-3ee02d0edfe7    25.480000
037e80a0-ad12-4707-adf2-a77351a4456c    14.777778
03c0f04d-6eb5-4dc5-bcfd-5642a6240681    21.222222
Name: avg_daily_unique_apps, dtype: float64


##### (Average) daily duration/ frequency of app use
&#8594; Already done

##### Increase/decrease in battery status

In [29]:
def calc_battery_status(df: pd.DataFrame):
    """
    Calculates four battery status variables per participant:
    - daily average battery level
    - daily std dev of battery level
    - daily average charge %
    - daily average discharge %

    :param df: the appevents DataFrame
    :return: results DataFrame with 4 variables per participant
    """
    df = df.copy()
    df = df.sort_values(['id', 'startTime']).assign(battery_shift= df.groupby(['id', 'startDate'])['battery'].shift(-1))
    df = df.assign(battery_change=df['battery_shift'] - df['battery'])

    battery_avg = (df.groupby(["id", "startDate"])["battery"].mean().groupby('id').mean()).rename('avg_daily_battery')
    battery_std = (df.groupby(["id", "startDate"])["battery"].mean().groupby('id').std()).rename("battery_std")

    battery_discharge = (df[df["battery_change"] < 0].groupby(['id', 'startDate'])['battery_change'].sum()
                         .abs().groupby('id').mean()).rename("battery_daily_discharge")
    battery_charge = (df[df["battery_change"] > 0].groupby(["id", "startDate"])["battery_change"].sum()
                      .abs().groupby("id").mean()).rename("battery_daily_charge")

    res = pd.concat([
        battery_charge,
        battery_discharge
    ], axis=1)

    return res

In [30]:
    battery_status = calc_battery_status(ae_all_waves.__data__)
    print(battery_status.head())
    # TODO not completely right, averaging app events not best way
    # TODO per day

                                      battery_daily_charge  \
id                                                           
0079ff14-3579-4e27-b1e5-a838bc23b295             76.142857   
01617409-e832-4bd9-b139-30189de7e827             86.714286   
028f0ac3-1720-487e-b5eb-3ee02d0edfe7             68.600000   
037e80a0-ad12-4707-adf2-a77351a4456c             75.625000   
03c0f04d-6eb5-4dc5-bcfd-5642a6240681             42.125000   

                                      battery_daily_discharge  
id                                                             
0079ff14-3579-4e27-b1e5-a838bc23b295                60.066667  
01617409-e832-4bd9-b139-30189de7e827                69.411765  
028f0ac3-1720-487e-b5eb-3ee02d0edfe7                65.960000  
037e80a0-ad12-4707-adf2-a77351a4456c                50.666667  
03c0f04d-6eb5-4dc5-bcfd-5642a6240681                41.777778  


##### Average daily time between consecutive phone use sessions

In [31]:
def calc_time_between_consecutive_sessions(df: pd.DataFrame, avg=False):
    session_firsts = df.groupby(["id", "session"]).head(1)

    session_firsts = session_firsts.assign(start_shift= session_firsts.groupby(["id", "startDate"])[["startTime"]].shift(-1))
    session_firsts= session_firsts.assign(duration_shift=(session_firsts["start_shift"] - session_firsts["endTime"]).dt.total_seconds())

    mean_shift_pd = (session_firsts.groupby(["id", "startDate"])["duration_shift"].mean() / 60) # pd= per day
    mean_shift = mean_shift_pd.groupby("id").mean()  # avg by user

    if avg:
        return mean_shift.rename("mins_between_sessions")
    else:
        return mean_shift_pd.rename("mins_between_sessions")

In [32]:
daily_time_between_sessions = calc_time_between_consecutive_sessions(ae_all_waves.__data__)
print(daily_time_between_sessions.head())
avg_daily_time_between_sessions = calc_time_between_consecutive_sessions(ae_all_waves.__data__, avg=True)
print(avg_daily_time_between_sessions.head())

id                                    startDate 
0079ff14-3579-4e27-b1e5-a838bc23b295  2021-04-27    30.254123
                                      2021-04-28    25.472930
                                      2021-04-29    25.215444
                                      2021-04-30    24.486846
                                      2021-05-01    28.959849
Name: mins_between_sessions, dtype: float64
id
0079ff14-3579-4e27-b1e5-a838bc23b295    28.356214
01617409-e832-4bd9-b139-30189de7e827    23.584685
028f0ac3-1720-487e-b5eb-3ee02d0edfe7    20.065113
037e80a0-ad12-4707-adf2-a77351a4456c    22.828693
03c0f04d-6eb5-4dc5-bcfd-5642a6240681    13.546520
Name: mins_between_sessions, dtype: float64


## 4. Merge all features in one dataframe
### Daily counted features
These are the features that are related to mobile use ***per person, per day***

In [33]:
# merge all dataframes with 'day' index
temp1 = (pd.merge(general_screen_time, smartphone_use_freq, on=['id', 'day']
).merge(duration_MIM_applications, on=['id', 'day']
).merge(freq_MIM_applications, on=['id', 'day']
).merge(daily_use_work_hours, on=['id', 'day']
).merge(daily_social_applications, on=['id', 'day']
).merge(freq_social_applications, on=['id', 'day']
).merge(daily_use_evening,  on=['id', 'day']
).merge(daily_use_night, on=['id', 'day']
).merge(freq_evening_use, on=['id', 'day']
).merge(freq_night_use, on=['id', 'day']
).merge(daily_non_social_applications, on=['id', 'day']
).merge(freq_non_social_applications, on=['id', 'day']
).merge(browser_use, on=['id', 'day']
).merge(freq_browser_use, on=['id', 'day']
).merge(news_use, on=['id', 'day']
).merge(freq_news_use, on=['id', 'day']
).merge(daily_instagram_use, on=['id', 'day']
).merge(freq_instagram_use, on=['id', 'day']
).merge(daily_call_duration, on=['id', 'day']
).merge(freq_daily_call, on=['id', 'day']
))

# merge all dataframes with 'startDate' index
temp2 = pd.merge(daily_unique_apps, time_between_notif_sessions, on=['id', 'startDate'])

# change type of the date columns a give similar name
temp1 = temp1.reset_index().astype({'day': 'datetime64[ns]'}).set_index(['id','day'])
temp2 = temp2.reset_index().rename(columns={'startDate': 'day'}).set_index(['id', 'day'])

# merge into one dataframe
res = (pd.merge(temp1, temp2, on=['id', 'day']))
print(res.head())

                                                 daily_durations  \
id                                   day                           
01617409-e832-4bd9-b139-30189de7e827 2021-04-26        68.323350   
                                     2021-04-27       161.490367   
                                     2021-04-29        58.077383   
                                     2021-04-30        81.203833   
                                     2021-05-01       114.517833   

                                                 daily_events  \
id                                   day                        
01617409-e832-4bd9-b139-30189de7e827 2021-04-26         160.0   
                                     2021-04-27         165.0   
                                     2021-04-29         158.0   
                                     2021-04-30         108.0   
                                     2021-05-01         147.0   

                                                 daily_durations_ch

In [34]:
res = res.reset_index().merge(df_mapping, on='id')
print(res.head())
res.to_csv("../data/filtered_appevents_features.csv", index=False, sep=';', decimal=',')

                                     id        day  daily_durations  \
0  0405c848-f9bf-4a27-9fc6-62b624287609 2021-04-20       156.062400   
1  0405c848-f9bf-4a27-9fc6-62b624287609 2021-04-22       183.103183   
2  0405c848-f9bf-4a27-9fc6-62b624287609 2021-04-23       227.671467   
3  0405c848-f9bf-4a27-9fc6-62b624287609 2021-04-27       224.706800   
4  0405c848-f9bf-4a27-9fc6-62b624287609 2021-04-28       165.493050   

   daily_events  daily_durations_chat  daily_events_chat  \
0         194.0             42.373883               39.0   
1         202.0             20.075767               34.0   
2         255.0             22.858300               40.0   
3         203.0             19.491467               17.0   
4         233.0             14.512483               24.0   

   daily_durations_chat_['morning', 'noon']  daily_durations_social  \
0                                 13.368733               92.213700   
1                                  2.191267               78.761117   

### Averaged features:
These are the features that are related to the ***average daily*** mobile usage of a person

In [35]:
avg_feat = pd.merge(avg_daily_use_work_hours, avg_daily_social_applications, on=['id']
).merge(avg_freq_social_applications, on=['id']
).merge(avg_daily_use_evening, on=['id']
).merge(avg_daily_use_night, on=['id']
).merge(avg_freq_evening_use, on=['id']
).merge(avg_freq_night_use, on=['id']
).merge(avg_time_between_notif_sessions, on=['id']
).merge(weekly_use_variability, on=['id']
).merge(avg_daily_non_social_applications, on=['id']
).merge(avg_freq_non_social_applications, on=['id']
).merge(avg_browser_use, on=['id']
).merge(avg_freq_browser_use, on=['id']
).merge(avg_news_use, on=['id']
).merge(avg_freq_news_use, on=['id']
).merge(avg_daily_instagram_use, on=['id']
).merge(avg_freq_instagram_use, on=['id']
).merge(avg_daily_call_duration, on=['id']
).merge(avg_freq_daily_call, on=['id']
).merge(avg_daily_unique_apps, on=['id']
).merge(avg_daily_time_between_sessions, on=['id']).groupby('id').first()

print(avg_feat.head())

                                      daily_durations_chat_['morning', 'noon']  \
id                                                                               
01617409-e832-4bd9-b139-30189de7e827                                  5.028610   
028f0ac3-1720-487e-b5eb-3ee02d0edfe7                                 11.704453   
0405c848-f9bf-4a27-9fc6-62b624287609                                  7.918373   
0c183eb9-04eb-4337-88c9-cc9a0f7c7f7c                                  3.892389   
0cde90dd-f657-48ca-92a5-d6569580cedb                                  2.316216   

                                      daily_durations_social  \
id                                                             
01617409-e832-4bd9-b139-30189de7e827               37.857345   
028f0ac3-1720-487e-b5eb-3ee02d0edfe7               60.493867   
0405c848-f9bf-4a27-9fc6-62b624287609              111.902298   
0c183eb9-04eb-4337-88c9-cc9a0f7c7f7c              112.561012   
0cde90dd-f657-48ca-92a5-d6569580cedb     