

# **Will you skip this music track or not?**


The public part of the dataset consists of roughly 130 million listening sessions with associated user interactions on the Spotify service. 

The task is to predict whether individual tracks encountered in a listening session will be skipped by a particular user. In order to do this, complete information about the first half of a user’s listening session is provided, while the prediction is to be carried out on the second half. Participants have access to metadata, as well as acoustic descriptors, for all the tracks encountered in listening sessions.

https://www.aicrowd.com/challenges/spotify-sequential-skip-prediction-challenge

Brost, B., Mehrotra, R., & Jehan, T. (2019, May). The music streaming sessions dataset. In The World Wide Web Conference (pp. 2594-2600).



As the entire dataset is too big to experiment data manipulation, Spotify provided a mini dataset for this purpose.

In this script, we will do data wrangling to inspect the quality of the data, and do data engineering to generate features for machine learning modeling.


# Mount Google drive to Colab

In [1]:
# # For Colab only
#from google.colab import drive
#drive.mount('/content/drive')
#%cd /content/drive/MyDrive/Capstone_SpotifyStreaming/notebooks

#!pip install featuretools==0.4.0
#!pip install -U featuretools
#!pip install featuretools

# # check the installed packages
# pip list -v 

In [2]:
import numpy as np
import pandas as pd
import featuretools as ft
import time
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')

# Load the data and perform some data cleaning/re-coding as described in 2_mini_EDA

In [3]:
# load the track information (mini version)

tf_df = pd.read_csv('../data/raw/data/track_features/tf_mini.csv')
log_df = pd.read_csv('../data/raw/data/training_set/log_mini.csv')

In [4]:
# perform some data cleaning/re-coding as described in 2_mini_EDA

tf_df_dummy = pd.get_dummies(tf_df, columns=['key','time_signature','mode'])
log_df_dummy = pd.get_dummies(log_df.drop(columns = ['session_length',  'hist_user_behavior_reason_end', 'hist_user_behavior_n_seekfwd','hist_user_behavior_n_seekback']), columns=['hist_user_behavior_reason_start', 'context_type'])


In [5]:
tf_df_dummy.head().T

Unnamed: 0,0,1,2,3,4
track_id,t_a540e552-16d4-42f8-a185-232bd650ea7d,t_67965da0-132b-4b1e-8a69-0ef99b32287c,t_0614ecd3-a7d5-40a1-816e-156d5872a467,t_070a63a0-744a-434e-9913-a97b02926a29,t_d6990e17-9c31-4b01-8559-47d9ce476df1
duration,109.706673,187.693329,160.839996,175.399994,369.600006
release_year,1950,1950,1951,1951,1951
us_popularity_estimate,99.975414,99.96943,99.602549,99.665018,99.991764
acousticness,0.45804,0.916272,0.812884,0.396854,0.728831
beat_strength,0.519497,0.419223,0.42589,0.400934,0.371328
bounciness,0.504949,0.54553,0.50828,0.35999,0.335115
danceability,0.399767,0.491235,0.491625,0.552227,0.483044
dyn_range_mean,7.51188,9.098376,8.36867,5.967346,5.802681
energy,0.817709,0.154258,0.358813,0.514585,0.721442


In [6]:
log_df_dummy.head().T

Unnamed: 0,0,1,2,3,4
session_id,0_00006f66-33e5-4de7-a324-2d18e439fc1e,0_00006f66-33e5-4de7-a324-2d18e439fc1e,0_00006f66-33e5-4de7-a324-2d18e439fc1e,0_00006f66-33e5-4de7-a324-2d18e439fc1e,0_00006f66-33e5-4de7-a324-2d18e439fc1e
session_position,1,2,3,4,5
track_id_clean,t_0479f24c-27d2-46d6-a00c-7ec928f2b539,t_9099cd7b-c238-47b7-9381-f23f2c1d1043,t_fc5df5ba-5396-49a7-8b29-35d0d28249e0,t_23cff8d6-d874-4b20-83dc-94e450e8aa20,t_64f3743c-f624-46bb-a579-0f3f9a07a123
skip_1,False,False,False,False,False
skip_2,False,False,False,False,False
skip_3,False,False,False,False,False
not_skipped,True,True,True,True,True
context_switch,0,0,0,0,0
no_pause_before_play,0,1,1,1,1
short_pause_before_play,0,0,0,0,0


# add skipping information as new columns

In [7]:
session_id = log_df_dummy['session_id'].unique()
print('number of sessions in this mini dataset:',len(session_id))


number of sessions in this mini dataset: 10000


In [8]:
# # the function of integrating the skipping labels into one column
# def skip_label(df):
#     skip = (df['not_skipped']==False).astype(int)*4 # no skip: 0, ultra-late skip: 4
#     # It has to go under this order. If skip_1 = True, then skip_2 and _3 will be True too.
#     skip[df['skip_3']==True] = 3 # late skip
#     skip[df['skip_2']==True] = 2 # mid skip
#     skip[df['skip_1']==True] = 1 # early skip
#     return skip

# log_df_dummy['skip_label'] = skip_label(log_df_dummy)


In [9]:
# make a column which has session ID and skip info
# log_df_dummy['session_id_skip_label'] = log_df_dummy['session_id'] + '_skip_' + log_df_dummy['skip_label'].astype(str)
log_df_dummy['session_id_skip_2_False'] = log_df_dummy['session_id'] * (log_df_dummy['skip_2'] == False)
log_df_dummy['session_id_skip_2_True'] = log_df_dummy['session_id'] * (log_df_dummy['skip_2'] == True)
# log_df_dummy['session_id_not_skipped_True'] = log_df_dummy['session_id'] * (log_df_dummy['not_skipped'] == True)
# log_df_dummy['session_id_not_skipped_False'] = log_df_dummy['session_id'] * (log_df_dummy['not_skipped'] == False)

log_df_dummy.head(10).T

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
session_id,0_00006f66-33e5-4de7-a324-2d18e439fc1e,0_00006f66-33e5-4de7-a324-2d18e439fc1e,0_00006f66-33e5-4de7-a324-2d18e439fc1e,0_00006f66-33e5-4de7-a324-2d18e439fc1e,0_00006f66-33e5-4de7-a324-2d18e439fc1e,0_00006f66-33e5-4de7-a324-2d18e439fc1e,0_00006f66-33e5-4de7-a324-2d18e439fc1e,0_00006f66-33e5-4de7-a324-2d18e439fc1e,0_00006f66-33e5-4de7-a324-2d18e439fc1e,0_00006f66-33e5-4de7-a324-2d18e439fc1e
session_position,1,2,3,4,5,6,7,8,9,10
track_id_clean,t_0479f24c-27d2-46d6-a00c-7ec928f2b539,t_9099cd7b-c238-47b7-9381-f23f2c1d1043,t_fc5df5ba-5396-49a7-8b29-35d0d28249e0,t_23cff8d6-d874-4b20-83dc-94e450e8aa20,t_64f3743c-f624-46bb-a579-0f3f9a07a123,t_c815228b-3212-4f9e-9d4f-9cb19b248184,t_e23c19f5-4c32-4557-aa44-81372c2e3705,t_0be6eced-f56f-48bd-8086-f2e0b760fdee,t_f3ecbd3b-9e8e-4557-b8e0-39cfcd7e65dd,t_2af4dfa0-7df3-4b7e-b7ab-353ba48237f9
skip_1,False,False,False,False,False,False,True,True,False,True
skip_2,False,False,False,False,False,False,True,True,True,True
skip_3,False,False,False,False,False,True,True,True,True,True
not_skipped,True,True,True,True,True,False,False,False,False,False
context_switch,0,0,0,0,0,0,0,0,0,0
no_pause_before_play,0,1,1,1,1,1,1,1,1,1
short_pause_before_play,0,0,0,0,0,0,0,0,0,0


# Calculate the feature distance/similarity between adjacent tracks

The track information dataframe contains acoustic analysis and scores of each track on 8 acoustic features (see: https://benanne.github.io/2014/08/05/spotify-cnns.html). Therefore, within each session, I would like to calculate the ***distance*** or ***similarity*** of each track to the other tracks.

In [10]:
# extract the acoustic features of each track
df = log_df_dummy.merge(tf_df_dummy[['track_id','acousticness','beat_strength','danceability',
                                     'dyn_range_mean', 'energy', 'flatness','instrumentalness', 'liveness', 
                                     'loudness', 'mechanism', 'organism','speechiness','valence',
                                     'acoustic_vector_0','acoustic_vector_1', 'acoustic_vector_2', 'acoustic_vector_3',
                                     'acoustic_vector_4', 'acoustic_vector_5', 'acoustic_vector_6','acoustic_vector_7']], 
                        left_on = 'track_id_clean', 
                        right_on = 'track_id')
df.sort_values(by = ['session_id', 'session_position'],inplace = True)
df.head().T

Unnamed: 0,0,45,50,327,353
session_id,0_00006f66-33e5-4de7-a324-2d18e439fc1e,0_00006f66-33e5-4de7-a324-2d18e439fc1e,0_00006f66-33e5-4de7-a324-2d18e439fc1e,0_00006f66-33e5-4de7-a324-2d18e439fc1e,0_00006f66-33e5-4de7-a324-2d18e439fc1e
session_position,1,2,3,4,5
track_id_clean,t_0479f24c-27d2-46d6-a00c-7ec928f2b539,t_9099cd7b-c238-47b7-9381-f23f2c1d1043,t_fc5df5ba-5396-49a7-8b29-35d0d28249e0,t_23cff8d6-d874-4b20-83dc-94e450e8aa20,t_64f3743c-f624-46bb-a579-0f3f9a07a123
skip_1,False,False,False,False,False
skip_2,False,False,False,False,False
skip_3,False,False,False,False,False
not_skipped,True,True,True,True,True
context_switch,0,0,0,0,0
no_pause_before_play,0,1,1,1,1
short_pause_before_play,0,0,0,0,0


In [11]:
temp_data = df.loc[df['session_id'] == session_id[0]]
temp_data

Unnamed: 0,session_id,session_position,track_id_clean,skip_1,skip_2,skip_3,not_skipped,context_switch,no_pause_before_play,short_pause_before_play,...,speechiness,valence,acoustic_vector_0,acoustic_vector_1,acoustic_vector_2,acoustic_vector_3,acoustic_vector_4,acoustic_vector_5,acoustic_vector_6,acoustic_vector_7
0,0_00006f66-33e5-4de7-a324-2d18e439fc1e,1,t_0479f24c-27d2-46d6-a00c-7ec928f2b539,False,False,False,True,0,0,0,...,0.069717,0.152255,-0.815775,0.386409,0.23016,0.028028,-0.333373,0.015452,-0.35359,0.205826
45,0_00006f66-33e5-4de7-a324-2d18e439fc1e,2,t_9099cd7b-c238-47b7-9381-f23f2c1d1043,False,False,False,True,0,1,0,...,0.061158,0.337152,-0.713646,0.363718,0.310315,-0.042222,-0.383164,0.066357,-0.365308,0.15792
50,0_00006f66-33e5-4de7-a324-2d18e439fc1e,3,t_fc5df5ba-5396-49a7-8b29-35d0d28249e0,False,False,False,True,0,1,0,...,0.045354,0.373862,-0.742541,0.375599,0.25266,-0.049007,-0.299745,0.063341,-0.486689,0.181604
327,0_00006f66-33e5-4de7-a324-2d18e439fc1e,4,t_23cff8d6-d874-4b20-83dc-94e450e8aa20,False,False,False,True,0,1,0,...,0.229936,0.64942,-0.705116,0.317562,0.289141,-0.03892,-0.393358,0.092719,-0.364418,0.285603
353,0_00006f66-33e5-4de7-a324-2d18e439fc1e,5,t_64f3743c-f624-46bb-a579-0f3f9a07a123,False,False,False,True,0,1,0,...,0.24098,0.652921,-0.868489,0.33128,0.210478,0.08474,-0.333287,-0.025706,-0.51035,0.182315
475,0_00006f66-33e5-4de7-a324-2d18e439fc1e,6,t_c815228b-3212-4f9e-9d4f-9cb19b248184,False,False,True,False,0,1,0,...,0.133586,0.661081,-0.817504,0.283297,0.387589,0.279636,-0.280334,0.117993,0.106159,0.311233
537,0_00006f66-33e5-4de7-a324-2d18e439fc1e,7,t_e23c19f5-4c32-4557-aa44-81372c2e3705,True,True,True,False,0,1,0,...,0.409848,0.10942,-0.748412,0.321976,0.237488,0.00348,-0.315287,0.032431,-0.464694,0.200836
540,0_00006f66-33e5-4de7-a324-2d18e439fc1e,8,t_0be6eced-f56f-48bd-8086-f2e0b760fdee,True,True,True,False,0,1,0,...,0.103687,0.389913,-0.921928,0.35974,0.293674,0.115302,-0.274987,0.043193,-0.444351,0.211909
541,0_00006f66-33e5-4de7-a324-2d18e439fc1e,9,t_f3ecbd3b-9e8e-4557-b8e0-39cfcd7e65dd,False,True,True,False,0,1,0,...,0.049853,0.338321,-0.744412,0.3087,0.230126,0.066493,-0.242549,0.02537,-0.40321,0.15935
601,0_00006f66-33e5-4de7-a324-2d18e439fc1e,10,t_2af4dfa0-7df3-4b7e-b7ab-353ba48237f9,True,True,True,False,0,1,0,...,0.154609,0.257672,-0.647221,0.316101,0.251329,-0.041532,-0.252359,0.059971,-0.313696,0.126421


In [12]:
def cal_dist(x):
    from scipy.spatial.distance import cdist
    Y_euc = cdist(x, x, 'euclidean')
    Y_cos = cdist(x, x, 'cosine')
    Y_man = cdist(x, x, 'cityblock')
    # The 1st track of each session should have unreasonably far distance
    euc_dist = [0]
    cos_dist = [0]
    man_dist = [0]
    for n in range(1,len(x)):
        euc_dist.append(Y_euc[n,n-1])
        cos_dist.append(Y_cos[n,n-1])
        man_dist.append(Y_man[n,n-1])
    return euc_dist, cos_dist, man_dist


In [13]:
# calculate the distance/similarity within each session
## probably the acoustic features have to be scaled before combining the data!!!
from sklearn.preprocessing import StandardScaler

# as the last 20% of the rows of each session will be used as the testing dataset, they should not be fitted by the scaler
train_perc = 0.8

# sel_col_names = ['skip_label','acousticness','beat_strength','danceability',
#                         'dyn_range_mean', 'energy', 'flatness','instrumentalness', 'liveness', 
#                         'loudness', 'mechanism', 'organism','speechiness','valence',
#                         'acoustic_vector_0','acoustic_vector_1', 'acoustic_vector_2', 'acoustic_vector_3',
#                         'acoustic_vector_4', 'acoustic_vector_5', 'acoustic_vector_6','acoustic_vector_7']
sel_col_names = ['acousticness','beat_strength','danceability',
                        'dyn_range_mean', 'energy', 'flatness','instrumentalness', 'liveness', 
                        'loudness', 'mechanism', 'organism','speechiness','valence',
                        'acoustic_vector_0','acoustic_vector_1', 'acoustic_vector_2', 'acoustic_vector_3',
                        'acoustic_vector_4', 'acoustic_vector_5', 'acoustic_vector_6','acoustic_vector_7']

start_time = time.time()

for s_id in session_id:
    temp_data = []
    temp_mat = []
    temp_data = df.loc[df['session_id'] == s_id, sel_col_names]
#     temp_mat = temp_data.drop(columns = ['skip_label']).copy()
    temp_mat = temp_data.copy()
    scaler = StandardScaler()
    scaler.fit(temp_mat[0:round(len(temp_mat)*train_perc)])
    temp_mat_scaled = scaler.transform(temp_mat)
#     temp_mat_scaled_skip0 = temp_mat_scaled[temp_data['skip_label']==0]
#     temp_mat_scaled_skip1 = temp_mat_scaled[temp_data['skip_label']==1]
#     temp_mat_scaled_skipR = temp_mat_scaled[temp_data['skip_label']>1]
    
    
    euc_dist_all, cos_dist_all, man_dist_all = cal_dist(temp_mat_scaled)
    df.loc[temp_data.index, 'euc_dist_all'] = euc_dist_all
    df.loc[temp_data.index, 'cos_dist_all'] = cos_dist_all
    df.loc[temp_data.index, 'man_dist_all'] = man_dist_all
    
#     euc_dist_skip0, cos_dist_skip0, man_dist_skip0 = cal_dist(temp_mat_scaled_skip0)
#     loc_index0 = temp_data.index[temp_data['skip_label']==0].tolist()
#     df.loc[loc_index0, 'euc_dist_skip0'] = euc_dist_skip0
#     df.loc[loc_index0, 'cos_dist_skip0'] = cos_dist_skip0
#     df.loc[loc_index0, 'man_dist_skip0'] = man_dist_skip0
    
#     euc_dist_skip1, cos_dist_skip1, man_dist_skip1 = cal_dist(temp_mat_scaled_skip1)
#     loc_index1 = temp_data.index[temp_data['skip_label']==1].tolist()
#     df.loc[loc_index1, 'euc_dist_skip1'] = euc_dist_skip1
#     df.loc[loc_index1, 'cos_dist_skip1'] = cos_dist_skip1
#     df.loc[loc_index1, 'man_dist_skip1'] = man_dist_skip1
    
#     euc_dist_skipR, cos_dist_skipR, man_dist_skipR = cal_dist(temp_mat_scaled_skipR)
#     loc_indexR = temp_data.index[temp_data['skip_label']>1].tolist()
#     df.loc[loc_indexR, 'euc_dist_skipR'] = euc_dist_skipR
#     df.loc[loc_indexR, 'cos_dist_skipR'] = cos_dist_skipR
#     df.loc[loc_indexR, 'man_dist_skipR'] = man_dist_skipR

print('***It takes ',(time.time() - start_time)/60, ' minutes.***')


***It takes  1.7423949201901754  minutes.***


In [14]:
log_df_dummy2 = df.drop(columns = ['track_id','acousticness','beat_strength','danceability',
                                     'dyn_range_mean', 'energy', 'flatness','instrumentalness', 'liveness', 
                                     'loudness', 'mechanism', 'organism','speechiness','valence',
                                     'acoustic_vector_0','acoustic_vector_1', 'acoustic_vector_2', 'acoustic_vector_3',
                                     'acoustic_vector_4', 'acoustic_vector_5', 'acoustic_vector_6','acoustic_vector_7'])
log_df_dummy2['session_id_skip_2_False'] = log_df_dummy2['session_id_skip_2_False'].astype('category')
log_df_dummy2['session_id_skip_2_True'] = log_df_dummy2['session_id_skip_2_True'].astype('category')

In [15]:
# log_df_dummy2 = df.drop(columns = ['skip_label','track_id','acousticness','beat_strength','danceability',
#                                      'dyn_range_mean', 'energy', 'flatness','instrumentalness', 'liveness', 
#                                      'loudness', 'mechanism', 'organism','speechiness','valence',
#                                      'acoustic_vector_0','acoustic_vector_1', 'acoustic_vector_2', 'acoustic_vector_3',
#                                      'acoustic_vector_4', 'acoustic_vector_5', 'acoustic_vector_6','acoustic_vector_7'])
# log_df_dummy2['session_id_skip_label'] = log_df_dummy2['session_id_skip_label'].astype('category')
# log_df_dummy2['session_id_skip_1_False'] = log_df_dummy2['session_id_skip_1_False'].astype('category')
# log_df_dummy2['session_id_skip_1_True'] = log_df_dummy2['session_id_skip_1_True'].astype('category')
# log_df_dummy2['session_id_not_skipped_True'] = log_df_dummy2['session_id_not_skipped_True'].astype('category')
# log_df_dummy2['session_id_not_skipped_False'] = log_df_dummy2['session_id_not_skipped_False'].astype('category')

# **Use featuretool to do automatic feature engineering**

In [16]:
#First, initializing an EntitySet with a name
es = ft.EntitySet(id="spotify_data")

In [17]:
from woodwork.logical_types import Categorical, PostalCode

es = es.add_dataframe(
    dataframe_name="tf",
    dataframe=tf_df_dummy,
    index="track_id",
)

es

Entityset: spotify_data
  DataFrames:
    tf [Rows: 50704, Columns: 46]
  Relationships:
    No relationships

In [18]:
es['tf'].ww.schema

Unnamed: 0_level_0,Logical Type,Semantic Tag(s)
Column,Unnamed: 1_level_1,Unnamed: 2_level_1
track_id,Unknown,['index']
duration,Double,['numeric']
release_year,Integer,['numeric']
us_popularity_estimate,Double,['numeric']
acousticness,Double,['numeric']
beat_strength,Double,['numeric']
bounciness,Double,['numeric']
danceability,Double,['numeric']
dyn_range_mean,Double,['numeric']
energy,Double,['numeric']


In [19]:
# add dataframe
# the 'session_position' contains order information within each session

es = es.add_dataframe(
    dataframe_name="log", dataframe=log_df_dummy2, make_index = True, index="event_id", time_index="session_position",
)

es

Entityset: spotify_data
  DataFrames:
    tf [Rows: 50704, Columns: 46]
    log [Rows: 167880, Columns: 36]
  Relationships:
    No relationships

In [20]:
es['log'].ww.schema

Unnamed: 0_level_0,Logical Type,Semantic Tag(s)
Column,Unnamed: 1_level_1,Unnamed: 2_level_1
event_id,Integer,['index']
session_id,Categorical,['category']
session_position,Integer,"['numeric', 'time_index']"
track_id_clean,Unknown,[]
skip_1,Boolean,[]
skip_2,Boolean,[]
skip_3,Boolean,[]
not_skipped,Boolean,[]
context_switch,Integer,['numeric']
no_pause_before_play,Integer,['numeric']


In [21]:
# When two DataFrames have a one-to-many relationship, we call the “one” DataFrame, the “parent DataFrame”. A relationship between a parent and child is defined like this:
# (parent_dataframe, parent_column, child_dataframe, child_column)
es = es.add_relationship("tf", "track_id", "log", "track_id_clean")
es

Entityset: spotify_data
  DataFrames:
    tf [Rows: 50704, Columns: 46]
    log [Rows: 167880, Columns: 36]
  Relationships:
    log.track_id_clean -> tf.track_id

In [22]:
# turn on "features_only = True" to experiment with the function without computing the feature_matrix

# primi_parameters = {
#     "include_groupby_columns":{"log": ["session_id","session_id_skip_1_False","session_id_skip_1_True","session_id_not_skipped_True","session_id_not_skipped_False"]},
#     "ignore_groupby_dataframes": ["tf"],
#     "ignore_columns": {"log":["context_type_catalog","context_type_charts","context_type_editorial_playlist","context_type_personalized_playlist","context_type_radio","context_type_user_collection","hour_of_day","date"]}
#                    }


primi_parameters = {
    "include_groupby_columns":{"log": ["session_id","session_id_skip_2_False","session_id_skip_2_True"]},
    "ignore_groupby_dataframes": ["tf"],
    "ignore_columns": {"log":["context_type_catalog","context_type_charts","context_type_editorial_playlist","context_type_personalized_playlist","context_type_radio","context_type_user_collection","hour_of_day","date","session_position"]}
                   }
primi_parameters

{'include_groupby_columns': {'log': ['session_id',
   'session_id_skip_2_False',
   'session_id_skip_2_True']},
 'ignore_groupby_dataframes': ['tf'],
 'ignore_columns': {'log': ['context_type_catalog',
   'context_type_charts',
   'context_type_editorial_playlist',
   'context_type_personalized_playlist',
   'context_type_radio',
   'context_type_user_collection',
   'hour_of_day',
   'date',
   'session_position']}}

In [23]:
# a few categorical acoustic features do not need to be included into the featuretools

primi_parameters_ignoreCategoricalAcoustic = primi_parameters.copy()

key_cols = [col for col in tf_df_dummy.columns if 'key_' in col]
time_cols = [col for col in tf_df_dummy.columns if 'time_signature_' in col]

primi_parameters_ignoreCategoricalAcoustic['ignore_columns']['tf'] = key_cols+time_cols
primi_parameters_ignoreCategoricalAcoustic

{'include_groupby_columns': {'log': ['session_id',
   'session_id_skip_2_False',
   'session_id_skip_2_True']},
 'ignore_groupby_dataframes': ['tf'],
 'ignore_columns': {'log': ['context_type_catalog',
   'context_type_charts',
   'context_type_editorial_playlist',
   'context_type_personalized_playlist',
   'context_type_radio',
   'context_type_user_collection',
   'hour_of_day',
   'date',
   'session_position'],
  'tf': ['key_0',
   'key_1',
   'key_2',
   'key_3',
   'key_4',
   'key_5',
   'key_6',
   'key_7',
   'key_8',
   'key_9',
   'key_10',
   'key_11',
   'time_signature_0',
   'time_signature_1',
   'time_signature_3',
   'time_signature_4',
   'time_signature_5']}}

In [24]:
feature_defs_test = ft.dfs(entityset=es,
                        target_dataframe_name="log",
                        groupby_trans_primitives=["Diff","CumSum", "CumMean", "CumMin", "CumMax"],
                        agg_primitives=[],
                        trans_primitives=[],
                        primitive_options={"diff": primi_parameters,
                                           "cum_sum": primi_parameters,
                                           "cum_mean": primi_parameters,
                                           "cum_min": primi_parameters_ignoreCategoricalAcoustic,
                                           "cum_max": primi_parameters_ignoreCategoricalAcoustic
                                          },
                        features_only = True,
                        n_jobs=-1)


feature_defs_test

[<Feature: session_id>,
 <Feature: session_position>,
 <Feature: skip_1>,
 <Feature: skip_2>,
 <Feature: skip_3>,
 <Feature: not_skipped>,
 <Feature: context_switch>,
 <Feature: no_pause_before_play>,
 <Feature: short_pause_before_play>,
 <Feature: long_pause_before_play>,
 <Feature: hist_user_behavior_is_shuffle>,
 <Feature: hour_of_day>,
 <Feature: premium>,
 <Feature: hist_user_behavior_reason_start_appload>,
 <Feature: hist_user_behavior_reason_start_backbtn>,
 <Feature: hist_user_behavior_reason_start_clickrow>,
 <Feature: hist_user_behavior_reason_start_endplay>,
 <Feature: hist_user_behavior_reason_start_fwdbtn>,
 <Feature: hist_user_behavior_reason_start_playbtn>,
 <Feature: hist_user_behavior_reason_start_remote>,
 <Feature: hist_user_behavior_reason_start_trackdone>,
 <Feature: hist_user_behavior_reason_start_trackerror>,
 <Feature: context_type_catalog>,
 <Feature: context_type_charts>,
 <Feature: context_type_editorial_playlist>,
 <Feature: context_type_personalized_playlis

In [25]:
len(feature_defs_test)

738

In [26]:
import warnings
def fxn():
    warnings.warn("deprecated", DeprecationWarning)

with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    fxn()
#warnings.simplefilter(action='ignore', category=pd.errors.PerformanceWarning)

In [27]:
# check this for specifying groupby: https://featuretools.alteryx.com/en/stable/guides/specifying_primitive_options.html
# The 'session_id' was specified as GroupBy option as we care what happend in each session.
# The tf (track inforamtion) dataframe does not contain any order or session information, so it does not be be specified as GroupBy option.

#warnings.simplefilter(action='ignore', category=pd.errors.PerformanceWarning)
# warnings.simplefilter("ignore")


start_time = time.time()

feature_matrix, feature_defs = ft.dfs(entityset=es,
                        target_dataframe_name="log",
                        groupby_trans_primitives=["Diff","CumSum", "CumMean", "CumMin", "CumMax"],
                        agg_primitives=[],
                        trans_primitives=[],
                        primitive_options={"diff": primi_parameters,
                                           "cum_sum": primi_parameters,
                                           "cum_mean": primi_parameters,
                                           "cum_min": primi_parameters_ignoreCategoricalAcoustic,
                                           "cum_max": primi_parameters_ignoreCategoricalAcoustic
                                          },
                        features_only = False,
                        n_jobs=5) # 8 cores will be very slow...

print('***It takes ',(time.time() - start_time)/60, ' minutes.***')

# save feature matrix
feature_matrix.to_csv('../data/processed/feature_matrix_skip2_TF.csv')


EntitySet scattered to 5 workers in 7 seconds


  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[

  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[

  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[

  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[

  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[

  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[

  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[

  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[

  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[

  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[

  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[

  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[

  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[

  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[

  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[

  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[

  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[

  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[

  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[

  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[

  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[

  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value


  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[name] = f.default_value
  frame[

***It takes  37.16111546754837  minutes.***


In [28]:
feature_matrix

Unnamed: 0_level_0,session_id,session_position,skip_1,skip_2,skip_3,not_skipped,context_switch,no_pause_before_play,short_pause_before_play,long_pause_before_play,...,DIFF(tf.speechiness) by session_id_skip_2_True,DIFF(tf.tempo) by session_id,DIFF(tf.tempo) by session_id_skip_2_False,DIFF(tf.tempo) by session_id_skip_2_True,DIFF(tf.us_popularity_estimate) by session_id,DIFF(tf.us_popularity_estimate) by session_id_skip_2_False,DIFF(tf.us_popularity_estimate) by session_id_skip_2_True,DIFF(tf.valence) by session_id,DIFF(tf.valence) by session_id_skip_2_False,DIFF(tf.valence) by session_id_skip_2_True
event_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,0_00006f66-33e5-4de7-a324-2d18e439fc1e,1,False,False,False,True,0,0,0,0,...,,,,,,,,,,
20,0_0000a72b-09ac-412f-b452-9b9e79bded8f,1,False,True,True,False,0,0,0,0,...,,,,,,,,,,
40,0_00010fc5-b79e-4cdf-bc4c-f140d0f99a3a,1,False,False,True,False,0,0,0,0,...,0.460207,,,-14.066994,,,0.031839,,,0.409766
60,0_00016a3d-9076-4f67-918f-f29e3ce160dc,1,True,True,True,False,0,0,0,0,...,,,40.448997,,,1.123040,,,0.412382,
80,0_00018b58-deb8-4f98-ac5e-d7e01b346130,1,False,False,True,False,0,0,0,0,...,-0.499567,,,-29.920998,,,-0.426843,,,-0.333178
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
167773,0_0eab7430-d291-4d57-9c92-ac7cb682f2c6,20,False,False,False,True,0,1,0,0,...,0.613955,15.776001,15.776001,-7.440002,-0.017276,-0.017276,-0.045441,-0.149514,-0.149514,0.021873
167807,0_0eac164c-f209-4590-8608-a56e67658952,20,False,False,True,False,0,1,0,0,...,-0.851562,14.338005,14.338005,12.658005,-0.191419,-0.191419,-0.113564,-0.126923,-0.126923,-0.053299
167827,0_0eacbee7-9868-48a0-9ab0-f86069329f50,20,True,True,True,False,0,1,0,0,...,-0.105432,15.980011,43.247002,15.980011,0.117369,0.047437,0.117369,0.339712,0.296191,0.339712
167847,0_0ead11fc-f32c-4eb4-8fc1-15b51432a404,20,True,True,True,False,0,0,1,1,...,0.130020,-1.011002,-0.012009,-1.011002,-0.205259,-0.202169,-0.205259,0.402161,0.171030,0.402161


In [29]:
feature_defs

[<Feature: session_id>,
 <Feature: session_position>,
 <Feature: skip_1>,
 <Feature: skip_2>,
 <Feature: skip_3>,
 <Feature: not_skipped>,
 <Feature: context_switch>,
 <Feature: no_pause_before_play>,
 <Feature: short_pause_before_play>,
 <Feature: long_pause_before_play>,
 <Feature: hist_user_behavior_is_shuffle>,
 <Feature: hour_of_day>,
 <Feature: premium>,
 <Feature: hist_user_behavior_reason_start_appload>,
 <Feature: hist_user_behavior_reason_start_backbtn>,
 <Feature: hist_user_behavior_reason_start_clickrow>,
 <Feature: hist_user_behavior_reason_start_endplay>,
 <Feature: hist_user_behavior_reason_start_fwdbtn>,
 <Feature: hist_user_behavior_reason_start_playbtn>,
 <Feature: hist_user_behavior_reason_start_remote>,
 <Feature: hist_user_behavior_reason_start_trackdone>,
 <Feature: hist_user_behavior_reason_start_trackerror>,
 <Feature: context_type_catalog>,
 <Feature: context_type_charts>,
 <Feature: context_type_editorial_playlist>,
 <Feature: context_type_personalized_playlis

In [30]:
print(feature_defs[350])
ft.describe_feature(feature_defs[350])

<Feature: CUM_MAX(tf.bounciness) by session_id_skip_2_True>


'The cumulative maximum of the "bounciness" for the instance of "tf" associated with this instance of "log" for each "session_id_skip_2_True".'