## Notebook with feature engineering process

### Curated Features list (in addition to columns)

#### 1 Features based on Sessions actions:  
1. [x] Create features event_i with according to:  
    * event_i means that it's action_info event of order i  
    * take its first order in session, i.e. if events are show_nan_nan, show_view_p3 then values for show_view_p3 is 2  
    * normalize by deviding by total number of events in user's session
2. [x] COUNT for each action_type
3. [x] MEAN, MAX and other descriptive statistics of secs_elapsed deltas

#### 2 Aggregated on Sessions:  
1. [x] COUNT DISTINCT of device_type
2. [ ] % time spent on each action type
3. [ ] count sessions per each device, MODE of Device type  
4. [ ] given that timestamp_first_active is the start of the session, analyze hour (0-23) of activity

#### 3 Transformed from users:
1. [x] Hour of first activity - users['hour_factive'] = users.timestamp_first_active.dt.hour
2. [x] date of week of account_created

**TODO**: use age_gender_bktd and countries data for features generation

In [1]:
import pandas as pd
from datetime import datetime
from tqdm.notebook import tqdm
import numpy as np
from scipy import stats
from collections import Counter

pd.options.display.float_format = "{:.2f}".format
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)
tqdm.pandas()
%load_ext autotime

time: 304 µs (started: 2021-08-12 12:33:25 +00:00)


### 0. Loading Data

In [2]:
users = pd.read_parquet('../data/processed/train_users_2.parquet')
users.shape

(73815, 15)

time: 145 ms (started: 2021-08-12 12:33:25 +00:00)


In [3]:
sessions = pd.read_parquet('../data/processed/sessions_train.parquet')
sessions.shape

(5494799, 7)

time: 2.27 s (started: 2021-08-12 12:33:25 +00:00)


In [4]:
users.head()

Unnamed: 0,id,date_account_created,timestamp_first_active,gender,age,signup_method,signup_flow,language,affiliate_channel,affiliate_provider,first_affiliate_tracked,signup_app,first_device_type,first_browser,country_destination
0,d1mm9tcy42,2014-01-01,2014-01-01 00:09:36,MALE,62.0,basic,0,en,sem-non-brand,google,omg,Web,Windows Desktop,Chrome,other
1,yo8nz8bqcq,2014-01-01,2014-01-01 00:15:58,-unknown-,,basic,0,en,direct,direct,untracked,Web,Mac Desktop,Firefox,NDF
2,4grx6yxeby,2014-01-01,2014-01-01 00:16:39,-unknown-,,basic,0,en,sem-brand,google,omg,Web,Windows Desktop,Firefox,NDF
3,ncf87guaf0,2014-01-01,2014-01-01 00:21:46,-unknown-,,basic,0,en,direct,direct,linked,Web,Windows Desktop,Chrome,NDF
4,4rvqpxoh3h,2014-01-01,2014-01-01 00:26:19,-unknown-,,basic,25,en,direct,direct,untracked,iOS,iPhone,-unknown-,GB


time: 38.7 ms (started: 2021-08-12 12:33:28 +00:00)


In [5]:
sessions.head()

Unnamed: 0,user_id,action,action_type,action_detail,device_type,secs_elapsed,action_info
0,d1mm9tcy42,lookup,,,Windows Desktop,319.0,lookup_nan_nan
1,d1mm9tcy42,search_results,click,view_search_results,Windows Desktop,67753.0,search_results_click_view_search_results
2,d1mm9tcy42,lookup,,,Windows Desktop,301.0,lookup_nan_nan
3,d1mm9tcy42,search_results,click,view_search_results,Windows Desktop,22141.0,search_results_click_view_search_results
4,d1mm9tcy42,lookup,,,Windows Desktop,435.0,lookup_nan_nan


time: 19.4 ms (started: 2021-08-12 12:33:28 +00:00)


### 2. Getting features based on Sessions

In [6]:
sessions.secs_elapsed.fillna(-1, inplace=True)
sessions.sort_values(['user_id', 'secs_elapsed'], inplace=True)
sessions.reset_index(drop=True, inplace=True)
sessions.shape

(5494799, 7)

time: 5.4 s (started: 2021-08-12 12:33:28 +00:00)


In [7]:
sessions.head(10)

Unnamed: 0,user_id,action,action_type,action_detail,device_type,secs_elapsed,action_info
0,00023iyk9l,callback,partner_callback,oauth_response,Mac Desktop,-1.0,callback_partner_callback_oauth_response
1,00023iyk9l,pending,booking_request,pending,Mac Desktop,0.0,pending_booking_request_pending
2,00023iyk9l,personalize,data,wishlist_content_update,Mac Desktop,6.0,personalize_data_wishlist_content_update
3,00023iyk9l,show,,,Mac Desktop,45.0,show_nan_nan
4,00023iyk9l,similar_listings,data,similar_listings,Mac Desktop,81.0,similar_listings_data_similar_listings
5,00023iyk9l,similar_listings,data,similar_listings,Mac Desktop,94.0,similar_listings_data_similar_listings
6,00023iyk9l,show,,,Mac Desktop,112.0,show_nan_nan
7,00023iyk9l,similar_listings,data,similar_listings,Mac Desktop,155.0,similar_listings_data_similar_listings
8,00023iyk9l,index,view,view_search_results,Mac Desktop,163.0,index_view_view_search_results
9,00023iyk9l,show,view,p3,Mac Desktop,182.0,show_view_p3


time: 30.9 ms (started: 2021-08-12 12:33:33 +00:00)


### 2.1 Generating features based on action_info Events vs its Order in the session stream with Normalization

Sessions actions:  
Create features event_i with according to:  
    * event_i means that it's action_info event of order i  
    * take its first order in session, i.e. if events are show_nan_nan, show_view_p3 then values for show_view_p3 is 2  
    * normalize by deviding by total number of events in user's session  

In [8]:
actions_info = list(sessions.action_info.unique())
len(actions_info)

336

time: 888 ms (started: 2021-08-12 12:33:33 +00:00)


In [9]:
tmp = sessions[['user_id', 'action_info']].groupby('user_id', as_index=False).agg(list)
tmp.shape

(73815, 2)

time: 3.46 s (started: 2021-08-12 12:33:34 +00:00)


In [10]:
tmp['size'] = tmp.action_info.apply(lambda x: len(x))

time: 92.2 ms (started: 2021-08-12 12:33:38 +00:00)


In [11]:
tmp.head()

Unnamed: 0,user_id,action_info,size
0,00023iyk9l,"[callback_partner_callback_oauth_response, pen...",40
1,001wyh0pz8,"[create_submit_signup, search_click_view_searc...",90
2,0028jgx1x1,"[create_submit_create_user, show_view_user_pro...",31
3,002qnbzfs5,"[campaigns_nan_nan, click_click_book_it, show_...",789
4,0035hobuyj,"[create_submit_create_user, search_results_cli...",489


time: 33.5 ms (started: 2021-08-12 12:33:38 +00:00)


In [12]:
tmp.columns = ['user_id', 'action_info', 'seassion_length']

time: 1.75 ms (started: 2021-08-12 12:33:38 +00:00)


In [13]:
def find_action_info_pos(ai, ais):
    try:
        return ais.index(ai) + 1
    except ValueError:
        return None

time: 1.86 ms (started: 2021-08-12 12:33:38 +00:00)


In [14]:
for ai in tqdm(actions_info):
    tmp[f'ai_{ai}'] = tmp.action_info.apply(lambda x: find_action_info_pos(ai, x)) / tmp.size    

  0%|          | 0/336 [00:00<?, ?it/s]

  tmp[f'ai_{ai}'] = tmp.action_info.apply(lambda x: find_action_info_pos(ai, x)) / tmp.size


time: 1min 4s (started: 2021-08-12 12:33:38 +00:00)


In [15]:
tmp.head()

Unnamed: 0,user_id,action_info,seassion_length,ai_callback_partner_callback_oauth_response,ai_pending_booking_request_pending,ai_personalize_data_wishlist_content_update,ai_show_nan_nan,ai_similar_listings_data_similar_listings,ai_index_view_view_search_results,ai_show_view_p3,ai_dashboard_view_dashboard,ai_travel_plans_current_view_your_trips,ai_ajax_refresh_subtotal_click_change_trip_characteristics,ai_requested_view_p5,ai_header_userpic_data_header_userpic,ai_search_results_click_view_search_results,ai_nan_message_post_message_post,ai_confirm_email_click_confirm_email_link,ai_create_submit_signup,ai_search_click_view_search_results,ai_active_-unknown-_-unknown-,ai_index_-unknown-_-unknown-,ai_collections_view_user_wishlists,ai_notifications_submit_notifications,ai_campaigns_nan_nan,ai_social_connections_data_user_social_connections,ai_show_view_user_profile,ai_update_submit_update_user,ai_create_submit_create_user,ai_reviews_data_listing_reviews,ai_click_click_book_it,ai_payment_instruments_data_payment_instruments,ai_click_click_contact_host,ai_index_data_reservations,ai_index_view_message_inbox,ai_listings_view_user_listings,ai_index_nan_nan,ai_show_-unknown-_-unknown-,ai_push_notification_callback_-unknown-_-unknown-,ai_create_-unknown-_-unknown-,ai_unavailabilities_data_unavailable_dates,ai_reviews_data_user_reviews,ai_payment_methods_-unknown-_-unknown-,ai_pay_-unknown-_-unknown-,ai_update_-unknown-_-unknown-,ai_identity_-unknown-_-unknown-,ai_kba_-unknown-_-unknown-,ai_10_message_post_message_post,ai_kba_update_-unknown-_-unknown-,ai_create_submit_create_phone_numbers,ai_at_checkpoint_booking_request_at_checkpoint,ai_lookup_nan_nan,ai_other_hosting_reviews_first_-unknown-_-unknown-,ai_ask_question_submit_contact_host,ai_show_personalize_data_user_profile_content_update,ai_ajax_check_dates_click_change_contact_host_dates,ai_qt_reply_v2_submit_send_message,ai_index_view_message_thread,ai_qt2_view_message_thread,ai_glob_-unknown-_-unknown-,ai_ajax_lwlb_contact_click_contact_host,ai_edit_view_edit_profile,ai_signup_login_view_signup_login_page,ai_notifications_view_account_notification_settings,ai_reviews_new_-unknown-_-unknown-,ai_phone_number_widget_-unknown-_-unknown-,ai_edit_verification_view_profile_verifications,ai_references_view_profile_references,ai_notifications_data_notifications,ai_account_-unknown-_-unknown-,ai_populate_from_facebook_-unknown-_-unknown-,ai_authenticate_view_login_page,ai_jumio_token_-unknown-_-unknown-,ai_connect_submit_oauth_login,ai_verify_-unknown-_-unknown-,ai_show_view_p1,ai_update_submit_update_user_profile,ai_authenticate_submit_login,ai_jumio_redirect_-unknown-_-unknown-,ai_click_click_instant_book,ai_track_page_view_nan_nan,ai_listing_view_p3,ai_listings_-unknown-_-unknown-,ai_show_view_wishlist,ai_index_view_user_wishlists,ai_my_view_user_wishlists,ai_calendar_tab_inner2_-unknown-_-unknown-,ai_update_submit_update_listing,ai_open_graph_setting_-unknown-_-unknown-,ai_requested_submit_post_checkout_action,ai_ajax_google_translate_reviews_click_translate_listing_reviews,ai_top_destinations_-unknown-_-unknown-,ai_terms_and_conditions_-unknown-_-unknown-,ai_cancellation_policies_view_cancellation_policies,ai_apply_reservation_submit_apply_coupon,ai_profile_pic_-unknown-_-unknown-,ai_ajax_image_upload_-unknown-_-unknown-,ai_languages_multiselect_-unknown-_-unknown-,ai_add_guests_-unknown-_-unknown-,ai_similar_listings_v2_nan_nan,ai_referrer_status_-unknown-_-unknown-,ai_login_modal_view_login_modal,ai_tell_a_friend_-unknown-_-unknown-,ai_signup_modal_view_signup_modal,ai_create_multiple_-unknown-_-unknown-,ai_decision_tree_-unknown-_-unknown-,ai_index_view_your_listings,ai_set_user_submit_create_listing,ai_faq_category_-unknown-_-unknown-,ai_manage_listing_view_manage_listing,ai_update_submit_update_listing_description,ai_populate_help_dropdown_-unknown-_-unknown-,ai_recent_reservations_-unknown-_-unknown-,ai_this_hosting_reviews_click_listing_reviews_page,ai_ajax_photo_widget_form_iframe_-unknown-_-unknown-,ai_complete_status_-unknown-_-unknown-,ai_faq_-unknown-_-unknown-,ai_ajax_statsd_-unknown-_-unknown-,ai_terms_view_terms_and_privacy,ai_recommendations_data_listing_recommendations,ai_handle_vanity_url_-unknown-_-unknown-,ai_settings_-unknown-_-unknown-,ai_change_password_submit_change_password,ai_read_policy_click_click_read_policy_click,ai_impressions_view_p4,ai_agree_terms_check_-unknown-_-unknown-,ai_new_view_list_your_space,ai_qt_with_data_lookup_message_thread,ai_mobile_landing_page_-unknown-_-unknown-,ai_login_view_login_page,ai_message_to_host_change_click_message_to_host_change,ai_message_to_host_focus_click_message_to_host_focus,ai_index_view_listing_descriptions,ai_create_view_list_your_space,ai_remove_dashboard_alert_-unknown-_-unknown-,ai_pending_-unknown-_-unknown-,ai_complete_-unknown-_-unknown-,ai_signature_-unknown-_-unknown-,ai_request_new_confirm_email_click_request_new_confirm_email,ai_tos_confirm_-unknown-_-unknown-,ai_11_message_post_message_post,ai_clickthrough_-unknown-_-unknown-,ai_12_message_post_message_post,ai_complete_redirect_-unknown-_-unknown-,ai_itinerary_view_guest_itinerary,ai_receipt_view_guest_receipt,ai_search_-unknown-_-unknown-,ai_update_notifications_-unknown-_-unknown-,ai_authorize_-unknown-_-unknown-,ai_host_summary_view_host_home,ai_index_data_user_tax_forms,ai_payout_preferences_view_account_payout_preferences,ai_payoneer_account_redirect_-unknown-_-unknown-,ai_ajax_payout_edit_-unknown-_-unknown-,ai_ajax_payout_options_by_country_-unknown-_-unknown-,ai_photography_update_-unknown-_-unknown-,ai_friends_view_friends_wishlists,ai_overview_-unknown-_-unknown-,ai_cancellation_policy_click_click_cancellation_policy_click,ai_delete_-unknown-_-unknown-,ai_facebook_auto_login_-unknown-_-unknown-,ai_pending_tickets_-unknown-_-unknown-,ai_contact_new_-unknown-_-unknown-,ai_transaction_history_paginated_-unknown-_-unknown-,ai_transaction_history_view_account_transaction_history,ai_review_page_-unknown-_-unknown-,ai_domains_-unknown-_-unknown-,ai_other_hosting_reviews_-unknown-_-unknown-,ai_localization_settings_nan_nan,ai_rate_-unknown-_-unknown-,ai_set_password_submit_set_password,ai_show_data_translations,ai_create_submit_create_listing,ai_ajax_google_translate_description_-unknown-_-unknown-,ai_guarantee_view_host_guarantee,ai_available_data_trip_availability,ai_15_message_post_message_post,ai_trust_-unknown-_-unknown-,ai_email_wishlist_click_email_wishlist_button,ai_email_share_submit_email_wishlist,ai_popular_view_popular_wishlists,ai_delete_submit_delete_phone_numbers,ai_add_note_submit_wishlist_note,ai_set_password_view_set_password_page,ai_issue_-unknown-_-unknown-,ai_phone_verification_modal_-unknown-_-unknown-,ai_show_view_view_listing,ai_apply_code_-unknown-_-unknown-,ai_click_click_request_to_book,ai_recommendations_data_user_friend_recommendations,ai_endpoint_error_-unknown-_-unknown-,ai_phone_verification_success_click_phone_verification_success,ai_phone_verification_number_sucessfully_submitted_-unknown-_-unknown-,ai_phone_verification_number_submitted_for_sms_-unknown-_-unknown-,ai_currencies_nan_nan,ai_mobile_oauth_callback_-unknown-_-unknown-,ai_upload_-unknown-_-unknown-,ai_status_-unknown-_-unknown-,ai_supported_-unknown-_-unknown-,ai_confirm_email_click_confirm_email,ai_photography_-unknown-_-unknown-,ai_patch_-unknown-_-unknown-,ai_delete_submit_delete_listing,ai_salute_-unknown-_-unknown-,ai_cancel_submit_guest_cancellation,ai_why_host_-unknown-_-unknown-,ai_spoken_languages_data_user_languages,ai_about_us_-unknown-_-unknown-,ai_founders_-unknown-_-unknown-,ai_questions_-unknown-_-unknown-,ai_webcam_upload_-unknown-_-unknown-,ai_forgot_password_click_forgot_password,ai_airbnb_picks_view_airbnb_picks_wishlists,ai_image_order_-unknown-_-unknown-,ai_friends_new_-unknown-_-unknown-,ai_signed_out_modal_nan_nan,ai_forgot_password_submit_forgot_password,ai_travel_plans_previous_view_previous_trips,ai_countries_-unknown-_-unknown-,ai_uptodate_nan_nan,ai_payout_update_-unknown-_-unknown-,ai_change_currency_-unknown-_-unknown-,ai_requirements_-unknown-_-unknown-,ai_submit_contact_-unknown-_-unknown-,ai_jumio_-unknown-_-unknown-,ai_privacy_view_account_privacy_settings,ai_update_friends_display_-unknown-_-unknown-,ai_update_hide_from_search_engines_-unknown-_-unknown-,ai_coupon_code_click_click_coupon_code_click,ai_coupon_field_focus_click_coupon_field_focus,ai_apply_coupon_click_click_apply_coupon_click,ai_apply_coupon_error_type_-unknown-_-unknown-,ai_apply_coupon_error_click_apply_coupon_error,ai_department_-unknown-_-unknown-,ai_place_worth_view_place_worth,ai_edit_-unknown-_-unknown-,ai_ajax_worth_submit_calculate_worth,ai_p4_refund_policy_terms_click_p4_refund_policy_terms,ai_agree_terms_uncheck_-unknown-_-unknown-,ai_change_view_change_or_alter,ai_ajax_price_and_availability_click_alteration_field,ai_my_listings_view_your_reservations,ai_request_photography_-unknown-_-unknown-,ai_has_profile_pic_-unknown-_-unknown-,ai_recommend_-unknown-_-unknown-,ai_detect_fb_session_-unknown-_-unknown-,ai_position_-unknown-_-unknown-,ai_departments_-unknown-_-unknown-,ai_life_-unknown-_-unknown-,ai_hospitality_-unknown-_-unknown-,ai_email_itinerary_colorbox_-unknown-_-unknown-,ai_ajax_google_translate_-unknown-_-unknown-,ai_destroy_-unknown-_-unknown-,ai_sync_-unknown-_-unknown-,ai_toggle_archived_thread_click_toggle_archived_thread,ai_clear_reservation_-unknown-_-unknown-,ai_toggle_starred_thread_click_toggle_starred_thread,ai_how_it_works_-unknown-_-unknown-,ai_update_cached_data_admin_templates,ai_become_user_-unknown-_-unknown-,ai_reservation_-unknown-_-unknown-,ai_country_options_-unknown-_-unknown-,ai_create_paypal_-unknown-_-unknown-,ai_phone_verification_error_-unknown-_-unknown-,ai_reputation_-unknown-_-unknown-,ai_social_-unknown-_-unknown-,ai_remove_dashboard_alert_click_remove_dashboard_alert,ai_phone_verification_phone_number_removed_-unknown-_-unknown-,ai_views_-unknown-_-unknown-,ai_media_resources_-unknown-_-unknown-,ai_press_news_-unknown-_-unknown-,ai_phone_verification_number_submitted_for_call_-unknown-_-unknown-,ai_p4_terms_click_p4_terms,ai_show_code_-unknown-_-unknown-,ai_guest_booked_elsewhere_message_post_message_post,ai_phone_verification_call_taking_too_long_-unknown-_-unknown-,ai_update_reservation_requirements_-unknown-_-unknown-,ai_check_nan_nan,ai_united-states_-unknown-_-unknown-,ai_recommendation_page_-unknown-_-unknown-,ai_update_nan_nan,ai_create_ach_-unknown-_-unknown-,ai_home_safety_landing_-unknown-_-unknown-,ai_home_safety_terms_-unknown-_-unknown-,ai_ajax_payout_split_edit_-unknown-_-unknown-,ai_show_view_alteration_request,ai_respond_submit_respond_to_alteration_request,ai_guest_billing_receipt_-unknown-_-unknown-,ai_click_click_complete_booking,ai_payoneer_signup_complete_-unknown-_-unknown-,ai_localized_-unknown-_-unknown-,ai_social-media_-unknown-_-unknown-,ai_sublets_-unknown-_-unknown-,ai_change_default_payout_-unknown-_-unknown-,ai_new_-unknown-_-unknown-,ai_office_location_-unknown-_-unknown-,ai_create_submit_create_alteration_request,ai_apply_coupon_click_success_click_apply_coupon_click_success,ai_invalid_action_-unknown-_-unknown-,ai_redirect_-unknown-_-unknown-,ai_badge_-unknown-_-unknown-,ai_new_session_-unknown-_-unknown-,ai_preapproval_message_post_message_post,ai_maybe_information_message_post_message_post,ai_update_country_of_residence_-unknown-_-unknown-,ai_change_availability_submit_change_availability,ai_locations_-unknown-_-unknown-,ai_signup_weibo_referral_-unknown-_-unknown-,ai_weibo_signup_referral_finish_-unknown-_-unknown-,ai_phone_verification_nan_nan,ai_slideshow_-unknown-_-unknown-,ai_zendesk_login_jwt_-unknown-_-unknown-,ai_set_default_-unknown-_-unknown-,ai_approve_submit_host_respond,ai_payout_delete_-unknown-_-unknown-,ai_approve_-unknown-_-unknown-,ai_booking_booking_response_booking,ai_satisfy_nan_nan,ai_envoy_bank_details_redirect_-unknown-_-unknown-,ai_load_more_-unknown-_-unknown-,ai_delete_submit_delete_listing_description,ai_qt_reply_v2_-unknown-_-unknown-,ai_feed_-unknown-_-unknown-,ai_southern-europe_-unknown-_-unknown-,ai_city_count_-unknown-_-unknown-,ai_signup_weibo_-unknown-_-unknown-,ai_track_activity_nan_nan,ai_ajax_special_offer_dates_available_click_special_offer_field,ai_envoy_form_-unknown-_-unknown-,ai_create_airbnb_-unknown-_-unknown-,ai_open_hard_fallback_modal_-unknown-_-unknown-,ai_print_confirmation_-unknown-_-unknown-,ai_special_offer_message_post_message_post,ai_email_by_key_-unknown-_-unknown-,ai_acculynk_pin_pad_inactive_-unknown-_-unknown-,ai_acculynk_bin_check_success_-unknown-_-unknown-,ai_acculynk_session_obtained_-unknown-_-unknown-,ai_acculynk_load_pin_pad_-unknown-_-unknown-,ai_toggle_availability_-unknown-_-unknown-
0,00023iyk9l,"[callback_partner_callback_oauth_response, pen...",40,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,001wyh0pz8,"[create_submit_signup, search_click_view_searc...",90,,,,0.0,,,0.0,,,,,,,,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2,0028jgx1x1,"[create_submit_create_user, show_view_user_pro...",31,,,,,,,0.0,,,,,,,,,,0.0,0.0,,,,,,0.0,,0.0,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3,002qnbzfs5,"[campaigns_nan_nan, click_click_book_it, show_...",789,,,,0.0,,,0.0,0.0,,,,0.0,,0.0,0.0,,0.0,0.0,0.0,,0.0,0.0,0.0,0.0,0.0,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
4,0035hobuyj,"[create_submit_create_user, search_results_cli...",489,,,0.0,0.0,0.0,0.0,0.0,0.0,,0.0,,0.0,0.0,0.0,0.0,,,,,,,,,0.0,,0.0,,,,,,,,,,,,,,,,0.0,,,,,,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


time: 440 ms (started: 2021-08-12 12:34:42 +00:00)


In [16]:
tmp.drop('action_info', axis=1, inplace=True)

time: 239 ms (started: 2021-08-12 12:34:43 +00:00)


#### Checking counts of missing values per each column

In [17]:
not_missing = pd.DataFrame(tmp.notna().sum()).reset_index()
not_missing.columns = ['col', 'counts']
not_missing['ratio'] = not_missing['counts'].apply(lambda x: round(x / len(users), 4))
not_missing.shape

(338, 3)

time: 230 ms (started: 2021-08-12 12:34:43 +00:00)


In [18]:
not_missing.head()

Unnamed: 0,col,counts,ratio
0,user_id,73815,1.0
1,seassion_length,73815,1.0
2,ai_callback_partner_callback_oauth_response,5334,0.07
3,ai_pending_booking_request_pending,7619,0.1
4,ai_personalize_data_wishlist_content_update,42556,0.58


time: 35.4 ms (started: 2021-08-12 12:34:43 +00:00)


In [19]:
threshold = 0.00005
mask = not_missing.ratio > threshold
mask.sum()

307

time: 22.2 ms (started: 2021-08-12 12:34:43 +00:00)


#### Dropping all columns that are lower than the above threshold

In [20]:
keep_columns = not_missing[mask].col.tolist()
len(keep_columns)

307

time: 7.73 ms (started: 2021-08-12 12:34:43 +00:00)


In [21]:
keep_columns[0], keep_columns[-1]

('user_id', 'ai_envoy_form_-unknown-_-unknown-')

time: 7.86 ms (started: 2021-08-12 12:34:43 +00:00)


In [22]:
features1 = tmp[keep_columns].copy(deep=True)
features1.shape

(73815, 307)

time: 181 ms (started: 2021-08-12 12:34:43 +00:00)


### 2.1.1 Count of each action_type normalized

In [23]:
col = 'action_type'
col_values = list(sessions[col].unique())
len(col_values)

10

time: 484 ms (started: 2021-08-12 12:34:43 +00:00)


In [24]:
tmp = sessions[['user_id', col]].groupby('user_id', as_index=False).agg(list)
tmp.shape

(73815, 2)

time: 3.57 s (started: 2021-08-12 12:34:44 +00:00)


In [25]:
tmp['size'] = tmp[col].apply(lambda x: len(x))

time: 85.9 ms (started: 2021-08-12 12:34:48 +00:00)


In [26]:
tmp['counts'] = tmp[col].apply(lambda x: dict(Counter(x)))

time: 669 ms (started: 2021-08-12 12:34:48 +00:00)


In [27]:
tmp.head()

Unnamed: 0,user_id,action_type,size,counts
0,00023iyk9l,"[partner_callback, booking_request, data, None...",40,"{'partner_callback': 1, 'booking_request': 1, ..."
1,001wyh0pz8,"[submit, click, click, -unknown-, -unknown-, c...",90,"{'submit': 3, 'click': 66, '-unknown-': 6, 'vi..."
2,0028jgx1x1,"[submit, view, view, data, view, data, view, d...",31,"{'submit': 1, 'view': 15, 'data': 5, '-unknown..."
3,002qnbzfs5,"[None, click, view, data, click, view, data, v...",789,"{None: 77, 'click': 140, 'view': 216, 'data': ..."
4,0035hobuyj,"[submit, click, click, click, data, None, clic...",489,"{'submit': 4, 'click': 206, 'data': 41, None: ..."


time: 22.4 ms (started: 2021-08-12 12:34:48 +00:00)


In [28]:
tmp = pd.concat([tmp, pd.json_normalize(tmp['counts'])], axis=1)

time: 648 ms (started: 2021-08-12 12:34:48 +00:00)


In [29]:
tmp.drop(['action_type', 'counts'], axis=1, inplace=True)

time: 23.8 ms (started: 2021-08-12 12:34:49 +00:00)


In [30]:
tmp.head()

Unnamed: 0,user_id,size,partner_callback,booking_request,data,NaN,view,click,message_post,submit,-unknown-,booking_response
0,00023iyk9l,40,1.0,1.0,9.0,3.0,21.0,4.0,1.0,,,
1,001wyh0pz8,90,,,2.0,5.0,8.0,66.0,,3.0,6.0,
2,0028jgx1x1,31,,,5.0,,15.0,9.0,,1.0,1.0,
3,002qnbzfs5,789,,1.0,140.0,77.0,216.0,140.0,16.0,15.0,184.0,
4,0035hobuyj,489,,,41.0,171.0,55.0,206.0,3.0,4.0,9.0,


time: 28.7 ms (started: 2021-08-12 12:34:49 +00:00)


In [31]:
cols = list(tmp)[2:]
cols = [f'at_{e}' for e in cols]

time: 1.25 ms (started: 2021-08-12 12:34:49 +00:00)


In [32]:
tmp.columns = ['user_id', 'size'] + cols

time: 1.27 ms (started: 2021-08-12 12:34:49 +00:00)


In [33]:
for e in cols:
    tmp[e] = tmp[e] / tmp['size']

time: 24.7 ms (started: 2021-08-12 12:34:49 +00:00)


In [34]:
tmp.head()

Unnamed: 0,user_id,size,at_partner_callback,at_booking_request,at_data,at_None,at_view,at_click,at_message_post,at_submit,at_-unknown-,at_booking_response
0,00023iyk9l,40,0.03,0.03,0.23,0.07,0.53,0.1,0.03,,,
1,001wyh0pz8,90,,,0.02,0.06,0.09,0.73,,0.03,0.07,
2,0028jgx1x1,31,,,0.16,,0.48,0.29,,0.03,0.03,
3,002qnbzfs5,789,,0.0,0.18,0.1,0.27,0.18,0.02,0.02,0.23,
4,0035hobuyj,489,,,0.08,0.35,0.11,0.42,0.01,0.01,0.02,


time: 34.1 ms (started: 2021-08-12 12:34:49 +00:00)


In [35]:
tmp.drop(['size'], axis=1, inplace=True)

time: 10.4 ms (started: 2021-08-12 12:34:49 +00:00)


In [36]:
tmp.fillna(0, inplace=True)

time: 36.2 ms (started: 2021-08-12 12:34:49 +00:00)


In [37]:
tmp.head()

Unnamed: 0,user_id,at_partner_callback,at_booking_request,at_data,at_None,at_view,at_click,at_message_post,at_submit,at_-unknown-,at_booking_response
0,00023iyk9l,0.03,0.03,0.23,0.07,0.53,0.1,0.03,0.0,0.0,0.0
1,001wyh0pz8,0.0,0.0,0.02,0.06,0.09,0.73,0.0,0.03,0.07,0.0
2,0028jgx1x1,0.0,0.0,0.16,0.0,0.48,0.29,0.0,0.03,0.03,0.0
3,002qnbzfs5,0.0,0.0,0.18,0.1,0.27,0.18,0.02,0.02,0.23,0.0
4,0035hobuyj,0.0,0.0,0.08,0.35,0.11,0.42,0.01,0.01,0.02,0.0


time: 19.6 ms (started: 2021-08-12 12:34:49 +00:00)


In [38]:
features1a = tmp.copy(deep=True)
features1a.shape

(73815, 11)

time: 13.1 ms (started: 2021-08-12 12:34:49 +00:00)


### 2.2 Generating features based on seconds elapsed info

In [39]:
tmp = sessions[['user_id', 'secs_elapsed']].groupby('user_id', as_index=False).agg(list)
tmp.shape

(73815, 2)

time: 3.2 s (started: 2021-08-12 12:34:49 +00:00)


In [40]:
tmp.head()

Unnamed: 0,user_id,secs_elapsed
0,00023iyk9l,"[-1.0, 0.0, 6.0, 45.0, 81.0, 94.0, 112.0, 155...."
1,001wyh0pz8,"[-1.0, 35.0, 80.0, 91.0, 108.0, 118.0, 142.0, ..."
2,0028jgx1x1,"[-1.0, 3.0, 5.0, 20.0, 28.0, 75.0, 86.0, 91.0,..."
3,002qnbzfs5,"[-1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0,..."
4,0035hobuyj,"[-1.0, 0.0, 11.0, 11.0, 11.0, 12.0, 18.0, 23.0..."


time: 24.8 ms (started: 2021-08-12 12:34:53 +00:00)


In [41]:
tmp.secs_elapsed = tmp.secs_elapsed.apply(lambda x: [0] + x[1:])

time: 859 ms (started: 2021-08-12 12:34:53 +00:00)


In [42]:
tmp['deltas'] = tmp['secs_elapsed'].apply(lambda x: [int(j - i) for i, j in zip(x[:-1], x[1:])])

time: 1.88 s (started: 2021-08-12 12:34:54 +00:00)


In [43]:
tmp.head()

Unnamed: 0,user_id,secs_elapsed,deltas
0,00023iyk9l,"[0, 0.0, 6.0, 45.0, 81.0, 94.0, 112.0, 155.0, ...","[0, 6, 39, 36, 13, 18, 43, 8, 19, 187, 26, 15,..."
1,001wyh0pz8,"[0, 35.0, 80.0, 91.0, 108.0, 118.0, 142.0, 201...","[35, 45, 11, 17, 10, 24, 59, 1, 54, 100, 2, 9,..."
2,0028jgx1x1,"[0, 3.0, 5.0, 20.0, 28.0, 75.0, 86.0, 91.0, 97...","[3, 2, 15, 8, 47, 11, 5, 6, 2, 15, 13, 19, 76,..."
3,002qnbzfs5,"[0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1....","[0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, ..."
4,0035hobuyj,"[0, 0.0, 11.0, 11.0, 11.0, 12.0, 18.0, 23.0, 3...","[0, 11, 0, 0, 1, 6, 5, 11, 6, 8, 7, 6, 0, 3, 5..."


time: 33.7 ms (started: 2021-08-12 12:34:55 +00:00)


In [44]:
def get_statistics(x):
    if not x:
        return None, None, None, None
    x = np.array(x)
    return x.mean(), x.std(), x.max(), np.median(x)

time: 1.72 ms (started: 2021-08-12 12:34:55 +00:00)


In [45]:
def get_statistics_no_outliers(x):
    if not x:
        return None, None, None, None, None
    x = np.array(x)
    initial_size = len(x)
    x = [e for e in x if e <= x.mean() + x.std()]
    outliers_count = initial_size - len(x)
    x = np.array(x)
    return x.mean(), x.std(), x.max(), np.median(x), outliers_count

time: 3.28 ms (started: 2021-08-12 12:34:56 +00:00)


In [46]:
get_statistics(tmp.iloc[0].deltas)

(14542.692307692309, 69958.77759379552, 437348, 64.0)

time: 8.74 ms (started: 2021-08-12 12:34:56 +00:00)


In [47]:
get_statistics_no_outliers(tmp.iloc[0].deltas)

(1165.1351351351352, 2758.241156794633, 11029, 58.0, 2)

time: 21.8 ms (started: 2021-08-12 12:34:56 +00:00)


In [48]:
tmp = pd.concat([tmp, tmp.deltas.progress_apply(lambda x: pd.Series(get_statistics(x)))], axis=1)
tmp.shape

  0%|          | 0/73815 [00:00<?, ?it/s]

(73815, 7)

time: 49.4 s (started: 2021-08-12 12:34:56 +00:00)


In [49]:
tmp.columns = ['user_id', 'secs_elapsed', 'deltas', 'deltas_mean', 'deltas_std', 'deltas_max', 'deltas_median']

time: 855 µs (started: 2021-08-12 12:35:45 +00:00)


In [50]:
tmp.head()

Unnamed: 0,user_id,secs_elapsed,deltas,deltas_mean,deltas_std,deltas_max,deltas_median
0,00023iyk9l,"[0, 0.0, 6.0, 45.0, 81.0, 94.0, 112.0, 155.0, ...","[0, 6, 39, 36, 13, 18, 43, 8, 19, 187, 26, 15,...",14542.69,69958.78,437348.0,64.0
1,001wyh0pz8,"[0, 35.0, 80.0, 91.0, 108.0, 118.0, 142.0, 201...","[35, 45, 11, 17, 10, 24, 59, 1, 54, 100, 2, 9,...",567.96,3206.52,30047.0,33.0
2,0028jgx1x1,"[0, 3.0, 5.0, 20.0, 28.0, 75.0, 86.0, 91.0, 97...","[3, 2, 15, 8, 47, 11, 5, 6, 2, 15, 13, 19, 76,...",2821.2,7456.5,37388.0,170.0
3,002qnbzfs5,"[0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1....","[0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, ...",1799.85,34343.17,946465.0,3.0
4,0035hobuyj,"[0, 0.0, 11.0, 11.0, 11.0, 12.0, 18.0, 23.0, 3...","[0, 11, 0, 0, 1, 6, 5, 11, 6, 8, 7, 6, 0, 3, 5...",2490.46,43351.61,954137.0,6.0


time: 40.8 ms (started: 2021-08-12 12:35:45 +00:00)


In [51]:
tmp = pd.concat([tmp, tmp.deltas.progress_apply(lambda x: pd.Series(get_statistics_no_outliers(x)))], axis=1)
tmp.shape

  0%|          | 0/73815 [00:00<?, ?it/s]

(73815, 12)

time: 7min 51s (started: 2021-08-12 12:35:45 +00:00)


In [52]:
tmp.columns = [
    'user_id', 'secs_elapsed', 'deltas', 'deltas_mean', 'deltas_std', 'deltas_max', 'deltas_median', 
    'deltas_no_mean', 'deltas_no_std', 'deltas_no_max', 'deltas_no_median', 'deltas_no_num_outliers'
]

time: 942 µs (started: 2021-08-12 12:43:36 +00:00)


In [53]:
tmp.head()

Unnamed: 0,user_id,secs_elapsed,deltas,deltas_mean,deltas_std,deltas_max,deltas_median,deltas_no_mean,deltas_no_std,deltas_no_max,deltas_no_median,deltas_no_num_outliers
0,00023iyk9l,"[0, 0.0, 6.0, 45.0, 81.0, 94.0, 112.0, 155.0, ...","[0, 6, 39, 36, 13, 18, 43, 8, 19, 187, 26, 15,...",14542.69,69958.78,437348.0,64.0,1165.14,2758.24,11029.0,58.0,2.0
1,001wyh0pz8,"[0, 35.0, 80.0, 91.0, 108.0, 118.0, 142.0, 201...","[35, 45, 11, 17, 10, 24, 59, 1, 54, 100, 2, 9,...",567.96,3206.52,30047.0,33.0,189.75,501.59,3212.0,33.0,2.0
2,0028jgx1x1,"[0, 3.0, 5.0, 20.0, 28.0, 75.0, 86.0, 91.0, 97...","[3, 2, 15, 8, 47, 11, 5, 6, 2, 15, 13, 19, 76,...",2821.2,7456.5,37388.0,170.0,988.0,1871.1,9313.0,105.0,2.0
3,002qnbzfs5,"[0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1....","[0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, ...",1799.85,34343.17,946465.0,3.0,235.77,1650.94,24728.0,3.0,4.0
4,0035hobuyj,"[0, 0.0, 11.0, 11.0, 11.0, 12.0, 18.0, 23.0, 3...","[0, 11, 0, 0, 1, 6, 5, 11, 6, 8, 7, 6, 0, 3, 5...",2490.46,43351.61,954137.0,6.0,352.65,1842.45,23971.0,6.0,2.0


time: 38.6 ms (started: 2021-08-12 12:43:36 +00:00)


In [54]:
tmp.drop(['secs_elapsed', 'deltas'], axis=1, inplace=True)

time: 23.3 ms (started: 2021-08-12 12:43:36 +00:00)


In [55]:
features2 = tmp.copy(deep=True)
features2.shape

(73815, 10)

time: 10.6 ms (started: 2021-08-12 12:43:36 +00:00)


### 2.3 Generating features based on device type info

In [56]:
tmp = sessions[['user_id', 'device_type']].groupby('user_id', as_index=False).agg(set)
tmp.shape

(73815, 2)

time: 2.49 s (started: 2021-08-12 12:43:36 +00:00)


In [57]:
tmp['size'] = tmp.device_type.apply(lambda x: len(x))

time: 52.4 ms (started: 2021-08-12 12:43:39 +00:00)


In [58]:
tmp.drop('device_type', axis=1, inplace=True)

time: 19.8 ms (started: 2021-08-12 12:43:39 +00:00)


In [59]:
tmp.head()

Unnamed: 0,user_id,size
0,00023iyk9l,2
1,001wyh0pz8,1
2,0028jgx1x1,2
3,002qnbzfs5,2
4,0035hobuyj,1


time: 18 ms (started: 2021-08-12 12:43:39 +00:00)


In [60]:
tmp.columns = ['user_id', 'device_count']

time: 2.35 ms (started: 2021-08-12 12:43:39 +00:00)


In [61]:
tmp.head()

Unnamed: 0,user_id,device_count
0,00023iyk9l,2
1,001wyh0pz8,1
2,0028jgx1x1,2
3,002qnbzfs5,2
4,0035hobuyj,1


time: 13.3 ms (started: 2021-08-12 12:43:39 +00:00)


In [62]:
features3 = tmp.copy(deep=True)
features3.shape

(73815, 2)

time: 7.54 ms (started: 2021-08-12 12:43:39 +00:00)


### 3.1 Features based on Users table

In [63]:
users['dow_registered'] = users.date_account_created.dt.weekday

time: 21.4 ms (started: 2021-08-12 12:43:39 +00:00)


In [64]:
users['hr_registered'] = users.timestamp_first_active.dt.hour

time: 14.6 ms (started: 2021-08-12 12:43:39 +00:00)


In [65]:
users.sample(5)

Unnamed: 0,id,date_account_created,timestamp_first_active,gender,age,signup_method,signup_flow,language,affiliate_channel,affiliate_provider,first_affiliate_tracked,signup_app,first_device_type,first_browser,country_destination,dow_registered,hr_registered
67962,vpete0jrh9,2014-06-20,2014-06-20 05:06:17,FEMALE,22.0,facebook,25,en,direct,direct,untracked,iOS,iPhone,-unknown-,NDF,4,5
22569,28l77h2yg1,2014-03-09,2014-03-09 18:50:47,-unknown-,,basic,0,en,seo,google,linked,Web,Mac Desktop,Chrome,NDF,6,18
72450,jadaf5on4k,2014-06-28,2014-06-28 06:46:46,-unknown-,,basic,25,zh,direct,direct,untracked,iOS,iPhone,-unknown-,NDF,5,6
268,dw1s1p83p7,2014-01-02,2014-01-02 08:07:54,-unknown-,,basic,0,zh,direct,direct,untracked,Web,Windows Desktop,Chrome,NDF,3,8
68656,y5lnjhbocv,2014-06-21,2014-06-21 19:41:33,FEMALE,,facebook,0,en,sem-brand,google,omg,Web,Windows Desktop,Firefox,NDF,5,19


time: 33.5 ms (started: 2021-08-12 12:43:39 +00:00)


### 3.1.1. Dropping redundand columns

In [66]:
users.drop(['date_account_created', 'timestamp_first_active'], axis=1, inplace=True)

time: 12.2 ms (started: 2021-08-12 12:43:39 +00:00)


In [67]:
users.columns = ['user_id'] + list(users)[1:]

time: 1.72 ms (started: 2021-08-12 12:43:39 +00:00)


In [68]:
users.head()

Unnamed: 0,user_id,gender,age,signup_method,signup_flow,language,affiliate_channel,affiliate_provider,first_affiliate_tracked,signup_app,first_device_type,first_browser,country_destination,dow_registered,hr_registered
0,d1mm9tcy42,MALE,62.0,basic,0,en,sem-non-brand,google,omg,Web,Windows Desktop,Chrome,other,2,0
1,yo8nz8bqcq,-unknown-,,basic,0,en,direct,direct,untracked,Web,Mac Desktop,Firefox,NDF,2,0
2,4grx6yxeby,-unknown-,,basic,0,en,sem-brand,google,omg,Web,Windows Desktop,Firefox,NDF,2,0
3,ncf87guaf0,-unknown-,,basic,0,en,direct,direct,linked,Web,Windows Desktop,Chrome,NDF,2,0
4,4rvqpxoh3h,-unknown-,,basic,25,en,direct,direct,untracked,iOS,iPhone,-unknown-,GB,2,0


time: 29.9 ms (started: 2021-08-12 12:43:39 +00:00)


In [69]:
users.shape

(73815, 15)

time: 5.33 ms (started: 2021-08-12 12:43:39 +00:00)


#### 4. Assembling all features into one dataset

In [70]:
df = users.merge(features1, on='user_id', how='inner')
df.shape

(73815, 321)

time: 338 ms (started: 2021-08-12 12:43:39 +00:00)


In [71]:
df = df.merge(features1a, on='user_id', how='inner')
df.shape

(73815, 331)

time: 137 ms (started: 2021-08-12 12:43:40 +00:00)


In [72]:
df = df.merge(features2, on='user_id', how='inner')
df.shape

(73815, 340)

time: 140 ms (started: 2021-08-12 12:43:40 +00:00)


In [73]:
df = df.merge(features3, on='user_id', how='inner')
df.shape

(73815, 341)

time: 135 ms (started: 2021-08-12 12:43:40 +00:00)


In [74]:
df.to_parquet('../data/processed/features.parquet')

time: 1.16 s (started: 2021-08-12 12:43:40 +00:00)
