## Notebook with feature engineering process

### Curated Features list (in addition to columns)

#### 1 Features based on Sessions actions:  
1. [x] Create features event_i with according to:  
    * event_i means that it's action_info event of order i  
    * take its first order in session, i.e. if events are show_nan_nan, show_view_p3 then values for show_view_p3 is 2  
    * normalize by deviding by total number of events in user's session
2. [x] COUNT for each action_type
3. [x] MEAN, MAX and other descriptive statistics of secs_elapsed deltas

#### 2 Aggregated on Sessions:  
1. [x] COUNT DISTINCT of device_type
2. [ ] % time spent on each action type
3. [ ] count sessions per each device, MODE of Device type  
4. [ ] given that timestamp_first_active is the start of the session, analyze hour (0-23) of activity

#### 3 Transformed from users:
1. [x] Hour of first activity - users['hour_factive'] = users.timestamp_first_active.dt.hour
2. [x] date of week of account_created

**TODO**: use age_gender_bktd and countries data for features generation

In [1]:
import pandas as pd
from datetime import datetime
from tqdm.notebook import tqdm
import numpy as np
from scipy import stats
from collections import Counter

pd.options.display.float_format = "{:.2f}".format
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)
tqdm.pandas()
%load_ext autotime

time: 296 µs (started: 2021-08-18 18:40:29 +00:00)


### 0. Loading Data

In [2]:
users = pd.read_parquet('../data/processed/users.parquet')
users.shape

(275547, 16)

time: 382 ms (started: 2021-08-18 18:40:29 +00:00)


In [3]:
sessions = pd.read_parquet('../data/processed/sessions.parquet')
sessions.shape

(10533241, 7)

time: 3.6 s (started: 2021-08-18 18:40:30 +00:00)


In [4]:
users.head()

Unnamed: 0,id,date_account_created,timestamp_first_active,gender,age,signup_method,signup_flow,language,affiliate_channel,affiliate_provider,first_affiliate_tracked,signup_app,first_device_type,first_browser,country_destination,train_flag
0,gxn3p5htnn,2010-06-28,2009-03-19 04:32:55,-unknown-,,facebook,0,en,direct,direct,untracked,Web,Mac Desktop,Chrome,NDF,1
1,820tgsjxq7,2011-05-25,2009-05-23 17:48:09,MALE,38.0,facebook,0,en,seo,google,untracked,Web,Mac Desktop,Chrome,NDF,1
2,4ft3gnwmtx,2010-09-28,2009-06-09 23:12:47,FEMALE,56.0,basic,3,en,direct,direct,untracked,Web,Windows Desktop,IE,US,1
3,bjjt8pjhuk,2011-12-05,2009-10-31 06:01:29,FEMALE,42.0,facebook,0,en,direct,direct,untracked,Web,Mac Desktop,Firefox,other,1
4,87mebub9p4,2010-09-14,2009-12-08 06:11:05,-unknown-,41.0,basic,0,en,direct,direct,untracked,Web,Mac Desktop,Chrome,US,1


time: 42.7 ms (started: 2021-08-18 18:40:33 +00:00)


In [5]:
sessions.head()

Unnamed: 0,user_id,action,action_type,action_detail,device_type,secs_elapsed,action_info
0,d1mm9tcy42,lookup,,,Windows Desktop,319.0,lookup_nan_nan
1,d1mm9tcy42,search_results,click,view_search_results,Windows Desktop,67753.0,search_results_click_view_search_results
2,d1mm9tcy42,lookup,,,Windows Desktop,301.0,lookup_nan_nan
3,d1mm9tcy42,search_results,click,view_search_results,Windows Desktop,22141.0,search_results_click_view_search_results
4,d1mm9tcy42,lookup,,,Windows Desktop,435.0,lookup_nan_nan


time: 28.9 ms (started: 2021-08-18 18:40:33 +00:00)


### 2. Getting features based on Sessions

In [6]:
sessions.secs_elapsed.fillna(-1, inplace=True)
sessions.sort_values(['user_id', 'secs_elapsed'], inplace=True)
sessions.reset_index(drop=True, inplace=True)
sessions.shape

(10533241, 7)

time: 8.87 s (started: 2021-08-18 18:40:33 +00:00)


In [7]:
sessions.head(10)

Unnamed: 0,user_id,action,action_type,action_detail,device_type,secs_elapsed,action_info
0,00023iyk9l,callback,partner_callback,oauth_response,Mac Desktop,-1.0,callback_partner_callback_oauth_response
1,00023iyk9l,pending,booking_request,pending,Mac Desktop,0.0,pending_booking_request_pending
2,00023iyk9l,personalize,data,wishlist_content_update,Mac Desktop,6.0,personalize_data_wishlist_content_update
3,00023iyk9l,show,,,Mac Desktop,45.0,show_nan_nan
4,00023iyk9l,similar_listings,data,similar_listings,Mac Desktop,81.0,similar_listings_data_similar_listings
5,00023iyk9l,similar_listings,data,similar_listings,Mac Desktop,94.0,similar_listings_data_similar_listings
6,00023iyk9l,show,,,Mac Desktop,112.0,show_nan_nan
7,00023iyk9l,similar_listings,data,similar_listings,Mac Desktop,155.0,similar_listings_data_similar_listings
8,00023iyk9l,index,view,view_search_results,Mac Desktop,163.0,index_view_view_search_results
9,00023iyk9l,show,view,p3,Mac Desktop,182.0,show_view_p3


time: 21.8 ms (started: 2021-08-18 18:40:42 +00:00)


### 2.1 Generating features based on action_info Events vs its Order in the session stream with Normalization

Sessions actions:  
Create features event_i with according to:  
    * event_i means that it's action_info event of order i  
    * take its first order in session, i.e. if events are show_nan_nan, show_view_p3 then values for show_view_p3 is 2  
    * normalize by deviding by total number of events in user's session  

### Small change to previous approaches, taking only those action_info, which are present in both train and test datasets, without dropping the rest of action_info

In [8]:
# actions_info = list(sessions.action_info.unique())
# len(actions_info)
actions_info = set(sessions[sessions.user_id.isin(users[users.train_flag == 1].id)].action_info.unique())
actions_info = actions_info.intersection(set(sessions[sessions.user_id.isin(users[users.train_flag == 0].id)].action_info.unique()))
actions_info = list(actions_info)
len(actions_info)

336

time: 4.64 s (started: 2021-08-18 18:40:42 +00:00)


In [None]:
# actions_info = list(sessions.action_info.unique())
# len(actions_info)
action_details = set(sessions[sessions.user_id.isin(users[users.train_flag == 1].id)].action_detail.unique())
action_details = action_details.intersection(set(sessions[sessions.user_id.isin(users[users.train_flag == 0].id)].action_detail.unique()))
action_details = list(action_details)
len(action_details)

127

time: 3.97 s (started: 2021-08-18 18:40:47 +00:00)


In [None]:
sessions.action_detail.nunique()

155

time: 732 ms (started: 2021-08-18 18:40:51 +00:00)


In [None]:
tmp = sessions[['user_id', 'action_info']].groupby('user_id', as_index=False).agg(list)
tmp.shape

(135483, 2)

time: 5.24 s (started: 2021-08-18 18:40:52 +00:00)


In [None]:
tmp['size'] = tmp.action_info.apply(lambda x: len(x))

time: 115 ms (started: 2021-08-18 18:40:57 +00:00)


In [None]:
tmp.head()

Unnamed: 0,user_id,action_info,size
0,00023iyk9l,"[callback_partner_callback_oauth_response, pen...",40
1,0010k6l0om,"[callback_partner_callback_oauth_response, sea...",63
2,001wyh0pz8,"[create_submit_signup, search_click_view_searc...",90
3,0028jgx1x1,"[create_submit_create_user, show_view_user_pro...",31
4,002qnbzfs5,"[campaigns_nan_nan, click_click_book_it, show_...",789


time: 19.5 ms (started: 2021-08-18 18:40:57 +00:00)


In [None]:
tmp.columns = ['user_id', 'action_info', 'seassion_length']

time: 1.94 ms (started: 2021-08-18 18:40:57 +00:00)


In [None]:
def find_action_info_pos(ai, ais):
    try:
        return ais.index(ai) + 1
    except ValueError:
        return None

time: 737 µs (started: 2021-08-18 18:40:59 +00:00)


In [17]:
for ai in tqdm(actions_info):
    tmp[f'ai_{ai}'] = tmp.action_info.apply(lambda x: find_action_info_pos(ai, x))

  0%|          | 0/336 [00:00<?, ?it/s]

  tmp[f'ai_{ai}'] = tmp.action_info.apply(lambda x: find_action_info_pos(ai, x))


time: 2min 4s (started: 2021-08-18 18:41:13 +00:00)


In [18]:
tmp.head()

Unnamed: 0,user_id,action_info,seassion_length,ai_handle_vanity_url_-unknown-_-unknown-,ai_phone_verification_number_submitted_for_call_-unknown-_-unknown-,ai_home_safety_landing_-unknown-_-unknown-,ai_delete_submit_delete_phone_numbers,ai_search_results_click_view_search_results,ai_guarantee_view_host_guarantee,ai_track_page_view_nan_nan,ai_populate_from_facebook_-unknown-_-unknown-,ai_phone_verification_error_-unknown-_-unknown-,ai_facebook_auto_login_-unknown-_-unknown-,ai_payout_delete_-unknown-_-unknown-,ai_phone_verification_number_submitted_for_sms_-unknown-_-unknown-,ai_push_notification_callback_-unknown-_-unknown-,ai_15_message_post_message_post,ai_login_view_login_page,ai_create_submit_create_user,ai_phone_verification_call_taking_too_long_-unknown-_-unknown-,ai_destroy_-unknown-_-unknown-,ai_sync_-unknown-_-unknown-,ai_contact_new_-unknown-_-unknown-,ai_p4_terms_click_p4_terms,ai_clickthrough_-unknown-_-unknown-,ai_hospitality_-unknown-_-unknown-,ai_reviews_data_listing_reviews,ai_tos_confirm_-unknown-_-unknown-,ai_create_multiple_-unknown-_-unknown-,ai_authorize_-unknown-_-unknown-,ai_localized_-unknown-_-unknown-,ai_recommend_-unknown-_-unknown-,ai_create_view_list_your_space,ai_travel_plans_current_view_your_trips,ai_city_count_-unknown-_-unknown-,ai_home_safety_terms_-unknown-_-unknown-,ai_terms_view_terms_and_privacy,ai_payout_preferences_view_account_payout_preferences,ai_payment_instruments_data_payment_instruments,ai_pending_tickets_-unknown-_-unknown-,ai_other_hosting_reviews_-unknown-_-unknown-,ai_confirm_email_click_confirm_email_link,ai_request_new_confirm_email_click_request_new_confirm_email,ai_complete_redirect_-unknown-_-unknown-,ai_collections_view_user_wishlists,ai_identity_-unknown-_-unknown-,ai_create_-unknown-_-unknown-,ai_kba_update_-unknown-_-unknown-,ai_maybe_information_message_post_message_post,ai_delete_submit_delete_listing,ai_supported_-unknown-_-unknown-,ai_photography_update_-unknown-_-unknown-,ai_at_checkpoint_booking_request_at_checkpoint,ai_kba_-unknown-_-unknown-,ai_envoy_bank_details_redirect_-unknown-_-unknown-,ai_forgot_password_submit_forgot_password,ai_available_data_trip_availability,ai_approve_-unknown-_-unknown-,ai_itinerary_view_guest_itinerary,ai_pending_-unknown-_-unknown-,ai_payoneer_account_redirect_-unknown-_-unknown-,ai_update_submit_update_listing,ai_apply_coupon_click_click_apply_coupon_click,ai_signup_weibo_-unknown-_-unknown-,ai_edit_-unknown-_-unknown-,ai_notifications_data_notifications,ai_remove_dashboard_alert_-unknown-_-unknown-,ai_show_data_translations,ai_update_cached_data_admin_templates,ai_listings_-unknown-_-unknown-,ai_life_-unknown-_-unknown-,ai_preapproval_message_post_message_post,ai_requested_view_p5,ai_show_view_view_listing,ai_office_location_-unknown-_-unknown-,ai_ask_question_submit_contact_host,ai_reviews_data_user_reviews,ai_currencies_nan_nan,ai_signup_weibo_referral_-unknown-_-unknown-,ai_why_host_-unknown-_-unknown-,ai_domains_-unknown-_-unknown-,ai_phone_verification_phone_number_removed_-unknown-_-unknown-,ai_social_connections_data_user_social_connections,ai_ajax_refresh_subtotal_click_change_trip_characteristics,ai_recent_reservations_-unknown-_-unknown-,ai_settings_-unknown-_-unknown-,ai_jumio_-unknown-_-unknown-,ai_feed_-unknown-_-unknown-,ai_localization_settings_nan_nan,ai_click_click_book_it,ai_detect_fb_session_-unknown-_-unknown-,ai_email_by_key_-unknown-_-unknown-,ai_review_page_-unknown-_-unknown-,ai_other_hosting_reviews_first_-unknown-_-unknown-,ai_complete_status_-unknown-_-unknown-,ai_read_policy_click_click_read_policy_click,ai_listings_view_user_listings,ai_recommendations_data_listing_recommendations,ai_pay_-unknown-_-unknown-,ai_referrer_status_-unknown-_-unknown-,ai_apply_coupon_error_click_apply_coupon_error,ai_new_-unknown-_-unknown-,ai_guest_booked_elsewhere_message_post_message_post,ai_photography_-unknown-_-unknown-,ai_webcam_upload_-unknown-_-unknown-,ai_personalize_data_wishlist_content_update,ai_signup_login_view_signup_login_page,ai_campaigns_nan_nan,ai_create_submit_signup,ai_host_summary_view_host_home,ai_decision_tree_-unknown-_-unknown-,ai_privacy_view_account_privacy_settings,ai_southern-europe_-unknown-_-unknown-,ai_update_submit_update_user_profile,ai_issue_-unknown-_-unknown-,ai_acculynk_bin_check_success_-unknown-_-unknown-,ai_social-media_-unknown-_-unknown-,ai_update_nan_nan,ai_envoy_form_-unknown-_-unknown-,ai_similar_listings_v2_nan_nan,ai_ajax_image_upload_-unknown-_-unknown-,ai_transaction_history_paginated_-unknown-_-unknown-,ai_united-states_-unknown-_-unknown-,ai_create_submit_create_listing,ai_authenticate_view_login_page,ai_10_message_post_message_post,ai_friends_view_friends_wishlists,ai_apply_reservation_submit_apply_coupon,ai_reservation_-unknown-_-unknown-,ai_payoneer_signup_complete_-unknown-_-unknown-,ai_sublets_-unknown-_-unknown-,ai_recommendation_page_-unknown-_-unknown-,ai_update_notifications_-unknown-_-unknown-,ai_qt2_view_message_thread,ai_delete_submit_delete_listing_description,ai_friends_new_-unknown-_-unknown-,ai_departments_-unknown-_-unknown-,ai_set_password_view_set_password_page,ai_payment_methods_-unknown-_-unknown-,ai_ajax_price_and_availability_click_alteration_field,ai_show_view_wishlist,ai_ajax_photo_widget_form_iframe_-unknown-_-unknown-,ai_change_password_submit_change_password,ai_rate_-unknown-_-unknown-,ai_airbnb_picks_view_airbnb_picks_wishlists,ai_clear_reservation_-unknown-_-unknown-,ai_listing_view_p3,ai_upload_-unknown-_-unknown-,ai_endpoint_error_-unknown-_-unknown-,ai_ajax_check_dates_click_change_contact_host_dates,ai_create_ach_-unknown-_-unknown-,ai_guest_billing_receipt_-unknown-_-unknown-,ai_image_order_-unknown-_-unknown-,ai_add_note_submit_wishlist_note,ai_toggle_archived_thread_click_toggle_archived_thread,ai_index_data_user_tax_forms,ai_search_-unknown-_-unknown-,ai_ajax_payout_options_by_country_-unknown-_-unknown-,ai_has_profile_pic_-unknown-_-unknown-,ai_index_view_your_listings,ai_weibo_signup_referral_finish_-unknown-_-unknown-,ai_transaction_history_view_account_transaction_history,ai_change_default_payout_-unknown-_-unknown-,ai_ajax_statsd_-unknown-_-unknown-,ai_edit_view_edit_profile,ai_about_us_-unknown-_-unknown-,ai_remove_dashboard_alert_click_remove_dashboard_alert,ai_set_password_submit_set_password,ai_my_view_user_wishlists,ai_references_view_profile_references,ai_ajax_lwlb_contact_click_contact_host,ai_index_view_view_search_results,ai_create_airbnb_-unknown-_-unknown-,ai_phone_verification_number_sucessfully_submitted_-unknown-_-unknown-,ai_print_confirmation_-unknown-_-unknown-,ai_apply_code_-unknown-_-unknown-,ai_request_photography_-unknown-_-unknown-,ai_authenticate_submit_login,ai_my_listings_view_your_reservations,ai_update_hide_from_search_engines_-unknown-_-unknown-,ai_glob_-unknown-_-unknown-,ai_email_itinerary_colorbox_-unknown-_-unknown-,ai_this_hosting_reviews_click_listing_reviews_page,ai_zendesk_login_jwt_-unknown-_-unknown-,ai_click_click_contact_host,ai_index_data_reservations,ai_coupon_code_click_click_coupon_code_click,ai_qt_reply_v2_submit_send_message,ai_acculynk_load_pin_pad_-unknown-_-unknown-,ai_set_default_-unknown-_-unknown-,ai_submit_contact_-unknown-_-unknown-,ai_status_-unknown-_-unknown-,ai_12_message_post_message_post,ai_connect_submit_oauth_login,ai_calendar_tab_inner2_-unknown-_-unknown-,ai_edit_verification_view_profile_verifications,ai_create_submit_create_phone_numbers,ai_show_view_user_profile,ai_agree_terms_check_-unknown-_-unknown-,ai_email_share_submit_email_wishlist,ai_change_availability_submit_change_availability,ai_cancel_submit_guest_cancellation,ai_click_click_complete_booking,ai_popular_view_popular_wishlists,ai_ajax_worth_submit_calculate_worth,ai_message_to_host_change_click_message_to_host_change,ai_update_reservation_requirements_-unknown-_-unknown-,ai_special_offer_message_post_message_post,ai_active_-unknown-_-unknown-,ai_notifications_submit_notifications,ai_search_click_view_search_results,ai_set_user_submit_create_listing,ai_update_friends_display_-unknown-_-unknown-,ai_respond_submit_respond_to_alteration_request,ai_badge_-unknown-_-unknown-,ai_slideshow_-unknown-_-unknown-,ai_how_it_works_-unknown-_-unknown-,ai_open_hard_fallback_modal_-unknown-_-unknown-,ai_phone_number_widget_-unknown-_-unknown-,ai_populate_help_dropdown_-unknown-_-unknown-,ai_qt_reply_v2_-unknown-_-unknown-,ai_press_news_-unknown-_-unknown-,ai_show_view_p1,ai_add_guests_-unknown-_-unknown-,ai_terms_and_conditions_-unknown-_-unknown-,ai_become_user_-unknown-_-unknown-,ai_confirm_email_click_confirm_email,ai_dashboard_view_dashboard,ai_acculynk_session_obtained_-unknown-_-unknown-,ai_founders_-unknown-_-unknown-,ai_apply_coupon_click_success_click_apply_coupon_click_success,ai_travel_plans_previous_view_previous_trips,ai_social_-unknown-_-unknown-,ai_signature_-unknown-_-unknown-,ai_reviews_new_-unknown-_-unknown-,ai_recommendations_data_user_friend_recommendations,ai_verify_-unknown-_-unknown-,ai_uptodate_nan_nan,ai_trust_-unknown-_-unknown-,ai_requirements_-unknown-_-unknown-,ai_top_destinations_-unknown-_-unknown-,ai_toggle_starred_thread_click_toggle_starred_thread,ai_update_submit_update_listing_description,ai_pending_booking_request_pending,ai_show_view_alteration_request,ai_receipt_view_guest_receipt,ai_booking_booking_response_booking,ai_toggle_availability_-unknown-_-unknown-,ai_agree_terms_uncheck_-unknown-_-unknown-,ai_change_view_change_or_alter,ai_profile_pic_-unknown-_-unknown-,ai_views_-unknown-_-unknown-,ai_signed_out_modal_nan_nan,ai_click_click_instant_book,ai_redirect_-unknown-_-unknown-,ai_message_to_host_focus_click_message_to_host_focus,ai_ajax_payout_edit_-unknown-_-unknown-,ai_load_more_-unknown-_-unknown-,ai_cancellation_policy_click_click_cancellation_policy_click,ai_show_view_p3,ai_countries_-unknown-_-unknown-,ai_apply_coupon_error_type_-unknown-_-unknown-,ai_nan_message_post_message_post,ai_patch_-unknown-_-unknown-,ai_impressions_view_p4,ai_phone_verification_nan_nan,ai_faq_-unknown-_-unknown-,ai_11_message_post_message_post,ai_email_wishlist_click_email_wishlist_button,ai_callback_partner_callback_oauth_response,ai_index_view_message_thread,ai_approve_submit_host_respond,ai_jumio_redirect_-unknown-_-unknown-,ai_check_nan_nan,ai_ajax_google_translate_-unknown-_-unknown-,ai_place_worth_view_place_worth,ai_index_-unknown-_-unknown-,ai_reputation_-unknown-_-unknown-,ai_qt_with_data_lookup_message_thread,ai_similar_listings_data_similar_listings,ai_ajax_google_translate_description_-unknown-_-unknown-,ai_update_country_of_residence_-unknown-_-unknown-,ai_notifications_view_account_notification_settings,ai_faq_category_-unknown-_-unknown-,ai_new_view_list_your_space,ai_index_view_user_wishlists,ai_cancellation_policies_view_cancellation_policies,ai_change_currency_-unknown-_-unknown-,ai_new_session_-unknown-_-unknown-,ai_signup_modal_view_signup_modal,ai_forgot_password_click_forgot_password,ai_login_modal_view_login_modal,ai_overview_-unknown-_-unknown-,ai_locations_-unknown-_-unknown-,ai_coupon_field_focus_click_coupon_field_focus,ai_invalid_action_-unknown-_-unknown-,ai_spoken_languages_data_user_languages,ai_manage_listing_view_manage_listing,ai_account_-unknown-_-unknown-,ai_ajax_payout_split_edit_-unknown-_-unknown-,ai_update_-unknown-_-unknown-,ai_header_userpic_data_header_userpic,ai_open_graph_setting_-unknown-_-unknown-,ai_mobile_landing_page_-unknown-_-unknown-,ai_media_resources_-unknown-_-unknown-,ai_p4_refund_policy_terms_click_p4_refund_policy_terms,ai_create_submit_create_alteration_request,ai_tell_a_friend_-unknown-_-unknown-,ai_delete_-unknown-_-unknown-,ai_show_personalize_data_user_profile_content_update,ai_complete_-unknown-_-unknown-,ai_index_nan_nan,ai_ajax_google_translate_reviews_click_translate_listing_reviews,ai_department_-unknown-_-unknown-,ai_acculynk_pin_pad_inactive_-unknown-_-unknown-,ai_languages_multiselect_-unknown-_-unknown-,ai_unavailabilities_data_unavailable_dates,ai_create_paypal_-unknown-_-unknown-,ai_index_view_message_inbox,ai_jumio_token_-unknown-_-unknown-,ai_show_code_-unknown-_-unknown-,ai_questions_-unknown-_-unknown-,ai_lookup_nan_nan,ai_country_options_-unknown-_-unknown-,ai_requested_submit_post_checkout_action,ai_phone_verification_modal_-unknown-_-unknown-,ai_index_view_listing_descriptions,ai_satisfy_nan_nan,ai_show_nan_nan,ai_track_activity_nan_nan,ai_salute_-unknown-_-unknown-,ai_update_submit_update_user,ai_position_-unknown-_-unknown-,ai_payout_update_-unknown-_-unknown-,ai_click_click_request_to_book,ai_ajax_special_offer_dates_available_click_special_offer_field,ai_phone_verification_success_click_phone_verification_success,ai_show_-unknown-_-unknown-,ai_mobile_oauth_callback_-unknown-_-unknown-
0,00023iyk9l,"[callback_partner_callback_oauth_response, pen...",40,,,,,36.0,,,,,,,,,,,,,,,,,,,,,,,,,,14.0,,,,,,,,40.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,16.0,,,,,,,,,,,15.0,,,,,,,,,,,,,,,,,,,,,,3.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,9.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,13.0,,,,,,,,,,,,,,,,2.0,,,,,,,,,,,,,,,,10.0,,,39.0,,,,,,,1.0,,,,,,,,,,5.0,,,,,,,,,,,,,,,,,,,,,,21.0,,,,,,,,,,,,,,,,,,,,,,,,,,,4.0,,,,,,,,,,
1,0010k6l0om,"[callback_partner_callback_oauth_response, sea...",63,,,,,2.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,28.0,,,,,,,,,,,,,,,,,,,,,,29.0,,,,,,,,,,,,,,3.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,32.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,18.0,,,,,,,,,,,19.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,4.0,,,,,,,,,,1.0,,,,,,,48.0,,,,,,,,,,,,,,,,,,,,,,,,,27.0,,,,,,,,,,,,,,,,,,,,,,,,,,,5.0,,,,,,,,,,
2,001wyh0pz8,"[create_submit_signup, search_click_view_searc...",90,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,8.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,24.0,,,,,,,,,,,,,,,,,,,,,,,,,15.0,1.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,27.0,,,,,,,,,,,4.0,12.0,2.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,76.0,,,,,,,,,,,,,,,,,5.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,36.0,,,90.0,,,,,,,
3,0028jgx1x1,"[create_submit_create_user, show_view_user_pro...",31,,,,,,,,,,,,,,,,1.0,,,,,,,,4.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2.0,,,,,,,,,,,16.0,,17.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,3.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
4,002qnbzfs5,"[campaigns_nan_nan, click_click_book_it, show_...",789,,,,,,,,,,,,,66.0,,,,,,,,,,,45.0,,,,,,,,,,,,4.0,,,725.0,,,,583.0,114.0,742.0,,,,,775.0,712.0,,,,,,,,,,,,,,,,,,,,,,,317.0,,,,,,7.0,,,,,,,2.0,,,,,,,34.0,,352.0,,,,,,,,,1.0,,,,,,,,,,,,,,,,,,718.0,,,,,,,,,,,,,336.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,5.0,15.0,,,,,,,,,,,749.0,3.0,,,,,,,,,,,11.0,61.0,415.0,,,,,,,,,,,,,,,,,405.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,26.0,,,406.0,,,,,,,,,,,,,,98.0,,,,,,,,,,,,,,,,,,,,,,,,401.0,617.0,,,,,,,,,,37.0,,,,,239.0,,25.0,,,,,,,,,,222.0,,,547.0,,,,,,47.0,


time: 347 ms (started: 2021-08-18 18:43:17 +00:00)


### 2.1.b Adding action_info counts features

In [19]:
def get_action_info_count(ai, ais):
    return ais.count(ai)

time: 610 µs (started: 2021-08-18 18:43:17 +00:00)


In [20]:
for ai in tqdm(actions_info):
    tmp[f'count_{ai}'] = tmp.action_info.apply(lambda x: get_action_info_count(ai, x))

  0%|          | 0/336 [00:00<?, ?it/s]

  tmp[f'count_{ai}'] = tmp.action_info.apply(lambda x: get_action_info_count(ai, x))


time: 1min 46s (started: 2021-08-18 18:43:17 +00:00)


In [21]:
tmp.head()

Unnamed: 0,user_id,action_info,seassion_length,ai_handle_vanity_url_-unknown-_-unknown-,ai_phone_verification_number_submitted_for_call_-unknown-_-unknown-,ai_home_safety_landing_-unknown-_-unknown-,ai_delete_submit_delete_phone_numbers,ai_search_results_click_view_search_results,ai_guarantee_view_host_guarantee,ai_track_page_view_nan_nan,ai_populate_from_facebook_-unknown-_-unknown-,ai_phone_verification_error_-unknown-_-unknown-,ai_facebook_auto_login_-unknown-_-unknown-,ai_payout_delete_-unknown-_-unknown-,ai_phone_verification_number_submitted_for_sms_-unknown-_-unknown-,ai_push_notification_callback_-unknown-_-unknown-,ai_15_message_post_message_post,ai_login_view_login_page,ai_create_submit_create_user,ai_phone_verification_call_taking_too_long_-unknown-_-unknown-,ai_destroy_-unknown-_-unknown-,ai_sync_-unknown-_-unknown-,ai_contact_new_-unknown-_-unknown-,ai_p4_terms_click_p4_terms,ai_clickthrough_-unknown-_-unknown-,ai_hospitality_-unknown-_-unknown-,ai_reviews_data_listing_reviews,ai_tos_confirm_-unknown-_-unknown-,ai_create_multiple_-unknown-_-unknown-,ai_authorize_-unknown-_-unknown-,ai_localized_-unknown-_-unknown-,ai_recommend_-unknown-_-unknown-,ai_create_view_list_your_space,ai_travel_plans_current_view_your_trips,ai_city_count_-unknown-_-unknown-,ai_home_safety_terms_-unknown-_-unknown-,ai_terms_view_terms_and_privacy,ai_payout_preferences_view_account_payout_preferences,ai_payment_instruments_data_payment_instruments,ai_pending_tickets_-unknown-_-unknown-,ai_other_hosting_reviews_-unknown-_-unknown-,ai_confirm_email_click_confirm_email_link,ai_request_new_confirm_email_click_request_new_confirm_email,ai_complete_redirect_-unknown-_-unknown-,ai_collections_view_user_wishlists,ai_identity_-unknown-_-unknown-,ai_create_-unknown-_-unknown-,ai_kba_update_-unknown-_-unknown-,ai_maybe_information_message_post_message_post,ai_delete_submit_delete_listing,ai_supported_-unknown-_-unknown-,ai_photography_update_-unknown-_-unknown-,ai_at_checkpoint_booking_request_at_checkpoint,ai_kba_-unknown-_-unknown-,ai_envoy_bank_details_redirect_-unknown-_-unknown-,ai_forgot_password_submit_forgot_password,ai_available_data_trip_availability,ai_approve_-unknown-_-unknown-,ai_itinerary_view_guest_itinerary,ai_pending_-unknown-_-unknown-,ai_payoneer_account_redirect_-unknown-_-unknown-,ai_update_submit_update_listing,ai_apply_coupon_click_click_apply_coupon_click,ai_signup_weibo_-unknown-_-unknown-,ai_edit_-unknown-_-unknown-,ai_notifications_data_notifications,ai_remove_dashboard_alert_-unknown-_-unknown-,ai_show_data_translations,ai_update_cached_data_admin_templates,ai_listings_-unknown-_-unknown-,ai_life_-unknown-_-unknown-,ai_preapproval_message_post_message_post,ai_requested_view_p5,ai_show_view_view_listing,ai_office_location_-unknown-_-unknown-,ai_ask_question_submit_contact_host,ai_reviews_data_user_reviews,ai_currencies_nan_nan,ai_signup_weibo_referral_-unknown-_-unknown-,ai_why_host_-unknown-_-unknown-,ai_domains_-unknown-_-unknown-,ai_phone_verification_phone_number_removed_-unknown-_-unknown-,ai_social_connections_data_user_social_connections,ai_ajax_refresh_subtotal_click_change_trip_characteristics,ai_recent_reservations_-unknown-_-unknown-,ai_settings_-unknown-_-unknown-,ai_jumio_-unknown-_-unknown-,ai_feed_-unknown-_-unknown-,ai_localization_settings_nan_nan,ai_click_click_book_it,ai_detect_fb_session_-unknown-_-unknown-,ai_email_by_key_-unknown-_-unknown-,ai_review_page_-unknown-_-unknown-,ai_other_hosting_reviews_first_-unknown-_-unknown-,ai_complete_status_-unknown-_-unknown-,ai_read_policy_click_click_read_policy_click,ai_listings_view_user_listings,ai_recommendations_data_listing_recommendations,ai_pay_-unknown-_-unknown-,ai_referrer_status_-unknown-_-unknown-,ai_apply_coupon_error_click_apply_coupon_error,ai_new_-unknown-_-unknown-,ai_guest_booked_elsewhere_message_post_message_post,ai_photography_-unknown-_-unknown-,ai_webcam_upload_-unknown-_-unknown-,ai_personalize_data_wishlist_content_update,ai_signup_login_view_signup_login_page,ai_campaigns_nan_nan,ai_create_submit_signup,ai_host_summary_view_host_home,ai_decision_tree_-unknown-_-unknown-,ai_privacy_view_account_privacy_settings,ai_southern-europe_-unknown-_-unknown-,ai_update_submit_update_user_profile,ai_issue_-unknown-_-unknown-,ai_acculynk_bin_check_success_-unknown-_-unknown-,ai_social-media_-unknown-_-unknown-,ai_update_nan_nan,ai_envoy_form_-unknown-_-unknown-,ai_similar_listings_v2_nan_nan,ai_ajax_image_upload_-unknown-_-unknown-,ai_transaction_history_paginated_-unknown-_-unknown-,ai_united-states_-unknown-_-unknown-,ai_create_submit_create_listing,ai_authenticate_view_login_page,ai_10_message_post_message_post,ai_friends_view_friends_wishlists,ai_apply_reservation_submit_apply_coupon,ai_reservation_-unknown-_-unknown-,ai_payoneer_signup_complete_-unknown-_-unknown-,ai_sublets_-unknown-_-unknown-,ai_recommendation_page_-unknown-_-unknown-,ai_update_notifications_-unknown-_-unknown-,ai_qt2_view_message_thread,ai_delete_submit_delete_listing_description,ai_friends_new_-unknown-_-unknown-,ai_departments_-unknown-_-unknown-,ai_set_password_view_set_password_page,ai_payment_methods_-unknown-_-unknown-,ai_ajax_price_and_availability_click_alteration_field,ai_show_view_wishlist,ai_ajax_photo_widget_form_iframe_-unknown-_-unknown-,ai_change_password_submit_change_password,ai_rate_-unknown-_-unknown-,ai_airbnb_picks_view_airbnb_picks_wishlists,ai_clear_reservation_-unknown-_-unknown-,ai_listing_view_p3,ai_upload_-unknown-_-unknown-,ai_endpoint_error_-unknown-_-unknown-,ai_ajax_check_dates_click_change_contact_host_dates,ai_create_ach_-unknown-_-unknown-,ai_guest_billing_receipt_-unknown-_-unknown-,ai_image_order_-unknown-_-unknown-,ai_add_note_submit_wishlist_note,ai_toggle_archived_thread_click_toggle_archived_thread,ai_index_data_user_tax_forms,ai_search_-unknown-_-unknown-,ai_ajax_payout_options_by_country_-unknown-_-unknown-,ai_has_profile_pic_-unknown-_-unknown-,ai_index_view_your_listings,ai_weibo_signup_referral_finish_-unknown-_-unknown-,ai_transaction_history_view_account_transaction_history,ai_change_default_payout_-unknown-_-unknown-,ai_ajax_statsd_-unknown-_-unknown-,ai_edit_view_edit_profile,ai_about_us_-unknown-_-unknown-,ai_remove_dashboard_alert_click_remove_dashboard_alert,ai_set_password_submit_set_password,ai_my_view_user_wishlists,ai_references_view_profile_references,ai_ajax_lwlb_contact_click_contact_host,ai_index_view_view_search_results,ai_create_airbnb_-unknown-_-unknown-,ai_phone_verification_number_sucessfully_submitted_-unknown-_-unknown-,ai_print_confirmation_-unknown-_-unknown-,ai_apply_code_-unknown-_-unknown-,ai_request_photography_-unknown-_-unknown-,ai_authenticate_submit_login,ai_my_listings_view_your_reservations,ai_update_hide_from_search_engines_-unknown-_-unknown-,ai_glob_-unknown-_-unknown-,ai_email_itinerary_colorbox_-unknown-_-unknown-,ai_this_hosting_reviews_click_listing_reviews_page,ai_zendesk_login_jwt_-unknown-_-unknown-,ai_click_click_contact_host,ai_index_data_reservations,ai_coupon_code_click_click_coupon_code_click,ai_qt_reply_v2_submit_send_message,ai_acculynk_load_pin_pad_-unknown-_-unknown-,ai_set_default_-unknown-_-unknown-,ai_submit_contact_-unknown-_-unknown-,ai_status_-unknown-_-unknown-,ai_12_message_post_message_post,ai_connect_submit_oauth_login,ai_calendar_tab_inner2_-unknown-_-unknown-,ai_edit_verification_view_profile_verifications,ai_create_submit_create_phone_numbers,ai_show_view_user_profile,ai_agree_terms_check_-unknown-_-unknown-,ai_email_share_submit_email_wishlist,ai_change_availability_submit_change_availability,ai_cancel_submit_guest_cancellation,ai_click_click_complete_booking,ai_popular_view_popular_wishlists,ai_ajax_worth_submit_calculate_worth,ai_message_to_host_change_click_message_to_host_change,ai_update_reservation_requirements_-unknown-_-unknown-,ai_special_offer_message_post_message_post,ai_active_-unknown-_-unknown-,ai_notifications_submit_notifications,ai_search_click_view_search_results,ai_set_user_submit_create_listing,ai_update_friends_display_-unknown-_-unknown-,ai_respond_submit_respond_to_alteration_request,ai_badge_-unknown-_-unknown-,ai_slideshow_-unknown-_-unknown-,ai_how_it_works_-unknown-_-unknown-,ai_open_hard_fallback_modal_-unknown-_-unknown-,ai_phone_number_widget_-unknown-_-unknown-,ai_populate_help_dropdown_-unknown-_-unknown-,ai_qt_reply_v2_-unknown-_-unknown-,ai_press_news_-unknown-_-unknown-,ai_show_view_p1,ai_add_guests_-unknown-_-unknown-,ai_terms_and_conditions_-unknown-_-unknown-,ai_become_user_-unknown-_-unknown-,ai_confirm_email_click_confirm_email,ai_dashboard_view_dashboard,ai_acculynk_session_obtained_-unknown-_-unknown-,ai_founders_-unknown-_-unknown-,ai_apply_coupon_click_success_click_apply_coupon_click_success,ai_travel_plans_previous_view_previous_trips,ai_social_-unknown-_-unknown-,ai_signature_-unknown-_-unknown-,ai_reviews_new_-unknown-_-unknown-,ai_recommendations_data_user_friend_recommendations,ai_verify_-unknown-_-unknown-,ai_uptodate_nan_nan,ai_trust_-unknown-_-unknown-,ai_requirements_-unknown-_-unknown-,ai_top_destinations_-unknown-_-unknown-,ai_toggle_starred_thread_click_toggle_starred_thread,ai_update_submit_update_listing_description,ai_pending_booking_request_pending,ai_show_view_alteration_request,ai_receipt_view_guest_receipt,ai_booking_booking_response_booking,ai_toggle_availability_-unknown-_-unknown-,ai_agree_terms_uncheck_-unknown-_-unknown-,ai_change_view_change_or_alter,ai_profile_pic_-unknown-_-unknown-,ai_views_-unknown-_-unknown-,ai_signed_out_modal_nan_nan,ai_click_click_instant_book,ai_redirect_-unknown-_-unknown-,ai_message_to_host_focus_click_message_to_host_focus,ai_ajax_payout_edit_-unknown-_-unknown-,ai_load_more_-unknown-_-unknown-,ai_cancellation_policy_click_click_cancellation_policy_click,ai_show_view_p3,ai_countries_-unknown-_-unknown-,ai_apply_coupon_error_type_-unknown-_-unknown-,ai_nan_message_post_message_post,ai_patch_-unknown-_-unknown-,ai_impressions_view_p4,ai_phone_verification_nan_nan,ai_faq_-unknown-_-unknown-,ai_11_message_post_message_post,ai_email_wishlist_click_email_wishlist_button,ai_callback_partner_callback_oauth_response,ai_index_view_message_thread,ai_approve_submit_host_respond,ai_jumio_redirect_-unknown-_-unknown-,ai_check_nan_nan,ai_ajax_google_translate_-unknown-_-unknown-,ai_place_worth_view_place_worth,ai_index_-unknown-_-unknown-,ai_reputation_-unknown-_-unknown-,ai_qt_with_data_lookup_message_thread,ai_similar_listings_data_similar_listings,ai_ajax_google_translate_description_-unknown-_-unknown-,ai_update_country_of_residence_-unknown-_-unknown-,ai_notifications_view_account_notification_settings,ai_faq_category_-unknown-_-unknown-,ai_new_view_list_your_space,ai_index_view_user_wishlists,ai_cancellation_policies_view_cancellation_policies,ai_change_currency_-unknown-_-unknown-,ai_new_session_-unknown-_-unknown-,ai_signup_modal_view_signup_modal,ai_forgot_password_click_forgot_password,ai_login_modal_view_login_modal,ai_overview_-unknown-_-unknown-,ai_locations_-unknown-_-unknown-,ai_coupon_field_focus_click_coupon_field_focus,ai_invalid_action_-unknown-_-unknown-,ai_spoken_languages_data_user_languages,ai_manage_listing_view_manage_listing,ai_account_-unknown-_-unknown-,ai_ajax_payout_split_edit_-unknown-_-unknown-,ai_update_-unknown-_-unknown-,ai_header_userpic_data_header_userpic,ai_open_graph_setting_-unknown-_-unknown-,ai_mobile_landing_page_-unknown-_-unknown-,ai_media_resources_-unknown-_-unknown-,ai_p4_refund_policy_terms_click_p4_refund_policy_terms,ai_create_submit_create_alteration_request,ai_tell_a_friend_-unknown-_-unknown-,ai_delete_-unknown-_-unknown-,ai_show_personalize_data_user_profile_content_update,ai_complete_-unknown-_-unknown-,ai_index_nan_nan,ai_ajax_google_translate_reviews_click_translate_listing_reviews,ai_department_-unknown-_-unknown-,ai_acculynk_pin_pad_inactive_-unknown-_-unknown-,ai_languages_multiselect_-unknown-_-unknown-,ai_unavailabilities_data_unavailable_dates,ai_create_paypal_-unknown-_-unknown-,ai_index_view_message_inbox,ai_jumio_token_-unknown-_-unknown-,ai_show_code_-unknown-_-unknown-,ai_questions_-unknown-_-unknown-,ai_lookup_nan_nan,ai_country_options_-unknown-_-unknown-,ai_requested_submit_post_checkout_action,ai_phone_verification_modal_-unknown-_-unknown-,ai_index_view_listing_descriptions,ai_satisfy_nan_nan,ai_show_nan_nan,ai_track_activity_nan_nan,ai_salute_-unknown-_-unknown-,ai_update_submit_update_user,ai_position_-unknown-_-unknown-,ai_payout_update_-unknown-_-unknown-,ai_click_click_request_to_book,ai_ajax_special_offer_dates_available_click_special_offer_field,ai_phone_verification_success_click_phone_verification_success,ai_show_-unknown-_-unknown-,ai_mobile_oauth_callback_-unknown-_-unknown-,count_handle_vanity_url_-unknown-_-unknown-,count_phone_verification_number_submitted_for_call_-unknown-_-unknown-,count_home_safety_landing_-unknown-_-unknown-,count_delete_submit_delete_phone_numbers,count_search_results_click_view_search_results,count_guarantee_view_host_guarantee,count_track_page_view_nan_nan,count_populate_from_facebook_-unknown-_-unknown-,count_phone_verification_error_-unknown-_-unknown-,count_facebook_auto_login_-unknown-_-unknown-,count_payout_delete_-unknown-_-unknown-,count_phone_verification_number_submitted_for_sms_-unknown-_-unknown-,count_push_notification_callback_-unknown-_-unknown-,count_15_message_post_message_post,count_login_view_login_page,count_create_submit_create_user,count_phone_verification_call_taking_too_long_-unknown-_-unknown-,count_destroy_-unknown-_-unknown-,count_sync_-unknown-_-unknown-,count_contact_new_-unknown-_-unknown-,count_p4_terms_click_p4_terms,count_clickthrough_-unknown-_-unknown-,count_hospitality_-unknown-_-unknown-,count_reviews_data_listing_reviews,count_tos_confirm_-unknown-_-unknown-,count_create_multiple_-unknown-_-unknown-,count_authorize_-unknown-_-unknown-,count_localized_-unknown-_-unknown-,count_recommend_-unknown-_-unknown-,count_create_view_list_your_space,count_travel_plans_current_view_your_trips,count_city_count_-unknown-_-unknown-,count_home_safety_terms_-unknown-_-unknown-,count_terms_view_terms_and_privacy,count_payout_preferences_view_account_payout_preferences,count_payment_instruments_data_payment_instruments,count_pending_tickets_-unknown-_-unknown-,count_other_hosting_reviews_-unknown-_-unknown-,count_confirm_email_click_confirm_email_link,count_request_new_confirm_email_click_request_new_confirm_email,count_complete_redirect_-unknown-_-unknown-,count_collections_view_user_wishlists,count_identity_-unknown-_-unknown-,count_create_-unknown-_-unknown-,count_kba_update_-unknown-_-unknown-,count_maybe_information_message_post_message_post,count_delete_submit_delete_listing,count_supported_-unknown-_-unknown-,count_photography_update_-unknown-_-unknown-,count_at_checkpoint_booking_request_at_checkpoint,count_kba_-unknown-_-unknown-,count_envoy_bank_details_redirect_-unknown-_-unknown-,count_forgot_password_submit_forgot_password,count_available_data_trip_availability,count_approve_-unknown-_-unknown-,count_itinerary_view_guest_itinerary,count_pending_-unknown-_-unknown-,count_payoneer_account_redirect_-unknown-_-unknown-,count_update_submit_update_listing,count_apply_coupon_click_click_apply_coupon_click,count_signup_weibo_-unknown-_-unknown-,count_edit_-unknown-_-unknown-,count_notifications_data_notifications,count_remove_dashboard_alert_-unknown-_-unknown-,count_show_data_translations,count_update_cached_data_admin_templates,count_listings_-unknown-_-unknown-,count_life_-unknown-_-unknown-,count_preapproval_message_post_message_post,count_requested_view_p5,count_show_view_view_listing,count_office_location_-unknown-_-unknown-,count_ask_question_submit_contact_host,count_reviews_data_user_reviews,count_currencies_nan_nan,count_signup_weibo_referral_-unknown-_-unknown-,count_why_host_-unknown-_-unknown-,count_domains_-unknown-_-unknown-,count_phone_verification_phone_number_removed_-unknown-_-unknown-,count_social_connections_data_user_social_connections,count_ajax_refresh_subtotal_click_change_trip_characteristics,count_recent_reservations_-unknown-_-unknown-,count_settings_-unknown-_-unknown-,count_jumio_-unknown-_-unknown-,count_feed_-unknown-_-unknown-,count_localization_settings_nan_nan,count_click_click_book_it,count_detect_fb_session_-unknown-_-unknown-,count_email_by_key_-unknown-_-unknown-,count_review_page_-unknown-_-unknown-,count_other_hosting_reviews_first_-unknown-_-unknown-,count_complete_status_-unknown-_-unknown-,count_read_policy_click_click_read_policy_click,count_listings_view_user_listings,count_recommendations_data_listing_recommendations,count_pay_-unknown-_-unknown-,count_referrer_status_-unknown-_-unknown-,count_apply_coupon_error_click_apply_coupon_error,count_new_-unknown-_-unknown-,count_guest_booked_elsewhere_message_post_message_post,count_photography_-unknown-_-unknown-,count_webcam_upload_-unknown-_-unknown-,count_personalize_data_wishlist_content_update,count_signup_login_view_signup_login_page,count_campaigns_nan_nan,count_create_submit_signup,count_host_summary_view_host_home,count_decision_tree_-unknown-_-unknown-,count_privacy_view_account_privacy_settings,count_southern-europe_-unknown-_-unknown-,count_update_submit_update_user_profile,count_issue_-unknown-_-unknown-,count_acculynk_bin_check_success_-unknown-_-unknown-,count_social-media_-unknown-_-unknown-,count_update_nan_nan,count_envoy_form_-unknown-_-unknown-,count_similar_listings_v2_nan_nan,count_ajax_image_upload_-unknown-_-unknown-,count_transaction_history_paginated_-unknown-_-unknown-,count_united-states_-unknown-_-unknown-,count_create_submit_create_listing,count_authenticate_view_login_page,count_10_message_post_message_post,count_friends_view_friends_wishlists,count_apply_reservation_submit_apply_coupon,count_reservation_-unknown-_-unknown-,count_payoneer_signup_complete_-unknown-_-unknown-,count_sublets_-unknown-_-unknown-,count_recommendation_page_-unknown-_-unknown-,count_update_notifications_-unknown-_-unknown-,count_qt2_view_message_thread,count_delete_submit_delete_listing_description,count_friends_new_-unknown-_-unknown-,count_departments_-unknown-_-unknown-,count_set_password_view_set_password_page,count_payment_methods_-unknown-_-unknown-,count_ajax_price_and_availability_click_alteration_field,count_show_view_wishlist,count_ajax_photo_widget_form_iframe_-unknown-_-unknown-,count_change_password_submit_change_password,count_rate_-unknown-_-unknown-,count_airbnb_picks_view_airbnb_picks_wishlists,count_clear_reservation_-unknown-_-unknown-,count_listing_view_p3,count_upload_-unknown-_-unknown-,count_endpoint_error_-unknown-_-unknown-,count_ajax_check_dates_click_change_contact_host_dates,count_create_ach_-unknown-_-unknown-,count_guest_billing_receipt_-unknown-_-unknown-,count_image_order_-unknown-_-unknown-,count_add_note_submit_wishlist_note,count_toggle_archived_thread_click_toggle_archived_thread,count_index_data_user_tax_forms,count_search_-unknown-_-unknown-,count_ajax_payout_options_by_country_-unknown-_-unknown-,count_has_profile_pic_-unknown-_-unknown-,count_index_view_your_listings,count_weibo_signup_referral_finish_-unknown-_-unknown-,count_transaction_history_view_account_transaction_history,count_change_default_payout_-unknown-_-unknown-,count_ajax_statsd_-unknown-_-unknown-,count_edit_view_edit_profile,count_about_us_-unknown-_-unknown-,count_remove_dashboard_alert_click_remove_dashboard_alert,count_set_password_submit_set_password,count_my_view_user_wishlists,count_references_view_profile_references,count_ajax_lwlb_contact_click_contact_host,count_index_view_view_search_results,count_create_airbnb_-unknown-_-unknown-,count_phone_verification_number_sucessfully_submitted_-unknown-_-unknown-,count_print_confirmation_-unknown-_-unknown-,count_apply_code_-unknown-_-unknown-,count_request_photography_-unknown-_-unknown-,count_authenticate_submit_login,count_my_listings_view_your_reservations,count_update_hide_from_search_engines_-unknown-_-unknown-,count_glob_-unknown-_-unknown-,count_email_itinerary_colorbox_-unknown-_-unknown-,count_this_hosting_reviews_click_listing_reviews_page,count_zendesk_login_jwt_-unknown-_-unknown-,count_click_click_contact_host,count_index_data_reservations,count_coupon_code_click_click_coupon_code_click,count_qt_reply_v2_submit_send_message,count_acculynk_load_pin_pad_-unknown-_-unknown-,count_set_default_-unknown-_-unknown-,count_submit_contact_-unknown-_-unknown-,count_status_-unknown-_-unknown-,count_12_message_post_message_post,count_connect_submit_oauth_login,count_calendar_tab_inner2_-unknown-_-unknown-,count_edit_verification_view_profile_verifications,count_create_submit_create_phone_numbers,count_show_view_user_profile,count_agree_terms_check_-unknown-_-unknown-,count_email_share_submit_email_wishlist,count_change_availability_submit_change_availability,count_cancel_submit_guest_cancellation,count_click_click_complete_booking,count_popular_view_popular_wishlists,count_ajax_worth_submit_calculate_worth,count_message_to_host_change_click_message_to_host_change,count_update_reservation_requirements_-unknown-_-unknown-,count_special_offer_message_post_message_post,count_active_-unknown-_-unknown-,count_notifications_submit_notifications,count_search_click_view_search_results,count_set_user_submit_create_listing,count_update_friends_display_-unknown-_-unknown-,count_respond_submit_respond_to_alteration_request,count_badge_-unknown-_-unknown-,count_slideshow_-unknown-_-unknown-,count_how_it_works_-unknown-_-unknown-,count_open_hard_fallback_modal_-unknown-_-unknown-,count_phone_number_widget_-unknown-_-unknown-,count_populate_help_dropdown_-unknown-_-unknown-,count_qt_reply_v2_-unknown-_-unknown-,count_press_news_-unknown-_-unknown-,count_show_view_p1,count_add_guests_-unknown-_-unknown-,count_terms_and_conditions_-unknown-_-unknown-,count_become_user_-unknown-_-unknown-,count_confirm_email_click_confirm_email,count_dashboard_view_dashboard,count_acculynk_session_obtained_-unknown-_-unknown-,count_founders_-unknown-_-unknown-,count_apply_coupon_click_success_click_apply_coupon_click_success,count_travel_plans_previous_view_previous_trips,count_social_-unknown-_-unknown-,count_signature_-unknown-_-unknown-,count_reviews_new_-unknown-_-unknown-,count_recommendations_data_user_friend_recommendations,count_verify_-unknown-_-unknown-,count_uptodate_nan_nan,count_trust_-unknown-_-unknown-,count_requirements_-unknown-_-unknown-,count_top_destinations_-unknown-_-unknown-,count_toggle_starred_thread_click_toggle_starred_thread,count_update_submit_update_listing_description,count_pending_booking_request_pending,count_show_view_alteration_request,count_receipt_view_guest_receipt,count_booking_booking_response_booking,count_toggle_availability_-unknown-_-unknown-,count_agree_terms_uncheck_-unknown-_-unknown-,count_change_view_change_or_alter,count_profile_pic_-unknown-_-unknown-,count_views_-unknown-_-unknown-,count_signed_out_modal_nan_nan,count_click_click_instant_book,count_redirect_-unknown-_-unknown-,count_message_to_host_focus_click_message_to_host_focus,count_ajax_payout_edit_-unknown-_-unknown-,count_load_more_-unknown-_-unknown-,count_cancellation_policy_click_click_cancellation_policy_click,count_show_view_p3,count_countries_-unknown-_-unknown-,count_apply_coupon_error_type_-unknown-_-unknown-,count_nan_message_post_message_post,count_patch_-unknown-_-unknown-,count_impressions_view_p4,count_phone_verification_nan_nan,count_faq_-unknown-_-unknown-,count_11_message_post_message_post,count_email_wishlist_click_email_wishlist_button,count_callback_partner_callback_oauth_response,count_index_view_message_thread,count_approve_submit_host_respond,count_jumio_redirect_-unknown-_-unknown-,count_check_nan_nan,count_ajax_google_translate_-unknown-_-unknown-,count_place_worth_view_place_worth,count_index_-unknown-_-unknown-,count_reputation_-unknown-_-unknown-,count_qt_with_data_lookup_message_thread,count_similar_listings_data_similar_listings,count_ajax_google_translate_description_-unknown-_-unknown-,count_update_country_of_residence_-unknown-_-unknown-,count_notifications_view_account_notification_settings,count_faq_category_-unknown-_-unknown-,count_new_view_list_your_space,count_index_view_user_wishlists,count_cancellation_policies_view_cancellation_policies,count_change_currency_-unknown-_-unknown-,count_new_session_-unknown-_-unknown-,count_signup_modal_view_signup_modal,count_forgot_password_click_forgot_password,count_login_modal_view_login_modal,count_overview_-unknown-_-unknown-,count_locations_-unknown-_-unknown-,count_coupon_field_focus_click_coupon_field_focus,count_invalid_action_-unknown-_-unknown-,count_spoken_languages_data_user_languages,count_manage_listing_view_manage_listing,count_account_-unknown-_-unknown-,count_ajax_payout_split_edit_-unknown-_-unknown-,count_update_-unknown-_-unknown-,count_header_userpic_data_header_userpic,count_open_graph_setting_-unknown-_-unknown-,count_mobile_landing_page_-unknown-_-unknown-,count_media_resources_-unknown-_-unknown-,count_p4_refund_policy_terms_click_p4_refund_policy_terms,count_create_submit_create_alteration_request,count_tell_a_friend_-unknown-_-unknown-,count_delete_-unknown-_-unknown-,count_show_personalize_data_user_profile_content_update,count_complete_-unknown-_-unknown-,count_index_nan_nan,count_ajax_google_translate_reviews_click_translate_listing_reviews,count_department_-unknown-_-unknown-,count_acculynk_pin_pad_inactive_-unknown-_-unknown-,count_languages_multiselect_-unknown-_-unknown-,count_unavailabilities_data_unavailable_dates,count_create_paypal_-unknown-_-unknown-,count_index_view_message_inbox,count_jumio_token_-unknown-_-unknown-,count_show_code_-unknown-_-unknown-,count_questions_-unknown-_-unknown-,count_lookup_nan_nan,count_country_options_-unknown-_-unknown-,count_requested_submit_post_checkout_action,count_phone_verification_modal_-unknown-_-unknown-,count_index_view_listing_descriptions,count_satisfy_nan_nan,count_show_nan_nan,count_track_activity_nan_nan,count_salute_-unknown-_-unknown-,count_update_submit_update_user,count_position_-unknown-_-unknown-,count_payout_update_-unknown-_-unknown-,count_click_click_request_to_book,count_ajax_special_offer_dates_available_click_special_offer_field,count_phone_verification_success_click_phone_verification_success,count_show_-unknown-_-unknown-,count_mobile_oauth_callback_-unknown-_-unknown-
0,00023iyk9l,"[callback_partner_callback_oauth_response, pen...",40,,,,,36.0,,,,,,,,,,,,,,,,,,,,,,,,,,14.0,,,,,,,,40.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,16.0,,,,,,,,,,,15.0,,,,,,,,,,,,,,,,,,,,,,3.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,9.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,13.0,,,,,,,,,,,,,,,,2.0,,,,,,,,,,,,,,,,10.0,,,39.0,,,,,,,1.0,,,,,,,,,,5.0,,,,,,,,,,,,,,,,,,,,,,21.0,,,,,,,,,,,,,,,,,,,,,,,,,,,4.0,,,,,,,,,,,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,6,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0
1,0010k6l0om,"[callback_partner_callback_oauth_response, sea...",63,,,,,2.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,28.0,,,,,,,,,,,,,,,,,,,,,,29.0,,,,,,,,,,,,,,3.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,32.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,18.0,,,,,,,,,,,19.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,4.0,,,,,,,,,,1.0,,,,,,,48.0,,,,,,,,,,,,,,,,,,,,,,,,,27.0,,,,,,,,,,,,,,,,,,,,,,,,,,,5.0,,,,,,,,,,,0,0,0,0,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,8,0,0,0,0,0,0,0,0,0,0,0,0,0,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,13,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,7,0,0,0,0,0,0,0,0,0,0
2,001wyh0pz8,"[create_submit_signup, search_click_view_searc...",90,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,8.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,24.0,,,,,,,,,,,,,,,,,,,,,,,,,15.0,1.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,27.0,,,,,,,,,,,4.0,12.0,2.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,76.0,,,,,,,,,,,,,,,,,5.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,36.0,,,90.0,,,,,,,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,4,1,66,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,1,0,0,0,0,0,0,0
3,0028jgx1x1,"[create_submit_create_user, show_view_user_pro...",31,,,,,,,,,,,,,,,,1.0,,,,,,,,4.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2.0,,,,,,,,,,,16.0,,17.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,3.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,6,0,0,0,0,0,0,0,0,0,0,1,0,9,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,9,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,002qnbzfs5,"[campaigns_nan_nan, click_click_book_it, show_...",789,,,,,,,,,,,,,66.0,,,,,,,,,,,45.0,,,,,,,,,,,,4.0,,,725.0,,,,583.0,114.0,742.0,,,,,775.0,712.0,,,,,,,,,,,,,,,,,,,,,,,317.0,,,,,,7.0,,,,,,,2.0,,,,,,,34.0,,352.0,,,,,,,,,1.0,,,,,,,,,,,,,,,,,,718.0,,,,,,,,,,,,,336.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,5.0,15.0,,,,,,,,,,,749.0,3.0,,,,,,,,,,,11.0,61.0,415.0,,,,,,,,,,,,,,,,,405.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,26.0,,,406.0,,,,,,,,,,,,,,98.0,,,,,,,,,,,,,,,,,,,,,,,,401.0,617.0,,,,,,,,,,37.0,,,,,239.0,,25.0,,,,,,,,,,222.0,,,547.0,,,,,,47.0,,0,0,0,0,0,0,0,0,0,0,0,0,12,0,0,0,0,0,0,0,0,0,0,10,0,0,0,0,0,0,0,0,0,0,0,12,0,0,1,0,0,0,2,20,2,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,72,0,0,0,0,0,0,6,0,0,0,0,0,0,35,0,1,0,0,0,0,0,0,0,0,28,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,9,0,0,0,0,0,0,0,0,0,0,0,0,10,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,8,20,0,0,0,0,0,0,0,0,0,0,1,109,0,0,0,0,0,0,0,0,0,0,29,9,125,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,25,0,0,7,0,0,0,0,0,0,0,0,0,0,0,0,0,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,8,2,0,0,0,0,0,0,0,0,0,42,0,0,0,0,21,0,45,0,0,0,0,0,0,0,0,0,7,0,0,5,0,0,0,0,0,91,0


time: 760 ms (started: 2021-08-18 18:45:03 +00:00)


In [22]:
tmp.drop('action_info', axis=1, inplace=True)

time: 831 ms (started: 2021-08-18 18:45:04 +00:00)


#### Checking counts of missing values per each column

In [23]:
# not_missing = pd.DataFrame(tmp.notna().sum()).reset_index()
# not_missing.columns = ['col', 'counts']
# not_missing['ratio'] = not_missing['counts'].apply(lambda x: round(x / len(users), 4))
# not_missing.shape

time: 483 µs (started: 2021-08-18 18:45:05 +00:00)


In [24]:
# not_missing.head()

time: 2.7 ms (started: 2021-08-18 18:45:05 +00:00)


In [25]:
# threshold = 0.00005
# mask = not_missing.ratio > threshold
# mask.sum()

time: 2.55 ms (started: 2021-08-18 18:45:05 +00:00)


#### Dropping all columns that are lower than the above threshold
Decided not to do that in this iteration

In [26]:
# keep_columns = not_missing[mask].col.tolist()
# len(keep_columns)

time: 309 µs (started: 2021-08-18 18:45:05 +00:00)


In [27]:
# keep_columns[0], keep_columns[-1]

time: 388 µs (started: 2021-08-18 18:45:05 +00:00)


### 2.1.c Saving features

In [28]:
# features1 = tmp[keep_columns].copy(deep=True)
features1 = tmp.copy(deep=True)
features1.shape

(135483, 674)

time: 248 ms (started: 2021-08-18 18:45:05 +00:00)


### 2.1.1 Count of each action_type normalized

In [29]:
col = 'action_type'
col_values = list(sessions[col].unique())
len(col_values)

11

time: 738 ms (started: 2021-08-18 18:45:05 +00:00)


In [30]:
tmp = sessions[['user_id', col]].groupby('user_id', as_index=False).agg(list)
tmp.shape

(135483, 2)

time: 5 s (started: 2021-08-18 18:45:06 +00:00)


In [31]:
tmp['size'] = tmp[col].apply(lambda x: len(x))

time: 117 ms (started: 2021-08-18 18:45:11 +00:00)


In [32]:
tmp['counts'] = tmp[col].apply(lambda x: dict(Counter(x)))

time: 887 ms (started: 2021-08-18 18:45:11 +00:00)


In [33]:
tmp.head()

Unnamed: 0,user_id,action_type,size,counts
0,00023iyk9l,"[partner_callback, booking_request, data, None...",40,"{'partner_callback': 1, 'booking_request': 1, ..."
1,0010k6l0om,"[partner_callback, click, None, view, None, No...",63,"{'partner_callback': 1, 'click': 16, None: 15,..."
2,001wyh0pz8,"[submit, click, click, -unknown-, -unknown-, c...",90,"{'submit': 3, 'click': 66, '-unknown-': 6, 'vi..."
3,0028jgx1x1,"[submit, view, view, data, view, data, view, d...",31,"{'submit': 1, 'view': 15, 'data': 5, '-unknown..."
4,002qnbzfs5,"[None, click, view, data, click, view, data, v...",789,"{None: 77, 'click': 140, 'view': 216, 'data': ..."


time: 16.3 ms (started: 2021-08-18 18:45:12 +00:00)


In [34]:
tmp = pd.concat([tmp, pd.json_normalize(tmp['counts'])], axis=1)

time: 1.34 s (started: 2021-08-18 18:45:12 +00:00)


In [35]:
tmp.drop(['action_type', 'counts'], axis=1, inplace=True)

time: 59.3 ms (started: 2021-08-18 18:45:13 +00:00)


In [36]:
tmp.head()

Unnamed: 0,user_id,size,partner_callback,booking_request,data,NaN,view,click,message_post,-unknown-,submit,modify,booking_response
0,00023iyk9l,40,1.0,1.0,9.0,3.0,21.0,4.0,1.0,,,,
1,0010k6l0om,63,1.0,,9.0,15.0,17.0,16.0,,5.0,,,
2,001wyh0pz8,90,,,2.0,5.0,8.0,66.0,,6.0,3.0,,
3,0028jgx1x1,31,,,5.0,,15.0,9.0,,1.0,1.0,,
4,002qnbzfs5,789,,1.0,140.0,77.0,216.0,140.0,16.0,184.0,15.0,,


time: 37.1 ms (started: 2021-08-18 18:45:13 +00:00)


In [37]:
cols = list(tmp)[2:]
cols = [f'at_{e}' for e in cols]

time: 1.14 ms (started: 2021-08-18 18:45:14 +00:00)


In [38]:
tmp.columns = ['user_id', 'size'] + cols

time: 668 µs (started: 2021-08-18 18:45:14 +00:00)


In [39]:
for e in cols:
    tmp[e] = tmp[e] / tmp['size']

time: 27.1 ms (started: 2021-08-18 18:45:14 +00:00)


In [40]:
tmp.head()

Unnamed: 0,user_id,size,at_partner_callback,at_booking_request,at_data,at_None,at_view,at_click,at_message_post,at_-unknown-,at_submit,at_modify,at_booking_response
0,00023iyk9l,40,0.03,0.03,0.23,0.07,0.53,0.1,0.03,,,,
1,0010k6l0om,63,0.02,,0.14,0.24,0.27,0.25,,0.08,,,
2,001wyh0pz8,90,,,0.02,0.06,0.09,0.73,,0.07,0.03,,
3,0028jgx1x1,31,,,0.16,,0.48,0.29,,0.03,0.03,,
4,002qnbzfs5,789,,0.0,0.18,0.1,0.27,0.18,0.02,0.23,0.02,,


time: 38.3 ms (started: 2021-08-18 18:45:14 +00:00)


In [41]:
tmp.drop(['size'], axis=1, inplace=True)

time: 14.8 ms (started: 2021-08-18 18:45:14 +00:00)


In [42]:
tmp.fillna(0, inplace=True)

time: 88.6 ms (started: 2021-08-18 18:45:14 +00:00)


In [43]:
tmp.head()

Unnamed: 0,user_id,at_partner_callback,at_booking_request,at_data,at_None,at_view,at_click,at_message_post,at_-unknown-,at_submit,at_modify,at_booking_response
0,00023iyk9l,0.03,0.03,0.23,0.07,0.53,0.1,0.03,0.0,0.0,0.0,0.0
1,0010k6l0om,0.02,0.0,0.14,0.24,0.27,0.25,0.0,0.08,0.0,0.0,0.0
2,001wyh0pz8,0.0,0.0,0.02,0.06,0.09,0.73,0.0,0.07,0.03,0.0,0.0
3,0028jgx1x1,0.0,0.0,0.16,0.0,0.48,0.29,0.0,0.03,0.03,0.0,0.0
4,002qnbzfs5,0.0,0.0,0.18,0.1,0.27,0.18,0.02,0.23,0.02,0.0,0.0


time: 21.3 ms (started: 2021-08-18 18:45:14 +00:00)


In [44]:
features1a = tmp.copy(deep=True)
features1a.shape

(135483, 12)

time: 16.1 ms (started: 2021-08-18 18:45:14 +00:00)


### 2.2 Generating features based on seconds elapsed and deltas between info

In [45]:
tmp = sessions[['user_id', 'secs_elapsed']].groupby('user_id', as_index=False).agg(list)
tmp.shape

(135483, 2)

time: 5.4 s (started: 2021-08-18 18:45:14 +00:00)


In [46]:
tmp.head()

Unnamed: 0,user_id,secs_elapsed
0,00023iyk9l,"[-1.0, 0.0, 6.0, 45.0, 81.0, 94.0, 112.0, 155...."
1,0010k6l0om,"[-1.0, 3.0, 9.0, 22.0, 26.0, 30.0, 34.0, 36.0,..."
2,001wyh0pz8,"[-1.0, 35.0, 80.0, 91.0, 108.0, 118.0, 142.0, ..."
3,0028jgx1x1,"[-1.0, 3.0, 5.0, 20.0, 28.0, 75.0, 86.0, 91.0,..."
4,002qnbzfs5,"[-1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0,..."


time: 16.5 ms (started: 2021-08-18 18:45:19 +00:00)


In [47]:
tmp.secs_elapsed = tmp.secs_elapsed.apply(lambda x: [0] + x[1:])

time: 931 ms (started: 2021-08-18 18:45:19 +00:00)


In [48]:
tmp['deltas'] = tmp['secs_elapsed'].apply(lambda x: [int(j - i) for i, j in zip(x[:-1], x[1:])])

time: 2.43 s (started: 2021-08-18 18:45:20 +00:00)


In [49]:
tmp.head()

Unnamed: 0,user_id,secs_elapsed,deltas
0,00023iyk9l,"[0, 0.0, 6.0, 45.0, 81.0, 94.0, 112.0, 155.0, ...","[0, 6, 39, 36, 13, 18, 43, 8, 19, 187, 26, 15,..."
1,0010k6l0om,"[0, 3.0, 9.0, 22.0, 26.0, 30.0, 34.0, 36.0, 39...","[3, 6, 13, 4, 4, 4, 2, 3, 6, 1, 3, 4, 30, 8, 1..."
2,001wyh0pz8,"[0, 35.0, 80.0, 91.0, 108.0, 118.0, 142.0, 201...","[35, 45, 11, 17, 10, 24, 59, 1, 54, 100, 2, 9,..."
3,0028jgx1x1,"[0, 3.0, 5.0, 20.0, 28.0, 75.0, 86.0, 91.0, 97...","[3, 2, 15, 8, 47, 11, 5, 6, 2, 15, 13, 19, 76,..."
4,002qnbzfs5,"[0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1....","[0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, ..."


time: 20 ms (started: 2021-08-18 18:45:23 +00:00)


In [50]:
def get_statistics(x):
    if not x:
        return None, None, None, None
    x = np.array(x)
    return x.mean(), x.std(), x.max(), np.median(x)

time: 1.33 ms (started: 2021-08-18 18:45:23 +00:00)


In [51]:
def get_statistics_no_outliers(x):
    if not x:
        return None, None, None, None, None
    x = np.array(x)
    initial_size = len(x)
    x = [e for e in x if e <= x.mean() + x.std()]
    outliers_count = initial_size - len(x)
    x = np.array(x)
    return x.mean(), x.std(), x.max(), np.median(x), outliers_count

time: 2.27 ms (started: 2021-08-18 18:45:23 +00:00)


In [52]:
get_statistics(tmp.iloc[0].deltas)

(14542.692307692309, 69958.77759379552, 437348, 64.0)

time: 13 ms (started: 2021-08-18 18:45:23 +00:00)


In [53]:
get_statistics_no_outliers(tmp.iloc[0].deltas)

(1165.1351351351352, 2758.241156794633, 11029, 58.0, 2)

time: 12.9 ms (started: 2021-08-18 18:45:23 +00:00)


In [54]:
tmp = pd.concat([tmp, tmp.deltas.progress_apply(lambda x: pd.Series(get_statistics(x)))], axis=1)
tmp.shape

  0%|          | 0/135483 [00:00<?, ?it/s]

(135483, 7)

time: 1min 6s (started: 2021-08-18 18:45:23 +00:00)


In [55]:
tmp.columns = ['user_id', 'secs_elapsed', 'deltas', 'deltas_mean', 'deltas_std', 'deltas_max', 'deltas_median']

time: 916 µs (started: 2021-08-18 18:46:30 +00:00)


In [56]:
tmp.head()

Unnamed: 0,user_id,secs_elapsed,deltas,deltas_mean,deltas_std,deltas_max,deltas_median
0,00023iyk9l,"[0, 0.0, 6.0, 45.0, 81.0, 94.0, 112.0, 155.0, ...","[0, 6, 39, 36, 13, 18, 43, 8, 19, 187, 26, 15,...",14542.69,69958.78,437348.0,64.0
1,0010k6l0om,"[0, 3.0, 9.0, 22.0, 26.0, 30.0, 34.0, 36.0, 39...","[3, 6, 13, 4, 4, 4, 2, 3, 6, 1, 3, 4, 30, 8, 1...",2062.87,6002.06,34874.0,38.5
2,001wyh0pz8,"[0, 35.0, 80.0, 91.0, 108.0, 118.0, 142.0, 201...","[35, 45, 11, 17, 10, 24, 59, 1, 54, 100, 2, 9,...",567.96,3206.52,30047.0,33.0
3,0028jgx1x1,"[0, 3.0, 5.0, 20.0, 28.0, 75.0, 86.0, 91.0, 97...","[3, 2, 15, 8, 47, 11, 5, 6, 2, 15, 13, 19, 76,...",2821.2,7456.5,37388.0,170.0
4,002qnbzfs5,"[0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1....","[0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, ...",1799.85,34343.17,946465.0,3.0


time: 39 ms (started: 2021-08-18 18:46:30 +00:00)


In [57]:
tmp = pd.concat([tmp, tmp.deltas.progress_apply(lambda x: pd.Series(get_statistics_no_outliers(x)))], axis=1)
tmp.shape

  0%|          | 0/135483 [00:00<?, ?it/s]

(135483, 12)

time: 16min 18s (started: 2021-08-18 18:46:30 +00:00)


In [58]:
tmp.columns = [
    'user_id', 'secs_elapsed', 'deltas', 'deltas_mean', 'deltas_std', 'deltas_max', 'deltas_median', 
    'deltas_no_mean', 'deltas_no_std', 'deltas_no_max', 'deltas_no_median', 'deltas_no_num_outliers'
]

time: 889 µs (started: 2021-08-18 19:02:47 +00:00)


In [59]:
tmp.head()

Unnamed: 0,user_id,secs_elapsed,deltas,deltas_mean,deltas_std,deltas_max,deltas_median,deltas_no_mean,deltas_no_std,deltas_no_max,deltas_no_median,deltas_no_num_outliers
0,00023iyk9l,"[0, 0.0, 6.0, 45.0, 81.0, 94.0, 112.0, 155.0, ...","[0, 6, 39, 36, 13, 18, 43, 8, 19, 187, 26, 15,...",14542.69,69958.78,437348.0,64.0,1165.14,2758.24,11029.0,58.0,2.0
1,0010k6l0om,"[0, 3.0, 9.0, 22.0, 26.0, 30.0, 34.0, 36.0, 39...","[3, 6, 13, 4, 4, 4, 2, 3, 6, 1, 3, 4, 30, 8, 1...",2062.87,6002.06,34874.0,38.5,692.14,1599.51,8000.0,27.0,4.0
2,001wyh0pz8,"[0, 35.0, 80.0, 91.0, 108.0, 118.0, 142.0, 201...","[35, 45, 11, 17, 10, 24, 59, 1, 54, 100, 2, 9,...",567.96,3206.52,30047.0,33.0,189.75,501.59,3212.0,33.0,2.0
3,0028jgx1x1,"[0, 3.0, 5.0, 20.0, 28.0, 75.0, 86.0, 91.0, 97...","[3, 2, 15, 8, 47, 11, 5, 6, 2, 15, 13, 19, 76,...",2821.2,7456.5,37388.0,170.0,988.0,1871.1,9313.0,105.0,2.0
4,002qnbzfs5,"[0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1....","[0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, ...",1799.85,34343.17,946465.0,3.0,235.77,1650.94,24728.0,3.0,4.0


time: 35.4 ms (started: 2021-08-18 19:02:47 +00:00)


### 2.2.1 Adding stats on seconds elapsed

In [60]:
tmp = pd.concat([tmp, tmp.secs_elapsed.progress_apply(lambda x: pd.Series(get_statistics(x)))], axis=1)
tmp.shape

  0%|          | 0/135483 [00:00<?, ?it/s]

(135483, 16)

time: 1min 14s (started: 2021-08-18 19:02:47 +00:00)


In [61]:
tmp.columns = [
    'user_id', 'secs_elapsed', 'deltas', 'deltas_mean', 'deltas_std', 'deltas_max', 'deltas_median', 
    'deltas_no_mean', 'deltas_no_std', 'deltas_no_max', 'deltas_no_median', 'deltas_no_num_outliers',
    'secs_elapsed_mean', 'secs_elapsed_std', 'secs_elapsed_max', 'secs_elapsed_median',
]

time: 1.41 ms (started: 2021-08-18 19:04:02 +00:00)


In [62]:
tmp.drop(['secs_elapsed', 'deltas'], axis=1, inplace=True)

time: 66.6 ms (started: 2021-08-18 19:04:02 +00:00)


In [63]:
features2 = tmp.copy(deep=True)
features2.shape

(135483, 14)

time: 19.1 ms (started: 2021-08-18 19:04:02 +00:00)


### 2.3 Generating features based on device type info

In [64]:
tmp = sessions[['user_id', 'device_type']].groupby('user_id', as_index=False).agg(set)
tmp.shape

(135483, 2)

time: 4.75 s (started: 2021-08-18 19:04:02 +00:00)


In [65]:
tmp['size'] = tmp.device_type.apply(lambda x: len(x))

time: 115 ms (started: 2021-08-18 19:04:07 +00:00)


In [66]:
tmp.drop('device_type', axis=1, inplace=True)

time: 43.8 ms (started: 2021-08-18 19:04:07 +00:00)


In [67]:
tmp.head()

Unnamed: 0,user_id,size
0,00023iyk9l,2
1,0010k6l0om,1
2,001wyh0pz8,1
3,0028jgx1x1,2
4,002qnbzfs5,2


time: 18.2 ms (started: 2021-08-18 19:04:07 +00:00)


In [68]:
tmp.columns = ['user_id', 'device_count']

time: 902 µs (started: 2021-08-18 19:04:07 +00:00)


In [69]:
tmp.head()

Unnamed: 0,user_id,device_count
0,00023iyk9l,2
1,0010k6l0om,1
2,001wyh0pz8,1
3,0028jgx1x1,2
4,002qnbzfs5,2


time: 16.4 ms (started: 2021-08-18 19:04:07 +00:00)


In [70]:
features3 = tmp.copy(deep=True)
features3.shape

(135483, 2)

time: 7.72 ms (started: 2021-08-18 19:04:07 +00:00)


### 3.1 Features based on Users table

In [71]:
users['dow_registered'] = users.date_account_created.dt.weekday
users['day_registered'] = users.date_account_created.dt.day
users['month_registered'] = users.date_account_created.dt.month
users['year_registered'] = users.date_account_created.dt.year

time: 166 ms (started: 2021-08-18 19:04:08 +00:00)


In [72]:
users['hr_registered'] = users.timestamp_first_active.dt.hour

time: 36.9 ms (started: 2021-08-18 19:04:08 +00:00)


In [73]:
users.age.max()

2014.0

time: 17.4 ms (started: 2021-08-18 19:04:08 +00:00)


In [74]:
users.head()

Unnamed: 0,id,date_account_created,timestamp_first_active,gender,age,signup_method,signup_flow,language,affiliate_channel,affiliate_provider,first_affiliate_tracked,signup_app,first_device_type,first_browser,country_destination,train_flag,dow_registered,day_registered,month_registered,year_registered,hr_registered
0,gxn3p5htnn,2010-06-28,2009-03-19 04:32:55,-unknown-,,facebook,0,en,direct,direct,untracked,Web,Mac Desktop,Chrome,NDF,1,0,28,6,2010,4
1,820tgsjxq7,2011-05-25,2009-05-23 17:48:09,MALE,38.0,facebook,0,en,seo,google,untracked,Web,Mac Desktop,Chrome,NDF,1,2,25,5,2011,17
2,4ft3gnwmtx,2010-09-28,2009-06-09 23:12:47,FEMALE,56.0,basic,3,en,direct,direct,untracked,Web,Windows Desktop,IE,US,1,1,28,9,2010,23
3,bjjt8pjhuk,2011-12-05,2009-10-31 06:01:29,FEMALE,42.0,facebook,0,en,direct,direct,untracked,Web,Mac Desktop,Firefox,other,1,0,5,12,2011,6
4,87mebub9p4,2010-09-14,2009-12-08 06:11:05,-unknown-,41.0,basic,0,en,direct,direct,untracked,Web,Mac Desktop,Chrome,US,1,1,14,9,2010,6


time: 43.3 ms (started: 2021-08-18 19:04:08 +00:00)


In [75]:
mask = (users.age > 1000) & (users.age < 2000)
users.loc[mask, 'age'] = 2015 - users.loc[mask, 'age']
mask.sum()

71

time: 20.3 ms (started: 2021-08-18 19:04:08 +00:00)


In [76]:
users.loc[(users['age'] > 105) | (users['age'] < 14), 'age'] = -1
users['age'].fillna(-1, inplace=True)

time: 18.2 ms (started: 2021-08-18 19:04:08 +00:00)


In [77]:
bins = [-1, 20, 25, 30, 40, 50, 60, 75, 85, 105]
users['age_group'] = np.digitize(users['age'], bins, right=True)

time: 25.9 ms (started: 2021-08-18 19:04:08 +00:00)


In [78]:
users.sample(5)

Unnamed: 0,id,date_account_created,timestamp_first_active,gender,age,signup_method,signup_flow,language,affiliate_channel,affiliate_provider,first_affiliate_tracked,signup_app,first_device_type,first_browser,country_destination,train_flag,dow_registered,day_registered,month_registered,year_registered,hr_registered,age_group
71413,vwb4kjvpo3,2013-04-17,2013-04-17 04:34:54,-unknown-,-1.0,basic,0,en,other,other,linked,Web,Mac Desktop,Safari,NDF,1,2,17,4,2013,4,0
93932,d1ojm9m718,2013-07-25,2013-07-25 16:44:18,FEMALE,29.0,basic,25,en,direct,direct,untracked,iOS,iPhone,-unknown-,other,1,3,25,7,2013,16,3
110353,m9gb3wtmgs,2013-09-23,2013-09-23 20:13:26,-unknown-,-1.0,basic,0,en,direct,direct,untracked,Web,Windows Desktop,Firefox,NDF,1,0,23,9,2013,20,0
116816,vw754ou9v4,2013-10-14,2013-10-14 21:52:11,-unknown-,31.0,basic,0,en,direct,direct,untracked,Web,Mac Desktop,Firefox,US,1,0,14,10,2013,21,4
118992,64j2hbzi4j,2013-10-23,2013-10-23 17:54:46,MALE,38.0,facebook,0,en,direct,direct,untracked,Web,Mac Desktop,Safari,NDF,1,2,23,10,2013,17,4


time: 51.5 ms (started: 2021-08-18 19:04:08 +00:00)


In [79]:
users.shape

(275547, 22)

time: 3.39 ms (started: 2021-08-18 19:04:08 +00:00)


### 3.1.1. Dropping redundand columns

In [80]:
users.drop(['date_account_created', 'timestamp_first_active'], axis=1, inplace=True)

time: 56.7 ms (started: 2021-08-18 19:04:08 +00:00)


In [81]:
users.columns = ['user_id'] + list(users)[1:]

time: 3.49 ms (started: 2021-08-18 19:04:08 +00:00)


In [82]:
users.head()

Unnamed: 0,user_id,gender,age,signup_method,signup_flow,language,affiliate_channel,affiliate_provider,first_affiliate_tracked,signup_app,first_device_type,first_browser,country_destination,train_flag,dow_registered,day_registered,month_registered,year_registered,hr_registered,age_group
0,gxn3p5htnn,-unknown-,-1.0,facebook,0,en,direct,direct,untracked,Web,Mac Desktop,Chrome,NDF,1,0,28,6,2010,4,0
1,820tgsjxq7,MALE,38.0,facebook,0,en,seo,google,untracked,Web,Mac Desktop,Chrome,NDF,1,2,25,5,2011,17,4
2,4ft3gnwmtx,FEMALE,56.0,basic,3,en,direct,direct,untracked,Web,Windows Desktop,IE,US,1,1,28,9,2010,23,6
3,bjjt8pjhuk,FEMALE,42.0,facebook,0,en,direct,direct,untracked,Web,Mac Desktop,Firefox,other,1,0,5,12,2011,6,5
4,87mebub9p4,-unknown-,41.0,basic,0,en,direct,direct,untracked,Web,Mac Desktop,Chrome,US,1,1,14,9,2010,6,5


time: 77.7 ms (started: 2021-08-18 19:04:08 +00:00)


In [83]:
users.shape

(275547, 20)

time: 7.73 ms (started: 2021-08-18 19:04:08 +00:00)


#### 4. Assembling all features into one dataset

In [84]:
df = users.merge(features1, on='user_id', how='left')
df.shape

(275547, 693)

time: 3.17 s (started: 2021-08-18 19:04:08 +00:00)


In [85]:
df = df.merge(features1a, on='user_id', how='left')
df.shape

(275547, 704)

time: 943 ms (started: 2021-08-18 19:04:11 +00:00)


In [86]:
df = df.merge(features2, on='user_id', how='left')
df.shape

(275547, 717)

time: 1.11 s (started: 2021-08-18 19:04:12 +00:00)


In [87]:
df = df.merge(features3, on='user_id', how='left')
df.shape

(275547, 718)

time: 966 ms (started: 2021-08-18 19:04:13 +00:00)


In [88]:
df.to_parquet('../data/processed/features.parquet')

time: 5.25 s (started: 2021-08-18 19:04:14 +00:00)


### 4.1 Splitting into train and test features

In [89]:
train_features = df[df.train_flag == 1]
train_features.shape

(213451, 718)

time: 4.15 s (started: 2021-08-18 19:04:20 +00:00)


In [90]:
train_features.to_parquet('../data/processed/train_features.parquet')

time: 4.7 s (started: 2021-08-18 19:04:24 +00:00)


In [91]:
test_features = df[df.train_flag == 0]
test_features.shape

(62096, 718)

time: 239 ms (started: 2021-08-18 19:04:29 +00:00)


In [92]:
test_features.to_parquet('../data/processed/test_features.parquet')

time: 2.93 s (started: 2021-08-18 19:04:29 +00:00)
