# W207, Final Project
Spring, 2018

Team:  Cameron Kennedy, Gaurav Khanna, Aaron Olson

## Data Preparation / Feature Extraction Notebook
Python Notebook 1 of 2

This notebook loads and pre-processes the data.  The other notebook (2 of 2) runs our ML models.

# Introduction
This analysis seeks to predict user churn in a music sharing service.

We will write a more complete description and analysis for submission of our final project.

We worked on 2 major data tables/frames (User logs & Transactions) independently for preperation and then brought them together before analysis

In [1]:
#Import Required Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

Loading the data indexing with the primary key (MSNO: String like/Object, represents the user)

In [2]:
#Load the data
members = pd.read_csv('members_filtered.csv')
transactions = pd.read_csv('transactions_filtered.csv')
user_logs = pd.read_csv('user_logs_filtered.csv')
labels = pd.read_csv('labels_filtered.csv')

#Set indices
members.set_index('msno', inplace = True)
labels.set_index('msno', inplace = True)

#user_logs.head()

In [3]:
user_logs.head()

Unnamed: 0,msno,date,num_25,num_50,num_75,num_985,num_100,num_unq,total_secs
0,MVODUEUlSocm1sXa+zVGpJazPrRFiD4IzEQk0QCdg4U=,20170217,37,2,2,3,30,66,9022.818
1,o3Dg7baW8dXq7Jq7NzlVrWG4mZNVvqp62oWBDO/ybeE=,20160209,36,5,2,3,48,71,13895.453
2,6ERcO7aqAKvrQ2CAvah79dVC7tJVZSjNti1MBfpNVW4=,20151210,26,9,3,0,51,54,13919.805
3,Xt9VAHNtHuST21tkcZSnGKjwv8vF8/COnsf6z28+fKk=,20161025,22,8,4,2,49,75,15147.842
4,zSgTJqoosTiFF7ZZi1DPTHgxLbnd99IgOEsTIDCcZHc=,20160904,26,3,1,0,39,60,10558.829


Getting some info on the userful data

In [4]:
print('Transactions: \n')
transactions.info()
print('User Logs: \n')
user_logs.info()

Transactions: 

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1353459 entries, 0 to 1353458
Data columns (total 9 columns):
msno                      1353459 non-null object
payment_method_id         1353459 non-null int64
payment_plan_days         1353459 non-null int64
plan_list_price           1353459 non-null int64
actual_amount_paid        1353459 non-null int64
is_auto_renew             1353459 non-null int64
transaction_date          1353459 non-null int64
membership_expire_date    1353459 non-null int64
is_cancel                 1353459 non-null int64
dtypes: int64(8), object(1)
memory usage: 92.9+ MB
User Logs: 

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19710631 entries, 0 to 19710630
Data columns (total 9 columns):
msno          object
date          int64
num_25        int64
num_50        int64
num_75        int64
num_985       int64
num_100       int64
num_unq       int64
total_secs    float64
dtypes: float64(1), int64(7), object(1)
memory usage: 1.3+ GB


Helper routine to format the date for visualization. Not conducive for analysis though

In [5]:
def pd_to_date(df_col):
    df_col = pd.to_datetime(df_col, format = '%Y%m%d')
    return df_col

#Convert to date
user_logs['date'] = pd_to_date(user_logs['date'])
#user_logs.head()

# User Logs Data: Preparation and Feature Extraction

In [6]:
#Create our groupby user object 
user_logs_gb = user_logs.groupby(['msno'], sort=False)

The list of features 

* User most recent date (max date)
* User first date (min date)
* How long they've been listening:  Min vs. max date by user
* Matrix of all the following (cartesian product)
    * Total X=(seconds, 100, 985, 75, 50, 25, unique), avg per day of X, maybe median per day of X
    * Last day, last 7 days, last 30 days, last 90, 180, 365, total (note last day is relative to user)
 

### Next Cell Is Slow!
It might take ~10 min?  15 min?

Consider ...
* Breaking it up into pieces.
* Timing it

In [7]:
#This cell is slow

#Append max date to every row in main table
user_logs['max_date'] = user_logs_gb['date'].transform('max')
user_logs['days_before_max_date'] = (user_logs['max_date'] - user_logs['date']).apply(lambda x: x.days)
    #The .apply(lambda...  just converts it from datetime to an integer, for easier comparisons later.

#Generate user's first date, last date, and tenure
#Also, the user_logs_features table will be the primary table to return from the transactions table
user_logs_features = (user_logs_gb
    .agg({'date':['max', 'min', lambda x: (max(x) - min(x)).days]})  #.days converts to int
    .rename(columns={'max': 'max_date', 'min': 'min_date','<lambda>':'listening_tenure'})
                      )
#Add a 3rd level, used for joining data later
user_logs_features = pd.concat([user_logs_features], axis=1, keys=['date_features'])

In [8]:
user_logs_features.head()

Unnamed: 0_level_0,date_features,date_features,date_features
Unnamed: 0_level_1,date,date,date
Unnamed: 0_level_2,max_date,min_date,listening_tenure
msno,Unnamed: 1_level_3,Unnamed: 2_level_3,Unnamed: 3_level_3
MVODUEUlSocm1sXa+zVGpJazPrRFiD4IzEQk0QCdg4U=,2017-02-27,2015-07-11,597
o3Dg7baW8dXq7Jq7NzlVrWG4mZNVvqp62oWBDO/ybeE=,2017-02-07,2015-03-10,700
6ERcO7aqAKvrQ2CAvah79dVC7tJVZSjNti1MBfpNVW4=,2017-02-17,2015-01-01,778
Xt9VAHNtHuST21tkcZSnGKjwv8vF8/COnsf6z28+fKk=,2017-02-28,2016-09-08,173
zSgTJqoosTiFF7ZZi1DPTHgxLbnd99IgOEsTIDCcZHc=,2017-02-13,2015-01-01,774


In [9]:
#Create Features:
    # Total X=(seconds, 100, 985, 75, 50, 25, unique), avg per day of X, maybe median per day of X
    # Last day, last 7 days, last 30 days, last 90, 180, 365, total (note last day is relative to user)
    
for num_days in [7, 14, 31, 90, 180, 365, 999]:
    #Create groupby object for items with x days
    ul_gb_xdays = (user_logs.loc[(user_logs['days_before_max_date'] < num_days)]
                   .groupby(['msno'], sort=False))

    #Generate sum and mean (and count, once) for all the user logs stats
    past_xdays_by_user = (ul_gb_xdays
        .agg({'num_unq':['sum', 'mean', 'count'],
              'total_secs':['sum', 'mean'],
              'num_25':['sum', 'mean'],
              'num_50':['sum', 'mean'],
              'num_75':['sum', 'mean'],
              'num_985':['sum', 'mean'],
              'num_100':['sum', 'mean'],
             })
                      )
    #Append level header
    past_xdays_by_user = pd.concat([past_xdays_by_user], axis=1, keys=['within_days_' + str(num_days)])

    #Join (append) to user_logs_features table
    user_logs_features = user_logs_features.join(past_xdays_by_user, how='inner')

In [10]:
#Next, let's look at changes in last 7 days vs. last 30 days, and last 30 days vs. last 180 days.

#Also, need to think about users with < x days tenure.

In [11]:
#Join members and labels files
features_all = None
features_all = members.join(labels, how='inner')
features_all = features_all.join(user_logs_features, how='inner')

#Note, the warning is okay, and actually helps us by flattening our column headers.

# Test
features_all.head()



Unnamed: 0_level_0,city,bd,gender,registered_via,registration_init_time,is_churn,"(date_features, date, max_date)","(date_features, date, min_date)","(date_features, date, listening_tenure)","(within_days_7, num_unq, sum)",...,"(within_days_999, num_25, sum)","(within_days_999, num_25, mean)","(within_days_999, num_50, sum)","(within_days_999, num_50, mean)","(within_days_999, num_75, sum)","(within_days_999, num_75, mean)","(within_days_999, num_985, sum)","(within_days_999, num_985, mean)","(within_days_999, num_100, sum)","(within_days_999, num_100, mean)"
msno,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
mKfgXQAmVeSKzN4rXW37qz0HbGCuYBspTBM3ONXZudg=,1,0,,13,20170120,0,2017-02-24,2017-01-20,35,1,...,41,6.833333,1,0.166667,1,0.166667,1,0.166667,9,1.5
AFcKYsrudzim8OFa+fL/c9g5gZabAbhaJnoM0qmlJfo=,1,0,,13,20160907,0,2017-02-27,2016-09-07,173,228,...,5289,34.344156,838,5.441558,410,2.662338,323,2.097403,4204,27.298701
qk4mEZUYZq+4sQE7bzRYKc5Pvj+Xc7Wmu25DrCzltEU=,1,0,,13,20160902,0,2017-02-26,2016-09-02,177,243,...,423,3.880734,72,0.66055,58,0.53211,58,0.53211,4308,39.522936
G2UGNLph2J6euGmZ7WIa1+Kc+dPZBJI0HbLPu5YtrZw=,1,0,,13,20161028,0,2017-02-28,2016-10-28,123,121,...,247,3.57971,115,1.666667,74,1.072464,76,1.101449,1272,18.434783
EqSHZpMj5uddJvv2gXcHvuOKFOdS5NN6RalHfzEhhaI=,1,0,,13,20161004,0,2016-10-26,2016-10-04,22,14,...,76,7.6,36,3.6,22,2.2,7,0.7,21,2.1


# Transaction Data: Preparation and Feature Extraction

Grouping by the primary key (MSNO)

In [12]:
# Grouping by the member (msno)
transactions_gb = transactions.sort_values(["transaction_date"]).groupby(['msno'])

# How many groups i.e. members i.e. msno's. We're good if this is the same number as the members table
print('%d Groups/msnos' %(len(transactions_gb.groups)))
print('%d Features' %(len(transactions.columns)))

99825 Groups/msnos
9 Features


The list of features 
    * Simple featuers from the latest transaction
        * Plan no of days
        * plan total amount paid
        * plan list price
        * Is_auto_renew
        * is_cancel
    * Synthetic features from the latest transaction
        * Plan actual amount paid/day 
    * Aggregate values
        * Total number of plan days
        * Total of all the amounts paid for the plan
    * Comparing transactions
        * Plan day difference among the latest and previous transaction
        * Amount paid/day difference among the latest and previous transaction
    ....


The aggregate features:
    * Total number of plan days
    * Total amount paid among all transactions

In [15]:
# Features: Total_plan_days, Total_amount_paid
transactions_features = (transactions_gb
    .agg({'payment_plan_days':'sum', 'actual_amount_paid':'sum' })
    .rename(columns={'payment_plan_days': 'total_plan_days', 'actual_amount_paid': 'total_amount_paid',})
          )
# Index by msno
# transactions_features.set_index('msno', inplace = True)
print('%d Entries in the DF: ' %(len(transactions_features)))
print('%d Features' %(len(transactions_features.columns)))

99825 Entries in the DF: 
2 Features


In [16]:
# Test
transactions_features.head()

Unnamed: 0_level_0,total_plan_days,total_amount_paid
msno,Unnamed: 1_level_1,Unnamed: 2_level_1
+++l/EXNMLTijfLBa8p2TUVVVp2aFGSuUI/h7mLmthw=,543,2831
++5nv+2nsvrWM7dOT+ZiWJ5uTZOzQS0NEvqu3jidTjU=,90,297
++7IULiyKbNc8jllqhRuyKZjX1J4mPF4tsudFCJfv4k=,513,2682
++Ck01c3EF07Ejek2jfXlKut+sEfg+0ry+A5uWeL9vY=,270,891
++FPL1dXZBXC3Cf6gE0HQiIHg1Pd+DBdK7w52xcUmX0=,457,2235


Amount/day for the entire tenure

In [17]:
# Plan actual amount paid/day for all the transactions by a user
# Adding the collumn amount_paid_per_day
transactions_features['amount_paid_per_day'] = (transactions_features['total_amount_paid']
                                                /transactions_features['total_plan_days'])

print('%d Entries in the DF: ' %(len(transactions_features)))
print('%d Features' %(len(transactions_features.columns)))

99825 Entries in the DF: 
3 Features


In [18]:
# Test
transactions_features.head()

Unnamed: 0_level_0,total_plan_days,total_amount_paid,amount_paid_per_day
msno,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
+++l/EXNMLTijfLBa8p2TUVVVp2aFGSuUI/h7mLmthw=,543,2831,5.213628
++5nv+2nsvrWM7dOT+ZiWJ5uTZOzQS0NEvqu3jidTjU=,90,297,3.3
++7IULiyKbNc8jllqhRuyKZjX1J4mPF4tsudFCJfv4k=,513,2682,5.22807
++Ck01c3EF07Ejek2jfXlKut+sEfg+0ry+A5uWeL9vY=,270,891,3.3
++FPL1dXZBXC3Cf6gE0HQiIHg1Pd+DBdK7w52xcUmX0=,457,2235,4.890591


Latest transaction features:
We'll just pick from the bottom of the ordered (by date) rows in groups

In [19]:
# Features: latest transaction, renaming the collumns
# V1- Fixed the name for plan_list_price collumn (now called latest_plan_list_price)

latest_transaction= transactions_gb.tail([1]).rename(columns={'payment_method_id': 'latest_payment_method_id',
                                                                  'payment_plan_days': 'latest_plan_days',
                                                                  'plan_list_price': 'latest_plan_list_price',
                                                                  'actual_amount_paid': 'latest_amount_paid',
                                                                  'is_auto_renew': 'latest_auto_renew', 
                                                                  'transaction_date': 'latest_transaction_date',
                                                                  'membership_expire_date': 'latest_expire_date',
                                                                  'is_cancel': 'latest_is_cancel' })

# Index by msno
latest_transaction.set_index('msno', inplace = True)

print('%d Entries in the DF: ' %(len(latest_transaction)))
print('%d Features' %(len(latest_transaction.columns)))

99825 Entries in the DF: 
8 Features


In [20]:
# Test
latest_transaction.head()


Unnamed: 0_level_0,latest_payment_method_id,latest_plan_days,latest_plan_list_price,latest_amount_paid,latest_auto_renew,latest_transaction_date,latest_expire_date,latest_is_cancel
msno,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
z1Lm/BlRQraiaWJ7RaQWe0+l0Z40ACj7W+zk29FiaS4=,38,30,149,149,0,20150102,20150503,0
IwE/pih8PuqrY/rsnoZ/4TazDliyH9S8VWNc2/d7mJg=,38,30,149,149,0,20150102,20150702,0
ea9rY0uEPY0ImD2QVbYFb+z3zi5wniKWMUM1V8os7OY=,32,410,1788,1788,0,20150104,20170213,0
plhzwjmNJp0HW04NidfVa35JE216RaFYpSeUCwT11zQ=,38,30,149,149,0,20150120,20170103,0
PbSQ2KxR4gRnzjsRd8Up75qMYb70iuMwGk10/jPRljk=,38,360,1200,1200,0,20150123,20170212,0


In [21]:
# Plan actual amount paid/day for the latest transaction
# Adding the collumn amount_paid_per_day

latest_transaction['latest_amount_paid_per_day'] = (latest_transaction['latest_amount_paid']
                                                /latest_transaction['latest_plan_days'])

print('%d Entries in the DF: ' %(len(latest_transaction)))
print('%d Features' %(len(latest_transaction.columns)))

99825 Entries in the DF: 
9 Features


In [22]:
# Test
latest_transaction.head()

Unnamed: 0_level_0,latest_payment_method_id,latest_plan_days,latest_plan_list_price,latest_amount_paid,latest_auto_renew,latest_transaction_date,latest_expire_date,latest_is_cancel,latest_amount_paid_per_day
msno,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
z1Lm/BlRQraiaWJ7RaQWe0+l0Z40ACj7W+zk29FiaS4=,38,30,149,149,0,20150102,20150503,0,4.966667
IwE/pih8PuqrY/rsnoZ/4TazDliyH9S8VWNc2/d7mJg=,38,30,149,149,0,20150102,20150702,0,4.966667
ea9rY0uEPY0ImD2QVbYFb+z3zi5wniKWMUM1V8os7OY=,32,410,1788,1788,0,20150104,20170213,0,4.360976
plhzwjmNJp0HW04NidfVa35JE216RaFYpSeUCwT11zQ=,38,30,149,149,0,20150120,20170103,0,4.966667
PbSQ2KxR4gRnzjsRd8Up75qMYb70iuMwGk10/jPRljk=,38,360,1200,1200,0,20150123,20170212,0,3.333333


Comparing transactions
* Plan duration difference among the last 2 transactons
* Cost difference among the last 2 transactions

In [24]:
# TODO Differences among latest and previous transaction

# Getting the 2 latest transactions and grouping by msno again
latest_transaction2_gb = transactions_gb.tail([2]).groupby(['msno'])

# Getting the latest but one transaction
latest2_transaction = latest_transaction2_gb.head([1])

# Index by msno
latest2_transaction.set_index('msno', inplace = True)

# Amount paid per day for the 2nd latest transaction
latest2_transaction['latest2_amount_paid_per_day'] = (latest2_transaction['actual_amount_paid']
                                                /latest2_transaction['payment_plan_days'])

# Difference in the renewal length among the latest 2 transactions
transactions_features['diff_renewal_duration'] = (latest_transaction['latest_plan_days']
                                                - latest2_transaction['payment_plan_days'])

# Different in plan cost among the latest 2 transactions
transactions_features['diff_plan_amount_paid_per_day'] = (latest_transaction['latest_amount_paid_per_day'] 
                                                          - latest2_transaction['latest2_amount_paid_per_day'])

print('%d Entries in the DF: ' %(len(transactions_features)))
print('%d Features' %(len(transactions_features.columns)))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


99825 Entries in the DF: 
5 Features


In [25]:
# Test
transactions_features.head()

Unnamed: 0_level_0,total_plan_days,total_amount_paid,amount_paid_per_day,diff_renewal_duration,diff_plan_amount_paid_per_day
msno,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
+++l/EXNMLTijfLBa8p2TUVVVp2aFGSuUI/h7mLmthw=,543,2831,5.213628,0,0.0
++5nv+2nsvrWM7dOT+ZiWJ5uTZOzQS0NEvqu3jidTjU=,90,297,3.3,0,0.0
++7IULiyKbNc8jllqhRuyKZjX1J4mPF4tsudFCJfv4k=,513,2682,5.22807,0,0.0
++Ck01c3EF07Ejek2jfXlKut+sEfg+0ry+A5uWeL9vY=,270,891,3.3,0,0.0
++FPL1dXZBXC3Cf6gE0HQiIHg1Pd+DBdK7w52xcUmX0=,457,2235,4.890591,23,4.966667


Getting all the transaction features in a single DF

In [26]:
# Get all transaction features in a single DF
transactions_features = transactions_features.join(latest_transaction, how = 'inner')

# Test
print('%d Entries in the DF: ' %(len(transactions_features)))
print('%d Features' %(len(transactions_features.columns)))
transactions_features.head()

99825 Entries in the DF: 
14 Features


Unnamed: 0_level_0,total_plan_days,total_amount_paid,amount_paid_per_day,diff_renewal_duration,diff_plan_amount_paid_per_day,latest_payment_method_id,latest_plan_days,latest_plan_list_price,latest_amount_paid,latest_auto_renew,latest_transaction_date,latest_expire_date,latest_is_cancel,latest_amount_paid_per_day
msno,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
+++l/EXNMLTijfLBa8p2TUVVVp2aFGSuUI/h7mLmthw=,543,2831,5.213628,0,0.0,39,30,149,149,1,20170131,20170319,0,4.966667
++5nv+2nsvrWM7dOT+ZiWJ5uTZOzQS0NEvqu3jidTjU=,90,297,3.3,0,0.0,41,30,99,99,1,20170201,20170301,0,3.3
++7IULiyKbNc8jllqhRuyKZjX1J4mPF4tsudFCJfv4k=,513,2682,5.22807,0,0.0,37,30,149,149,1,20170201,20170301,0,4.966667
++Ck01c3EF07Ejek2jfXlKut+sEfg+0ry+A5uWeL9vY=,270,891,3.3,0,0.0,41,30,99,99,1,20170214,20170314,0,3.3
++FPL1dXZBXC3Cf6gE0HQiIHg1Pd+DBdK7w52xcUmX0=,457,2235,4.890591,23,4.966667,41,30,149,149,1,20160225,20160225,1,4.966667


# Bringing all the features in a single Data Frame, file

Members and Labels were joined into the User logs DF

The code below joins the Transaction features into the primary features dataframe

In [27]:
# Joining feature DF's
features_all = features_all.join(transactions_features, how='inner')

In [28]:
# Test
print('%d Entries in the DF: ' %(len(features_all)))
print('%d Features' %(len(features_all.columns)))
features_all.head()

88544 Entries in the DF: 
128 Features


Unnamed: 0_level_0,city,bd,gender,registered_via,registration_init_time,is_churn,"(date_features, date, max_date)","(date_features, date, min_date)","(date_features, date, listening_tenure)","(within_days_7, num_unq, sum)",...,diff_plan_amount_paid_per_day,latest_payment_method_id,latest_plan_days,latest_plan_list_price,latest_amount_paid,latest_auto_renew,latest_transaction_date,latest_expire_date,latest_is_cancel,latest_amount_paid_per_day
msno,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
mKfgXQAmVeSKzN4rXW37qz0HbGCuYBspTBM3ONXZudg=,1,0,,13,20170120,0,2017-02-24,2017-01-20,35,1,...,0.0,30,30,129,129,1,20170220,20170319,0,4.3
AFcKYsrudzim8OFa+fL/c9g5gZabAbhaJnoM0qmlJfo=,1,0,,13,20160907,0,2017-02-27,2016-09-07,173,228,...,0.0,30,30,129,129,1,20170207,20170306,0,4.3
qk4mEZUYZq+4sQE7bzRYKc5Pvj+Xc7Wmu25DrCzltEU=,1,0,,13,20160902,0,2017-02-26,2016-09-02,177,243,...,0.0,30,30,129,129,1,20170202,20170301,0,4.3
G2UGNLph2J6euGmZ7WIa1+Kc+dPZBJI0HbLPu5YtrZw=,1,0,,13,20161028,0,2017-02-28,2016-10-28,123,121,...,0.0,30,30,149,149,1,20170228,20170327,0,4.966667
EqSHZpMj5uddJvv2gXcHvuOKFOdS5NN6RalHfzEhhaI=,1,0,,13,20161004,0,2016-10-26,2016-10-04,22,14,...,0.0,30,30,129,129,1,20170204,20170303,0,4.3


In [30]:
#Write all features to pkl
features_all.to_pickle('features_all.pkl')

#Writing the features to a .pkl file allows us to use the 2nd ipynb file
#without having to run all the code above