# Starbucks Capstone Challenge

### Introduction

This data set contains simulated data that mimics customer behavior on the Starbucks rewards mobile app. Once every few days, Starbucks sends out an offer to users of the mobile app. An offer can be merely an advertisement for a drink or an actual offer such as a discount or BOGO (buy one get one free). Some users might not receive any offer during certain weeks. 

Not all users receive the same offer, and that is the challenge to solve with this data set.

Your task is to combine transaction, demographic and offer data to determine which demographic groups respond best to which offer type. This data set is a simplified version of the real Starbucks app because the underlying simulator only has one product whereas Starbucks actually sells dozens of products.

Every offer has a validity period before the offer expires. As an example, a BOGO offer might be valid for only 5 days. You'll see in the data set that informational offers have a validity period even though these ads are merely providing information about a product; for example, if an informational offer has 7 days of validity, you can assume the customer is feeling the influence of the offer for 7 days after receiving the advertisement.

You'll be given transactional data showing user purchases made on the app including the timestamp of purchase and the amount of money spent on a purchase. This transactional data also has a record for each offer that a user receives as well as a record for when a user actually views the offer. There are also records for when a user completes an offer. 

Keep in mind as well that someone using the app might make a purchase through the app without having received an offer or seen an offer.

### Example

To give an example, a user could receive a discount offer buy 10 dollars get 2 off on Monday. The offer is valid for 10 days from receipt. If the customer accumulates at least 10 dollars in purchases during the validity period, the customer completes the offer.

However, there are a few things to watch out for in this data set. Customers do not opt into the offers that they receive; in other words, a user can receive an offer, never actually view the offer, and still complete the offer. For example, a user might receive the "buy 10 dollars get 2 dollars off offer", but the user never opens the offer during the 10 day validity period. The customer spends 15 dollars during those ten days. There will be an offer completion record in the data set; however, the customer was not influenced by the offer because the customer never viewed the offer.

### Cleaning

This makes data cleaning especially important and tricky.

You'll also want to take into account that some demographic groups will make purchases even if they don't receive an offer. From a business perspective, if a customer is going to make a 10 dollar purchase without an offer anyway, you wouldn't want to send a buy 10 dollars get 2 dollars off offer. You'll want to try to assess what a certain demographic group will buy when not receiving any offers.

### Final Advice

Because this is a capstone project, you are free to analyze the data any way you see fit. For example, you could build a machine learning model that predicts how much someone will spend based on demographics and offer type. Or you could build a model that predicts whether or not someone will respond to an offer. Or, you don't need to build a machine learning model at all. You could develop a set of heuristics that determine what offer you should send to each customer (i.e., 75 percent of women customers who were 35 years old responded to offer A vs 40 percent from the same demographic to offer B, so send offer A).

# Data Sets

The data is contained in three files:

* portfolio.json - containing offer ids and meta data about each offer (duration, type, etc.)
* profile.json - demographic data for each customer
* transcript.json - records for transactions, offers received, offers viewed, and offers completed

Here is the schema and explanation of each variable in the files:

**portfolio.json**
* id (string) - offer id
* offer_type (string) - type of offer ie BOGO, discount, informational
* difficulty (int) - minimum required spend to complete an offer
* reward (int) - reward given for completing an offer
* duration (int) - time for offer to be open, in days
* channels (list of strings)

**profile.json**
* age (int) - age of the customer 
* became_member_on (int) - date when customer created an app account
* gender (str) - gender of the customer (note some entries contain 'O' for other rather than M or F)
* id (str) - customer id
* income (float) - customer's income

**transcript.json**
* event (str) - record description (ie transaction, offer received, offer viewed, etc.)
* person (str) - customer id
* time (int) - time in hours since start of test. The data begins at time t=0
* value - (dict of strings) - either an offer id or transaction amount depending on the record

**Note:** If you are using the workspace, you will need to go to the terminal and run the command `conda update pandas` before reading in the files. This is because the version of pandas in the workspace cannot read in the transcript.json file correctly, but the newest version of pandas can. You can access the termnal from the orange icon in the top left of this notebook.  

You can see how to access the terminal and how the install works using the two images below.  First you need to access the terminal:

<img src="pic1.png"/>

Then you will want to run the above command:

<img src="pic2.png"/>

Finally, when you enter back into the notebook (use the jupyter icon again), you should be able to run the below cell without any errors.

In [1]:
import pandas as pd
import numpy as np
import math
import json
from dateutil import rrule
from datetime import datetime
from time import time
from dateutil.parser import parse
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, fbeta_score
import visuals as vs
from sklearn.naive_bayes import GaussianNB
from sklearn import svm
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import fbeta_score, make_scorer
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.base import clone
from pandas import Series
%matplotlib inline



In [5]:
# read in the json files
portfolio = pd.read_json('data/portfolio.json', orient='records', lines=True)
profile = pd.read_json('data/profile.json', orient='records', lines=True)
transcript = pd.read_json('data/transcript.json', orient='records', lines=True)

### Define functions which will be used later：

In [6]:
# offer id和offer_id这两列重复了，需要把他们合并：如果offer_id为空，则把offer id的值填到offer_id里
def combine_offer_id(a, b):
    if pd.isna(b):
        return a
    else:
        return b
    
    
# 创建判断行为是否发生在有效期的函数
def action_in_valiperd(receive_time, duration, action_time):
    if action_time >= receive_time and action_time <= receive_time+duration*24:
        return 1
    else:
        return 0
    
    
# 创建判断顾客的行为是否收到offer的影响
def offer_has_effect(is_viewed, is_used, has_trans, offer_type):
    if is_viewed == 0: 
        return 0   # 只要用户没看过offer，都算没影响
    else:
        if has_trans == 0:
            return 0   # 如果被看过，但是没有交易，也算没影响 （没有交易，但是用券的情况应该不存在）
        else: 
            if is_used == 1:
                return 1 # 被看过，有交易，且用券，说明有影响
            else: 
                if offer_type == 'informational':
                    return 1 # 被看过，有交易，没用券，但是是消息类的推送（不可能用券），说明有影响
                else:
                    return 0 # 非消息类，即使有交易，没用券也认为没影响

# 根据trans_effect['in_valiperd']和trans_effect['is_effect']两个字段，来判断该交易是否收到offer影响。
def is_trans_effect(in_valiperd, is_effect):
    if pd.isna(is_effect) == False and pd.isna(in_valiperd) == False:
        return in_valiperd*is_effect
    else:
        return 0

    
# 给顾客的income分类
def income_type(x):
    if pd.isna(x):
        return x
    elif x <= 50000:
        return "0~50000"
    elif x > 50000 and x <= 100000:
        return "50001~100000"
    elif x > 100000 and x <= 150000:
        return "100000~150000"
    elif x > 150000 and x <= 200000:
        return "150000~200000"
    else:
        return ">200000"
    
    
# 给顾客的年龄age分类
def age_type(x):
    if pd.isna(x) or x == 118:
        return "None"
    elif x < 10:
        return "0~9y"
    elif x >= 10 and x < 20:
        return "10~19y"
    elif x >= 20 and x < 30:
        return "20~29y"
    elif x >= 30 and x < 40:
        return "30~39y"
    elif x >= 40 and x < 50:
        return "40~49y"
    elif x >= 50 and x < 60:
        return "50~59y"
    elif x >= 60 and x < 70:
        return "60~69y"
    elif x >= 70 and x < 80:
        return "70~79y"
    elif x >= 80 and x < 90:
        return "80~89y"
    elif x >= 90 and x < 100:
        return "90~99y"
    else:
        return ">100y"
    
# 根据x的条件，用y赋值
def filter_x(x, y):
    if x == 1:
        return y
    else:
        return 0
    
    
# 计算日期x（int：20170421）距离当前日期有多少天
def days_from_x(x):
    if pd.isna(x):
        return x
    else:
        now = datetime.now()
        try: 
            start_date = parse(str(x))
            #datediff = rrule.rrule(freq = rrule.DAILY,dtstart=start_date,until=now)   
            #return datediff.count()   #这个计算起来速度太慢，可能是因为有count的原因
            datediff = now - start_date
            return datediff.days
        except:
            return np.nan

        
# 把天数按照半年一档来划分（days_type）
def days_type(x):
    if pd.isna(x):
        return x
    else:
        return int(x/180)

        
# 转化int日期x（int：20170421）到datetime （废弃）
def x_to_datetime(x):
    if pd.isna(x):
        return x
    else:
        try: 
            start_date = parse(str(x))
            return start_date
        except:
            return np.nan

## Data cleaning:
### Clean the data: deal with transcript.value

In [7]:
# 把value的值拆成多个列（根据字典里的类型）
transcript_value_s = transcript['value'].apply(pd.Series)

# offer id和offer_id这两列重复了，需要把他们合并：如果offer_id为空，则把offer id的值填到offer_id里
transcript_value_s['offer_id']=transcript_value_s.apply(lambda transcript_value_s: combine_offer_id(transcript_value_s['offer id'],transcript_value_s['offer_id']),axis=1)

# 去掉多余的offer id列
transcript_value = transcript_value_s.drop(['offer id'], axis=1)

#transcript_value.head()

# 把加工好的transcript_value合并到transcript表后面
transcript_new = pd.concat([transcript, transcript_value], axis=1, join_axes=[transcript.index])

# check how many types of event：
# - offer received
# - offer viewed
# - transaction
# - offer completed
transcript_new.drop_duplicates(['event'])

Unnamed: 0,event,person,time,value,amount,offer_id,reward
0,offer received,78afa995795e4d85b5d9ceeca43f5fef,0,{'offer id': '9b98b8c7a33c4b65b9aebfe6a799e6d9'},,9b98b8c7a33c4b65b9aebfe6a799e6d9,
12650,offer viewed,389bc3fa690240e798340f5a15918d5c,0,{'offer id': 'f19421c1d4aa40978ebb69ca19b0e20d'},,f19421c1d4aa40978ebb69ca19b0e20d,
12654,transaction,02c083884c7d45b39cc68e1314fec56c,0,{'amount': 0.8300000000000001},0.83,,
12658,offer completed,9fa9ae8f57894cc9a3b8a9bbe0fc1b2f,0,{'offer_id': '2906b810c7d4411798c6938adc9daaa5...,,2906b810c7d4411798c6938adc9daaa5,2.0


### Refine portfolio and profile table for later use

In [9]:
# 重命名id列，使后面的merge更方便
portfolio.rename(columns={'id':'offer_id'},inplace=True) 
# 重命名id列，使后面的merge更方便
profile.rename(columns={'id':'person_id'},inplace=True) 

# 给顾客的income分类
profile['income_type']=profile.apply(lambda profile: income_type(profile['income']),axis=1)
# 给顾客的年龄age分类
profile['age_type']=profile.apply(lambda profile: age_type(profile['age']),axis=1)
# 按照入会日期，计算每个顾客入会的时长
profile['membr_days'] = profile.apply(lambda profile: days_from_x(profile['became_member_on']),axis=1)
# 以一个季度（90天）为单位，划分顾客的入会时间
profile['membr_days_type'] = profile['membr_days'].apply(lambda x: days_type(x)) 

profile.head()

Unnamed: 0,age,became_member_on,gender,person_id,income,income_type,age_type,membr_days,membr_days_type
0,118,20170212,,68be06ca386d4c31939f3a4f0e3dd783,,,,1130,6
1,55,20170715,F,0610b486422d4921ae7d2bf64640c50b,112000.0,100000~150000,50~59y,977,5
2,118,20180712,,38fe809add3b4fcf9315a9694bb96ff5,,,,615,3
3,75,20170509,F,78afa995795e4d85b5d9ceeca43f5fef,100000.0,50001~100000,70~79y,1044,5
4,118,20170804,,a03223e636434f42ac4c3df47e8bac43,,,,957,5


# A. Analysis part
## 1. Calculate whether each person was effected by each offer (Multi Reaction)
one person can receive one offer for many times, but not act the same each time, so this table record each reaction.

In [10]:
# 1.1.1 把收到offer到行为单独提取出来
offer_received = transcript_new[transcript_new['event'] == 'offer received'].loc[:,['person','offer_id','time']]
offer_received.rename(columns={'time':'receive_time','person':'person_id'},inplace=True) 

# 1.1.2 关联portfolio表，取得券的有效期
offer_received = pd.merge(offer_received, portfolio, how='left', on=['offer_id'])


# 1.2.1 把offer view的行为单独提取出来
offer_viewed = transcript_new[transcript_new['event'] == 'offer viewed'].loc[:,['person','offer_id','time']]
offer_viewed.rename(columns={'time':'view_time','person':'person_id'},inplace=True) 

# 1.2.2 收到offer到行为左连接view的行为，来计算多少收到offer后被查看了。（如果只有view行为，没有收到行为，理论上不应该，这里当作脏数据删除了）
offer_received = pd.merge(offer_received, offer_viewed, how='left', on=['person_id','offer_id'])

# 1.2.3 添加一列，判断在有效期内是否有view发生（注意：一条offer的推送可能关联多个view的行为，这里判断每条view行为是否在有效期内）
offer_received['is_viewed']=offer_received.apply(lambda offer_received: 
                                                 action_in_valiperd(offer_received['receive_time'], 
                                                                    offer_received['duration'], 
                                                                    offer_received['view_time']),axis=1)

# 1.2.4 得到每次用户收到offer后，是否在有效期内阅读了。（注意：一条offer用户可能阅读多次，这里统一只计为1-是否阅读；如果后续需要区分阅读次数，可以再改）
offer_received = offer_received.groupby(['person_id','offer_id','receive_time','duration'],as_index=False)['is_viewed'].max()



# 1.3.1 把offer use的行为单独提取出来（跟view的行为处理方法一致）
offer_completed = transcript_new[transcript_new['event'] == 'offer completed'].loc[:,['person','offer_id','time','reward']]
offer_completed.rename(columns={'time':'use_time','person':'person_id'},inplace=True) 

# 1.3.2 收到offer到行为左连接view的行为，来计算多少收到offer后被查看了。（如果只有view行为，没有收到行为，理论上不应该，这里当作脏数据删除了）
offer_received = pd.merge(offer_received, offer_completed, how='left', on=['person_id','offer_id'])

# 1.3.3 添加一列，判断在有效期内是否有use发生（注意：一条offer的推送也可能关联多个use的行为，不一定在同一有效期）
offer_received['is_used']=offer_received.apply(lambda offer_received: 
                                                 action_in_valiperd(offer_received['receive_time'], 
                                                                    offer_received['duration'], 
                                                                    offer_received['use_time']),axis=1)


# 1.3.4 得到每次用户收到offer后，是否在有效期内使用了。（注意：原则上一次推送的offer，只能在有效期内被使用一次，
#      但是也有可能两次推送同样的offer，且有效期重叠，这样就会得到每条推送都关联了2条使用记录，且reward金额可能不同。
#      但是，根据现有数据，没有办法区分哪条use是对应哪条推送的，所以统一都取一条最大的）
offer_received = offer_received.groupby(['person_id','offer_id','receive_time','duration','is_viewed'],as_index=False)['is_used'].max()

#offer_received = offer_received.groupby(['person_id','offer_id','receive_time','duration','is_viewed'],as_index=False)['is_used','reward'].max()
# 注意：使用max有一个问题，就是选取的是每一列的最大值，而不是选取最大值的那一行，也就是说只能在选取单列的最大值的时候才是准确的.
#      所以如果有2条以上使用记录，那么这个reward的金额可能不是真正用那一张的金额


# 1.4.1 把transaction的“交易”行为单独提取出来（这是每个顾客所有的交易记录）
transaction_log = transcript_new[transcript_new['event'] == 'transaction'].loc[:,['person','time','amount']]
transaction_log.rename(columns={'time':'transaction_time','person':'person_id'},inplace=True) 

# 1.4.2 收到offer到行为左连接交易行为，来计算offer的有效期内，是否有交易产生。（这里只看和收到offer的人有关的交易数据）
offer_received = pd.merge(offer_received, transaction_log, how='left', on=['person_id'])

# 1.4.3 添加一列，判断在有效期内是否有use发生（注意：一条offer的推送也可能关联多个use的行为，不一定在同一有效期）
offer_received['has_trans']=offer_received.apply(lambda offer_received: 
                                                 action_in_valiperd(offer_received['receive_time'], 
                                                                    offer_received['duration'], 
                                                                    offer_received['transaction_time']),axis=1)

# 1.4.4 得到每次用户收到offer后，是否在有效期内有交易。（注意：原则上如果有交易，不一定用券；但是用券了，一定有交易）
offer_received = offer_received.groupby(['person_id','offer_id','receive_time','duration','is_viewed','is_used'],as_index=False)['has_trans'].max()
offer_received.head()

Unnamed: 0,person_id,offer_id,receive_time,duration,is_viewed,is_used,has_trans
0,0009655768c64bdeb2e877511632db8f,2906b810c7d4411798c6938adc9daaa5,576,7,0,1,1
1,0009655768c64bdeb2e877511632db8f,3f207df678b143eea3cee63160fa8bed,336,4,1,0,1
2,0009655768c64bdeb2e877511632db8f,5a8bc65990b245e5a138643cd4eb9837,168,3,1,0,1
3,0009655768c64bdeb2e877511632db8f,f19421c1d4aa40978ebb69ca19b0e20d,408,5,1,1,1
4,0009655768c64bdeb2e877511632db8f,fafdcd668e3743c1bb461111dcafc2a4,504,10,1,1,1


In [11]:
# 1.5.1 再次关联portfolio表，取得券的类型（后续判断会用到）
offer_received = pd.merge(offer_received, portfolio, how='left', on=['offer_id'])
offer_received = offer_received.drop(['duration_y'], axis=1)
offer_received.rename(columns={'duration_x':'duration'},inplace=True) 

# 1.5.2 添加一列，判断offer对于顾客的影响
offer_received['is_effect']=offer_received.apply(lambda offer_received: 
                                                 offer_has_effect(offer_received['is_viewed'], 
                                                                    offer_received['is_used'], 
                                                                    offer_received['has_trans'],
                                                                    offer_received['offer_type']),axis=1)

# 1.5.3 增加一列is_receive，全部赋值为1，因为所有记录都是收到offer的。这个方便后续的groupby计算
offer_received['is_receive'] = 1
offer_received.head()

Unnamed: 0,person_id,offer_id,receive_time,duration,is_viewed,is_used,has_trans,channels,difficulty,offer_type,reward,is_effect,is_receive
0,0009655768c64bdeb2e877511632db8f,2906b810c7d4411798c6938adc9daaa5,576,7,0,1,1,"[web, email, mobile]",10,discount,2,0,1
1,0009655768c64bdeb2e877511632db8f,3f207df678b143eea3cee63160fa8bed,336,4,1,0,1,"[web, email, mobile]",0,informational,0,1,1
2,0009655768c64bdeb2e877511632db8f,5a8bc65990b245e5a138643cd4eb9837,168,3,1,0,1,"[email, mobile, social]",0,informational,0,1,1
3,0009655768c64bdeb2e877511632db8f,f19421c1d4aa40978ebb69ca19b0e20d,408,5,1,1,1,"[web, email, mobile, social]",5,bogo,5,1,1
4,0009655768c64bdeb2e877511632db8f,fafdcd668e3743c1bb461111dcafc2a4,504,10,1,1,1,"[web, email, mobile, social]",10,discount,2,1,1


In [None]:
#######------------到这里，第一张宽表就做好了。---------------------

In [13]:
# Data-Check：
# whether exist this situation: offer is used, but no transaction（logically this situation shouldn't exist）
offer_received.query('is_used == 1 and has_trans == 0')
# Result: NO such situation, it means the data is correct. 

Unnamed: 0,person_id,offer_id,receive_time,duration,is_viewed,is_used,has_trans,channels,difficulty,offer_type,reward,is_effect,is_receive


## 2. Calculate whether each person was effected by each offer (Summarized Reaction)
one person can receive one offer for many times, but not act the same each time, so this table summarized each reaction to get the average effect_score. i.e. if a person receive the same offer 3 times, effected 2 times, then the effect_score = 2/3. 

In [14]:
# 2.1 计算出每个顾客对于每种offer的反应
user_offer_reaction = offer_received.groupby(['person_id','offer_id'],as_index=False)['is_effect','is_receive'].sum()

user_offer_reaction['effect_score'] = user_offer_reaction['is_effect']/user_offer_reaction['is_receive']
user_offer_reaction.head()

Unnamed: 0,person_id,offer_id,is_effect,is_receive,effect_score
0,0009655768c64bdeb2e877511632db8f,2906b810c7d4411798c6938adc9daaa5,0,1,0.0
1,0009655768c64bdeb2e877511632db8f,3f207df678b143eea3cee63160fa8bed,1,1,1.0
2,0009655768c64bdeb2e877511632db8f,5a8bc65990b245e5a138643cd4eb9837,1,1,1.0
3,0009655768c64bdeb2e877511632db8f,f19421c1d4aa40978ebb69ca19b0e20d,1,1,1.0
4,0009655768c64bdeb2e877511632db8f,fafdcd668e3743c1bb461111dcafc2a4,1,1,1.0


In [None]:
# --------------这个表后面Model2会用到-----------------

## 3. Transaction distribution for each person, how many transactions were effected by offer.
Calculate below metrics for each person:
- total order count
- total order amount
- How many orders happened during offer period
- how many orders were effected by offer
- effected order amount...

In [15]:
# 2.1.1 注意：这里没有交易id，需要把index作为交易的唯一id，因为可能存在同一个人，同一时间，同等金额的交易。
transaction_log['trans_id'] = transaction_log.index

# 2.2.1 所有交易行为左连接第一张宽表（每个顾客对于每个收到的offer的反应，同一个offer可能收到多次）
trans_effect = pd.merge(transaction_log, offer_received, how='left', on=['person_id'])

# 2.2.2 添加一列，判断该交易是否在offer的有效期内（注意：一条交易可能关联多条offer推送的行为，且可能有多条都在有效期内）
trans_effect['in_valiperd']=trans_effect.apply(lambda trans_effect: 
                                                 action_in_valiperd(trans_effect['receive_time'], 
                                                                    trans_effect['duration'], 
                                                                    trans_effect['transaction_time']),axis=1)

# 2.2.3 用in_valiperd和is_effect结合看该笔交易是否收到offer的影响： 只有is_effect和in_valiperd同时为1的情况下，才算。
trans_effect['trans_effect'] = trans_effect.apply(lambda trans_effect: 
                                                 is_trans_effect(trans_effect['in_valiperd'], 
                                                                    trans_effect['is_effect']),axis=1)

# 2.2.4 得到每笔交易是受到offer影响
trans_effect = trans_effect.groupby(['trans_id','person_id','transaction_time','amount'],as_index=False)['in_valiperd','trans_effect'].max()

trans_effect.head()


Unnamed: 0,trans_id,person_id,transaction_time,amount,in_valiperd,trans_effect
0,12654,02c083884c7d45b39cc68e1314fec56c,0,0.83,1,0.0
1,12657,9fa9ae8f57894cc9a3b8a9bbe0fc1b2f,0,34.56,1,1.0
2,12659,54890f68699049c2a04d415abc25e717,0,13.23,0,0.0
3,12670,b2f1cd155b864803ad8334cdf13c4bd2,0,19.51,1,1.0
4,12671,fe97aa22dd3e48c8b143116a8403dd52,0,18.97,1,1.0


In [16]:
# 2.3.1 计算每个顾客的客单价，受影响的订单数，不受影响的订单数，受影响的订单金额，不受影响的订单金额
user_summary = trans_effect
user_summary['trans_cnt'] = 1
user_summary['effect_amount'] = user_summary.apply(lambda user_summary: 
                                                 filter_x(user_summary['trans_effect'], 
                                                          user_summary['amount']),axis=1)


# 2.3.2 按照人，来统计交易情况
user_summary = user_summary.groupby(['person_id'],as_index=False)['trans_cnt','amount','in_valiperd','trans_effect','effect_amount'].sum()


# 2.3.3 结合顾客的属性，分析哪些顾客容易受offer影响
user_summary = pd.merge(user_summary, profile, how='left', on=['person_id'])
user_summary.head()

Unnamed: 0,person_id,trans_cnt,amount,in_valiperd,trans_effect,effect_amount,age,became_member_on,gender,income,income_type,age_type,membr_days,membr_days_type
0,0009655768c64bdeb2e877511632db8f,8,127.6,8,8.0,127.6,33,20170421,M,72000.0,50001~100000,30~39y,1062,5
1,00116118485d4dfda04fdbaba9a87b5c,3,4.09,0,0.0,0.0,118,20180425,,,,,693,3
2,0011e0d4e6b944f998e987f904e8c1e5,5,79.46,4,4.0,65.97,40,20180109,O,57000.0,50001~100000,40~49y,799,4
3,0020c2b971eb4e9188eac86d93036a77,8,196.86,6,5.0,115.57,59,20160304,F,90000.0,50001~100000,50~59y,1475,8
4,0020ccbbb6d84e358d3414a3ff76cffd,12,154.05,11,11.0,137.78,24,20161111,F,60000.0,50001~100000,20~29y,1223,6


In [None]:
user_summary.to_excel('user_summary.xlsx')