## Dataset: model_dataset

<img src="./images/model_dataset.png"/>

Part explanation for the columns:  
1. label_group (obeject): 4 groups of resonse to offers
    - 'none_offer'
    - 'no_care'
    - 'tried'
    - 'effective_offer'
2. label_seg (int): 12 segments based on age and income
    - values: 1 ... 12  <br>  
  
(More details in <u>2_heuristic_exploration.ipynb</u>)

###  <u>10 Kinds</u> of offer_id
| offer_id #| type | duration | requirement | reward |
|:-| :-| :-:|:-:|:-:|
| 0 | bogo | 7 | 10 | 10 |
| 1 | bogo | 5 | 10 | 10 |
| 2 | infomational | 4 | - | - |
| 3 | bogo | 7 | 5 | 5 |
| 4 | discount | 10 | 20 | 5 |
| 5 | discount | 7 | 7 | 3 |
| 6 | discount | 10 | 10 | 2 |
| 7 | informational | 3 | - | - |
| 8 | bogo | 5 | 5 | 5 |
| 9 | discount | 7 | 10 | 2 |

### <u>12 Segements</u> based on 'age' and 'gender'
<br>
    
|Segment #| Age Group (edge included)<br> (Experiment in 2018) | Income | 
|---| --- | --- | 
|1| Millenials(-21 & 22-37) | low  | 
|2| Millenials(-21 & 22-37) | medium  | 
|3| Millenials(-21 & 22-37) | high  | 
|4| Gen X(38-53) | low  |
|5| Gen X(38-53) | medium |
|6| Gen X(38-53) | high |
|7| Baby Boomer(54-72) | low  |
|8| Baby Boomer(54-72) | medium |
|9| Baby Boomer(54-72) | high |
|10| Silent(73-90 & 91+) |low |
|11| Silent(73-90 & 91+) | medium |
|12| Silent(73-90 & 91+) | high |

**Notice:**  
- low: 30,000-50,000
- medium: 50,001-82,500
- high: 82,501-120,000

### <u>4 Groups</u> of possible responsiveness to offer
<br>

|Group| received | viewed |valid completed | transaction amount |Scenario |
| :-| :-: | :-:| :-: | :-: | :- |
|1.none_offer| 0 | 0 | 0 | |haven't received the offer |
|2.no_care | 1 | 0 | - | |received but not viewed.<br> regarded as no_care|
|| 1 | 1 | 0 | =0.0 | received, viewed but no transaction |
|| 1 | 1 | 1<br>viewed after completed |  | received, but completed unintentionally |
|3.tried| 1 | 1 | 0 | >0.0|received, viewed, have transaction |
|4.effctive_offer | 1 | 1 | 1<br>viewed before completed | | viewed before completed,  effctive offer|

# <a class="anchor" id="Start">Table of Contents</a>

I. [Feature Engineer](#1)<br>
II.[Build model Pipeline](#2)<br>
III.[Explore intersting Questions](#3)

    - Q3.1 Offer prepared to sent to a person, is this offer effective?
    - Q3.2 Offer already sent to a person, is this offer effective?
    - Q3.3 Given a person, recommend an offer with the most effctivity?
IV.[Build Neural Network for Regeression](#4)<br>
[References](#References)

In [1]:
import pandas as pd
import numpy as np
import math
import json

from time import time
from datetime import date
from collections import defaultdict

import seaborn as sb
import matplotlib.pyplot as plt
%matplotlib inline

model_dataset_raw = pd.read_csv('./data_generated/model_dataset_raw.csv', dtype={'offer_id': str})

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import PolynomialFeatures

from sklearn.metrics import mean_squared_error
from sklearn.metrics import classification_report
from sklearn.metrics import accuracy_score

from sklearn.model_selection import GridSearchCV

from sklearn.linear_model import Ridge
from sklearn.tree import DecisionTreeRegressor
from sklearn.tree import DecisionTreeClassifier


from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC, LinearSVC, NuSVC
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier, GradientBoostingClassifier

In [None]:
from sklearn.pipeline import Pipeline
import pickle

In [None]:
from sklearn.multioutput import MultiOutputClassifier

## <a class="anchor" id="1">[I. Feature Engineer](#Start)</a>

### 1. Add features
- Total transactions amount of individuals `'amount_total'`
- Count of offers received of individuals  `'offer_received_cnt'`

In [None]:
# Load in transactions dataset

# wrangled transcript with updated information of offer
transcript_offer = pd.read_csv('./data_generated/wrangled_transcript_offer.csv', dtype={'person': int})
# recover to original dataset: index is the same
transcript_offer.index = transcript_offer.iloc[:, 0].values
del transcript_offer['Unnamed: 0']

In [None]:
transcript_amount = transcript_offer.groupby('person').sum()['amount']
offer_received_cnt = model_dataset_raw.groupby(['person']).count()['offer_id']
persons = transcript_amount.index.tolist()

for person in persons:
    is_person = (model_dataset_raw.person == person)
    model_dataset_raw.loc[is_person,'amount_total'] = transcript_amount.loc[person]
    model_dataset_raw.loc[is_person,'offer_received_cnt'] = offer_received_cnt.loc[person]

In [None]:
model_dataset = model_dataset_raw.copy()

In [None]:
model_dataset.groupby('label_group').count()

**FOUND:**
1. The 5 person in group `none_offer` will be droped, so that there is no more NaNs in the target columns in `model_dataset` 

In [None]:
is_dataset_kepp = (model_dataset.label_group != 'none_offer')
model_dataset = model_dataset[is_dataset_kepp]

### 2. One-hot code for target obejects
- gender
- label_group

In [None]:
gender_onehot = pd.get_dummies(model_dataset['gender'], prefix='gender')
label_group_onehot = pd.get_dummies(model_dataset['label_group'], prefix='group')
offer_id_onehot =  pd.get_dummies(model_dataset['offer_id'], prefix='offer')

In [None]:
model_dataset = pd.concat([model_dataset, gender_onehot, label_group_onehot, offer_id_onehot], axis=1)

### 3. Features of time
1. Time features
    - 'time_received'
    - 'time_viewed'
    - 'time_transaction'
    - 'time_completed'
2. Transform the time_transaction to transaction_cnt
3. Fill the NaNs with 0

In [None]:
model_dataset[(model_dataset.time_transaction.isin(['-1']))].offer_id.unique()  #-1标签 只对应offer_id.isin(['2','7']) 

In [None]:
def transform_transaction_cnt(dataset):
    # group of offer_id=='2' '7'
    # group of transaction = -1
    # group of transaction with ','
    
    dataset['time_transaction'] = dataset['time_transaction'].apply(lambda x: len(str(x).split(','))-1)
    
    is_group_info = (dataset.offer_id.isin(['2', '7']) & (dataset.label_effective_offer==1))
    dataset.loc[is_group_info, 'time_transaction'] = 1
    
    return dataset

model_dataset = transform_transaction_cnt(model_dataset)

In [None]:
model_dataset.rename(columns={'time_transaction': 'transaction_cnt'}, inplace=True)

# drop the useless columns for modeling
model_dataset.drop(['label_effective_offer'], axis=1, inplace=True)

values = {'time_viewed': 0.0, 'time_completed': 0.0} #time_viewed: 49860 non-null, time_completed: 40407 non-null
model_dataset.fillna(value=values, inplace=True)

model_dataset.info()

## <a class="anchor" id="2">[II. Build model Pipeline](#Start)</a>

In [None]:
# 方便重启
model_dataset_test = model_dataset.copy()

In [None]:
model_dataset = model_dataset_test

### 1. Select features and target 
[References[1]](https://github.com/syuenloh/UdacityDataScientistCapstone/blob/master/Starbucks%20Capstone%20Challenge%20-%20Using%20Starbucks%20app%20user%20data%20to%20predict%20effective%20offers.ipynb)

In [None]:
# Target: label_group
model_dataset['label_group'] = model_dataset['label_group'].replace(['no_care','tried', 'effctive_offer'],['0','1','1'])
model_dataset = model_dataset.astype({'label_group': int})

model_dataset.groupby('label_group').count()  
# 31613	 VS 34888: The distribution of the targets seems balanced

In [None]:
def select_features_target(df, target_cols, keep_cols):
    '''
    INPUT:
    - df(DataFrame): dataset include all possible features and target
    - target_cols: 
        a column name(str) or more columns names(list of str)
    - keep_cols(list): list of columns names as features
    
    OUTPUT:
    - 
    '''
    # df[[]] is DataFrame
    target = df[target_cols] #np.array()
    
    drop_cols = np.setdiff1d(df.columns, keep_cols)
    features = df.drop(drop_cols, axis=1)
    
    return features, target

### 2. prepare model pipeline
[References[1]](https://github.com/syuenloh/UdacityDataScientistCapstone/blob/master/Starbucks%20Capstone%20Challenge%20-%20Using%20Starbucks%20app%20user%20data%20to%20predict%20effective%20offers.ipynb)

In [None]:
def select_clf(pickle_path, clf_ls, features, target, test_size=0.20, random_state=9):
    '''
    OUTPUT:
    - results(dict): 'model', 'train_time', 'pred_time', 'train_score', 'test_score'
    '''
    # split into training and test sets
    X_train, X_test, y_train, y_test = train_test_split(features, target, 
                                                        test_size=test_size, 
                                                        random_state=random_state)
    
    results = defaultdict()
    visual_results = pd.DataFrame(columns=['model', 'train_time', 'test_time',
                                         'train_score', 'test_score'])
    #models = defaultdict()
    #report_ls = []
    
    
    for classifier in clf_ls:
        pipe = Pipeline(steps=[('preprocessor', StandardScaler()),
                               ('clf', classifier)])
                           
        start_train = time()
        model = pipe.fit(X_train, y_train)
        end_train = time()
        results['train_time'] = end_train-start_train
        
        # predict in train set
        pred_train = model.predict(X_train)
        
        # predict in test set and Calculate the time
        start_test = time()
        pred_test = model.predict(X_test)
        end_test = time()
        results['test_time'] = end_test-start_test
    
        # add training accuracy to results
        # what is the score？
        results['train_score']=model.score(X_train,y_train)
    
        #add testing accuracy to results
        results['test_score']=model.score(X_test,y_test)
        
        
        print("{} trained on {} samples.".format(classifier.__class__.__name__, len(y_train)))
        print("Train time: {}s".format(results['train_time']))
        print("Test time: {}s".format(results['test_time']))
        print("MSE_train: %.4f" % mean_squared_error(y_train,pred_train))
        print("MSE_test: %.4f" % mean_squared_error(y_test,pred_test))
        print("Training accuracy: %.4f" % results['train_score'])
        print("Test accuracy: %.4f" % results['test_score'])
        
        # output the report
        report = classification_report(y_test, pred_test,digits=4) #output_dict=True
        print(report)
                # df_report = pd.DataFrame(report).transpose()
                # report_ls.append(df_report)
        
        # for scaler value need an index
        new_model = pd.Series([classifier.__class__.__name__, results['train_time'],
                            results['test_time'], results['train_score'], results['test_score']],
                           index=visual_results.columns)
        visual_results = visual_results.append(new_model, ignore_index=True)
        
        #models[classifier.__class__.__name__] = model
        #覆盖之后只写入了最后一个模型
        with open(pickle_path, "wb") as f:  
                pickle.dump(model, f)
        
    return visual_results #,report_ls

In [None]:
def model_select_param(classifier, param_grid, features, target, test_size=0.20, random_state=9):
    # split into training and test sets
    X_train, X_test, y_train, y_test = train_test_split(features, target, 
                                                        test_size=test_size, 
                                                        random_state=random_state)
    
    pipe = Pipeline(steps=[('preprocessor', StandardScaler()),
                        ('clf', classifier)])
    CV = GridSearchCV(pipe, param_grid, n_jobs= 1)
    
    results = defaultdict()
    
    start = time()
    CV.fit(X_train, y_train) 
    end = time()
    
    # Attribute: best_estimator_  best_params_  best_score_
    results['model'] = CV
    results['train_time'] = end - start
    
    # predict in train set
    pred_train = CV.predict(X_train)

    # predict in test set and Calculate the time
    start_test = time()
    pred_test = CV.predict(X_test)
    end_test = time()
    results['test_time'] = end_test-start_test

    # add training accuracy to results
    # what is the score？
    results['train_score']=CV.score(X_train,y_train)

    #add testing accuracy to results
    results['test_score']=CV.score(X_test,y_test)

    print("{} trained on {} samples.".format(CV.best_estimator_, len(y_train)))
    print("MSE_train: %.4f" % mean_squared_error(y_train, pred_train))
    print("MSE_test: %.4f" % mean_squared_error(y_test, pred_test))
    print("Training accuracy: %.4f" % results['train_score'])
    print("Test accuracy: %.4f" % results['test_score'])
    print(classification_report(y_test, pred_test,digits=4))

    return results

## <a class="anchor" id="3">[III. Explore intersting Questions](#Start)</a>

### Q3.1 Offer prepared to sent to a person, is this offer effective?

1. Dataset<br>
Data in the group as follows in label_group:
    - no_care
    - tried
    - effective_offer

2. Target   

| Target | Value | Meaning |
| :- | :-: | :- |
| label_group | 0 | person doesn't care the offer |
|       | 1 | Within the duration of offer, tried or completed the transactions|

3. Features

| (Number: default 1)Features | Select reason |
| :- | :- |
| age | basic info about person |
| income | basic info about person |
| member_days | basic info about person |
| (3)gender_ | basic info about person<br>(3 kinds of 0-1 variables) |
| (10)offer_ | info about offer<br>(10 kinds of 0-1 variables) |
| amount_total | amount paid of all transactions |
| offer_received_cnt | number of all received offers |
| time_received | receive time for this offer |


In [None]:
target_cols = 'label_group'

keep_cols = ['age', 'income', 'member_days', 'gender_F', 'gender_M', 'gender_O',
            'offer_0', 'offer_1', 'offer_2', 'offer_3', 'offer_4', 'offer_5',
              'offer_6', 'offer_7', 'offer_8', 'offer_9',
             'amount_total', 'offer_received_cnt','time_received']
           
features, target = select_features_target(model_dataset, target_cols, keep_cols)

classifiers = [
    KNeighborsClassifier(3),
    #SVC(kernel="rbf", C=0.025, probability=True),
    #NuSVC(probability=True),
    DecisionTreeClassifier(),
    RandomForestClassifier(),
    AdaBoostClassifier(),
    GradientBoostingClassifier()
    ]

# test for ideal with group infos
pickle_path = './models_effct_1.pckl'
results_effct_1 = select_clf(pickle_path, classifiers, features, target, test_size=0.20, random_state=9)

In [None]:
results_effct_1

### Q3.2 Offer already sent to a person, is this offer effective?

1. Dataset<br>
Data in the group as follows in label_group:
    - no_care
    - tried
    - effective_offer


2. Target   

| Target | Value | Meaning |
| :- | :-: | :- |
| label_group | 0 | person doesn't care the offer |
|       | 1 | Within the duration of offer, tried or completed the transactions|

3. Features

| (Number: default 1)Features | Select reason |
| :- | :- |
| age | basic info about person |
| income | basic info about person |
| member_days | basic info about person |
| (3)gender_ | basic info about person<br>(3 kinds of 0-1 variables) |
| (10)offer_ | info about offer<br>(10 kinds of 0-1 variables) |
| amount_with_offer | amount paid of transactions for this offer |
| amount_total | amount paid of all transactions |
| offer_received_cnt | number of all received offers |
| time_received | receive time for this offer |
| time_viewed | view time for this offer. <br>If not, values 0.0|


In [None]:
# 'group_effctive_offer', 'group_no_care', 'group_tried','transaction_cnt', 'time_completed' has direct information of target classes
target_cols = 'label_group'

keep_cols = ['age', 'income', 'member_days', 'gender_F', 'gender_M', 'gender_O',
            'offer_0', 'offer_1', 'offer_2', 'offer_3', 'offer_4', 'offer_5',
              'offer_6', 'offer_7', 'offer_8', 'offer_9',
             'amount_with_offer', 'amount_total', 'offer_received_cnt',
            'time_received', 'time_viewed']
           
features, target = select_features_target(model_dataset, target_cols, keep_cols)

classifiers = [
    KNeighborsClassifier(3),
    #SVC(kernel="rbf", C=0.025, probability=True),
    #NuSVC(probability=True),
    DecisionTreeClassifier(),
    RandomForestClassifier(),
    AdaBoostClassifier(),
    GradientBoostingClassifier()
    ]

# test for ideal with group infos
pickle_path = './models_rec_1.pckl'
results_rec_1 = select_clf(pickle_path, classifiers, features, target, test_size=0.20, random_state=9)

In [None]:
results_rec_1

In [None]:
#load model: feature_importances_
with open(r"./models_rec_1.pckl", "rb") as f:
    models = pickle.load(f)


In [None]:
models

In [None]:
columns = ['age', 'income', 'member_days', 'gender_F', 'gender_M', 'gender_O',
            'offer_0', 'offer_1', 'offer_2', 'offer_3', 'offer_4', 'offer_5',
              'offer_6', 'offer_7', 'offer_8', 'offer_9',
             'amount_with_offer', 'amount_total', 'offer_received_cnt',
            'time_received', 'time_viewed']

In [None]:
ss = pd.DataFrame(models['clf'].feature_importances_, index=columns)

In [None]:
ss.T

In [None]:
feature_importances = pd.DataFrame(models['clf'].feature_importances_,
                                   index = columns,
                                    columns=['importance']).sort_values('importance',ascending=False)
feature_importances.plot.bar()
plt.xticks(rotation=80)

In [None]:
models['clf'].feature_importances_

### Q3.3 Given a person, recommend an offer with the most effctivity.

定位：只有个人基本信息和汇总的情况，没有针对的offer

1. Dataset<br>
Data in the group as follows in label_group:
    - tried——in this group exists transaction(s)
    - effective_offer


2. Target

| Target | Value | Meaning |
| :- | :-: | :- |
| offer_(10 classes) | 0 | uneffective in this offer_id |
|              | 1 | effective in this offer_id |

3. Features

| (Number: default 1)Features | Select reason |
| :- | :- |
| age | basic info about person |
| income | basic info about person |
| member_days | basic info about person |
| (3)gender_ | basic info about person<br>(3 kinds of 0-1 variables) |
| (10)offer_ | info about offer<br>(10 kinds of 0-1 variables) |
| amount_with_offer | amount paid of transactions for this offer |
| amount_total | amount paid of all transactions |
| offer_received_cnt | number of all received offers |
| time_received | receive time for this offer |
| time_viewed | view time for this offer. <br>If not, values 0.0|

In [None]:
is_group_effective = (model_dataset.label_group==1)  #tried & effctive_offer
model_dataset_input = model_dataset[is_group_effective]

In [None]:
model_dataset_input.groupby('offer_id').count()  #seems the samples is far from enough

In [None]:
# 'group_effctive_offer', 'group_no_care', 'group_tried','transaction_cnt', 'time_completed' has direct information of target classes
target_cols = ['offer_0', 'offer_1', 'offer_2', 'offer_3', 'offer_4', 'offer_5',
              'offer_6', 'offer_7', 'offer_8', 'offer_9']

#keep_cols = ['age', 'income', 'member_days', 'gender_F', 'gender_M', 'gender_O',
 #            'amount_total', 'offer_received_cnt']

keep_cols = ['age', 'income', 'member_days', 'gender_F', 'gender_M', 'gender_O',
             'amount_with_offer', 'amount_total', 'offer_received_cnt',
            'time_received', 'time_viewed']

features, target = select_features_target(model_dataset_input,target_cols, keep_cols)

classifiers = [
    KNeighborsClassifier(3),
    #SVC(kernel="rbf", C=0.025, probability=True),
    #NuSVC(probability=True),
    #DecisionTreeClassifier(),
    #RandomForestClassifier(),
    #AdaBoostClassifier(),  
    MultiOutputClassifier(GradientBoostingClassifier())  #one-vs-the rest
    ]

# test for ideal with group infos
pickle_path = './models_multiclass_test.pckl'
results_multiclass_test = select_clf(pickle_path, classifiers, features, target, test_size=0.20, random_state=9)

In [None]:
with open(r"./models_multiclass_test.pckl", "rb") as f:
    models_test = pickle.load(f)

In [None]:
models_test.predict_proba(test) #概率：第一列为0， 第二列为1 10 lables * 3 recods * 2 outputs

In [None]:
test = features.iloc[-10:]

In [None]:
test

In [None]:
model_dataset_input[target_cols].sum()

## <a class="anchor" id="4">[IV. Build neural network for regeression](#Start)</a>

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch import optim

from skorch import NeuralNetClassifier

from sklearn.utils import shuffle
torch.manual_seed(0)

In [None]:
class Classifier(nn.Module):
    
    def __init__(self, inputs=18, hidden=7, outputs=1):
        super().__init__()
        self.fc1 = nn.Linear(inputs, hidden)  # 18 features as input
        self.fc2 = nn.Linear(hidden, outputs)
        
        self.dropout = nn.Dropout(p=0.25)

    def forward(self, x):
        x = self.dropout(F.relu(self.fc1(x)))
        x = self.fc2(x)  #单分类用sigmoid, 多分类soft_max...
        
        return x

In [None]:
model_dataset.info()

In [None]:
# 'group_effctive_offer', 'group_no_care', 'group_tried','transaction_cnt', 'time_completed' has direct information of target classes
target_cols = 'amount_total'

keep_cols = ['age', 'income', 'member_days', 'gender_F', 'gender_M', 'gender_O',
             'reward', 'difficulty','duration', 'email', 'mobile', 'social', 'web',
             'transaction_cnt', 'offer_received_cnt',
             'group_effctive_offer', 'group_no_care', 'group_tried']
            
model_dataset_input = shuffle(model_dataset)
features, target = select_features_target(model_dataset_input, target_cols, keep_cols)

features_array, target_array = np.array(features), np.array(target)
features_tensor, target_tensor = torch.from_numpy(features_array).float(), torch.from_numpy(target_array).float()


X_train, X_test, y_train, y_test = train_test_split(features_tensor, target_tensor, 
                                                        test_size=0.2, 
                                                        random_state=9)

In [None]:
y_train[1]

In [None]:
# Build a model
model = Classifier()  #default (29, 10, 4, 1)
#criterion = nn.NLLLoss() #针对对分类变量

criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.003)

# train the net
epochs = 20
train_size = X_train.shape[0]
test_size = X_test.shape[0]

train_losses, test_losses = [], []
for e in range(epochs):
    start = time()
    running_loss = 0
    for idx in range(train_size):
        features, target = X_train[idx], y_train[idx]
        
        optimizer.zero_grad()
        output = model(features)
        
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()
        
        running_loss += loss.item()
    
    test_loss = 0
    
    with torch.no_grad(): #测试阶段，不计算梯度
        model.eval()  #测试阶段，附带取消dropout()
        
        for idx in range(test_size):
            features, target = X_test[idx], y_test[idx]
            
            output = model(features)  #回归分析，没有标签accuracy的计算
            test_loss += criterion(output, target)
            
    
    train_losses.append(running_loss/train_size)
    test_losses.append(test_loss/test_size)
        
    model.train()
    end = time()
    print("epoch:{}/{}.." .format(e+1, epochs), 
        "Training Loss: {:.3f}..".format(running_loss/train_size),
        "Test Loss: {:.3f}..".format(test_loss/test_size),
        "Time Cost: {:.3f}s..".format(end-start))

In [None]:
# test for ideal with group infos
pickle_path = './models_rec_1.pckl'
results_rec_1 = select_clf(pickle_path, classifiers, features, target, test_size=0.20, random_state=9)

In [None]:
net = NeuralNetClassifier(MyModule,
                          max_epochs=10,
                          lr=0.1,)

                            
                            

# sklearn pipe & gridsearch
pipe = Pipeline([('preprocessor', StandardScaler()),
                 ('clf', net)
                ])

pipe.fit

In [None]:
criterion = nn.NLLLoss()
optimizer = optim.Adam(model.parameters(), lr=0.003)

## <a class="anchor" id="References">[References](#Start)</a>
[[1]Starbucks Capstone Challenge: Using Starbucks app user data to predict effective offers](https://github.com/syuenloh/UdacityDataScientistCapstone/blob/master/Starbucks%20Capstone%20Challenge%20-%20Using%20Starbucks%20app%20user%20data%20to%20predict%20effective%20offers.ipynb)<br>


In [None]:
model_dataset.offer_received_cnt.hist() #series 直接画图

# Tips Summary

## 1.

### [3.3 Conclusion](#Start)
**Notice:**
1. `offer_id == -1` means person haven't received any offer

2. These 5 person(`offer_id=='-1'`) are the whole `label_effective_offer == -2` group

### i. In general
1. There are just 5 person, who never received the offer  
    - 2 in `segment \#7`
    - 1 in `segment \#8 \#9 \#11`
  
2. The offer distributions under income: See `segment \#3` VS. `segment \#12`  
    - Young people have not so much money. 
    - Elder people tend to have more savings.

3. The offer distributions under age: See `segment \#1` VS. `segment \#10`
    - In the low income group, compared with young person, the elder person seems to receive less offers

### ii. In subplots
1. In each segment, person reveive almost the same quantity of offers 
2. In `Segment \#3`
    - Young person tends to lack of a big savings.

# Summary