## Dataset I. target_dataset
<br>
<img src="./images/target_dataset.png"/>

Explanation for the columns:
1. `person` (int)
    - Id for person: already encoded to int number
2. `offer_id` (object of str)
    - Id for offer: already encoded to int number
    - **Values**: from '0' to '9'
3. `time_received` (float)
    - time of reveiving the offer
    - **Values**: `NaN` represent not received
4. `time_viewed` (float) 
    - time of viewing the offer
    - **Values**: `NaN` represent not viewed
5. `time_transaction` (object of str)
    - time of the transactions within an transaction unit(within one unique offer_id of one person, there may be more transaction units)
    - **Values**: there maybe more than one transactions within a transaction unit, so use `str`(obeject) to represent
        - `''` means there is no transaction
        - `',3.8,5.9'` means there is two transactions, one is at time 3.8, another in at time 5.9
6. `time_completed` (float)
    - time of complete the offer
    - **Values**: `NaN` represent not completed
7. `amount_with_offer` (float)
    - amount of transaction(s) within this transaction unit
    - **Values**: '0.0' represent no transaction
8. `label_effective_offer` (int)
    - the label to mark the completion level of offer
    - **Values**:
        - `1`: 
            - for informational offer there is at least one transaction within duration; 
            - for other offer there should be 'offer completed'
        - `0`: 
            - for informational offer there is no transaction but 'offer received'; 
            - for other offers there is no 'offer completed', but within duration there maybe some amount, although the amount of transactions not fulfil requirements
        - `-1`: the init label, when there is no 'offer received', the label keeps '-1'
        - `-2`: Special, some interesting discovering after the data was wrangled
            - represent some people: they only have transactions within all the experimantal time , no offer was sent to them

## Dataset II. transcript_offer(updated)

```python
### Code
# Just show the updated part:
normal_offer_id = target_dataset.offer_id.unique().tolist()   # ['0','1','2','3','4','5','6','7','8','9']
# show transaction, the offer_id is not normal(some has been updated by wrangling)
transcript_offer[~transcript_offer.offer_id.isin(normal_offer_id)].tail(20)
```
<img src="./images/transcript_offer.png"/>
Some values in column 'offer_id' are updated.<br>
In figure, the value '6,5' represent:<br>
　　this transaction is valid for two offer_ids: one is '6', another is '5'.

# <a class="anchor" id="Table-Start">Table of Contents</a>

I. [Data Exploration](#Data-Exploration)<br>
II.[Data Cleaning](#Data-Cleaning)<br>
III.[Data Preprocessing](#Data-Preprocessing)<br>
　　3.1. [Wrangle Data](#Wrangle-Data)<br>
　　　　Method.1 [Using a moduled class](#Method-1-wrangle)<br>
　　　　Method.2 [Based on following functions](#Method-2-wrangle) (better for debugging)<br>
　　3.2. [Save the wrangled data](#Save-Data)<br>
IV.[Explore the wrangled Data](#Wrangled-Data)(preliminary)<br>
[References](#References)

### Imports & Load in data

In [None]:
import pandas as pd
import numpy as np
import math
import json

from time import time
from collections import defaultdict

import matplotlib.pyplot as plt
%matplotlib inline

transcript_offer = pd.read_csv('./wrangled_transcript_offer.csv', dtype={'person': int})
# recover to original dataset: index is the same
transcript_offer.index = transcript_offer.iloc[:, 0].values
del transcript_offer['Unnamed: 0']

target_dataset = pd.read_csv('./modified_wrangled_target_dataset.csv', dtype={'person': int, 'offer_id': str})
# recover to original dataset: index is the same
target_dataset.index = target_dataset.iloc[:, 0].values
del target_dataset['Unnamed: 0']