# Main

This jupyter notebook aims at giving a `craftai.pandas` use case to the user.

The work is based on the dataset `yellow.csv` located in the directory _data/_. (It is possible to regenerate this dataset by using the notebook `Preprocessing.ipynb`.)

`yellow.csv` has been extracted from the data available on the ___NYC Taxi and Limousine Commission (LTC)___ [webpage](https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page).

__Goal__:
The user is a NYC taxi driver who wants to know in the next hours where to drive to maximize his chances to find a client. 

In [1]:
import craftai.pandas
import pandas as pd
import numpy as np
import os

import Tools

import matplotlib.pyplot as plt
from plotly.offline import init_notebook_mode, iplot
import plotly.graph_objs as go

In [2]:
FIGSIZE = (17, 5)
FONT = {"family": "sans-serif", "weight": "normal", "size": 16}

init_notebook_mode(connected=True)

In [5]:
client = craftai.pandas.Client({
  "token": "eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJvd25lciI6IlBpZXJyZVNhbGF1biIsInByb2plY3QiOiJjeWJlcmRpbiIsInJpZ2h0Ijoid3JpdGUiLCJwbGF0Zm9ybSI6Imh0dHBzOi8vYmV0YS5jcmFmdC5haSIsImlhdCI6MTU3NjE2OTQ4NCwiaXNzIjoiaHR0cHM6Ly9iZXRhLmNyYWZ0LmFpIiwianRpIjoiZmFkZDc5ZmEtNjk1MS00YTk0LTgxMmQtMzdlNDExZDQ4YjQzIn0.B1GpmSwe2grBK7Hxv-_tRLjWSWwfAF4DCuXb0svgBcE"
})

In [6]:
PATH = '../data/' # Modify this to fit your data folder

In [None]:
yellow = pd.read_csv(PATH + "yellow.csv")
yellow.columns = yellow.columns[1:].insert(0, "timestamp")
yellow.timestamp = pd.to_datetime(yellow.timestamp, utc=True)
yellow.set_index("timestamp", drop=True, inplace=True)
yellow.index = yellow.index.tz_convert("America/New_York")

In [7]:
agent = 'HE4_SRC'

In [8]:
history = client.get_operations_list(agent)

In [14]:
history.to_csv('../../benchmarks/mes/cyberdin2.data',index_label='DATE')

In [None]:
def decide(tree):
    decision = client.decide_from_contexts_df(tree, DECISION_DF)
    return pd.Series({ c: decision[c].values for c in decision.columns })

In [None]:
agents_with_decisions = agents.merge(
    agents.decision_tree.apply(decide), 
    left_index=True, 
    right_index=True
)

agents_with_decisions.head()

Each decision is explainable. You can analyse the tree directly in your <a href="https://integration.craft.ai/inspector/">project's inspector</a>, or your can print the decision rule that has led to the decision.

In [None]:
# decision to explain:
agent_id_to_explain = 'taxi_zone_007'
decision_idx_to_explain = 0

line_to_explain = agents_with_decisions[agents_with_decisions.agent_id == agent_id_to_explain]

decision =  line_to_explain.trip_counter_predicted_value.values[0][decision_idx_to_explain]
        
rule_to_explain = line_to_explain.trip_counter_decision_rules.values[0][decision_idx_to_explain]

print('Agent {0} has predicted {1} clients in his zone,\nbecause '
      .format(agent_id_to_explain, decision),
      craftai.format_decision_rules(
          craftai.reduce_decision_rules(rule_to_explain)
      ))

### 4.3. Observe results

In [None]:
# To display without timezone shifting
utc_test_index = pd.date_range('2017-12-04 00:00', 
                        "2017-12-17 23:00", freq="h") 

Reformat results:

In [None]:
predictions = pd.DataFrame(
                agents_with_decisions.trip_counter_predicted_value.tolist()
              ).T
predictions.columns = selected_zones_str
predictions.index = test.index
predictions.head()

In [None]:
# predictions' standard deviation
stds = pd.DataFrame(
                agents_with_decisions.trip_counter_standard_deviation.tolist()
              ).T
stds.columns = selected_zones_str
stds.index = test.index

Static visualization:

In [None]:
zone = 7

fig, ax = plt.subplots(figsize=FIGSIZE)

plt.plot(utc_test_index, test[str(zone)], label='Reality')
plt.plot(utc_test_index, predictions[str(zone)],
            label='Prediction {:0>3}'.format(zone))

ax.fill_between(utc_test_index, 
                predictions[str(zone)] + stds[str(zone)], 
                predictions[str(zone)] - stds[str(zone)], 
                color='red', alpha=0.15, 
                label='STD {:0>3}'.format(zone))


plt.title('Zone {:0>3} Predictions'.format(zone), fontdict=FONT)
plt.xlabel('Time', fontdict=FONT)
plt.ylabel('#Clients', fontdict=FONT)

ax.set_frame_on(False)
plt.grid(True)
plt.legend(prop={'size': FONT['size']})
plt.show()

To have a deeper analysis, please refer to `Benchmarks.ipynb`.

```python
#save results for benchmark analysis:
predictions.to_csv(PATH + 'craftai.csv')
stds.to_csv(PATH + 'craftai_std.csv')

```

## 5. Conclusion: Evaluate Best Taxi Zone 

Based on all Agents estimations, find the `taxi_zone` with the most persons looking for a taxi. 

In [None]:
def best_zone(row):
    return agents.zone.values[np.argmax(row.values)]

In [None]:
predictions['best_zone'] = predictions[selected_zones_str].apply(best_zone, axis=1)

predictions.sample(10).sort_index()

In [None]:
predictions[predictions['best_zone']==145].head()

Let's have a look to the results:

In [None]:
nb_hours_to_display = 24 # display the first day

plot_text = {
    'x_tick_labels': selected_zones,
    'y_tick_labels': predictions.index[:nb_hours_to_display],
    'title': 'Where are the clients ?',
    'xlabel': 'Zone',
    'ylabel': 'Time',
    'cbar_label': '# Clients'
}

Tools.plot_matshow(predictions[selected_zones_str][:nb_hours_to_display], plot_text)

In [None]:
hours = ['{:0>2}:00'.format(h) for h in range(24)]
days = [str(d) for d in np.unique(predictions.index.dayofyear)]
mat = predictions.best_zone.values.reshape((14,24)).T

plot_text = {
    'x_tick_labels': days,
    'y_tick_labels': hours,
    'title': 'Which is the best zone to find clients ?',
    'xlabel': 'Day of the year',
    'ylabel': 'Hour of the day',
    'cbar_label': None
}

Tools.plot_matshow(mat, plot_text, selected_zones)

Thanks for reading this notebook until the end! 

 * If you want to work on more data you can check `Preprocessing.ipynb`.
 * To see the benchmarks it's with `Benchmarks.ipynb`.