# Orders

**Exercice**

Today, we will investigate the **orders**, and their associated review score.

For that purpose, we will create one single data table containing **all orders as index and all properties of these orders as columns.**

Our goal is to create the following DataFrame, which will come very handing later on for our modelling phase

  - `order_id` (_str_) _the id of the order_
  - `wait_time` (_float_) _the number of days between order_date and delivered_date_
  - `wait_vs_expected` (_float_) _if the actual delivery date is later than the estimated delivery date, returns the absolute number of days between the two dates, otherwise return 0_
  - `dim_is_five_star` (_int_) _1 if the order received a five_star, 0 otherwise_
  - `dim_is_one_star` (_int_) _1 if the order received a one_star, 0 otherwise_
  - `review_score`(_int_) from 1 to 5
  - `number_of_product` (_int_) _number of products that the order contains_
  - `number_of_sellers` (_int_) _number of sellers involved in the order_
  - `freight_value` (_float_) _value of the freight paid by customer_
  - (Optional) `distance_customer_seller` (_float_) _the distance in km between customer and seller_
  
We also want to filtering out "non-delivered" orders, unless explicitely specified

❓ Your challenge: 

- Implement each feature as a separate method within the `Order` class available at `olist/order.py`

- Then, create a method `get_training_data()` that returns the complete DataFrame.

- Feel free to use the notebook below to test your code step by step, before copying the code into `order.py` once you are certain of your code logic

- Focus on the data manipulation logic now, we will analyse the dataset visually in the next challenges

In [1]:
# Auto reload imported module everytime a jupyter cell is executed (handy for olist.order.py updates)
%load_ext autoreload
%autoreload 2

In [2]:
# Import usual modules
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

In [3]:
# Handy tips to display by default max only 8 rows of DataFrame at a time (default is 10)
pd.set_option('display.max_rows', 8)

In [4]:
# Import olist data
from olist.data import Olist
olist=Olist()
data=olist.get_data()
matching_table = olist.get_matching_table()

In [5]:
orders = data['olist_orders_dataset'].copy() # good practice to be sure not to modify your raw data

## get_wait_time
Return a dataframe with [order_id, wait_time, expected_wait_time ,delay_vs_expected]

Hints:
- Don't forget to convert dates from "string" type to "pandas.datetime' using [`pandas.to_datetime()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html)
- Take time to understand what python [`datetime`](https://docs.python.org/3/library/datetime.html) objects are

In [6]:
# Inspect orders dataframe

In [8]:
# handle datetime

In [10]:
# compute wait time

In [11]:
# compute delay vs expected

In [1]:
# check new dataframe and copy code to `olist/oder.py`

## get_review_score
        [order_id, dim_is_five_star, dim_is_one_star]

In [None]:
# Load reviews dataset
reviews = data['olist_order_reviews_dataset']
reviews

In [None]:
# Fill the function below, that we will apply element-wise in the next cell,in order to create our new columns
def dim_five_star(x):
    pass

def dim_one_star(x):
    pass

In [119]:
reviews["dim_is_five_star"] = reviews["review_score"].map(dim_five_star)

reviews["dim_is_one_star"] = reviews["review_score"].map(dim_one_star)

In [None]:
# Check your new dataframe and commit your code to olist/order.py

## get_number_products(self):
        """
        02-01 > Returns a DataFrame with:
        order_id, number_of_products

In [2]:
# YOUR CODE HERE

## get_number_sellers(self):
        """
        02-01 > Returns a DataFrame with:
        order_id, number_of_products
        """

In [3]:
# YOUR CODE HERE

## get_price_and_freight
order_id, number_of_products

## get_distance_seller_customer (OPTIONAL - Start next challenge by 3 pm)
[order_id, distance_seller_customer] (the distance in km between customer and seller)

💡Have a look at the `haversine_distance` formula we coded for you in the `olist.utils` module

## Test your newly coded module

In [None]:
%%time
from olist.order import Order
result = Order().get_training_data()
result