# Introduction

Customers does not only look for online shopping with the best variety of products and best user experience. They are also demanding fast delivery without delays. Throughout the project we have analyzed various hypotheses that might help Olist optimize business critical KPI's that focus on the cutomer, the sellers, and the process of purchasing products, keeping customers, getting the best sellers on the platform, etc. In this last part, we will analyze the potential of maximizing the latent but critical value in the process of delivering the products. Estimation of delivery is both important when the customer is choosing between products and potentially holding delivery date as a decisive factor, and when delivering the physical product to the customer by connecting logistics providers and various sellers. These operational tasks, which is not directly handled by Olist, are critical to Olist's image and how satisfied customers are after shopping on Olist. 

The goal with the following analysis is threefold. First, we are assessing whether there exist a problem in terms of delivering the products to the customers. The delivery performance is analyzed by looking at historical data and how many delays Olist has had thoughout the dataset. Second, we will optimize Olist's estimate of delivery resulting in better decision-making for the customers and higher convenience throughout the later part of the shopping experience. Lastly, we are going to predict whether an order will be delivered in time or too late by classifying an order upon purchase. 

## KPI's

Operational excellence is at the hart of an Ecommerce if customer satisfaction is a top priority. Optimization across both estimated delivery and potential delivery delay is helping Olist on several important KPI's. We can potentially increase Customer Acquisition and Customer Retention rates if Olist are able to proof a trackrecord of fast and exact delivery. Furthermore, if we are able to predict whether a package is late or not and use a set of tools to counteract a potential late delivery, the number of Late Deliveries is a KPI in itself that we would want to optimize. 

# <font color='blue'>Setup 1</font>: Load Libraries

In [1]:
import numpy as np
import pandas as pd
import sys, os
import seaborn as sns
import matplotlib.pyplot as plt
from scipy import stats
import matplotlib.pyplot as plt
from math import sin, cos, sqrt, atan2, radians
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import KFold
from sklearn.dummy import DummyRegressor
from sklearn.model_selection import cross_val_predict
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import MinMaxScaler
from sklearn.linear_model import LinearRegression
from keras.models import Sequential
from keras.layers import Dense, Dropout
from sklearn.model_selection import train_test_split
from keras import backend as K
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
%matplotlib inline
sns.set()
sns.color_palette("Paired")
plt.rcParams['figure.figsize'] = (15.0, 8.0)
plt.style.use('ggplot')

Using TensorFlow backend.


# <font color='blue'>Setup 2</font>: Load data

The workbook will focus on the delivery and Geospatial part of the OLIST dataset. Therefore, the following datasets are loaded and joined:

- olist_customers_dataset
- olist_order_items_dataset
- olist_customers_dataset
- olist_sellers_dataset
- olist_geolocation_dataset

These are joined by the left-join operation and dataformat is set. 

In [2]:
def load_table(tablename):
    """ Defining the current directory and path to where data are located. Joining path to
    data folder and table name in order to read the table into a Pandas DataFrame"""
    
    file_path = os.path.join ('../data/') 

    df = os.path.join(file_path, tablename) 
    df = pd.read_csv(df, sep=',', encoding="latin1") 
    
    return df

In [4]:
# Reading the olist_orders table
orders = load_table('olist_orders_dataset.csv')
order_items = load_table('olist_order_items_dataset.csv')
customers = load_table('olist_customers_dataset.csv')
customers.columns = ['customer_id','customer_unique_id','geolocation_zip_code_prefix',
                     'customer_city','customer_state']
sellers = load_table('olist_sellers_dataset.csv')
sellers.columns = ['seller_id','geolocation_zip_code_prefix',
                     'seller_city','seller_state']
geo = load_table('olist_geolocation_dataset.csv')

In [None]:
order_items

In [None]:
order_items[order_items['order_id']=='8272b63d03f5f79c56e9e4120aec44ef']

In [None]:
order_items.describe(include='O')

In [None]:
order_items_filt = order_items.groupby(['order_id', 'product_id','seller_id','shipping_limit_date','freight_value','price'])['order_item_id'].agg({"quantity":"max"}).reset_index()

In [None]:
order_items_filt['order_price'] = order_items_filt['price']*order_items_filt['quantity']

In [None]:
order_items_filt.describe()

In [None]:
order_items_filt = order_items_filt.groupby(['order_id','seller_id','shipping_limit_date','freight_value','quantity'])['order_price'].agg('sum').reset_index()

In [None]:
order_items_filt

In [None]:
order_items_filt.describe()

# <font color='blue'>Setup 3</font>: Merge datasets

In [None]:
#Looking at the geolocations
geo.head()

Since some of the zip-codes has several different geolocations very close to each other, we will aggregate those different locations into a mean location. explain.... 

In [None]:
geo = geo.groupby('geolocation_zip_code_prefix')['geolocation_lat','geolocation_lng'].agg('mean').reset_index()

In [None]:
# Merging geo location to sellers and customers dataset
customers_geo = customers.merge(geo, on = "geolocation_zip_code_prefix", how = "left")
customers_geo = customers_geo[['customer_id', 'geolocation_lat','geolocation_lng']]
customers_geo.columns = ['customer_id', 'customer_lat','customer_lon']
#customers_geo = customers_geo.groupby('customer_id')['customer_lat','customer_lon'].agg('mean').reset_index()

sellers_geo = sellers.merge(geo, on = "geolocation_zip_code_prefix", how = "left")
sellers_geo = sellers_geo[['seller_id', 'geolocation_lat','geolocation_lng']]
sellers_geo.columns = ['seller_id', 'seller_lat','seller_lon']
#sellers_geo = sellers_geo.groupby('seller_id')['seller_lat','seller_lon'].agg('mean').reset_index()

In [None]:
# Merging geo location to orders dataset
order_items_all = order_items_filt.merge(sellers_geo, on='seller_id', how='left')

orders_all = orders.merge(customers_geo, on='customer_id', how='left')
orders_all = orders_all.merge(order_items_all, on='order_id',how='left')

orders_all = orders_all[['order_id','order_status', 'customer_id', 'seller_id', 'order_purchase_timestamp',
                         'order_approved_at','order_delivered_carrier_date','shipping_limit_date',
                         'order_delivered_customer_date','order_estimated_delivery_date','order_price',
                         'quantity','freight_value','customer_lat','customer_lon','seller_lat', 'seller_lon']]

In [None]:
orders_all.head()

In [None]:
# Filter all orders that has been delivered
delivered_orders = orders_all[orders_all['order_status']=='delivered']
delivered_orders = delivered_orders.drop('order_status', axis=1)

In [None]:
delivered_orders

In [None]:
#Changing these columns to datetimes
delivered_orders.order_purchase_timestamp = pd.to_datetime(delivered_orders.order_purchase_timestamp)
delivered_orders.order_approved_at = pd.to_datetime(delivered_orders.order_approved_at)
delivered_orders.shipping_limit_date = pd.to_datetime(delivered_orders.shipping_limit_date)
delivered_orders.order_delivered_carrier_date = pd.to_datetime(delivered_orders.order_delivered_carrier_date)
delivered_orders.order_estimated_delivery_date = pd.to_datetime(delivered_orders.order_estimated_delivery_date)
delivered_orders.order_delivered_customer_date = pd.to_datetime(delivered_orders.order_delivered_customer_date)

# <font color='blue'>Part 1</font>: Descriptive analysis

In [None]:
def count_missing_values(df):
    missing = df.isna()
    num_missing = pd.DataFrame(missing.sum(),columns=['Number'])
    num_missing['Percentage'] = round(num_missing / len(df),4)
    
    print(num_missing)

In [None]:
count_missing_values(delivered_orders)

##### Deleting all rows which have NaN values

In [None]:
delivered_orders = delivered_orders.dropna()

In [None]:
delivered_orders.head()

When a purchase is approved, the seller is given a `shipping_limit_date` telling when they have to pass on the product to a third party logistic company. While the customer is given a `order_estimated_delivery_date`.

In [None]:
#Calculating difference between order_estimated_delivery_date and the actual order_delivered_customer_date
delivered_orders['delivery_difference'] = delivered_orders.order_estimated_delivery_date - delivered_orders.order_delivered_customer_date
delivered_orders['delivery_difference'] = delivered_orders['delivery_difference'].dt.days

In [None]:
delivered_orders['delivery_difference'].describe()

The product is recieved by the customer 10,98 days ahead of the estimated delivery day.

In [None]:
delivered_orders.describe(include='O')

In [None]:
print("Number of orders delivered later than the estimated delivery data:",len(delivered_orders[delivered_orders['delivery_difference']<0]))

Meaning ~8% of the orders was delivered later than the estimated delivery date.

In the olist_orders_dataset we have 95978 unique orders. I.e. ~4% procent of the orders includes multiple sellers. Thus, one product from one seller can be delivered on time, while the other are late.

We want to see whether we are to blame the sellers or the logistic firm for the orders delivered late. Therefor we plot all delivered orders with the shipping difference as the x-axis and delivery difference as the y-axis.

- If Y > 0 the order was delivered on time. 
- If Y < 0 and X > 0 the logistic firm is the one to blame
- If Y < 0 and X >= Y then the seller is the one to blame
- If Y < 0 and X < 0 the both the seller and the logistic firm is to blame

In [None]:
#Calculating difference between shipping_limit_date and order_delivered_carrier_date
delivered_orders['shipping_difference'] = delivered_orders.shipping_limit_date - delivered_orders.order_delivered_carrier_date
delivered_orders['shipping_difference'] = delivered_orders['shipping_difference'].dt.days

# Removing outlier

In [None]:
delivered_orders = delivered_orders.drop(89664)

In [None]:
delivered_orders['shipping_difference'].describe()

A product is passed on to the logistic firm 2.8 days ahead on average.

In [None]:
delivered_orders[delivered_orders['delivery_difference']<0]

In [None]:
def plot_late_order(df):
    
    X = [x for x in df.delivery_difference]
    X = np.array(X)
    Y = [y for y in df.shipping_difference]
    Y = np.array(Y)

    col = np.where((X<0) & (Y<0) & (Y<X),'red',
            np.where((X<0) & (Y<0) & (Y>=X),'yellow',
                np.where((X<0) & (Y>=0),'orange','grey')))
    
    fig, ax = plt.subplots()
    ax.scatter(X, Y, c=col)
    ax.set_xlabel('Delivery difference')
    ax.set_ylabel('Shipping difference')
    ax.set_title('Orders')
    ax.axhline(linewidth=1, color='black')
    ax.axvline(linewidth=1, color='black')
    plt.show()

In [None]:
plot_late_order(delivered_orders)

1. **Grey points**: order was delivered on time
2. **Orange points**: order was late because of the logistic firm
3. **Yellow points**: order was late because of the seller
4. **Red points**: order was late because of both the seller and the logistic firm

In [None]:
delivered_orders[delivered_orders['delivery_difference']<0]

In [None]:
X = [x for x in delivered_orders.delivery_difference]
Y = [y for y in delivered_orders.shipping_difference]

grey_points = [i for i, x in enumerate(X) if (X[i]>=0)]
orange_points = [i for i, x in enumerate(X) if (X[i]<0) and (Y[i]>=0)]
yellow_points = [i for i, x in enumerate(X) if (X[i]<0) and (Y[i]<0) and (Y[i]>=X[i])]
red_points = [i for i, x in enumerate(X) if (X[i]<0) and (Y[i]<0) and (Y[i]<X[i])]

In [None]:
print(" %5.2f pct. of the orders was delivered on time" %((len(grey_points)/len(X))*100))
print(" %5.2f pct. of the orders was delivered late because of the logistic firm" %((len(orange_points)/len(X))*100))
print(" %5.2f pct. of the orders was delivered late because of the seller" %((len(yellow_points)/len(X))*100))
print(" %5.2f pct. of the orders was delivered late because of both the seller and logistic firm" %((len(red_points)/len(X))*100))

# <font color='blue'>Part 2</font>: Prediction - Estimated delivery time

In [None]:
delivered_orders['actual'] = delivered_orders.order_delivered_customer_date - delivered_orders.order_purchase_timestamp
delivered_orders['actual'] = delivered_orders['actual'].dt.days

In [None]:
delivered_orders.actual.describe()

In [None]:
delivered_orders.head()

# -------adding features--------

In [None]:
def geoDistance(lat0, lon0, lat1, lon1):
    # Approx. radius of earth (km)
    R = 6373.0 
    # Convert to radians
    lat0, lon0 = radians(lat0), radians(lon0)
    lat1, lon1 = radians(lat1), radians(lon1)
    
    # Getting differences
    dlon, dlat = lon1 - lon0, lat1 - lat0
    
    # Use haversine formula
    a = sin(dlat / 2)**2 + cos(lat0) * cos(lat1) * sin(dlon / 2)**2
    c = 2 * atan2(sqrt(a), sqrt(1 - a))
    return R * c

In [None]:
delivered_orders['Distance'] = delivered_orders.apply(lambda row:\
    geoDistance(row['customer_lat'], row['customer_lon'], row['seller_lat'], row['seller_lon']), axis=1)

In [None]:
#https://www.weatherbit.io/api/weather-history-hourly

In [None]:
delivered_orders["estimated_delivery"] = (delivered_orders.order_estimated_delivery_date-delivered_orders.\
                                          order_purchase_timestamp).dt.total_seconds() / (24 * 60 * 60)

# -------adding features end--------

- take a look at what the actual delivery - the forecasted delivery was (have this as an error measure
- the forecast should be trained from the actual delivery time, and have this as the ideal model. If we can forecast exactly how much time it takes. 
- 

In [None]:


features = ['order_price','quantity','freight_value','customer_lat',
           'customer_lon','seller_lat','seller_lon','Distance','estimated_delivery']
target = ['actual']

X = np.array(delivered_orders[features])
y = np.array(delivered_orders['actual'])
split_test_size = 0.30

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = split_test_size, random_state=42)

# <font color='blue'>2.2</font>: Establish baseline

# --------plot denne for bedste model også--------

In [None]:
#Creating variables for both actual and estimated deliveries
actual_delivery = delivered_orders.actual
estimated_delivery = delivered_orders.estimated_delivery

In [None]:
#Plotting actual delivery times and estimated delivery times
plt.hist(actual_delivery, bins= 100, alpha=0.5, label='Actual delivery')
plt.hist(estimated_delivery, bins= 100, alpha=0.5, label='Estimated delivery')
plt.legend(loc='upper right')
plt.show()

As can be seen from the plot above, Olist are systematically overestimating the delivery dates, which might give customers a motivation for shopping through another ecommerce, if they are estimating a faster delivery. 
Therefore, we want to enable a correct and much better estimated delivery time, so that the customer gets more correct information about the faster delivery of their products. 

##### How much is Olist wrong at the current state

In this cell, we are looking at the estimated time of delivery that Olist gives the customer upon purchase. We are then measuring what the error is on this estimate given to the customer. 

In [None]:
#Defining an array consisting of the estimated delivery times and the actual delivery times
Y_estimate = X_test[:,-1]
baseline_errors = np.sqrt(((Y_estimate - y_test) ** 2).mean())

print('Error in estimate to actual delivery: ', round(np.mean(baseline_errors), 2), 'days')

The last feature in the test and trining set was only included for the purpose of the above calculation and will this be removed.

In [None]:
#Removing the last column in x_train and x_test
X_test = X_test[:,:-1]
X_train = X_train[:,:-1]

##### Mean prediction

Predicting the average delivery time for all the orders. In this cell, we will look at how well a baseline model does, if we predict all the coming delivery times, to be the same as the average of all delivery times from our training set. This can act as a baseline, to see if the measure is actually a better prediction that the estimate Olist are currently able to provide.

In [None]:
#Creating a baseline predicting the average delivery time
Y_pred_baseline = y_train
Y_pred_baseline = np.mean(Y_pred_baseline)

baseline_errors = np.sqrt(((Y_pred_baseline - y_test) ** 2).mean())

print('Baseline RMSE error:', round(np.mean(baseline_errors), 2), 'days')

The baseline model shows, that there is a potential to be significantly more correct in estimating the delivery times from just estimating the delivery time as the average actual delivery time in the training set. This poses the question, if we can significantly improve the estimate from using more advanced models, which rely on some of the features that we have available when a customer places an order. 

##### Linear regression model

At first, we will use a linear regression model to see if we can beat the baseline model. Furthermore, the linear regression model might indicate which features that has an impact on the estimated delivery time. 

In [None]:
X_train = preprocessing.normalize(X_train)
X_test = preprocessing.normalize(X_test)

In [None]:
#Defining and fitting a linear regression model
lr = LinearRegression()
lr.fit(X_train, y_train);

In [None]:
#Predicting the estimated delivery times and calculating the error
predictions = lr.predict(X_test)
errors = np.sqrt(((predictions - y_test) ** 2).mean())

print('RMSE for linear regression model: ', round(np.mean(errors), 2), 'days')

In [None]:
for i in range(len(lr.coef_)):
    print("Feature:","'",features[i],"'", "has the following impact on the estimate: ", lr.coef_[i])

We see that this model performs slightly better than the baseline. Furthermore, we get an idea about which features that play an important role for the eatimate of the delivery. Here price and distance is the least important features, while seller latitude and customer latitude is the most important featuers. 

# Mangler: beskrivelse af FFNN samt kort opsamling op betydning af features

##### FFNN

In [None]:
def root_mean_squared_error(y_true, y_pred):
        return K.sqrt(K.mean(K.square(y_pred - y_true))) 

In [None]:
#Defining input
input_ = X_train.shape[1]

#Defining model
model = Sequential()
model.add(Dense(20, input_dim=input_, activation='relu'))
model.add(Dropout(rate=0.25))
model.add(Dense(50, input_dim=20, activation='relu'))
model.add(Dropout(rate=0.5))
model.add(Dense(1, activation='linear'))
#Specifying what type of loss function and optimizer to use:
model.compile(loss=root_mean_squared_error, optimizer='adam')

#model.fit(X_train, y_train, epochs=10, batch_size=16)
history = model.fit(X_train, y_train, epochs=15, batch_size=64, validation_data=(X_test,y_test))

Explain why `val_loss` is lower than `loss`: https://forums.fast.ai/t/validation-loss-lower-than-training-loss/4581

In [None]:
print(history.history.keys())
# "Loss"
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epochs')
plt.legend(['Train error', 'Validation error'], loc='upper right')
plt.show()

In [None]:
#Predicting the 
pred = model.predict(X_test)
baseline_errors = np.sqrt(np.mean((pred - y_test) ** 2))
print('Baseline RMSE error:', round(np.mean(baseline_errors), 2), 'days')

# Jens-------------

Spørg jensen:
- nedenstående tager laver en y_baseline, der er lige så lang som y_train. Men denne Y_baseline består af tallene fra "delivery difference" - dette skal vel være actual, når vi prøver at forudsige den faktiske delivery dato. 

In [None]:
baseline_preds = delivered_orders['delivery_difference'].values
X_baseline = baseline_preds[:69774]
Y_baseline = baseline_preds[69774:]

baseline_errors = abs(Y_baseline - y_test)

print('Average baseline error: ', round(np.mean(baseline_errors), 2), 'days')

In [None]:


lr = LinearRegression()

lr.fit(X_train, y_train)

In [None]:
predictions = lr.predict(X_test)

errors = abs(predictions - y_test)

print('Average baseline error: ', round(np.mean(errors), 2), 'days')

In [None]:
mape = 100 * (errors/y_test)

accuracy = 100 - np.mean(mape)

print('Accuracy: ', round(accuracy, 2), '%')

In [None]:
errors

In [None]:
mape

# Jens-------------

# <font color='blue'>Part 3</font>: Prediction - Late delivery (yes or no)

This last analysis will seek to predict whether a delivery will be late or not. An online retailer platform like Olist can benefit from this analysis in several different ways: First, simply knowing which factors affect the delivery of orders is useful. Secondly, having this model deployed Olist would be able to warn sellers and customers when a order was predicted to be delivered late. 

his parameter setting means that the penalties for false predictions in the loss function will be weighted with inverse proportions to the frequencies of the classes. This can solve the problem you describe

This model will use logistic regression to perfrom the binary classification, where the the target variable $y$ is defined as: 1 if the order was delivered on time and 0 if it was late. T

In [None]:
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
import numpy as np

X = delivered_orders[['quantity','order_price','freight_value','Distance','customer_lat','customer_lon','seller_lat', 'seller_lon']]
X = preprocessing.StandardScaler().fit_transform(X)

y = (delivered_orders.order_estimated_delivery_date > delivered_orders.order_delivered_customer_date).astype(int)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

In [None]:
y.value_counts()

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report

clf = LogisticRegression(random_state=0, solver='lbfgs',
                          multi_class='multinomial', class_weight='balanced').fit(X_train, y_train)
y_pred = clf.predict(X_test)
print(classification_report(y_test, y_pred))

In [None]:
name = ['quantity','order_price','freight_value','Distance','customer_lat','customer_lon','seller_lat', 'seller_lon']
coef = clf.coef_[0]

In [None]:
list(zip(name,coef))

Interpretation of coefficients: 
- Orders with larger quantity are more likely to be delivered on time. 
- Orders with longer distance between seller and customer are less likely to be delivered on time.
- Orders with a high customer longitude coordinate are less likely to be delivered on time.
close to coast 

Features that could prove useful to this model, but has not yet been implemented:
- Weather forcast data
- Historical weather data
- Real distance between seller and customer (on roads)
- Seller track record (in terms of number of previously late deliveries. 
- Seller reviews (average)


- Previously late delivery

In [None]:
%matplotlib inline

import matplotlib.pyplot as plt
import geopandas as gpd
import pysal as ps
from pysal.viz.splot import mapping as maps

In [None]:
delivered_orders['late'] = y

late_orders = delivered_orders[delivered_orders.late == 0]
df = late_orders[['customer_lat','customer_lon']]
gdf = gpd.GeoDataFrame(
    df, geometry=gpd.points_from_xy(df.customer_lon, df.customer_lat))

orders = delivered_orders[delivered_orders.late == 1]
df1 = orders[['customer_lat','customer_lon']]
gdf1 = gpd.GeoDataFrame(
    df1, geometry=gpd.points_from_xy(df1.customer_lon, df1.customer_lat))

In [None]:
delivered_orders.groupby(['customer_lat','customer_lon']).agg({'late':'sum', 'customer_lat':'count'})

In [None]:
# define color dictionary
color_map = {1:"blue", 0:'red'}

In [None]:
plt.rcParams['figure.figsize'] = [14, 6]
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))

# restrict to South America.
ax1 = world[world.continent == 'South America'].plot(
    color='white', edgecolor='black')

ax2 = world[world.continent == 'South America'].plot(
    color='white', edgecolor='black')


colors_late = [color_map[x] for x in late_orders['late']]
gdf.plot(ax=ax1, color=colors_late, alpha=0.3, markersize=.1)

colors = [color_map[x] for x in orders['late']]
gdf1.plot(ax=ax2, color=colors, alpha=0.1, markersize=.1)
plt.show()


# <font color='blue'>Part 4</font>: Clustering - Warehouse locations

# Conclusion

- Operationelle vi går ind og optimere på for at sikre god oplevelse under og efter køb
- Der er et problem givet at 8% forsinkelser over alle ordre
- Går ind og kigger på hvad vi deskriptivt kan udlede af årsager til forsinkelserne, finder at: (1) det i høj grad er logistikudbyderens skyld at varerne er forsinket. (2) I mindre del at tilfældene er det seller som ikke har varen parat når logistikudbyderen skal 'tage over' 
- Derefter kigger vi på den estimerede tid til delivery, som informeres om efter køb. Her kan vi se at Olist systematisk informerer om en langt højere leveringstid end den tid de faktisk bruger på at levere. Vi finder frem til at vi godt kan lære af data, og at de kan give langt mere præcise leveringsestimater ved brug af de modeller, som er lavet. 
- Sidst går vi ind og ser om vi kan flagge potentielt forsinkede ordre inden det sker. Her påvises også at forskellige modeller kan fange et signal om, om en ordre leveres til tiden. --> fremtidigt arbejde hertil kan være at prediktere den faktiske forsinkelse, såfremt modellen prediktere at en vare bliver forsinket. 


- (1) --> bedre forhandling af logistikaftaler, fx som pålægning af bøder ved for sen levering fra 3. part. 
- (2) --> kan løses ved eksempelvis varelagre og på
- (3) --> de nye og bedre estimater af leverancer kan både bruges til at tiltrække kunder og til at fastholde kunder 
- (4) --> beredskabspakke som ved flagget ordre kan tages i brug for at ændre udfaldet, samtidig med at kunden bliver informeret hurtigst muligt, med et nyt og præcist estimat for levering --> bliver dette gjort korrekt, så forbedre det også den første KPI (antal forsinkelser). 




Estimated time of delivery
- her kan vi følge to KPI'er, hhv. Customer attention og Customer retention hvis vi kan begynde og reklamere med en hurtig og præcis leveringstid --> så dvs. vi giver dem information om deres leverance inden de køber frem for efter --> øget tilfredshed

Prediction late yes/no
- antallet af forsinkelser er en KPI i sig selv
- deskriptivt forarbejde, som viser hvad der oftest går galt 
- videre arbejde er derefter at kunne gå ind og sige: hvor meget er pakken forsinket
- --> så når vi får et signal om en forsinkelse så skal der SKE NOGET --> betydende faktorer kunne sige noget om.
- ideen er at vi på en eller anden måde skal kunne flagge en ordre og derefter have en toolbox som kan hjælpe med at undgå eller mindske denne forsinkelse, samt informere kunden så snart det opdaget og give dem en præcis prædiktion på en ny leveringsdag. 