# Executing Supervised AI Model to forecast sales anomaly
Method: TF Neural Network Classifier

Process:

Modelling data by aggregate orders line and merging it to orders info Predict standard value of HMS product within orders Differrence between predicted value and actual value will define anomaly flag Features to be used:

Wholesaler
Retailer
Line Count
Line of HMS Product
Sum Qty of HMS
Value HMS
Supervised label: anomaly_label

In [27]:
import numpy as np
import pandas as pd
import tensorflow as tf

In [28]:
orders = pd.read_csv("datasource/test_data/test_orders.csv")
prods = pd.read_csv("datasource/test_data/test_order_products.csv")
print("orders:",len(orders))
print("prods:",len(prods))

orders: 3000
prods: 27623


In [29]:
orders.isna().sum()

id                     0
wholesaler_id          0
retailer_id         1452
buyer_type             0
shipping_type          0
order_type             0
book_time              0
last_status            0
last_status_time       0
dtype: int64

In [30]:
orders["retailer_id"].fillna(0, inplace=True)

In [31]:
orders.head(2)

Unnamed: 0,id,wholesaler_id,retailer_id,buyer_type,shipping_type,order_type,book_time,last_status,last_status_time
0,3585206,137074,114640.0,retailer,pick-up,app,6/9/2020 8:46,selesai,6/22/2020 22:12
1,3585223,172815,190897.0,retailer,pick-up,app,6/9/2020 8:47,pesanan-diterima,6/11/2020 15:05


In [32]:
prods.head(2)

Unnamed: 0,id,sku_id,brand,packaging,packaging_amount,amount,price,book_time,last_status,last_status_time
0,3528407,013.000248.001,PT.Mayora Indah,KRT,11,1,118000,2020-06-05 08:02:29,disetujui,2020-06-05 08:02:29
1,3725739,001.4103159.000,PT.HM Sampoerna,bks,1,5,16400,2020-06-18 07:11:08,disetujui,2020-06-18 07:11:08


In [33]:
# apply is_hms
prods["is_hms"] = prods.apply(lambda x: 1 if str(x.brand).lower().find("sampoerna") > 0 else 0, axis=1)
prods["qty"] = prods["packaging_amount"] * prods["amount"]
prods["value"] = prods["price"] * prods["amount"]
prods["qty_hms"] = prods.apply(lambda x: x.qty if x.is_hms==1 else 0, axis=1)
prods["value_hms"] = prods.apply(lambda x: x.value if x.is_hms==1 else 0, axis=1)

In [34]:
prods.head(2)

Unnamed: 0,id,sku_id,brand,packaging,packaging_amount,amount,price,book_time,last_status,last_status_time,is_hms,qty,value,qty_hms,value_hms
0,3528407,013.000248.001,PT.Mayora Indah,KRT,11,1,118000,2020-06-05 08:02:29,disetujui,2020-06-05 08:02:29,0,11,118000,0,0
1,3725739,001.4103159.000,PT.HM Sampoerna,bks,1,5,16400,2020-06-18 07:11:08,disetujui,2020-06-18 07:11:08,1,5,82000,5,82000


In [36]:
prodSums = prods.groupby("id").agg({"is_hms":["count","sum"],"value":"sum","value_hms":"sum","qty_hms":"sum"}).reset_index()
prodSums.columns = ["id","line_count","line_hms","value","value_hms","qty_hms"]
prodSums.head(2)

Unnamed: 0,id,line_count,line_hms,value,value_hms,qty_hms
0,3473338,3,3,359000,359000,15
1,3473442,3,0,653000,0,0


In [37]:
orders = orders.merge(prodSums, how="left", on="id")
orders.head(2)

Unnamed: 0,id,wholesaler_id,retailer_id,buyer_type,shipping_type,order_type,book_time,last_status,last_status_time,line_count,line_hms,value,value_hms,qty_hms
0,3585206,137074,114640.0,retailer,pick-up,app,6/9/2020 8:46,selesai,6/22/2020 22:12,8,0,476600,0,0
1,3585223,172815,190897.0,retailer,pick-up,app,6/9/2020 8:47,pesanan-diterima,6/11/2020 15:05,10,4,857500,278100,15


In [39]:
feed_cols = ["wholesaler_id","retailer_id","line_count","line_hms","value","value_hms","qty_hms"]
model = tf.keras.models.load_model('aimodel/ws_all')

def predict(mdl, x, treshold=65):
    input_dict = {name: tf.convert_to_tensor([value]) for name, value in x.items()}
    predictions = mdl.predict(input_dict)
    prob = tf.nn.sigmoid(predictions[0])[0] * 100
    return 1 if prob > treshold else 0
    
orders["anomaly_label"] = orders[feed_cols].apply(lambda x: predict(model,x), axis=1)

W0929 01:54:53.449408 4640052672 def_function.py:120] 5 out of the last 5 calls to <function recreate_function.<locals>.restored_function_body at 0x14b7e96a8> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for  more details.
W0929 01:54:53.458761 4640052672 def_function.py:120] 6 out of the last 6 calls to <function recreate_function.<locals>.restored_function_body at 0x14ba9a598> triggered tf.function retracing. Tracing is expensive an

In [40]:
so = pd.read_csv("datasource/test_data/test_orders.csv").merge(orders[["id","anomaly_label"]],how="left",on="id")
so.to_csv("dataresult/supervised_orders.csv")

In [41]:
len(so[so["anomaly_label"]==1])


59

Data is saved on [dataresult/supervised_orders.csv](dataresult/supervised_orders.csv)