<a id='ldata'></a>
 <a id = "11115"></a><br>

<div>  
<h1><center style="background-color:#c0c0c0; color:black;"><strong>  📈Transaction Forecasting (ML Approaches for Time Series )🚀 </strong></center></h1>
</div>

<img src="https://github.com/Mustafa-YAZICI/-Transaction-Forecasting-ML-Approaches-for-Time-Series-/blob/main/tsf.jpg?raw=true" width="600">

<a id='ldata'></a>
 <a id = "11115"></a><br>

<div>  
<h1><center style="background-color:#c0c0c0; color:black;"><strong>TABLE OF CONTENTS </strong></center></h1>
</div>

1. [Description](#1)
2. [Loading libraries and data](#2)
3. [Understanding the data](#3)

4. [Date Features](#4)
5. [Lag/Shifted Features](#5)
6. [Rolling Mean Features](#6)
7. [Exponentially Weighted Mean Features](#7)
8. [Special Days](#8)
9. [One-Hot Encoding](#9)
10. [Custom Cost Function](#10)

11. [Train,Test, and Validation Sets](#11)
12. [LightGBM Model](#12)
13. [SMAPE](#13)
14. [Result](#14)

<a id='ldata'></a>
 <a id = "1"></a><br>

<div>  
<h1><center style="background-color:#c0c0c0; color:black;"><strong>1. DESCRIPTION </strong></center></h1>
</div>
<a id="Description"></a>

<div style="border-radius:10px;
            border : black solid;
            background-color: #FFFFF;
            font-size:110%;
            text-align: left">

    
    
## Expected Transaction Volume Estimation of Merchants
    
### Business Problem

Iyzico is a financial technology company that facilitates the online shopping experience for both buyers and sellers. It provides payment infrastructure for e-commerce companies, marketplaces and individual users. In this project we make an estimation of the total trading volume, for the first 3 months of 2021.

### Dataset & Variables

> Dataset contains 2018-2021 datas of 7 merchants.

* **transaction_date :** Date of sales
* **merchant_id :** Id's of merchants (Unique number for each merchant)
* **Total_Transaction :** Number of transactions
* **Total_Paid :** Payment amount

<a id = "2" ></a>
# <span style="font-family:serif; font-size:28px;"> 2. Loading libraries and data</span>
<a id="loading"></a>

In [1]:
import time
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
import lightgbm as lgb
import warnings

pd.set_option('display.max_columns', None)
pd.set_option('display.width', 500)
warnings.filterwarnings('ignore')

In [2]:
df.shape

NameError: name 'df' is not defined

In [None]:
df = pd.read_csv('iyzico_data.csv', parse_dates=['transaction_date'],index_col=0)
df.head(5)

<a id = "3" ></a>
# <span style="font-family:serif; font-size:28px;"> 3. Understanding the data</span>
<a id = "Undertanding the data" ></a>

In [None]:
df.info()

In [None]:
merchant_ids=df["merchant_id"].unique()
merchant_ids

In [None]:
dfx=df[df["merchant_id"]==merchant_ids[2]]
forecast = pd.DataFrame({
    "transaction_date":dfx.transaction_date,
    "Total_Transaction":dfx.Total_Transaction
})

forecast.set_index("transaction_date").Total_Transaction.plot(color = "orange", figsize = (20,6),legend=True);

We will choose one here and make an estimate based on it. You can guess the others.

In [None]:
df=df[df["merchant_id"]==merchant_ids[2]]
df.head(3)

<a id = "4" ></a>
# <span style="font-family:serif; font-size:28px;"> 4. Date Features</span>
<a id = "Undertanding the data" ></a>

In [None]:
def create_date_features(df):
    df['month'] = df.transaction_date.dt.month
    df['day_of_month'] = df.transaction_date.dt.day
    df['day_of_year'] = df.transaction_date.dt.dayofyear
    df['week_of_year'] = df.transaction_date.dt.weekofyear
    df['day_of_week'] = df.transaction_date.dt.dayofweek
    df['year'] = df.transaction_date.dt.year
    df["is_wknd"] = df.transaction_date.dt.weekday // 4
    df['is_month_start'] = df.transaction_date.dt.is_month_start.astype(int)
    df['is_month_end'] = df.transaction_date.dt.is_month_end.astype(int)
    return df

df = create_date_features(df)
df.head(3)

In [None]:
df.groupby(["merchant_id","year"]).agg({"Total_Transaction": ["count"]}).T

In [None]:
for merchant_id in df.merchant_id.unique():
    for year in df.year.unique():
        plt.figure(figsize=(10, 5))
        sns.set(font_scale=1)
        sns.barplot(x="day_of_year", y="Total_Transaction", data=df[df["merchant_id"]==merchant_id][df["year"]==year])
        plt.title(str(year)+"_"+str(merchant_id))
        plt.tight_layout()
        plt.show()
    break

In [None]:
########################
# Random Noise
########################

def random_noise(dataframe):
    return np.random.normal(scale=1.6, size=(len(dataframe),))

<a id = "5" ></a>
# <span style="font-family:serif; font-size:28px;"> 5. Lag/Shifted Features</span>
<a id = "Undertanding the data" ></a>

In [None]:
########################
# Lag/Shifted Features
########################

df.sort_values(by=['merchant_id', 'transaction_date'], axis=0, inplace=True)
df.head(3)

In [None]:
def lag_features(dataframe, lags):
    for lag in lags:
        dataframe['Total_Transaction_lag_' + str(lag)] = dataframe.groupby(["merchant_id"])['Total_Transaction'].transform(
            lambda x: x.shift(lag)) + random_noise(dataframe)
    return dataframe

df = lag_features(df, [91, 98, 105, 112, 119, 126, 182, 364, 546, 728])
df

<a id = "6" ></a>
# <span style="font-family:serif; font-size:28px;"> 6. Rolling Mean Features</span>
<a id = "Undertanding the data" ></a>

In [None]:
########################
# Rolling Mean Features
########################

def roll_mean_features(dataframe, windows):
    for window in windows:
        dataframe['Total_Paid_roll_mean_' + str(window)] = dataframe.groupby(["merchant_id"])['Total_Paid']. \
                                                          transform(
            lambda x: x.shift(1).rolling(window=window, min_periods=10, win_type="triang").mean()) + random_noise(
            dataframe)
    return dataframe


df = roll_mean_features(df, [91,120,182,365, 546,728])
df

In [None]:
# ########################
# # Rolling Mean Features
# ########################


# def roll_mean_features(dataframe, windows):
#     for window in windows:
#         dataframe['Total_Transaction_roll_mean_' + str(window)] = dataframe.groupby(["merchant_id"])['Total_Transaction']. \
#                                                           transform(
#             lambda x: x.shift(1).rolling(window=window, min_periods=10, win_type="triang").mean()) + random_noise(
#             dataframe)
#     return dataframe


# df = roll_mean_features(df, [15,30,45,60,120,182,365, 546,728])
# df

<a id = "7" ></a>
# <span style="font-family:serif; font-size:28px;"> 7. Exponentially Weighted Mean Features</span>
<a id = "Undertanding the data" ></a>

In [None]:
########################
# Exponentially Weighted Mean Features
########################

def ewm_features(dataframe, alphas, lags):
    for alpha in alphas:
        for lag in lags:
            dataframe['Total_Transaction_ewm_alpha_' + str(alpha).replace(".", "") + "_lag_" + str(lag)] = \
                dataframe.groupby(["merchant_id"])['Total_Transaction'].transform(lambda x: x.shift(lag).ewm(alpha=alpha).mean())
    return dataframe

alphas = [0.95, 0.9, 0.8, 0.7, 0.5]
lags = [91, 98, 105, 112, 180, 270, 365, 546, 728]

df = ewm_features(df, alphas, lags)
df

In [None]:
df.groupby(["day_of_year"]).agg({"Total_Paid": ["sum"]})

<a id = "8" ></a>
# <span style="font-family:serif; font-size:28px;"> 8. Special Days</span>
<a id = "Undertanding the data" ></a>

In [None]:
imported_days=[]
ort_=df["Total_Paid"].mean()
for day in df["day_of_year"].unique():
    dfx=df[df["day_of_year"]==day]["Total_Paid"]
    if int(100*dfx.mean()/ort_)<50 or int(100*dfx.mean()/ort_)>150:
        imported_days.append(day)
        if len(imported_days)<5:
            print(day,int(100*dfx.mean()/ort_),dfx.sum()/dfx.std())
        

In [None]:
df[df["day_of_year"]==26][["Total_Paid","merchant_id"]]

In [None]:
for day in imported_days:
    df["day_"+str(day)]=0
    df.loc[df[df["day_of_year"]==day].index,"day_"+str(day)]=1
df

<a id = "9" ></a>
# <span style="font-family:serif; font-size:28px;"> 9. One-Hot Encoding</span>
<a id = "Undertanding the data" ></a>

In [None]:
########################
# One-Hot Encoding
########################

df = pd.get_dummies(df, columns=['merchant_id', 'day_of_week', "week_of_year",'month'])
df

In [None]:
########################
# Converting sales to log(1+sales)
########################

df['Total_Paid'] = np.log1p(df["Total_Paid"].values)

<a id = "10" ></a>
# <span style="font-family:serif; font-size:28px;"> 10. Custom Cost Function</span>
<a id = "Undertanding the data" ></a>

In [None]:
########################
# Custom Cost Function
########################

# MAE, MSE, RMSE, SSE

# MAE: mean absolute error
# MAPE: mean absolute percentage error
# SMAPE: Symmetric mean absolute percentage error (adjusted MAPE)

def smape(preds, target):
    n = len(preds)
    masked_arr = ~((preds == 0) & (target == 0))
    preds, target = preds[masked_arr], target[masked_arr]
    num = np.abs(preds - target)
    denom = np.abs(preds) + np.abs(target)
    smape_val = (200 * np.sum(num / denom)) / n
    return smape_val


def lgbm_smape(preds, train_data):
    labels = train_data.get_label()
    smape_val = smape(np.expm1(preds), np.expm1(labels))
    return 'SMAPE', smape_val, False

<a id = "11" ></a>
# <span style="font-family:serif; font-size:28px;"> 11. Train,Test, and Validation Sets</span>
<a id = "Undertanding the data" ></a>

In [None]:
########################
# Time-Based Validation Sets
########################

# 2020'nin başına kadar (2019'nın sonuna kadar) train seti.
train = df.loc[(df["transaction_date"] < "2020-01-01"), :]

# 2020'nin ilk 3'ayı validasyon seti.
val = df.loc[(df["transaction_date"] >= "2020-01-01") & (df["transaction_date"] < "2020-04-01"), :]

cols = [col for col in train.columns if col not in ['transaction_date', "Total_Transaction","Total_Paid"]]

Y_train = train['Total_Paid']
X_train = train[cols]

Y_val = val['Total_Paid']
X_val = val[cols]

Y_train.shape, X_train.shape, Y_val.shape, X_val.shape


In [None]:
X_train.head(2)

<a id = "12" ></a>
# <span style="font-family:serif; font-size:28px;"> 12. LightGBM Model</span>
<a id = "Undertanding the data" ></a>

In [None]:
########################
# LightGBM ile Zaman Serisi Modeli
########################

# LightGBM parameters
lgb_params = {'num_leaves': 10,
              'learning_rate': 0.02,
              'feature_fraction': 0.8,
              'max_depth': 5,
              'verbose': 0,
              'num_boost_round': 10000,     
              'early_stopping_rounds': 300,    
              'nthread': -1}

lgbtrain = lgb.Dataset(data=X_train, label=Y_train, feature_name=cols) 

lgbval = lgb.Dataset(data=X_val, label=Y_val, reference=lgbtrain, feature_name=cols)

# train==fit
model = lgb.train(lgb_params, lgbtrain,
                  valid_sets=[lgbtrain, lgbval],
                  num_boost_round=lgb_params['num_boost_round'],
                  early_stopping_rounds=lgb_params['early_stopping_rounds'],
                  feval=lgbm_smape, # özel bir hata/loss fonksiyonumuz varsa buraya yazıyoruz
                  verbose_eval=10)  # 100 iterasyonda bir raporlama yapsın



<a id = "13" ></a>
# <span style="font-family:serif; font-size:28px;"> 13. SMAPE</span>
<a id = "Undertanding the data" ></a>

In [None]:
y_pred_val = model.predict(X_val, num_iteration=model.best_iteration)

smape(np.expm1(y_pred_val), np.expm1(Y_val))   # np.expm1(): log u alınmış değerlerin ilk halini bulur

In [None]:
########################
# Variable İmportant Levels
########################

def plot_lgb_importances(model, plot=False, num=10):
    gain = model.feature_importance('gain')
    feat_imp = pd.DataFrame({'feature': model.feature_name(),
                             'split': model.feature_importance('split'),
                             'gain': 100 * gain / gain.sum()}).sort_values('gain', ascending=False)
    if plot:
        plt.figure(figsize=(10, 10))
        sns.set(font_scale=1)
        sns.barplot(x="gain", y="feature", data=feat_imp[0:25])
        plt.title('feature')
        plt.tight_layout()
        plt.show()
    else:
        print(feat_imp.head(num))
    return feat_imp

In [None]:
plot_lgb_importances(model, num=30, plot=True)

In [None]:
dff=pd.DataFrame(np.expm1(Y_val))
dff["pred"]=np.expm1(y_pred_val)
dff.head(3)

In [None]:
dff["diff"].mean()

<a id = "14" ></a>
# <span style="font-family:serif; font-size:28px;"> 14. Result</span>
<a id = "Undertanding the data" ></a>

In [None]:
dff['Total_Paid'].plot(legend=True, label="Total_Paid")
dff['pred'].plot(legend=True, label="pred", figsize=(16, 8))
plt.show()

**Estimates and actual data largely overlap.**

**Great result 👍**

<img src="https://amessagewithabottle.com/wp-content/uploads/2019/09/moneyball-1024x550.jpg" width="500">

I hope all of you did gain some insights to Machine Learning. Some other my notebooks for Machine Learning are:

❖ **House Price** Prediction, EDA, ML, Visualization : https://www.kaggle.com/code/mustafayazici/house-price-eda-ml-visualization

❖ **Salary** Prediction with ML Models - ⚾Baseball  : https://www.kaggle.com/code/mustafayazici/salary-prediction-with-ml-models-basebal

❖ **Customer Churn** Prediction :https://www.kaggle.com/code/mustafayazici/telco-churn-prediction

❖ **Customer Segmentation with K-Means Clustering**  : https://www.kaggle.com/code/mustafayazici/customer-segmentation-with-k-means-clustering

❖ **Customer Segmentation with Hierarchical Clustering**  : https://www.kaggle.com/code/mustafayazici/customer-segmentation-with-hierarchical-clustering


❖ **PCA & Clustering :** https://www.kaggle.com/code/mustafayazici/makine-ogrenmesi-modeli-pca-clustering

❖ **User Based** Recommendation : https://www.kaggle.com/code/mustafayazici/user-based-recommendation-system

❖ **Content Based** Recommendation : https://www.kaggle.com/code/mustafayazici/content-based-recommendation

❖ **ARL** Recommendation : https://www.kaggle.com/code/mustafayazici/association-rule-learning-recommendation-system

❖ **My Kaggle** https://www.kaggle.com/mustafayazici/code

❖ **My github** https://github.com/Mustafa-YAZICI?tab=repositories

<img src="https://i.imgflip.com/4of9h0.jpg" width="300">

Please upvote **if you liked** the kernel! 😀

KEEP KAGGLING!