# Prediction of Job Classification by Manual Classification

Classifying Jobs is quite a challenge, especially when we are dealing with large amount of reports. However, in this markdown, I would like to demonstrate how we can do a prediction for jobs and score based on the classified jobs that is manually categorized. Although we might have a better solution regarding Job Classification rather than manually determining which catergory a job report belongs to. We only care about the end result, which is the prediction.

#### *Import packages*

In [21]:
import pandas as pd
import numpy as np
import statistics
import nltk
from Sastrawi.StopWordRemover.StopWordRemoverFactory import StopWordRemoverFactory
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn import metrics
from imblearn.over_sampling import RandomOverSampler
import seaborn as sns
import matplotlib
import matplotlib.pyplot as plt

#### *Import dataset*

In [None]:
proms_df = pd.read_csv("CSV/PROMS_API_EXTRA.csv")
proms_df

#### *Text processing*

As the HPAM_ATHENA.ipynb has explained about text processing, I will import the processed text and embed the text to the dataset to speed up the process.

In [23]:
proms_df.title = pd.read_csv("Stemmed_Result/stemmed_title.txt").title
proms_df.remarks = pd.read_csv("Stemmed_Result/stemmed_remarks.txt").remarks

#### *Checking empty values*

Sometimes text processing creates empty values. The reason might be because of the text only contains noise (stopwords, punctuations, white space, etc). Ensuring that our data is free from null values is a must.

In [24]:
proms_df.isnull().sum()

title                 2
remarks               5
time_to_completion    0
complexity            0
related_parties       0
score                 0
created_at            0
office                0
division              0
word_count            0
dtype: int64

The number of empty values is apparently so little that it will not create any significant impact if we remove them. As such, I will remove the empty values.

#### *Removing empty values*

In [25]:
proms_df = proms_df[(proms_df.title.isnull() == False) & (proms_df.remarks.isnull() == False)].reset_index(drop=True)

In [26]:
proms_df.isnull().sum()

title                 0
remarks               0
time_to_completion    0
complexity            0
related_parties       0
score                 0
created_at            0
office                0
division              0
word_count            0
dtype: int64

## Manually Classifying Jobs

With the help of Unigram and Bigram, we can determine which word or words oftenly used in our data. Thus we can decide which to name those jobs based on the Unigram and Bigram.

#### *Counts of tasks*

In [None]:
pd.DataFrame(proms_df.title.value_counts())

As we can see above, the top 3 tasks seems to be a call. If we take a closer look, we should not see any difference between "call nasabah" and "calls" as they are generally the same kind of job. This is the reason why we need to properly classify the jobs.

#### *Unigram*

In [None]:
tokenized_title = pd.Series(np.concatenate(list(proms_df.title.str.split())))

bahasaStopwords = StopWordRemoverFactory().get_stop_words()
clean_tokenized_title = tokenized_title[~tokenized_title.isin(bahasaStopwords)]

unigram_freq = pd.DataFrame(clean_tokenized_title).value_counts().sort_values(ascending=False)
unigram_freq

#### *Bigram*

In [None]:
bigram = []
for value in proms_df.title:
    tokenized = value.split()
    tokenized = [word for word in tokenized if word not in bahasaStopwords]
    zipped = nltk.ngrams(tokenized, 2)
    lst = []
    for item in zipped:
        lst.append(item)
    bigram.extend(lst)

bigram_freq = pd.DataFrame(bigram).value_counts().sort_values(ascending=False)
bigram_freq

From the unigram and bigram data counts above, we can determine which word or words is the most important and used it as the new task title.

#### *Observing Unigrams*

In [None]:
observe_call = pd.DataFrame(proms_df.loc[proms_df.title.str.contains("call")].value_counts("title")).reset_index()
observe_hubung = pd.DataFrame(proms_df.loc[proms_df.title.str.contains("hubu")].value_counts("title").reset_index())
observe_telpon = pd.DataFrame(proms_df.loc[(proms_df.title.str.contains("telp")) | (proms_df.title.str.contains("tele"))].value_counts("title").reset_index())
observe_meeting = pd.DataFrame(proms_df.loc[proms_df.title.str.contains("meeting")].value_counts("title")).reset_index()
observe_transaction = pd.DataFrame(proms_df.loc[proms_df.title.str.contains("trans")].value_counts("title")).reset_index()
observe_update = pd.DataFrame(proms_df.loc[proms_df.title.str.contains("update")].value_counts("title")).reset_index()
observe_up = pd.DataFrame(proms_df.loc[proms_df.title.str.contains("up ")].value_counts("title")).reset_index()
observe_review = pd.DataFrame(proms_df.loc[proms_df.title.str.contains("review")].value_counts("title").reset_index())
observe_prospek = pd.DataFrame(proms_df.loc[proms_df.title.str.contains("prosp")].value_counts("title").reset_index())
observe_cl = pd.DataFrame(proms_df.loc[proms_df.title.str.contains("cl")].value_counts("title").reset_index())
observe_data = pd.DataFrame(proms_df.loc[proms_df.title.str.contains("data")].value_counts("title").reset_index())
observe_daily = pd.DataFrame(proms_df.loc[proms_df.title.str.contains("daily")].value_counts("title").reset_index())
observe_siar = pd.DataFrame(proms_df.loc[proms_df.title.str.contains("siar")].value_counts("title").reset_index())
observe_nav = pd.DataFrame(proms_df.loc[proms_df.title.str.contains("nav")].value_counts("title").reset_index())
observe_end = pd.DataFrame(proms_df.loc[proms_df.title.str.contains("end of")].value_counts("title").reset_index())
observe_siap = pd.DataFrame(proms_df.loc[proms_df.title.str.contains("siap")].value_counts("title").reset_index())
observe_input = pd.DataFrame(proms_df.loc[proms_df.title.str.contains("input")].value_counts("title").reset_index())
observe_surat = pd.DataFrame(proms_df.loc[proms_df.title.str.contains("surat")].value_counts("title").reset_index())
observe_email = pd.DataFrame(proms_df.loc[proms_df.title.str.contains("email")].value_counts("title").reset_index())
observe_nasabah = pd.DataFrame(proms_df.loc[proms_df.title.str.contains("nasabah")].value_counts("title").reset_index())
observe_koordinasi = pd.DataFrame(proms_df.loc[proms_df.title.str.contains("koordinasi|kordinasi|coordination")].value_counts("title").reset_index())

observe_call

We might be able to determine which word or words is the most important and change the title which falls under the same category with that title. However, we should make sure whether other tasks that is not on the same category also follows the same keyword. For example, we can see that keyword "call" also pull out "concall" which should not be in the same category as concall is supposed to be an online meeting and not a call.

#### *Renaming tasks that falls into the same category*

In [31]:
proms_df.title.loc[(proms_df.title.str.contains("call|hubu|telp|tele|telf")) &
                   (~proms_df.title.str.contains("meeting|concall|video|conf|con call|extension|telegram|cek|lapor|\
                                                  review|setting|salur|bisa|surat|troubleshoot|mati|ubah|biaya|\
                                                  input|telecom|tagih|bank|cek"))] = "call"
proms_df.title.loc[proms_df.title.str.contains("meeting|concal|con call|video call|confere|confr|zoom")] = "meeting"
proms_df.title.loc[proms_df.title.str.contains("trans")] = "transaction"
proms_df.title.loc[(proms_df.title.str.contains("update")) &
                   (~proms_df.title.str.contains("saldo"))] = "update"
proms_df.title.loc[proms_df.title.str.contains("saldo")] = "balance"
proms_df.title.loc[(proms_df.title.str.contains("up")) &
                   (proms_df.title.str.contains("follow"))] = "follow-up"
proms_df.title.loc[proms_df.title.str.contains("visit|kunj")] = "visit"
proms_df.title.loc[(proms_df.title.str.contains("prospek|propek|prospect")) &
                   (~proms_df.title.str.contains("prospektus|prospectus"))] = "prospect"
proms_df.title.loc[(proms_df.title.str.contains("review"))] = "review"
proms_df.title.loc[(proms_df.title.str.contains("cl|confirmation letter")) &
                   (~proms_df.title.str.contains("cl[aiuoe]"))] = "confirmation-letter"
proms_df.title.loc[proms_df.title.str.contains("data")] = "data"
proms_df.title.loc[proms_df.title.str.contains("daily")] = "daily"
proms_df.title.loc[proms_df.title.str.contains("siar")] = "siar"
proms_df.title.loc[proms_df.title.str.contains("nav")] = "net-asset-value"
proms_df.title.loc[proms_df.title.str.contains("siap|eod|end of day")] = "siap"
proms_df.title.loc[proms_df.title.str.contains("input")] = "input"
proms_df.title.loc[proms_df.title.str.contains("surat|email")] = "mail"
proms_df.title.loc[proms_df.title.str.contains("koordinasi|kordinasi|coordination")] = "coordination"
proms_df.title.loc[~proms_df.title.isin(["call", "meeting",  "transaction", "update", "balance",\
                                         "follow-up", "visit", "prospect", "review", "confirmation-letter",\
                                         "data", "daily", "siar", "net-asset-value", "siap", "input",\
                                         "mail", "coordination"])] = "other"

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)


Through Unigram, Bigram, and observation of the keywords, we can pull out tasks which falls into the same category to be renamed as the most general terms possible according to Unigram or Bigram. I decided to stop classifying the jobs on the 18th jobs and call the rest of uncategorized jobs as "other".

#### *Percentage of classified jobs*

In [34]:
classified = proms_df.loc[proms_df.title.isin(["call", "meeting",  "transaction", "update", "balance",\
                                               "follow-up", "visit", "prospect", "review", "confirmation-letter",\
                                               "data", "daily", "siar", "net-asset-value", "siap", "input",\
                                               "mail", "coordination"])].reset_index(drop=True).shape[0]

unclassified = proms_df.loc[proms_df.title.isin(["other"])].shape[0]

(classified/(classified+unclassified))*100

62.50164495328333

## Machine Learning

We might be able to predict score directly by training our machine learning using score as the target. However, the large range or the score (from 1 to a 100) creates too many target to train our machine learning. This is why we classify jobs before we predict scores. by separating data based on the job category, we can reduce the amount of target that our machine learning has to learn and create a more accurate prediction.

#### *Train and test for classifying jobs*

In [101]:
x_train, x_test, y_train, y_test = train_test_split(proms_df.loc[:, proms_df.columns != "title"], 
proms_df.title, test_size=0.2, random_state=126)

#### *Machine learning for classifying jobs*

In [103]:
vect_title = CountVectorizer(min_df=5, ngram_range=(1,5)).fit(x_train.remarks)

model_naive_title = MultinomialNB()
model_naive_title.fit(vect_title.transform(x_train.remarks), y_train)

pred_title = model_naive_title.predict(vect_title.transform(x_test.remarks))

print("Classification Report")
print("===========================")
print("Accuracy", metrics.accuracy_score(y_test, pred_title))
print("===========================\n")
print(metrics.classification_report(y_test, pred_title, labels=y_train.sort_values().unique()))

Classification Report
Accuracy 0.7645455399943604

                     precision    recall  f1-score   support

            balance       0.93      0.96      0.94       841
               call       0.66      0.86      0.74      2163
confirmation-letter       0.85      0.99      0.91       697
       coordination       0.51      0.43      0.47       201
              daily       0.78      0.91      0.84       439
               data       0.68      0.66      0.67       486
          follow-up       0.64      0.79      0.70       890
              input       0.85      0.72      0.78       387
               mail       0.57      0.70      0.63       418
            meeting       0.59      0.84      0.69      1492
    net-asset-value       1.00      0.94      0.97       323
              other       0.85      0.67      0.75      7952
           prospect       0.79      0.78      0.78       230
             review       0.76      0.80      0.78       817
               siap       0.95   

for as much as 19 target to be learned by our machine learning, an overall accuracy of 76% is quite good.

#### *Train and test for predicting score*

In [81]:
x_bal_train, x_bal_test, y_bal_train, y_bal_test = train_test_split(proms_df.loc[proms_df.title == "balance", proms_df.columns != "score"], 
proms_df[proms_df.title == "balance"].score, test_size=0.2, random_state=126)

x_cal_train, x_cal_test, y_cal_train, y_cal_test = train_test_split(proms_df.loc[proms_df.title == "call", proms_df.columns != "score"], 
proms_df[proms_df.title == "call"].score, test_size=0.2, random_state=126)

x_cl_train, x_cl_test, y_cl_train, y_cl_test = train_test_split(proms_df.loc[proms_df.title == "confirmation-letter", proms_df.columns != "score"], 
proms_df[proms_df.title == "confirmation-letter"].score, test_size=0.2, random_state=126)

x_coo_train, x_coo_test, y_coo_train, y_coo_test = train_test_split(proms_df.loc[proms_df.title == "coordination", proms_df.columns != "score"], 
proms_df[proms_df.title == "coordination"].score, test_size=0.2, random_state=126)

x_dai_train, x_dai_test, y_dai_train, y_dai_test = train_test_split(proms_df.loc[proms_df.title == "daily", proms_df.columns != "score"], 
proms_df[proms_df.title == "daily"].score, test_size=0.2, random_state=126)

x_dat_train, x_dat_test, y_dat_train, y_dat_test = train_test_split(proms_df.loc[proms_df.title == "data", proms_df.columns != "score"], 
proms_df[proms_df.title == "data"].score, test_size=0.2, random_state=126)

x_fol_train, x_fol_test, y_fol_train, y_fol_test = train_test_split(proms_df.loc[proms_df.title == "follow-up", proms_df.columns != "score"], 
proms_df[proms_df.title == "follow-up"].score, test_size=0.2, random_state=126)

x_in_train, x_in_test, y_in_train, y_in_test = train_test_split(proms_df.loc[proms_df.title == "input", proms_df.columns != "score"], 
proms_df[proms_df.title == "input"].score, test_size=0.2, random_state=126)

x_mai_train, x_mai_test, y_mai_train, y_mai_test = train_test_split(proms_df.loc[proms_df.title == "mail", proms_df.columns != "score"], 
proms_df[proms_df.title == "mail"].score, test_size=0.2, random_state=126)

x_mee_train, x_mee_test, y_mee_train, y_mee_test = train_test_split(proms_df.loc[proms_df.title == "meeting", proms_df.columns != "score"], 
proms_df[proms_df.title == "meeting"].score, test_size=0.2, random_state=126)

x_nav_train, x_nav_test, y_nav_train, y_nav_test = train_test_split(proms_df.loc[proms_df.title == "net-asset-value", proms_df.columns != "score"], 
proms_df[proms_df.title == "net-asset-value"].score, test_size=0.2, random_state=126)

x_oth_train, x_oth_test, y_oth_train, y_oth_test = train_test_split(proms_df.loc[proms_df.title == "other", proms_df.columns != "score"], 
proms_df[proms_df.title == "other"].score, test_size=0.2, random_state=126)

x_pro_train, x_pro_test, y_pro_train, y_pro_test = train_test_split(proms_df.loc[proms_df.title == "prospect", proms_df.columns != "score"], 
proms_df[proms_df.title == "prospect"].score, test_size=0.2, random_state=126)

x_rev_train, x_rev_test, y_rev_train, y_rev_test = train_test_split(proms_df.loc[proms_df.title == "review", proms_df.columns != "score"], 
proms_df[proms_df.title == "review"].score, test_size=0.2, random_state=126)

x_sr_train, x_sr_test, y_sr_train, y_sr_test = train_test_split(proms_df.loc[proms_df.title == "siar", proms_df.columns != "score"], 
proms_df[proms_df.title == "siar"].score, test_size=0.2, random_state=126)

x_sp_train, x_sp_test, y_sp_train, y_sp_test = train_test_split(proms_df.loc[proms_df.title == "siap", proms_df.columns != "score"], 
proms_df[proms_df.title == "siap"].score, test_size=0.2, random_state=126)

x_tra_train, x_tra_test, y_tra_train, y_tra_test = train_test_split(proms_df.loc[proms_df.title == "transaction", proms_df.columns != "score"], 
proms_df[proms_df.title == "transaction"].score, test_size=0.2, random_state=126)

x_upd_train, x_upd_test, y_upd_train, y_upd_test = train_test_split(proms_df.loc[proms_df.title == "update", proms_df.columns != "score"], 
proms_df[proms_df.title == "update"].score, test_size=0.2, random_state=126)

#### *Machine learning for prediction score*

In [88]:
vect_bal = CountVectorizer(min_df=5, ngram_range=(1,5)).fit(x_bal_train.remarks)
model_naive_bal = MultinomialNB().fit(vect_bal.transform(x_bal_train.remarks), y_bal_train)

vect_cal = CountVectorizer(min_df=5, ngram_range=(1,5)).fit(x_cal_train.remarks)
model_naive_cal = MultinomialNB().fit(vect_cal.transform(x_cal_train.remarks), y_cal_train)

vect_cl = CountVectorizer(min_df=5, ngram_range=(1,5)).fit(x_cl_train.remarks)
model_naive_cl = MultinomialNB().fit(vect_cl.transform(x_cl_train.remarks), y_cl_train)

vect_coo = CountVectorizer(min_df=5, ngram_range=(1,5)).fit(x_coo_train.remarks)
model_naive_coo = MultinomialNB().fit(vect_coo.transform(x_coo_train.remarks), y_coo_train)

vect_dai = CountVectorizer(min_df=5, ngram_range=(1,5)).fit(x_dai_train.remarks)
model_naive_dai = MultinomialNB().fit(vect_dai.transform(x_dai_train.remarks), y_dai_train)

vect_dat = CountVectorizer(min_df=5, ngram_range=(1,5)).fit(x_dat_train.remarks)
model_naive_dat = MultinomialNB().fit(vect_dat.transform(x_dat_train.remarks), y_dat_train)

vect_fol = CountVectorizer(min_df=5, ngram_range=(1,5)).fit(x_fol_train.remarks)
model_naive_fol = MultinomialNB().fit(vect_fol.transform(x_fol_train.remarks), y_fol_train)

vect_in = CountVectorizer(min_df=5, ngram_range=(1,5)).fit(x_in_train.remarks)
model_naive_in = MultinomialNB().fit(vect_in.transform(x_in_train.remarks), y_in_train)

vect_mai = CountVectorizer(min_df=5, ngram_range=(1,5)).fit(x_mai_train.remarks)
model_naive_mai = MultinomialNB().fit(vect_mai.transform(x_mai_train.remarks), y_mai_train)

vect_mee = CountVectorizer(min_df=5, ngram_range=(1,5)).fit(x_mee_train.remarks)
model_naive_mee = MultinomialNB().fit(vect_mee.transform(x_mee_train.remarks), y_mee_train)

vect_nav = CountVectorizer(min_df=5, ngram_range=(1,5)).fit(x_nav_train.remarks)
model_naive_nav = MultinomialNB().fit(vect_nav.transform(x_nav_train.remarks), y_nav_train)

vect_oth = CountVectorizer(min_df=5, ngram_range=(1,5)).fit(x_oth_train.remarks)
model_naive_oth = MultinomialNB().fit(vect_oth.transform(x_oth_train.remarks), y_oth_train)

vect_pro = CountVectorizer(min_df=5, ngram_range=(1,5)).fit(x_pro_train.remarks)
model_naive_pro = MultinomialNB().fit(vect_pro.transform(x_pro_train.remarks), y_pro_train)

vect_rev = CountVectorizer(min_df=5, ngram_range=(1,5)).fit(x_rev_train.remarks)
model_naive_rev = MultinomialNB().fit(vect_rev.transform(x_rev_train.remarks), y_rev_train)

vect_sr = CountVectorizer(min_df=5, ngram_range=(1,5)).fit(x_sr_train.remarks)
model_naive_sr = MultinomialNB().fit(vect_sr.transform(x_sr_train.remarks), y_sr_train)

vect_sp = CountVectorizer(min_df=5, ngram_range=(1,5)).fit(x_sp_train.remarks)
model_naive_sp = MultinomialNB().fit(vect_sp.transform(x_sp_train.remarks), y_sp_train)

vect_tra = CountVectorizer(min_df=5, ngram_range=(1,5)).fit(x_tra_train.remarks)
model_naive_tra = MultinomialNB().fit(vect_tra.transform(x_tra_train.remarks), y_tra_train)

vect_upd = CountVectorizer(min_df=5, ngram_range=(1,5)).fit(x_upd_train.remarks)
model_naive_upd = MultinomialNB().fit(vect_upd.transform(x_upd_train.remarks), y_upd_train)

In [89]:
pred_bal = model_naive_bal.predict(vect_bal.transform(x_bal_test.remarks))
pred_cal = model_naive_cal.predict(vect_cal.transform(x_cal_test.remarks))
pred_cl = model_naive_cl.predict(vect_cl.transform(x_cl_test.remarks))
pred_coo = model_naive_coo.predict(vect_coo.transform(x_coo_test.remarks))
pred_dai = model_naive_dai.predict(vect_dai.transform(x_dai_test.remarks))
pred_dat = model_naive_dat.predict(vect_dat.transform(x_dat_test.remarks))
pred_fol = model_naive_fol.predict(vect_fol.transform(x_fol_test.remarks))
pred_in = model_naive_in.predict(vect_in.transform(x_in_test.remarks))
pred_mai = model_naive_mai.predict(vect_mai.transform(x_mai_test.remarks))
pred_mee = model_naive_mee.predict(vect_mee.transform(x_mee_test.remarks))
pred_nav = model_naive_nav.predict(vect_nav.transform(x_nav_test.remarks))
pred_oth = model_naive_oth.predict(vect_oth.transform(x_oth_test.remarks))
pred_pro = model_naive_pro.predict(vect_pro.transform(x_pro_test.remarks))
pred_rev = model_naive_rev.predict(vect_rev.transform(x_rev_test.remarks))
pred_sr = model_naive_sr.predict(vect_sr.transform(x_sr_test.remarks))
pred_sp = model_naive_sp.predict(vect_sp.transform(x_sp_test.remarks))
pred_tra = model_naive_tra.predict(vect_tra.transform(x_tra_test.remarks))
pred_upd = model_naive_upd.predict(vect_upd.transform(x_upd_test.remarks))

In [96]:
print("Accuracy of balance              : ", metrics.accuracy_score(y_bal_test, pred_bal))
print("Accuracy of call                 : ", metrics.accuracy_score(y_cal_test, pred_cal))
print("Accuracy of confirmation-letter  : ", metrics.accuracy_score(y_cl_test, pred_cl))
print("Accuracy of coordination         : ", metrics.accuracy_score(y_coo_test, pred_coo))
print("Accuracy of daily                : ", metrics.accuracy_score(y_dai_test, pred_dai))
print("Accuracy of data                 : ", metrics.accuracy_score(y_dat_test, pred_dat))
print("Accuracy of follow-up            : ", metrics.accuracy_score(y_fol_test, pred_fol))
print("Accuracy of input                : ", metrics.accuracy_score(y_in_test, pred_in))
print("Accuracy of mail                 : ", metrics.accuracy_score(y_mai_test, pred_mai))
print("Accuracy of meeting              : ", metrics.accuracy_score(y_mee_test, pred_mee))
print("Accuracy of net-asset-value      : ", metrics.accuracy_score(y_nav_test, pred_nav))
print("Accuracy of other                : ", metrics.accuracy_score(y_oth_test, pred_oth))
print("Accuracy of prospect             : ", metrics.accuracy_score(y_pro_test, pred_pro))
print("Accuracy of review               : ", metrics.accuracy_score(y_rev_test, pred_rev))
print("Accuracy of siar                 : ", metrics.accuracy_score(y_sr_test, pred_sr))
print("Accuracy of siap                 : ", metrics.accuracy_score(y_sp_test, pred_sp))
print("Accuracy of transaction          : ", metrics.accuracy_score(y_tra_test, pred_tra))
print("Accuracy of update               : ", metrics.accuracy_score(y_upd_test, pred_upd))

Accuracy of balace               :  0.7697841726618705
Accuracy of call                 :  0.7487367937528709
Accuracy of confirmation-letter  :  0.927710843373494
Accuracy of coordination         :  0.3287037037037037
Accuracy of daily                :  0.6376146788990825
Accuracy of data                 :  0.4613935969868173
Accuracy of follow-up            :  0.6070175438596491
Accuracy of input                :  0.455026455026455
Accuracy of mail                 :  0.4434389140271493
Accuracy of meeting              :  0.4352542372881356
Accuracy of net-asset-value      :  0.8395904436860068
Accuracy of other                :  0.46246396791577893
Accuracy of prospect             :  0.7022222222222222
Accuracy of review               :  0.3879849812265332
Accuracy of siar                 :  0.8606811145510835
Accuracy of siap                 :  0.7014925373134329
Accuracy of transaction          :  0.5716272600834492
Accuracy of update               :  0.6725595695618755


Although there are machine learning with a performance of lower than 50%, the overall accuracy of the prediction is still quite good. With further data accumulation, we can adjust our machine learning to improve the overall accuracy in the future.