# **Inference**

Nama : Richie Devon Sumantri

Batch : HCK-018

Dokumen ini berfungsi untuk melakukan uji coba prediksi data menggunakan data mentah dan model prediksi yang telah dibuat.

---

### **Import Library**

In [1]:
# Memuat libraries
from IPython.display import display, HTML
import pandas as pd
import numpy as np
import json
import pickle

# Import libraries tensorflow
from tensorflow.keras.models import load_model
from tensorflow.keras.layers import TextVectorization

# Import stopword dan lemmaitzer
import nltk
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
nltk.download('stopwords')
nltk.download("wordnet")
nltk.download("omw-1.4")
nltk.download('punkt')

# Import preprocessing
from function import text_preprocessing

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\Max\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\Max\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package omw-1.4 to
[nltk_data]     C:\Users\Max\AppData\Roaming\nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!
[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\Max\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


### **Load Model dan Preprocessing Pipeline**

Terdiri dari proses memuat model dari dokumen pickle yang terdiri dari dokumen _modelling_ dan _pipeline preprocessing_.

In [51]:
# Proses memuat model
loaded_model = load_model('model.h5')

with open('stop_words_english.txt', 'r', encoding="utf-8") as file:
	stop_words_list = file.read().splitlines()

with open('max_sen_len.txt', 'r') as f:
	max_sen_len = int(f.read())

with open('total_vocab.txt', 'r') as f:
	total_vocab = int(f.read())
 
with open('train.pickle', 'rb') as f:
    loaded_train = pickle.load(f)



### **Pembuatan Dummy Data**

Pada tahap ini akan dilakukan pembuatan data yang akan dicoba prediksi menggunakan model yang telah dibuat.

In [41]:
# Pembuatan data dummy berisi tiga data
dummy_data = [
	{
		'data' : 'Global financial markets experienced volatility today as investors reacted to ongoing concerns about rising inflation and its potential impact on economic growth. Central banks around the world are under pressure to adjust monetary policies to curb inflation while supporting recovery efforts.'
	},
	{
		'data' : "Tesla has announced record profits for the second quarter of 2024, driven by surging demand for electric vehicles (EVs) and the expansion of its production facilities. The company's stock rose sharply following the announcement, reflecting investor confidence in Tesla's growth prospects."
	},
	{
		'data' : "Peace talks between Israel and Palestine have resumed in Geneva, with international mediators seeking to broker a lasting resolution to the decades-long conflict. The negotiations are seen as a critical opportunity to address core issues such as borders, security, and the status of Jerusalem."
	}
]

# Pengubahan data dummy menjadi dataframe
dummy_df = pd.DataFrame(dummy_data)

# Menampilkan judul dari dataframe
display(HTML('<center><b><h3>Data Dummy</h3></b></center>'))

# Menampilkan dataframe
dummy_df

Unnamed: 0,data
0,Global financial markets experienced volatilit...
1,Tesla has announced record profits for the sec...
2,Peace talks between Israel and Palestine have ...


### **Data Preprocessing**

In [42]:
# Persalinan dataframe dummy
dummy_df_pre = dummy_df.copy()

# Inisialisasi pembuatan stopword bahasa inggris
stopword_eng = list(set(stopwords.words('english') + stop_words_list))

# Inisialisasi lematization
lemmatizer = WordNetLemmatizer()

# Proses preprocessing data
dummy_df_pre  = dummy_df_pre ['data'].apply(text_preprocessing, stemmer=lemmatizer, stopword=stopword_eng)

# Proses Vektorisasi
text_vectorization = TextVectorization(max_tokens=total_vocab,
                                       standardize="lower_and_strip_punctuation",
                                       split="whitespace",
                                       ngrams=(1,2),
                                       output_mode="int",
                                       output_sequence_length=max_sen_len,
                                       encoding='utf-8',
                                       input_shape=(1,)) 
text_vectorization.adapt(loaded_train)

# Proses transofrmasi data dummy
dummy_df_pre  = text_vectorization(dummy_df_pre)

  super().__init__(name=name, **kwargs)


### **Predict Data Dummy**

Pada tahap ini dilakukan penggabungan keseluruhan data dummy yang akan dilakukan prediksi menggunakan model yang telah dibuat menggunakan data _train_.

In [49]:
# Label target mapping
label_mapping = {
    0: 'business',
    1: 'entertainment',
    2: 'politics',
    3: 'sport',
    4: 'tech'
}

# Menampilkan prediksi target dari data dummy
pred = np.argmax(loaded_model.predict(dummy_df_pre), axis=1)
pred_converted = pd.Series(pred).map(label_mapping)

# Pengabungan dataframe dummy dengan dataframe prediksi
df_concat = pd.concat([dummy_df, pred_converted], axis=1)
df_concat.columns = ['data', 'Prediction Label']

# Menampilkan hasil prediksi
display(HTML('<center><b><h3>Data Prediksi</h3></b></center>'))
df_concat


[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 165ms/step


Unnamed: 0,data,Prediction Label
0,Global financial markets experienced volatilit...,business
1,Tesla has announced record profits for the sec...,business
2,Peace talks between Israel and Palestine have ...,tech
