# Pengembangan dan Pengoperasian Sistem Machine Learning untuk Prediksi Keparahan Serangan Kecemasan

- Nama: `Rendika Nurhartanto Suharto`
- Username dicoding: ```RENDIKA NURHARTANTO SUHARTO```

# **1. Import Library and Dependency**

In [44]:
# conda create -n proyek-akhir-mlops python=3.9.15 -y
# conda activate proyek-akhir-mlops
# pip install -r requirements.txt
# pip install autopep8 pylint

# conda create -n tfx-beam python=3.8.18 -y
# conda activate tfx-beam
# pip install -r requirements-2.txt

In [45]:
import os
import pandas as pd # type: ignore
from typing import Text
from absl import logging  # type: ignore
from tfx.orchestration import metadata  # type: ignore
from sklearn.preprocessing import LabelEncoder
from tfx.orchestration import pipeline as tfx_pipeline  # type: ignore
from tfx.orchestration.beam.beam_dag_runner import BeamDagRunner  # type: ignore

# **2. Set Variable**

Melakukan set variabel seperti pipeline name, path untuk menyimpan output, path module, dan banyak lainnya.

In [46]:
# Define pipeline and schema names
PIPELINE_NAME = "anxiety-pipeline"  # Name of the main pipeline
SCHEMA_PIPELINE_NAME = "anxiety-tfdv-schema"  # Name of the schema pipeline
MODEL_NAME = "anxiety-model" # Name of the saved model

# Directory for storing generated artifacts
PIPELINE_ROOT = os.path.join('RENDIKA_NURHARTANTO_SUHARTO-pipeline', PIPELINE_NAME)  # Root directory for pipeline artifacts

# Path to SQLite DB file for MLMD storage
METADATA_PATH = os.path.join('metadata', PIPELINE_NAME, 'metadata.sqlite')  # Path to metadata database

# Output directory for exporting trained models
SERVING_MODEL_DIR = os.path.join('serving_model_dir', MODEL_NAME)  # Directory for saving trained models

# Pipeline inputs
DATA_ROOT = "data"  # Root directory for input data
COMPONENTS_MODULE_FILE = "modules/components.py"
TRANSFORM_MODULE_FILE = "modules/anxiety_transform.py"  # Transformation logic module
TRAINER_MODULE_FILE = "modules/anxiety_trainer.py"  # Model training logic module
TUNER_MODULE_FILE = "modules/anxiety_tuner.py"  # Hyperparameter tuning module

In [47]:
# Define subdirectories
project_root = "C:/Users/rendi/ITTS DATA SCIENCE/Semester 8\MLOps - Dicoding Bonus Course/2. Proyek Akhir - Proyek Pengembangan dan Pengoperasian Sistem Machine Learning"
modules_dir = os.path.join(project_root, "modules")

# Create directories
os.makedirs(modules_dir, exist_ok=True)

# **3. Checking and Processing Dataset with Pandas**

**Name**: Anxiety Attack : Factors, Symptoms, and Severity

**Data Format**: CSV (Comma-Separated Values)

**Size**: 12,000+ records

**Usability in Kaggle**: 10.00

**Description**: This dataset contains over **12,000 records** detailing various factors related to anxiety attacks, including demographics, lifestyle habits, stress levels, and physiological responses. It is designed for **data analysis**, **machine learning**, and **mental health research** to explore patterns, triggers, and potential correlations in anxiety disorders.

**Key Features**:

🧑‍🤝‍🧑 Demographics: Age, Gender, Occupation

🌙 Lifestyle Factors: Sleep, Physical Activity, Diet, Caffeine & Alcohol Intake

💓 Health Indicators: Heart Rate, Breathing Rate, Sweating, Dizziness

🧠 Psychological Factors: Stress Level, Family History, Therapy & Medication

⚠️ Anxiety Attack Severity: Scale from 1 to 10

**Feature Explaination**:

| **Feature**                  | **Description**                                                            |
| ----------------------------: | -------------------------------------------------------------------------- |
| `ID`                         | Unique identifier for each record                                          |
| `Age`                        | Age of the individual (18 to 64 years)                                     |
| `Gender`                     | Gender of the individual (Male, Female, Other)                             |
| `Occupation`                 | Job role of the individual                                                 |
| `Sleep Hours`                | Daily sleep duration (in hours)                                            |
| `Physical Activity`          | Weekly exercise duration (in hours)                                        |
| `Caffeine Intake`            | Daily caffeine intake (in mg)                                              |
| `Alcohol Consumption`        | Weekly alcohol consumption (in drinks)                                     |
| `Smoking`                    | Whether the individual smokes (Yes/No)                                      |
| `Family History of Anxiety`  | Whether the individual has a family history of anxiety (Yes/No)            |
| `Stress Level`               | Stress level (scale from 1 to 10)                                          |
| `Heart Rate`                 | Heart rate (bpm) during an anxiety attack                                  |
| `Breathing Rate`             | Breathing rate (breaths per minute) during an anxiety attack               |
| `Sweating Level`             | Sweating level (scale from 1 to 5)                                         |
| `Dizziness`                  | Whether dizziness was experienced during the attack (Yes/No)               |
| `Medication`                 | Whether the individual is on medication for anxiety (Yes/No)               |
| `Therapy Sessions`           | Number of therapy sessions attended per month                              |
| `Recent Major Life Event`    | Whether the individual has experienced a recent major life event (Yes/No)  |
| `Diet Quality`               | Quality of the individual's diet (scale from 1 to 10)                      |
| `Severity of Anxiety Attack` | Severity of the anxiety attack (scale from 1 to 10)                        |

In [48]:
# 1. Load dataset
data_check = pd.read_csv("anxiety_attack_dataset.csv").drop(columns = "ID")

In [49]:
# 2. Check the first 5 rows of the dataset
data_check.head()

Unnamed: 0,Age,Gender,Occupation,Sleep Hours,Physical Activity (hrs/week),Caffeine Intake (mg/day),Alcohol Consumption (drinks/week),Smoking,Family History of Anxiety,Stress Level (1-10),Heart Rate (bpm during attack),Breathing Rate (breaths/min),Sweating Level (1-5),Dizziness,Medication,Therapy Sessions (per month),Recent Major Life Event,Diet Quality (1-10),Severity of Anxiety Attack (1-10)
0,56,Female,Other,9.6,8.3,175,6,No,No,4,145,33,3,No,No,4,Yes,9,10
1,46,Male,Teacher,6.4,7.3,97,6,No,No,3,143,18,5,Yes,No,0,No,9,8
2,32,Female,Doctor,6.9,1.0,467,14,No,No,2,60,34,1,No,No,7,Yes,10,5
3,60,Male,Doctor,9.2,3.7,471,16,No,Yes,6,94,19,1,No,Yes,4,Yes,5,8
4,25,Male,Student,9.2,2.5,364,2,No,Yes,7,152,15,4,No,Yes,0,No,1,1


In [50]:
# 3. Check columns of the dataset
print(f"Columns: {data_check.columns}")

Columns: Index(['Age', 'Gender', 'Occupation', 'Sleep Hours',
       'Physical Activity (hrs/week)', 'Caffeine Intake (mg/day)',
       'Alcohol Consumption (drinks/week)', 'Smoking',
       'Family History of Anxiety', 'Stress Level (1-10)',
       'Heart Rate (bpm during attack)', 'Breathing Rate (breaths/min)',
       'Sweating Level (1-5)', 'Dizziness', 'Medication',
       'Therapy Sessions (per month)', 'Recent Major Life Event',
       'Diet Quality (1-10)', 'Severity of Anxiety Attack (1-10)'],
      dtype='object')


In [51]:
# 4. Check the summary of the dataset
data_check.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12000 entries, 0 to 11999
Data columns (total 19 columns):
 #   Column                             Non-Null Count  Dtype  
---  ------                             --------------  -----  
 0   Age                                12000 non-null  int64  
 1   Gender                             12000 non-null  object 
 2   Occupation                         12000 non-null  object 
 3   Sleep Hours                        12000 non-null  float64
 4   Physical Activity (hrs/week)       12000 non-null  float64
 5   Caffeine Intake (mg/day)           12000 non-null  int64  
 6   Alcohol Consumption (drinks/week)  12000 non-null  int64  
 7   Smoking                            12000 non-null  object 
 8   Family History of Anxiety          12000 non-null  object 
 9   Stress Level (1-10)                12000 non-null  int64  
 10  Heart Rate (bpm during attack)     12000 non-null  int64  
 11  Breathing Rate (breaths/min)       12000 non-null  int

In [52]:
# 5. Check for duplicate data in the dataframe
data_check.duplicated().sum()

0

In [53]:
# Binning the severity of anxiety attack into categories - reference: https://www.therecoveryvillage.com/mental-health/anxiety/levels-of-anxiety/

def categorize_anxiety(severity):
    if severity <= 3:
        return 'Mild Anxiety'
    elif 4 <= severity <= 6:
        return 'Moderate Anxiety'
    elif 7 <= severity <= 9:
        return 'Severe Anxiety'
    else:
        return 'Panic Level Anxiety'

# 6. Apply the function to create a new column for anxiety category
data_check['Anxiety Category'] = data_check['Severity of Anxiety Attack (1-10)'].apply(categorize_anxiety)

In [54]:
# 7. Label encoding
label_encoder = LabelEncoder()
data_check['Anxiety Category Encoded'] = label_encoder.fit_transform(data_check['Anxiety Category'])

# Display Mapping the label encoding results
dict(zip(label_encoder.classes_, label_encoder.transform(label_encoder.classes_)))

{'Mild Anxiety': 0,
 'Moderate Anxiety': 1,
 'Panic Level Anxiety': 2,
 'Severe Anxiety': 3}

In [55]:
# 8. Drop the Severity of Anxiety Attack (1-10) column
data_check.drop(columns = "Severity of Anxiety Attack (1-10)", inplace = True)

In [56]:
# 9. Save the processed data to a new CSV file
data_check.to_csv('data/anxiety_attack_dataset.csv', index=False)

***Membuat Pipeline TFX Interaktif dengan Komponen-Komponen Utama***

Pipeline ini terdiri dari beberapa komponen utama *TFX* yang saling berhubungan untuk *membangun*, *melatih*, dan *mengevaluasi* model machine learning. Setiap komponen akan dijelaskan secara detail dengan contoh kode yang dilengkapi dengan **magic command** untuk membuat modul Python. *Pipeline* ini bersifat modular dan dapat disesuaikan dengan kebutuhan proyek Anda. Magic command seperti *%%writefile* mempermudah pembuatan modul khusus untuk komponen seperti *Transform*, *Trainer*, dan *Tuner*.

1. ```CsvExampleGen``` -> Digunakan untuk membaca data dan membaginya menjadi dua bagian: training dan evaluation.
2. ```StatisticsGen``` -> Menghasilkan statistik data yang digunakan untuk analisis lebih lanjut dan pembuatan skema data.
3. ```SchemaGen``` -> Membuat skema untuk dataset berdasarkan statistik yang dihitung pada langkah sebelumnya, untuk memastikan data yang masuk sesuai dengan harapan.
4. ```ExampleValidator``` -> Memvalidasi data menggunakan skema yang telah dibuat untuk memastikan kualitas dan konsistensi data.
5. ```Transform``` -> Melakukan transformasi pada data (misalnya, normalisasi atau encoding), guna menyiapkan data untuk pelatihan.
6. ```Tuner``` -> Mencari hyperparameter terbaik untuk model, sehingga model dapat mencapai performa optimal berdasarkan dataset.
7. ```Trainer``` -> Melatih model menggunakan data yang telah ditransformasi dan hyperparameter terbaik yang ditemukan oleh Tuner.
8. ```Evaluator``` -> Mengevaluasi model yang telah dilatih menggunakan berbagai metrik kinerja, seperti Accuracy atau AUC.
9. ```Pusher``` -> Menyimpan dan mendistribusikan model terlatih jika model memenuhi kriteria evaluasi.

Komponen trainer sudah menggunakan komponen tuner. Pusher akan melakukan push model jika melebihi syarat dari BinaryAccuracy ```0.9```

# **4. Inisialisasi Komponen TFX untuk Pipeline**

In [57]:
%%writefile {COMPONENTS_MODULE_FILE}
"""
components.py

Modul ini berisi fungsi untuk inisialisasi komponen TFX dalam pipeline ML.
"""

import os
import tensorflow_model_analysis as tfma
from tfx.components import (
    CsvExampleGen, StatisticsGen, SchemaGen, ExampleValidator,
    Transform, Trainer, Tuner, Evaluator, Pusher
)
from tfx.proto import example_gen_pb2, trainer_pb2, pusher_pb2
from tfx.types import Channel
from tfx.dsl.components.common.resolver import Resolver
from tfx.types.standard_artifacts import Model, ModelBlessing
from tfx.dsl.input_resolution.strategies.latest_blessed_model_strategy import (
    LatestBlessedModelStrategy
)

def init_components(config):
    """
    Inisialisasi dan mengembalikan komponen TFX untuk pipeline.

    Args:
        config (dict): Konfigurasi pipeline yang mencakup path modul, jumlah langkah pelatihan,
                       path data, dan direktori model serving.

    Returns:
        tuple: Komponen-komponen TFX yang siap digunakan dalam pipeline.
    """

    # 1. Konfigurasi split dataset: 90% training, 10% evaluasi
    output = example_gen_pb2.Output(
        split_config=example_gen_pb2.SplitConfig(splits=[
            example_gen_pb2.SplitConfig.Split(name="train", hash_buckets=9),
            example_gen_pb2.SplitConfig.Split(name="eval", hash_buckets=1)
        ])
    )

    # 2. Komponen ExampleGen
    example_gen = CsvExampleGen(input_base=config["DATA_ROOT"], output_config=output)

    # 3. Komponen StatisticsGen
    statistics_gen = StatisticsGen(examples=example_gen.outputs["examples"])

    # 4. Komponen SchemaGen
    schema_gen = SchemaGen(statistics=statistics_gen.outputs["statistics"])

    # 5. Komponen ExampleValidator
    example_validator = ExampleValidator(
        statistics=statistics_gen.outputs["statistics"],
        schema=schema_gen.outputs["schema"]
    )

    # 6. Komponen Transform
    transform = Transform(
        examples=example_gen.outputs["examples"],
        schema=schema_gen.outputs["schema"],
        module_file=os.path.abspath(config["transform_module"])
    )

    # Validasi path module
    assert os.path.exists(config["transform_module"]), "Transform module file not found!"

    # 7. Komponen Tuner
    tuner = Tuner(
        module_file=os.path.abspath(config["tuner_module"]),
        examples=transform.outputs["transformed_examples"],
        transform_graph=transform.outputs["transform_graph"],
        schema=schema_gen.outputs["schema"],
        train_args=trainer_pb2.TrainArgs(splits=["train"], num_steps=config["training_steps"]),
        eval_args=trainer_pb2.EvalArgs(splits=["eval"], num_steps=config["eval_steps"])
    )

    assert os.path.exists(config["tuner_module"]), "Tuner module file not found!"

    # 8. Komponen Trainer
    trainer = Trainer(
        module_file=os.path.abspath(config["training_module"]),
        examples=transform.outputs["transformed_examples"],
        transform_graph=transform.outputs["transform_graph"],
        schema=schema_gen.outputs["schema"],
        hyperparameters=tuner.outputs["best_hyperparameters"],
        train_args=trainer_pb2.TrainArgs(splits=["train"], num_steps=config["training_steps"]),
        eval_args=trainer_pb2.EvalArgs(splits=["eval"], num_steps=config["eval_steps"])
    )

    assert os.path.exists(config["training_module"]), "Training module file not found!"

    # 9. Komponen Model Resolver
    model_resolver = Resolver(
        strategy_class=LatestBlessedModelStrategy,
        model=Channel(type=Model),
        model_blessing=Channel(type=ModelBlessing)
    ).with_id("Latest_blessed_model_resolver")

    # 10. Konfigurasi Evaluator
    metrics_specs = [
        tfma.MetricsSpec(metrics=[
            tfma.MetricConfig(class_name="AUC"),
            tfma.MetricConfig(class_name="Precision"),
            tfma.MetricConfig(class_name="Recall"),
            tfma.MetricConfig(class_name="ExampleCount"),
            tfma.MetricConfig(
                class_name="CategoricalAccuracy",
                threshold=tfma.MetricThreshold(
                    value_threshold=tfma.GenericValueThreshold(lower_bound={"value": 0.9}),
                    change_threshold=tfma.GenericChangeThreshold(
                        direction=tfma.MetricDirection.HIGHER_IS_BETTER,
                        absolute={"value": 0.0001}
                    )
                )
            )
        ])
    ]

    eval_config = tfma.EvalConfig(
        model_specs=[tfma.ModelSpec(label_key="Anxiety Category Encoded")],
        slicing_specs=[tfma.SlicingSpec()],
        metrics_specs=metrics_specs
    )

    evaluator = Evaluator(
        examples=example_gen.outputs["examples"],
        model=trainer.outputs["model"],
        baseline_model=model_resolver.outputs["model"],
        eval_config=eval_config
    )

    # 11. Komponen Pusher
    pusher = Pusher(
        model=trainer.outputs["model"],
        model_blessing=evaluator.outputs["blessing"],
        push_destination=pusher_pb2.PushDestination(
            filesystem=pusher_pb2.PushDestination.Filesystem(
                base_directory=config["serving_model_dir"]
            )
        )
    )

    # Mengembalikan tuple komponen untuk pipeline
    return (
        example_gen, statistics_gen, schema_gen, example_validator,
        transform, tuner, trainer, model_resolver, evaluator, pusher
    )

Overwriting modules/components.py


In [58]:
# Misalkan df adalah DataFrame dengan data mentah
print(data_check.dtypes)

Age                                    int64
Gender                                object
Occupation                            object
Sleep Hours                          float64
Physical Activity (hrs/week)         float64
Caffeine Intake (mg/day)               int64
Alcohol Consumption (drinks/week)      int64
Smoking                               object
Family History of Anxiety             object
Stress Level (1-10)                    int64
Heart Rate (bpm during attack)         int64
Breathing Rate (breaths/min)           int64
Sweating Level (1-5)                   int64
Dizziness                             object
Medication                            object
Therapy Sessions (per month)           int64
Recent Major Life Event               object
Diet Quality (1-10)                    int64
Anxiety Category                      object
Anxiety Category Encoded               int32
dtype: object


In [59]:
for kolom in data_check.columns:
    if data_check[kolom].dtype == 'int64' or data_check[kolom].dtype == 'float64':
        print(f"Kolom {kolom} bertipe data numerik")

Kolom Age bertipe data numerik
Kolom Sleep Hours bertipe data numerik
Kolom Physical Activity (hrs/week) bertipe data numerik
Kolom Caffeine Intake (mg/day) bertipe data numerik
Kolom Alcohol Consumption (drinks/week) bertipe data numerik
Kolom Stress Level (1-10) bertipe data numerik
Kolom Heart Rate (bpm during attack) bertipe data numerik
Kolom Breathing Rate (breaths/min) bertipe data numerik
Kolom Sweating Level (1-5) bertipe data numerik
Kolom Therapy Sessions (per month) bertipe data numerik
Kolom Diet Quality (1-10) bertipe data numerik


# **5. Transform: Modul Transformasi Fitur untuk Preprocessing**

In [60]:
%%writefile {TRANSFORM_MODULE_FILE}
"""
anxiety_transform.py

Modul ini menangani transformasi fitur untuk preprocessing data
menggunakan TensorFlow Transform (TFT).
"""

import tensorflow as tf
import tensorflow_transform as tft

# Daftar numerical fitur pada dataset
NUMERICAL_FEATURES = [
    "Age",
    "Sleep Hours",
    "Physical Activity (hrs/week)",
    "Caffeine Intake (mg/day)",
    "Alcohol Consumption (drinks/week)",
    "Stress Level (1-10)",
    "Heart Rate (bpm during attack)",
    "Breathing Rate (breaths/min)",
    "Sweating Level (1-5)",
    "Therapy Sessions (per month)",
    "Diet Quality (1-10)",
]

# Daftar categorical fitur pada dataset
CATEGORICAL_FEATURES = [
    "Gender",
    "Occupation",
    "Smoking",
    "Family History of Anxiety",
    "Dizziness",
    "Medication",
    "Recent Major Life Event",
]

# Label key
LABEL_KEY = "Anxiety Category Encoded"

def transformed_name(key):
    """
    Menambahkan suffix '_xf' untuk fitur yang telah ditransformasikan.

    Args:
        key (str): Nama fitur sebelum transformasi.

    Returns:
        str: Nama fitur setelah transformasi.
    """
    return f"{key}_xf"

def preprocessing_fn(inputs):
    """
    Melakukan preprocessing pada fitur input.

    Args:
        inputs (dict): Dictionary dari feature keys ke raw features.

    Returns:
        dict: Dictionary dari feature keys ke transformed features.
    """
    outputs = {}

    # 1️⃣ Encoding fitur kategorikal menjadi integer (menggunakan vocabulary encoding)
    encoded_categorical_features = {
        feature: tft.compute_and_apply_vocabulary(
            tf.strings.strip(tf.strings.lower(inputs[feature]))
        )
        for feature in CATEGORICAL_FEATURES
        if feature in inputs
    }

    # 2️⃣ Gabungkan semua fitur numerik dan fitur kategorikal yang telah dienkode
    all_numeric_features = {**encoded_categorical_features}
    for feature in NUMERICAL_FEATURES:
        if feature in inputs:
            all_numeric_features[feature] = tf.cast(inputs[feature], tf.float32)

    # 3️⃣ Normalisasi semua fitur numerik agar berada dalam rentang [0,1]
    for feature, tensor in all_numeric_features.items():
        outputs[transformed_name(feature)] = tft.scale_to_0_1(tensor)

    # 4️⃣ Transformasi label target menjadi integer
    if LABEL_KEY in inputs:
        outputs[transformed_name(LABEL_KEY)] = tf.cast(inputs[LABEL_KEY], tf.int64)

    return outputs

Overwriting modules/anxiety_transform.py


In [61]:
data_check["Anxiety Category"].value_counts()

Moderate Anxiety       3680
Severe Anxiety         3602
Mild Anxiety           3531
Panic Level Anxiety    1187
Name: Anxiety Category, dtype: int64

# **6. Tuner: Modul Tuning Hyperparameter Model dengan Keras Tuner**

In [62]:
%%writefile {TUNER_MODULE_FILE}
"""
anxiety_tuner.py

Modul ini digunakan untuk melakukan tuning hyperparameter model menggunakan Keras Tuner.
"""

# Import library
import tensorflow as tf
import keras_tuner as kt
import tensorflow_transform as tft
from tfx.v1.components import TunerFnResult
from tfx.components.trainer.fn_args_utils import FnArgs
from anxiety_trainer import NUMERICAL_FEATURES, CATEGORICAL_FEATURES, transformed_name, input_fn

def model_builder(hyperparameters):
    """
    Membuat model Keras dengan hyperparameter yang akan dituning.

    Args:
        hyperparameters (kt.HyperParameters): Hyperparameters yang akan digunakan untuk tuning.

    Returns:
        tf.keras.Model: Model Keras yang dikompilasi.
    """

    input_features = [
        tf.keras.Input(shape=(1,), name=transformed_name(key))
        for key in NUMERICAL_FEATURES + CATEGORICAL_FEATURES
    ]

    concatenate = tf.keras.layers.concatenate(input_features)

    # Hyperparameter yang lebih luas untuk optimasi lebih baik
    unit_1 = hyperparameters.Int('unit_1', min_value=128, max_value=512, step=64)
    dropout_1 = hyperparameters.Float('dropout_1', min_value=0.1, max_value=0.5, step=0.1)

    unit_2 = hyperparameters.Int('unit_2', min_value=64, max_value=256, step=32)
    dropout_2 = hyperparameters.Float('dropout_2', min_value=0.1, max_value=0.5, step=0.1)

    unit_3 = hyperparameters.Int('unit_3', min_value=32, max_value=128, step=32)
    dropout_3 = hyperparameters.Float('dropout_3', min_value=0.1, max_value=0.5, step=0.1)

    learning_rate = hyperparameters.Choice('learning_rate', [0.0001, 0.0005, 0.001, 0.005])

    # Membangun arsitektur model
    deep = tf.keras.layers.Dense(unit_1, activation="relu")(concatenate)
    deep = tf.keras.layers.Dropout(dropout_1)(deep)

    deep = tf.keras.layers.Dense(unit_2, activation="relu")(deep)
    deep = tf.keras.layers.Dropout(dropout_2)(deep)

    deep = tf.keras.layers.Dense(unit_3, activation="relu")(deep)
    deep = tf.keras.layers.Dropout(dropout_3)(deep)

    outputs = tf.keras.layers.Dense(4, activation="softmax")(deep)  # 4 kelas untuk klasifikasi

    model = tf.keras.models.Model(inputs=input_features, outputs=outputs)

    # Kompilasi model dengan optimizer yang dituning
    model.compile(
        optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate),
        loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
        metrics=[tf.keras.metrics.SparseCategoricalAccuracy()]
    )

    return model

def tuner_fn(fn_args: FnArgs):
    """
    Melakukan tuning hyperparameter menggunakan Keras Tuner.

    Args:
        fn_args (FnArgs): Argumen fungsi dari TFX yang berisi informasi data & model.

    Returns:
        TunerFnResult: Objek hasil tuning dari TFX.
    """

    tf_transform_output = tft.TFTransformOutput(fn_args.transform_graph_path)

    # Ambil dataset pelatihan dan evaluasi
    train_dataset = input_fn(fn_args.train_files, tf_transform_output, batch_size=32)
    eval_dataset = input_fn(fn_args.eval_files, tf_transform_output, batch_size=32)

    # Inisialisasi RandomSearch tuner
    tuner = kt.RandomSearch(
        model_builder,
        objective='val_sparse_categorical_accuracy',  # Optimasi berdasarkan akurasi validasi
        max_trials=10,
        executions_per_trial=2,
        directory=fn_args.working_dir,
        project_name='anxiety_severity_tuner'
    )

    return TunerFnResult(
        tuner=tuner,
        fit_kwargs={
            "x": train_dataset,
            "validation_data": eval_dataset,
            "steps_per_epoch": fn_args.train_steps,
            "validation_steps": fn_args.eval_steps,
            "epochs": 8
        }
    )

Overwriting modules/anxiety_tuner.py


# **7. Trainer: Modul Pelatihan dan Penyajian Model**

In [63]:
%%writefile {TRAINER_MODULE_FILE}
"""
anxiety_trainer.py

Modul ini berisi fungsi pelatihan model Machine Learning untuk klasifikasi tingkat kecemasan.
"""

# Import library
import os
import tensorflow as tf
import tensorflow_transform as tft
from anxiety_transform import (
    LABEL_KEY,
    NUMERICAL_FEATURES,
    CATEGORICAL_FEATURES,
    transformed_name,
)

def get_model(hyperparameters, show_summary=True):
    """
    Membuat model dengan hyperparameter terbaik dari tuner.

    Args:
        hyperparameters (dict): Dictionary berisi nilai hyperparameter.
        show_summary (bool): Menampilkan summary model jika True.

    Returns:
        tf.keras.Model: Model Keras yang telah dikompilasi.
    """

    input_features = [
        tf.keras.Input(shape=(1,), name=transformed_name(feature))
        for feature in NUMERICAL_FEATURES + CATEGORICAL_FEATURES
    ]

    concatenate = tf.keras.layers.concatenate(input_features)

    # Ambil hyperparameter terbaik dari tuner
    unit_1 = hyperparameters.get('unit_1', 128)
    dropout_1 = hyperparameters.get('dropout_1', 0.2)
    unit_2 = hyperparameters.get('unit_2', 64)
    dropout_2 = hyperparameters.get('dropout_2', 0.2)
    unit_3 = hyperparameters.get('unit_3', 32)
    dropout_3 = hyperparameters.get('dropout_3', 0.2)
    learning_rate = hyperparameters.get('learning_rate', 0.001)

    # Lapisan Dense berdasarkan hyperparameter yang dituning
    deep = tf.keras.layers.Dense(unit_1, activation="relu")(concatenate)
    deep = tf.keras.layers.Dropout(dropout_1)(deep)

    deep = tf.keras.layers.Dense(unit_2, activation="relu")(deep)
    deep = tf.keras.layers.Dropout(dropout_2)(deep)

    deep = tf.keras.layers.Dense(unit_3, activation="relu")(deep)
    deep = tf.keras.layers.Dropout(dropout_3)(deep)

    # Output layer untuk klasifikasi multi-kelas
    outputs = tf.keras.layers.Dense(4, activation="softmax")(deep)

    # Buat model
    model = tf.keras.models.Model(inputs=input_features, outputs=outputs)

    # Kompilasi model
    model.compile(
        optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate),
        loss='sparse_categorical_crossentropy',
        metrics=[tf.keras.metrics.SparseCategoricalAccuracy()]
    )

    if show_summary:
        model.summary()

    return model

def gzip_reader_fn(filenames):
    """Loads compressed data"""
    return tf.data.TFRecordDataset(filenames, compression_type='GZIP')

def get_serve_tf_examples_fn(model, tf_transform_output):
    """Returns a function that parses a serialized tf.Example."""

    model.tft_layer = tf_transform_output.transform_features_layer()

    @tf.function
    def serve_tf_examples_fn(serialized_tf_examples):
        """Returns the output to be used in the serving signature."""
        feature_spec = tf_transform_output.raw_feature_spec()
        feature_spec.pop(LABEL_KEY)
        parsed_features = tf.io.parse_example(serialized_tf_examples, feature_spec)

        transformed_features = model.tft_layer(parsed_features)

        outputs = model(transformed_features)
        return {"outputs": outputs}

    return serve_tf_examples_fn

def input_fn(file_pattern, tf_transform_output, batch_size=64):
    """
    Generates features and labels for tuning/training.

    Args:
        file_pattern (str): Pola file untuk dataset.
        tf_transform_output: Output dari transformasi fitur.
        batch_size (int): Ukuran batch.

    Returns:
        tf.data.Dataset: Dataset dalam format TensorFlow.
    """
    transformed_feature_spec = tf_transform_output.transformed_feature_spec().copy()

    dataset = tf.data.experimental.make_batched_features_dataset(
        file_pattern=file_pattern,
        batch_size=batch_size,
        features=transformed_feature_spec,
        reader=gzip_reader_fn,
        label_key=transformed_name(LABEL_KEY),
    )

    return dataset

def run_fn(fn_args):
    """
    Fungsi utama untuk melatih model berdasarkan hasil tuning dari tuner.

    Args:
        fn_args: Argumen dari TFX yang berisi informasi data & model.
    """

    # Load hasil transformasi fitur
    tf_transform_output = tft.TFTransformOutput(fn_args.transform_output)

    # Ambil hyperparameters terbaik dari tuner
    hyperparameters = fn_args.hyperparameters

    # Ambil dataset pelatihan dan evaluasi
    train_dataset = input_fn(fn_args.train_files, tf_transform_output, batch_size=64)
    eval_dataset = input_fn(fn_args.eval_files, tf_transform_output, batch_size=64)

    # Buat model dengan hyperparameter terbaik
    model = get_model(hyperparameters)

    log_dir = os.path.join(os.path.dirname(fn_args.serving_model_dir), "logs")

    tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, update_freq="batch")

    # Tambahkan callback untuk optimalisasi training
    early_stopping = tf.keras.callbacks.EarlyStopping(
        monitor='val_loss', patience=8, restore_best_weights=True
    )

    reduce_lr = tf.keras.callbacks.ReduceLROnPlateau(
        monitor='val_loss', factor=0.5, patience=3, min_lr=0.00001
    )

    # Latih model
    model.fit(
        train_dataset,
        steps_per_epoch=fn_args.train_steps,
        validation_data=eval_dataset,
        validation_steps=fn_args.eval_steps,
        callbacks=[tensorboard_callback, early_stopping, reduce_lr],
        epochs=10
    )

    # Simpan model untuk serving
    signatures = {
        "serving_default": get_serve_tf_examples_fn(model, tf_transform_output).get_concrete_function(
            tf.TensorSpec(shape=[None], dtype=tf.string, name="examples")
        ),
    }

    model.save(fn_args.serving_model_dir, save_format="tf", signatures=signatures)

Overwriting modules/anxiety_trainer.py


# **8. Inisialisasi Pipeline Lokal dengan Apache Beam**

In [64]:
def init_local_pipeline(
    components, pipeline_root: Text
) -> tfx_pipeline.Pipeline:  # Use the aliased name here
    
    logging.info(f"Pipeline root set to: {pipeline_root}")

    
    return tfx_pipeline.Pipeline(  # Use the aliased name here
        pipeline_name=PIPELINE_NAME,
        pipeline_root=pipeline_root,
        components=components,
        enable_cache=True,
        metadata_connection_config=metadata.sqlite_metadata_connection_config(
            METADATA_PATH
        )
    )

# **9. Menjalankan Pipeline Lokal Menggunakan Apache Beam**

In [65]:
from modules.components import init_components

logging.set_verbosity(logging.INFO)

config = {
    "DATA_ROOT": DATA_ROOT,
    "training_module": TRAINER_MODULE_FILE,
    "transform_module": TRANSFORM_MODULE_FILE,
    "tuner_module": TUNER_MODULE_FILE,
    "training_steps": 1000,
    "eval_steps": 250,
    "serving_model_dir": SERVING_MODEL_DIR,
}

components = init_components(config)

pipeline = init_local_pipeline(components, PIPELINE_ROOT)
BeamDagRunner().run(pipeline=pipeline)

Trial 10 Complete [00h 01m 27s]
val_sparse_categorical_accuracy: 0.2943124920129776

Best val_sparse_categorical_accuracy So Far: 0.3099374920129776
Total elapsed time: 00h 14m 36s
INFO:tensorflow:Oracle triggered exit


INFO:tensorflow:Oracle triggered exit
INFO:absl:Finished tuning... Tuner ID: tuner0
INFO:absl:Best HyperParameters: {'space': [{'class_name': 'Int', 'config': {'name': 'unit_1', 'default': None, 'conditions': [], 'min_value': 128, 'max_value': 512, 'step': 64, 'sampling': None}}, {'class_name': 'Float', 'config': {'name': 'dropout_1', 'default': 0.1, 'conditions': [], 'min_value': 0.1, 'max_value': 0.5, 'step': 0.1, 'sampling': None}}, {'class_name': 'Int', 'config': {'name': 'unit_2', 'default': None, 'conditions': [], 'min_value': 64, 'max_value': 256, 'step': 32, 'sampling': None}}, {'class_name': 'Float', 'config': {'name': 'dropout_2', 'default': 0.1, 'conditions': [], 'min_value': 0.1, 'max_value': 0.5, 'step': 0.1, 'sampling': None}}, {'class_name': 'Int', 'config': {'name': 'unit_3', 'default': None, 'conditions': [], 'min_value': 32, 'max_value': 128, 'step': 32, 'sampling': None}}, {'class_name': 'Float', 'config': {'name': 'dropout_3', 'default': 0.1, 'conditions': [], 'min_

Results summary
Results in RENDIKA_NURHARTANTO_SUHARTO-pipeline\anxiety-pipeline\Tuner\.system\executor_execution\108\.temp\108\anxiety_severity_tuner
Showing 10 best trials
<keras_tuner.engine.objective.Objective object at 0x0000023210B1D4F0>
Trial summary
Hyperparameters:
unit_1: 448
dropout_1: 0.1
unit_2: 256
dropout_2: 0.4
unit_3: 64
dropout_3: 0.2
learning_rate: 0.0005
Score: 0.3099374920129776
Trial summary
Hyperparameters:
unit_1: 384
dropout_1: 0.1
unit_2: 96
dropout_2: 0.4
unit_3: 96
dropout_3: 0.30000000000000004
learning_rate: 0.005
Score: 0.30924999713897705
Trial summary
Hyperparameters:
unit_1: 128
dropout_1: 0.4
unit_2: 160
dropout_2: 0.30000000000000004
unit_3: 64
dropout_3: 0.2
learning_rate: 0.0005
Score: 0.3085625022649765
Trial summary
Hyperparameters:
unit_1: 320
dropout_1: 0.2
unit_2: 192
dropout_2: 0.2
unit_3: 128
dropout_3: 0.2
learning_rate: 0.001
Score: 0.3062499910593033
Trial summary
Hyperparameters:
unit_1: 512
dropout_1: 0.4
unit_2: 64
dropout_2: 0.2
unit_

INFO:absl:Running launcher for node_info {
  type {
    name: "tfx.components.trainer.component.Trainer"
    base_type: TRAIN
  }
  id: "Trainer"
}
contexts {
  contexts {
    type {
      name: "pipeline"
    }
    name {
      field_value {
        string_value: "anxiety-pipeline"
      }
    }
  }
  contexts {
    type {
      name: "pipeline_run"
    }
    name {
      field_value {
        string_value: "20250210-014456.003231"
      }
    }
  }
  contexts {
    type {
      name: "node"
    }
    name {
      field_value {
        string_value: "anxiety-pipeline.Trainer"
      }
    }
  }
}
inputs {
  inputs {
    key: "examples"
    value {
      channels {
        producer_node_query {
          id: "Transform"
        }
        context_queries {
          type {
            name: "pipeline"
          }
          name {
            field_value {
              string_value: "anxiety-pipeline"
            }
          }
        }
        context_queries {
          type {
        

Model: "model_1"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 Age_xf (InputLayer)            [(None, 1)]          0           []                               
                                                                                                  
 Sleep Hours_xf (InputLayer)    [(None, 1)]          0           []                               
                                                                                                  
 Physical Activity (hrs/week)_x  [(None, 1)]         0           []                               
 f (InputLayer)                                                                                   
                                                                                                  
 Caffeine Intake (mg/day)_xf (I  [(None, 1)]         0           []                         

INFO:tensorflow:struct2tensor is not available.


INFO:tensorflow:tensorflow_decision_forests is not available.


INFO:tensorflow:tensorflow_decision_forests is not available.


INFO:tensorflow:tensorflow_text is not available.


INFO:tensorflow:tensorflow_text is not available.


INFO:tensorflow:Assets written to: RENDIKA_NURHARTANTO_SUHARTO-pipeline\anxiety-pipeline\Trainer\model\109\Format-Serving\assets


INFO:tensorflow:Assets written to: RENDIKA_NURHARTANTO_SUHARTO-pipeline\anxiety-pipeline\Trainer\model\109\Format-Serving\assets
INFO:absl:Training complete. Model written to RENDIKA_NURHARTANTO_SUHARTO-pipeline\anxiety-pipeline\Trainer\model\109\Format-Serving. ModelRun written to RENDIKA_NURHARTANTO_SUHARTO-pipeline\anxiety-pipeline\Trainer\model_run\109
INFO:absl:Cleaning up stateless execution info.
INFO:absl:Execution 109 succeeded.
INFO:absl:Cleaning up stateful execution info.
INFO:absl:Publishing output artifacts defaultdict(<class 'list'>, {'model': [Artifact(artifact: uri: "RENDIKA_NURHARTANTO_SUHARTO-pipeline\\anxiety-pipeline\\Trainer\\model\\109"
custom_properties {
  key: "name"
  value {
    string_value: "anxiety-pipeline:20250210-014456.003231:Trainer:model:0"
  }
}
custom_properties {
  key: "tfx_version"
  value {
    string_value: "1.8.0"
  }
}
name: "anxiety-pipeline:20250210-014456.003231:Trainer:model:0"
, artifact_type: name: "Model"
base_type: MODEL
)], 'model_



INFO:absl:The 'example_splits' parameter is not set, using 'eval' split.
INFO:absl:Evaluating model.
INFO:absl:udf_utils.get_fn {'fairness_indicator_thresholds': 'null', 'eval_config': '{\n  "metrics_specs": [\n    {\n      "metrics": [\n        {\n          "class_name": "AUC"\n        },\n        {\n          "class_name": "Precision"\n        },\n        {\n          "class_name": "Recall"\n        },\n        {\n          "class_name": "ExampleCount"\n        },\n        {\n          "class_name": "CategoricalAccuracy",\n          "threshold": {\n            "change_threshold": {\n              "absolute": 0.0001,\n              "direction": "HIGHER_IS_BETTER"\n            },\n            "value_threshold": {\n              "lower_bound": 0.9\n            }\n          }\n        }\n      ]\n    }\n  ],\n  "model_specs": [\n    {\n      "label_key": "Anxiety Category Encoded"\n    }\n  ],\n  "slicing_specs": [\n    {}\n  ]\n}', 'example_splits': 'null'} 'custom_extractors'
INFO:absl



























INFO:absl:Evaluation complete. Results written to RENDIKA_NURHARTANTO_SUHARTO-pipeline\anxiety-pipeline\Evaluator\evaluation\110.
INFO:absl:Checking validation results.
INFO:absl:Blessing result False written to RENDIKA_NURHARTANTO_SUHARTO-pipeline\anxiety-pipeline\Evaluator\blessing\110.
INFO:absl:Cleaning up stateless execution info.
INFO:absl:Execution 110 succeeded.
INFO:absl:Cleaning up stateful execution info.
INFO:absl:Publishing output artifacts defaultdict(<class 'list'>, {'blessing': [Artifact(artifact: uri: "RENDIKA_NURHARTANTO_SUHARTO-pipeline\\anxiety-pipeline\\Evaluator\\blessing\\110"
custom_properties {
  key: "name"
  value {
    string_value: "anxiety-pipeline:20250210-014456.003231:Evaluator:blessing:0"
  }
}
custom_properties {
  key: "tfx_version"
  value {
    string_value: "1.8.0"
  }
}
name: "anxiety-pipeline:20250210-014456.003231:Evaluator:blessing:0"
, artifact_type: name: "ModelBlessing"
)], 'evaluation': [Artifact(artifact: uri: "RENDIKA_NURHARTANTO_SUHARTO