<a href="https://colab.research.google.com/github/221230001-wq/221230001-Pengantar-ML/blob/main/week-02/latihan-praktikum-2-numpy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Praktikum 2: Komputasi Numerik dengan NumPy Array

<a href="https://colab.research.google.com/github/pakizhan-ump/ml-umpontianak/blob/main/Modules/Week-02/Praktikum-02/Praktikum_2_numpy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## 🎯 Tujuan Praktikum
Mahasiswa mampu menggunakan library **NumPy** untuk membuat dan memanipulasi array multidimensi serta memahami keunggulannya dibandingkan List untuk komputasi numerik.

## 📖 Dasar Teori
**NumPy (Numerical Python)** adalah library fundamental untuk komputasi ilmiah di Python. Struktur data utamanya adalah **ndarray (n-dimensional array)**, yang dirancang khusus untuk operasi numerik yang efisien. Keunggulan utamanya meliputi:
* **Homogeneous:** Semua elemen dalam sebuah array harus memiliki tipe data yang sama (misalnya, semua `float64` atau semua `int32`), yang memungkinkan penyimpanan memori yang lebih efisien.
* **Ukuran Tetap:** Ukuran array ditentukan saat pembuatan, membuatnya lebih dapat diprediksi dalam hal alokasi memori.
* **Efisiensi Memori & Kecepatan:** Operasi matematika pada NumPy array jauh lebih cepat karena dieksekusi oleh kode C yang terkompilasi, bukan melalui interpreter Python.
* **Mendukung Operasi Vektorisasi:** Memungkinkan operasi matematika dilakukan pada seluruh elemen array secara bersamaan tanpa perlu menulis *loop* `for`, membuat kode lebih ringkas dan cepat.

Dalam machine learning, data seperti gambar (matriks piksel), sinyal audio, atau fitur-fitur numerik hampir selalu direpresentasikan sebagai NumPy array.

In [1]:
import numpy as np

# 🔧 OPERASI FUNDAMENTAL NUMPY

# 1. ARRAY CREATION & TYPES

In [2]:
print("=== ARRAY CREATION ===")
arr1d = np.array([1, 2, 3, 4, 5])                    # 1D array
arr2d = np.array([[1, 2, 3], [4, 5, 6]])            # 2D array
arr3d = np.random.rand(2, 3, 4)                     # 3D random array

print("1D array shape:", arr1d.shape)
print("2D array shape:", arr2d.shape)
print("3D array shape:", arr3d.shape)

=== ARRAY CREATION ===
1D array shape: (5,)
2D array shape: (2, 3)
3D array shape: (2, 3, 4)


# 2. DATA TYPES

In [3]:
arr_int = np.array([1, 2, 3], dtype=np.int32)
arr_float = np.array([1.0, 2.0, 3.0], dtype=np.float64)
arr_bool = np.array([True, False, True])

print("Data types:")
print("arr_int dtype:", arr_int.dtype)
print("arr_float dtype:", arr_float.dtype)

Data types:
arr_int dtype: int32
arr_float dtype: float64


# 3. ACCESS & SLICING

In [4]:
matrix = np.array([[1, 2, 3, 4],
                   [5, 6, 7, 8],
                   [9, 10, 11, 12]])

print("\n=== ACCESS & SLICING ===")
print("Element [1,2]:", matrix[1, 2])                # Single element
print("Row 1:", matrix[1, :])                        # Entire row
print("Column 2:", matrix[:, 2])                     # Entire column
print("Submatrix 2x2:\n", matrix[0:2, 1:3])          # Submatrix
print("Every other row:\n", matrix[::2, :])          # Step slicing


=== ACCESS & SLICING ===
Element [1,2]: 7
Row 1: [5 6 7 8]
Column 2: [ 3  7 11]
Submatrix 2x2:
 [[2 3]
 [6 7]]
Every other row:
 [[ 1  2  3  4]
 [ 9 10 11 12]]


# 4. RESHAPING

In [5]:
print("\n=== RESHAPING ===")
arr = np.arange(12)                                  # 0-11
print("Original shape:", arr.shape)
reshaped = arr.reshape(3, 4)                        # Reshape to 3x4
flattened = reshaped.flatten()                      # Back to 1D
transposed = reshaped.T                             # Transpose

print("Reshaped 3x4:\n", reshaped)
print("Transposed 4x3:\n", transposed)
print("Flattened:", flattened)


=== RESHAPING ===
Original shape: (12,)
Reshaped 3x4:
 [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
Transposed 4x3:
 [[ 0  4  8]
 [ 1  5  9]
 [ 2  6 10]
 [ 3  7 11]]
Flattened: [ 0  1  2  3  4  5  6  7  8  9 10 11]


# 5. MATHEMATICAL OPERATIONS

In [6]:
print("\n=== MATHEMATICAL OPERATIONS ===")
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

print("a + b =", a + b)                             # Element-wise add
print("a * b =", a * b)                             # Element-wise multiply
print("a ** 2 =", a ** 2)                           # Power
print("np.dot(a, b) =", np.dot(a, b))               # Dot product


=== MATHEMATICAL OPERATIONS ===
a + b = [5 7 9]
a * b = [ 4 10 18]
a ** 2 = [1 4 9]
np.dot(a, b) = 32


# 6. AGGREGATION

In [7]:
print("\n=== AGGREGATION ===")
data = np.random.rand(5, 4)                         # 5x4 random data
print("Data:\n", data)
print("Sum all:", np.sum(data))
print("Mean all:", np.mean(data))
print("Std all:", np.std(data))
print("Sum columns:", np.sum(data, axis=0))         # Along columns
print("Mean rows:", np.mean(data, axis=1))          # Along rows


=== AGGREGATION ===
Data:
 [[0.03970543 0.40205667 0.49604086 0.07240419]
 [0.48024949 0.23337366 0.51651072 0.07226172]
 [0.15492007 0.91044786 0.88625473 0.53585156]
 [0.57322333 0.77953707 0.76688782 0.00985132]
 [0.36467854 0.9549235  0.3733182  0.74831707]]
Sum all: 9.370813826689844
Mean all: 0.46854069133449217
Std all: 0.29792754699858964
Sum columns: [1.61277686 3.28033876 3.03901234 1.43868586]
Mean rows: [0.25255179 0.3255989  0.62186856 0.53237489 0.61030933]


# 7. BROADCASTING

In [8]:
print("\n=== BROADCASTING ===")
matrix = np.array([[1, 2, 3], [4, 5, 6]])
vector = np.array([10, 20, 30])

result = matrix + vector                            # Broadcasting
print("Matrix + Vector:\n", result)


=== BROADCASTING ===
Matrix + Vector:
 [[11 22 33]
 [14 25 36]]


# 8. RANDOM OPERATIONS

In [9]:
print("\n=== RANDOM OPERATIONS ===")
random_arr = np.random.rand(3, 3)                   # Uniform [0,1)
normal_arr = np.random.randn(3, 3)                  # Normal distribution
integers = np.random.randint(0, 100, (3, 3))        # Random integers

print("Uniform random:\n", random_arr)
print("Normal random:\n", normal_arr)
print("Random integers:\n", integers)


=== RANDOM OPERATIONS ===
Uniform random:
 [[0.37220347 0.83025735 0.3742609 ]
 [0.6840643  0.72393562 0.35095031]
 [0.44998043 0.67189036 0.03435261]]
Normal random:
 [[-0.58222722 -1.11104127 -0.06220788]
 [ 0.19845853  0.56863117  0.52944057]
 [-0.09220783  0.57509391 -1.187073  ]]
Random integers:
 [[29 31 79]
 [27 26  5]
 [71 55 12]]


# 9. SPLITTING & JOINING

In [10]:
print("\n=== SPLITTING & JOINING ===")
arr = np.arange(12).reshape(3, 4)
sub_arrays = np.split(arr, 3, axis=0)               # Split along rows
print("Original:\n", arr)
print("After split:")
for i, sub in enumerate(sub_arrays):
    print(f"Part {i}:\n{sub}")

# Joining arrays
arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6]])
joined = np.concatenate([arr1, arr2], axis=0)       # Vertical join
print("After concatenation:\n", joined)


=== SPLITTING & JOINING ===
Original:
 [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
After split:
Part 0:
[[0 1 2 3]]
Part 1:
[[4 5 6 7]]
Part 2:
[[ 8  9 10 11]]
After concatenation:
 [[1 2]
 [3 4]
 [5 6]]


# 10. CONDITIONAL OPERATIONS

In [11]:
print("\n=== CONDITIONAL OPERATIONS ===")
data = np.array([10, 25, 30, 45, 50, 65])
mask = data > 30                                    # Boolean mask
filtered = data[mask]                               # Filter with mask
print("Data > 30:", filtered)

# Where operation
result = np.where(data > 30, data, 0)               # Replace conditionally
print("Where > 30 else 0:", result)


=== CONDITIONAL OPERATIONS ===
Data > 30: [45 50 65]
Where > 30 else 0: [ 0  0  0 45 50 65]


# 🏋️ LATIHAN 2: OPERASI NUMPY UNTUK MACHINE LEARNING

### DATA PREPROCESSING DENGAN NUMPY ###

In [14]:
import numpy as np

'''TODO: Implementasi Preprocessing Pipeline'''
# Dataset simulasi: 100 samples, 5 features
np.random.seed(42)
X = np.random.randn(100, 5) * 10 + 5  # Mean=5, Std=10

# TODO 1: Normalisasi Z-score: (x - mean) / std
def z_score_normalization(data):
    """
    Menormalisasi data menggunakan Z-score.
    Setiap fitur akan memiliki mean=0 dan standard deviation=1.
    """
    mean = np.mean(data, axis=0)
    std = np.std(data, axis=0)
    # Menghindari pembagian dengan nol jika ada fitur dengan std deviasi 0
    std[std == 0] = 1
    return (data - mean) / std

X_normalized = z_score_normalization(X)

# TODO 2: Handle outliers - replace values beyond 3 std with boundaries
def handle_outliers(data, std_threshold=3):
    """
    Menangani outlier dengan metode clipping (capping).
    Nilai yang berada di luar ambang batas (threshold) akan diganti
    dengan nilai batas atas atau batas bawah.
    """
    mean = np.mean(data, axis=0)
    std = np.std(data, axis=0)
    lower_bound = mean - std_threshold * std
    upper_bound = mean + std_threshold * std

    # Menggunakan np.clip untuk cara yang lebih efisien dan ringkas
    cleaned_data = np.clip(data, lower_bound, upper_bound)

    return cleaned_data

# Sebaiknya tangani outlier pada data asli, bukan yang sudah dinormalisasi
X_cleaned = handle_outliers(X)
# Kemudian normalisasi data yang sudah bersih dari outlier
X_final = z_score_normalization(X_cleaned)


# TODO 3: One-hot encoding untuk label kategorikal
def one_hot_encoding(labels):
    """
    Mengubah array label kategorikal (integer) menjadi format one-hot encoding.
    Contoh: [0, 1, 2] -> [[1,0,0], [0,1,0], [0,0,1]]
    """
    # Menentukan jumlah kelas secara dinamis
    num_classes = np.max(labels) + 1
    # Teknik indexing dengan identity matrix untuk membuat one-hot encoding
    return np.eye(num_classes)[labels]


labels = np.array([0, 1, 2, 0, 1, 2, 0])
one_hot_labels = one_hot_encoding(labels)

# TODO 4: Train-test split manual
def train_test_split_numpy(X, y, test_size=0.2, random_state=42):
    """
    Membagi dataset (fitur X dan target y) menjadi data training dan testing.
    """
    # Seed untuk memastikan hasil shuffle selalu sama (reproducibility)
    np.random.seed(random_state)

    n_samples = X.shape[0]
    n_test = int(n_samples * test_size)

    # Mengacak semua indeks data
    shuffled_indices = np.random.permutation(n_samples)

    # Mengambil indeks untuk data tes dan data latih
    test_indices = shuffled_indices[:n_test]
    train_indices = shuffled_indices[n_test:]

    # Memisahkan data berdasarkan indeks yang sudah diacak
    X_train, X_test = X[train_indices], X[test_indices]
    y_train, y_test = y[train_indices], y[test_indices]

    return X_train, X_test, y_train, y_test


X_train, X_test, y_train, y_test = train_test_split_numpy(X_final, np.random.randint(0, 3, 100))

# --- Pengecekan Hasil ---
assert X_normalized.shape == X.shape, "Bentuk data tidak boleh berubah setelah normalisasi"
assert np.allclose(X_normalized.mean(), 0, atol=1e-10), "Mean setelah Z-score harus mendekati 0"
assert np.allclose(X_normalized.std(), 1, atol=1e-10), "Standard deviasi setelah Z-score harus mendekati 1"

print("✅ Operasi NumPy berhasil diselesaikan dan diperbaiki.")
print("\nContoh data setelah normalisasi (5 baris pertama):")
print(X_final[:5])
print("\nContoh label setelah one-hot encoding:")
print(one_hot_labels)

✅ Operasi NumPy berhasil diselesaikan dan diperbaiki.

Contoh data setelah normalisasi (5 baris pertama):
[[ 0.604418   -0.21979528  0.76040738  1.46605404 -0.18681207]
 [-0.21141045  1.53420502  0.883546   -0.626682    0.55681147]
 [-0.46735006 -0.55422449  0.34318685 -2.1431177  -1.61406634]
 [-0.57771567 -1.1129603   0.41751976 -1.08729268 -1.31477033]
 [ 1.68601234 -0.30916828  0.16381105 -1.63001031 -0.4838249 ]]

Contoh label setelah one-hot encoding:
[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]
 [1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]
 [1. 0. 0.]]
