# Praktikum 2: Komputasi Numerik dengan NumPy Array

<a href="https://colab.research.google.com/github/pakizhan-ump/ml-umpontianak/blob/main/Modules/Week-02/Praktikum-02/Praktikum_2_numpy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## 🎯 Tujuan Praktikum
Mahasiswa mampu menggunakan library **NumPy** untuk membuat dan memanipulasi array multidimensi serta memahami keunggulannya dibandingkan List untuk komputasi numerik.

## 📖 Dasar Teori
**NumPy (Numerical Python)** adalah library fundamental untuk komputasi ilmiah di Python. Struktur data utamanya adalah **ndarray (n-dimensional array)**, yang dirancang khusus untuk operasi numerik yang efisien. Keunggulan utamanya meliputi:
* **Homogeneous:** Semua elemen dalam sebuah array harus memiliki tipe data yang sama (misalnya, semua `float64` atau semua `int32`), yang memungkinkan penyimpanan memori yang lebih efisien.
* **Ukuran Tetap:** Ukuran array ditentukan saat pembuatan, membuatnya lebih dapat diprediksi dalam hal alokasi memori.
* **Efisiensi Memori & Kecepatan:** Operasi matematika pada NumPy array jauh lebih cepat karena dieksekusi oleh kode C yang terkompilasi, bukan melalui interpreter Python.
* **Mendukung Operasi Vektorisasi:** Memungkinkan operasi matematika dilakukan pada seluruh elemen array secara bersamaan tanpa perlu menulis *loop* `for`, membuat kode lebih ringkas dan cepat.

Dalam machine learning, data seperti gambar (matriks piksel), sinyal audio, atau fitur-fitur numerik hampir selalu direpresentasikan sebagai NumPy array.

In [1]:
import numpy as np

# 🔧 OPERASI FUNDAMENTAL NUMPY

# 1. ARRAY CREATION & TYPES

In [2]:
print("=== ARRAY CREATION ===")
arr1d = np.array([1, 2, 3, 4, 5])                    # 1D array
arr2d = np.array([[1, 2, 3], [4, 5, 6]])            # 2D array
arr3d = np.random.rand(2, 3, 4)                     # 3D random array

print("1D array shape:", arr1d.shape)
print("2D array shape:", arr2d.shape)
print("3D array shape:", arr3d.shape)

=== ARRAY CREATION ===
1D array shape: (5,)
2D array shape: (2, 3)
3D array shape: (2, 3, 4)


# 2. DATA TYPES

In [3]:
arr_int = np.array([1, 2, 3], dtype=np.int32)
arr_float = np.array([1.0, 2.0, 3.0], dtype=np.float64)
arr_bool = np.array([True, False, True])

print("Data types:")
print("arr_int dtype:", arr_int.dtype)
print("arr_float dtype:", arr_float.dtype)

Data types:
arr_int dtype: int32
arr_float dtype: float64


# 3. ACCESS & SLICING

In [4]:
matrix = np.array([[1, 2, 3, 4],
                   [5, 6, 7, 8],
                   [9, 10, 11, 12]])

print("\n=== ACCESS & SLICING ===")
print("Element [1,2]:", matrix[1, 2])                # Single element
print("Row 1:", matrix[1, :])                        # Entire row
print("Column 2:", matrix[:, 2])                     # Entire column
print("Submatrix 2x2:\n", matrix[0:2, 1:3])          # Submatrix
print("Every other row:\n", matrix[::2, :])          # Step slicing


=== ACCESS & SLICING ===
Element [1,2]: 7
Row 1: [5 6 7 8]
Column 2: [ 3  7 11]
Submatrix 2x2:
 [[2 3]
 [6 7]]
Every other row:
 [[ 1  2  3  4]
 [ 9 10 11 12]]


# 4. RESHAPING

In [5]:
print("\n=== RESHAPING ===")
arr = np.arange(12)                                  # 0-11
print("Original shape:", arr.shape)
reshaped = arr.reshape(3, 4)                        # Reshape to 3x4
flattened = reshaped.flatten()                      # Back to 1D
transposed = reshaped.T                             # Transpose

print("Reshaped 3x4:\n", reshaped)
print("Transposed 4x3:\n", transposed)
print("Flattened:", flattened)


=== RESHAPING ===
Original shape: (12,)
Reshaped 3x4:
 [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
Transposed 4x3:
 [[ 0  4  8]
 [ 1  5  9]
 [ 2  6 10]
 [ 3  7 11]]
Flattened: [ 0  1  2  3  4  5  6  7  8  9 10 11]


# 5. MATHEMATICAL OPERATIONS

In [6]:
print("\n=== MATHEMATICAL OPERATIONS ===")
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

print("a + b =", a + b)                             # Element-wise add
print("a * b =", a * b)                             # Element-wise multiply
print("a ** 2 =", a ** 2)                           # Power
print("np.dot(a, b) =", np.dot(a, b))               # Dot product


=== MATHEMATICAL OPERATIONS ===
a + b = [5 7 9]
a * b = [ 4 10 18]
a ** 2 = [1 4 9]
np.dot(a, b) = 32


# 6. AGGREGATION

In [7]:
print("\n=== AGGREGATION ===")
data = np.random.rand(5, 4)                         # 5x4 random data
print("Data:\n", data)
print("Sum all:", np.sum(data))
print("Mean all:", np.mean(data))
print("Std all:", np.std(data))
print("Sum columns:", np.sum(data, axis=0))         # Along columns
print("Mean rows:", np.mean(data, axis=1))          # Along rows


=== AGGREGATION ===
Data:
 [[0.66696757 0.83006066 0.50067015 0.72796391]
 [0.17552716 0.1325406  0.90279381 0.60257077]
 [0.42353525 0.47514293 0.27559082 0.79843927]
 [0.55722158 0.352223   0.78795767 0.61671535]
 [0.19731046 0.1854258  0.69253556 0.01169695]]
Sum all: 9.912889268457183
Mean all: 0.49564446342285917
Std all: 0.25837103051851334
Sum columns: [2.02056202 1.97539299 3.15954801 2.75738625]
Mean rows: [0.68141557 0.45335809 0.49317707 0.5785294  0.27174219]


# 7. BROADCASTING

In [8]:
print("\n=== BROADCASTING ===")
matrix = np.array([[1, 2, 3], [4, 5, 6]])
vector = np.array([10, 20, 30])

result = matrix + vector                            # Broadcasting
print("Matrix + Vector:\n", result)


=== BROADCASTING ===
Matrix + Vector:
 [[11 22 33]
 [14 25 36]]


# 8. RANDOM OPERATIONS

In [9]:
print("\n=== RANDOM OPERATIONS ===")
random_arr = np.random.rand(3, 3)                   # Uniform [0,1)
normal_arr = np.random.randn(3, 3)                  # Normal distribution
integers = np.random.randint(0, 100, (3, 3))        # Random integers

print("Uniform random:\n", random_arr)
print("Normal random:\n", normal_arr)
print("Random integers:\n", integers)


=== RANDOM OPERATIONS ===
Uniform random:
 [[0.07120715 0.61192351 0.36306125]
 [0.81133769 0.28943318 0.85401235]
 [0.86202348 0.87925113 0.999382  ]]
Normal random:
 [[ 0.65704245 -0.87321077  0.21639513]
 [-0.65239422 -1.48926219 -1.25284271]
 [ 0.11736275 -0.71583435  0.34680067]]
Random integers:
 [[10 32 29]
 [16 94  8]
 [67 69 72]]


# 9. SPLITTING & JOINING

In [10]:
print("\n=== SPLITTING & JOINING ===")
arr = np.arange(12).reshape(3, 4)
sub_arrays = np.split(arr, 3, axis=0)               # Split along rows
print("Original:\n", arr)
print("After split:")
for i, sub in enumerate(sub_arrays):
    print(f"Part {i}:\n{sub}")

# Joining arrays
arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6]])
joined = np.concatenate([arr1, arr2], axis=0)       # Vertical join
print("After concatenation:\n", joined)


=== SPLITTING & JOINING ===
Original:
 [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
After split:
Part 0:
[[0 1 2 3]]
Part 1:
[[4 5 6 7]]
Part 2:
[[ 8  9 10 11]]
After concatenation:
 [[1 2]
 [3 4]
 [5 6]]


# 10. CONDITIONAL OPERATIONS

In [11]:
print("\n=== CONDITIONAL OPERATIONS ===")
data = np.array([10, 25, 30, 45, 50, 65])
mask = data > 30                                    # Boolean mask
filtered = data[mask]                               # Filter with mask
print("Data > 30:", filtered)

# Where operation
result = np.where(data > 30, data, 0)               # Replace conditionally
print("Where > 30 else 0:", result)


=== CONDITIONAL OPERATIONS ===
Data > 30: [45 50 65]
Where > 30 else 0: [ 0  0  0 45 50 65]


# 🏋️ LATIHAN 2: OPERASI NUMPY UNTUK MACHINE LEARNING

### DATA PREPROCESSING DENGAN NUMPY ###

In [17]:
'''TODO: Implementasi Preprocessing Pipeline'''
# Dataset simulasi: 100 samples, 5 features
np.random.seed(42)
X = np.random.randn(100, 5) * 10 + 5  # Mean=5, Std=10

# TODO 1: Normalisasi Z-score: (x - mean) / std
def z_score_normalization(data):
    # TODO: Implementasi Z-score normalization
    """Normalisasi data per kolom (fitur) menggunakan Z-score."""
    # Hitung mean dan standar deviasi untuk setiap kolom (axis=0)
    mean_per_feature = np.mean(data, axis=0)
    std_per_feature = np.std(data, axis=0)

    # Tambahkan nilai kecil (epsilon) untuk menghindari pembagian dengan nol
    return (data - mean_per_feature) / (std_per_feature + 1e-8)

X_normalized = z_score_normalization(X)

# TODO 2: Handle outliers - replace values beyond 3 std with boundaries
def handle_outliers(data, std_threshold=3):
    # TODO: Handle outliers using std threshold
    """Mengganti outlier dengan nilai batas menggunakan np.clip."""
    # np.clip akan membatasi semua nilai dalam array 'data'
    # agar berada di antara -std_threshold dan +std_threshold.
    return np.clip(data, -std_threshold, std_threshold)

X_cleaned = handle_outliers(X_normalized)

# TODO 3: One-hot encoding untuk label kategorikal
def one_hot_encoding(labels):
    # TODO: Convert categorical labels to one-hot encoding
    # Input: [0, 1, 2, 0, 1] -> Output: 2D one-hot array
    """Mengubah array 1D berisi label integer menjadi format 2D one-hot."""
    # Cari tahu jumlah kelas unik (misal: jika max label adalah 2, maka ada kelas 0, 1, 2)
    num_classes = np.max(labels) + 1
    # Buat matriks nol dengan shape (jumlah_label, jumlah_kelas)
    one_hot = np.zeros((labels.size, num_classes))
    # Gunakan advanced indexing untuk menempatkan nilai 1 di posisi yang benar
    # untuk setiap baris (label), kolom yang diisi adalah nilai label itu sendiri
    one_hot[np.arange(labels.size), labels] = 1
    return one_hot

labels = np.array([0, 1, 2, 0, 1, 2, 0])
one_hot_labels = one_hot_encoding(labels)

# TODO 4: Train-test split manual
def train_test_split_numpy(X, y, test_size=0.2):
    # TODO: Implementasi train-test split tanpa sklearn
    """Membagi dataset X dan y menjadi set training dan testing secara manual."""
    n_samples = X.shape[0]
    # Buat urutan indeks dari 0 hingga n_samples-1, lalu acak
    shuffled_indices = np.random.permutation(n_samples)

    # Tentukan ukuran set testing dan titik pemisahan
    test_set_size = int(n_samples * test_size)
    test_indices = shuffled_indices[:test_set_size]
    train_indices = shuffled_indices[test_set_size:]

    # Ambil data berdasarkan indeks yang sudah diacak dan dipisah
    X_train, X_test = X[train_indices], X[test_indices]
    y_train, y_test = y[train_indices], y[test_indices]

    return X_train, X_test, y_train, y_test


X_train, X_test, y_train, y_test = train_test_split_numpy(X, np.random.randint(0, 3, 100))

assert X_normalized.shape == X.shape, "Shape should remain same"
assert np.allclose(X_normalized.mean(), 0, atol=1e-10), "Mean should be ~0 after z-score"
assert np.allclose(X_normalized.std(), 1, atol=1e-10), "Std should be ~1 after z-score"
print("✅ NumPy operations completed")
print(f"\nShape of X_train: {X_train.shape}")
print(f"Shape of X_test: {X_test.shape}")
print("\nContoh One-Hot Encoding:")
print(one_hot_labels)


✅ NumPy operations completed

Shape of X_train: (80, 5)
Shape of X_test: (20, 5)

Contoh One-Hot Encoding:
[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]
 [1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]
 [1. 0. 0.]]
