# üß† Face Recognition - Transfer Learning (MobileNetV2)
## Untuk Final Project - Attendance System

---

### üìå Platform: Kaggle Notebook

**Cara setup di Kaggle:**
1. Buka [kaggle.com](https://www.kaggle.com/) ‚Üí **New Notebook**
2. Upload notebook ini atau copy-paste cell-nya
3. **Upload dataset** sebagai Kaggle Dataset:
   - Klik **"Add Data"** (sidebar kanan) ‚Üí **"Upload"** ‚Üí **"New Dataset"**
   - Upload folder/zip dataset dengan struktur:
     ```
     dataset/
     ‚îú‚îÄ‚îÄ Queensya/
     ‚îÇ   ‚îú‚îÄ‚îÄ photo_001.jpg
     ‚îÇ   ‚îú‚îÄ‚îÄ photo_002.jpg
     ‚îÇ   ‚îî‚îÄ‚îÄ ... (20-50 foto)
     ‚îú‚îÄ‚îÄ Danisw/
     ‚îÇ   ‚îî‚îÄ‚îÄ ...
     ‚îî‚îÄ‚îÄ Person3/
         ‚îî‚îÄ‚îÄ ...
     ```
   - Beri nama dataset, misal: `face-dataset`
4. **Settings** ‚Üí Aktifkan **GPU** (Accelerator: GPU T4 x2 atau P100)
5. Run semua cell dari atas ke bawah!

### üìÅ Path di Kaggle:
| Lokasi | Path |
|--------|------|
| Input (dataset) | `/kaggle/input/face-dataset/dataset/` |
| Output (model) | `/kaggle/working/` |

### üéØ Output:
- `keras_model.h5` - Model trained (kompatibel dengan Teachable Machine format)
- `labels.txt` - Daftar nama orang
- Copy kedua file ke `minggu-8-final-project/project/models/`

---
## üì¶ Step 1: Setup Environment

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, models
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint, ReduceLROnPlateau

import numpy as np
import matplotlib.pyplot as plt
import os
from pathlib import Path
from datetime import datetime

print(f"TensorFlow version: {tf.__version__}")
print(f"GPU Available: {tf.config.list_physical_devices('GPU')}")
print(f"Platform: Kaggle")

---
## üìÅ Step 2: Cari & Load Dataset

Kaggle menyimpan dataset yang kamu upload di `/kaggle/input/`.

**Pastikan kamu sudah:**
1. Klik **"Add Data"** di sidebar kanan
2. Upload dataset dengan folder per orang
3. Cell di bawah akan otomatis menemukan folder dataset-nya

In [None]:
# ============================================
# AUTO-DETECT DATASET PATH
# ============================================

KAGGLE_INPUT = Path('/kaggle/input')
DATASET_PATH = None

print("üîç Mencari dataset di /kaggle/input/...\n")

# Cari folder yang berisi subfolder (orang-orang)
for dataset_dir in sorted(KAGGLE_INPUT.iterdir()):
    if not dataset_dir.is_dir():
        continue
    
    # Cek semua subfolder
    for sub in sorted(dataset_dir.rglob('*')):
        if sub.is_dir():
            # Cek apakah folder ini berisi gambar
            images = list(sub.glob('*.jpg')) + list(sub.glob('*.jpeg')) + list(sub.glob('*.png'))
            if len(images) > 0:
                # Parent folder dari folder orang = dataset root
                candidate = sub.parent
                # Pastikan ada minimal 2 subfolder (minimal 2 orang)
                person_folders = [d for d in candidate.iterdir() if d.is_dir()]
                if len(person_folders) >= 2:
                    DATASET_PATH = str(candidate)
                    break
    if DATASET_PATH:
        break

if DATASET_PATH is None:
    print("‚ùå Dataset tidak ditemukan!")
    print("\nüí° Pastikan:")
    print("   1. Klik 'Add Data' di sidebar kanan")
    print("   2. Upload dataset dengan struktur folder per orang")
    print("   3. Setiap folder berisi foto .jpg/.png")
    print("\nüìÅ Isi /kaggle/input/:")
    for p in KAGGLE_INPUT.rglob('*'):
        level = len(p.relative_to(KAGGLE_INPUT).parts)
        if level <= 3:
            indent = '  ' * level
            print(f"{indent}{p.name}{'/' if p.is_dir() else ''}")
else:
    print(f"‚úÖ Dataset ditemukan: {DATASET_PATH}")
    print(f"\nüìÇ Orang yang ditemukan:")
    total_images = 0
    for person_dir in sorted(Path(DATASET_PATH).iterdir()):
        if person_dir.is_dir():
            imgs = list(person_dir.glob('*.jpg')) + list(person_dir.glob('*.jpeg')) + list(person_dir.glob('*.png'))
            total_images += len(imgs)
            print(f"   üë§ {person_dir.name}: {len(imgs)} foto")
    print(f"\nüìä Total: {total_images} foto")

---
## ‚öôÔ∏è Step 3: Configuration

Sesuaikan parameter di bawah jika perlu.

In [None]:
# ============================================
# CONFIGURATION - Sesuaikan jika perlu
# ============================================

IMG_SIZE = 224              # Ukuran input (sama dengan Teachable Machine)
BATCH_SIZE = 32             # Batch size
EPOCHS = 50                 # Max epochs (early stopping akan stop lebih awal)
VALIDATION_SPLIT = 0.2     # 20% untuk validation
LEARNING_RATE = 0.0001     # Learning rate awal

# Output path (Kaggle working directory)
OUTPUT_DIR = '/kaggle/working'

print("‚úÖ Configuration:")
print(f"   Image size    : {IMG_SIZE}x{IMG_SIZE}")
print(f"   Batch size    : {BATCH_SIZE}")
print(f"   Max epochs    : {EPOCHS}")
print(f"   Val split     : {VALIDATION_SPLIT}")
print(f"   Learning rate : {LEARNING_RATE}")
print(f"   Output dir    : {OUTPUT_DIR}")

---
## üìä Step 4: Load & Prepare Data

In [None]:
# Data Augmentation untuk training data
train_datagen = ImageDataGenerator(
    rescale=1./255,
    validation_split=VALIDATION_SPLIT,
    rotation_range=20,
    width_shift_range=0.2,
    height_shift_range=0.2,
    horizontal_flip=True,
    zoom_range=0.15,
    brightness_range=[0.8, 1.2],
    fill_mode='nearest'
)

# Validation hanya rescale, tanpa augmentation
val_datagen = ImageDataGenerator(
    rescale=1./255,
    validation_split=VALIDATION_SPLIT
)

# Load training data
train_generator = train_datagen.flow_from_directory(
    DATASET_PATH,
    target_size=(IMG_SIZE, IMG_SIZE),
    batch_size=BATCH_SIZE,
    class_mode='categorical',
    subset='training',
    shuffle=True
)

# Load validation data
val_generator = val_datagen.flow_from_directory(
    DATASET_PATH,
    target_size=(IMG_SIZE, IMG_SIZE),
    batch_size=BATCH_SIZE,
    class_mode='categorical',
    subset='validation',
    shuffle=False
)

# Get class names
class_names = list(train_generator.class_indices.keys())
num_classes = len(class_names)

print(f"\n‚úÖ Data loaded!")
print(f"   Training   : {train_generator.samples} samples")
print(f"   Validation : {val_generator.samples} samples")
print(f"   Classes ({num_classes}): {class_names}")

In [None]:
# Visualize sample images dengan augmentation
plt.figure(figsize=(12, 12))
images, labels = next(train_generator)

for i in range(min(9, len(images))):
    plt.subplot(3, 3, i + 1)
    plt.imshow(images[i])
    label_idx = np.argmax(labels[i])
    plt.title(f"{class_names[label_idx]}", fontsize=14)
    plt.axis('off')

plt.suptitle('üì∑ Sample Training Images (with Augmentation)', fontsize=16)
plt.tight_layout()
plt.show()

---
## üèóÔ∏è Step 5: Build Model (MobileNetV2 Transfer Learning)

In [None]:
# Load pretrained MobileNetV2 (tanpa top layer)
base_model = MobileNetV2(
    input_shape=(IMG_SIZE, IMG_SIZE, 3),
    include_top=False,
    weights='imagenet'
)

# Freeze base model (jangan train dulu)
base_model.trainable = False

# Build classification head
model = models.Sequential([
    base_model,
    layers.GlobalAveragePooling2D(),
    layers.Dropout(0.3),
    layers.Dense(256, activation='relu'),
    layers.Dropout(0.3),
    layers.Dense(128, activation='relu'),
    layers.Dense(num_classes, activation='softmax')
])

# Compile
model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=LEARNING_RATE),
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

print("‚úÖ Model built!")
print(f"   Base: MobileNetV2 (ImageNet pretrained)")
print(f"   Output classes: {num_classes}")
print(f"   Total params: {model.count_params():,}")
model.summary()

---
## üèãÔ∏è Step 6: Training (Phase 1 - Train Head Only)

In [None]:
# Callbacks
callbacks = [
    EarlyStopping(
        monitor='val_loss',
        patience=10,
        restore_best_weights=True,
        verbose=1
    ),
    ReduceLROnPlateau(
        monitor='val_loss',
        factor=0.5,
        patience=5,
        min_lr=1e-7,
        verbose=1
    ),
    ModelCheckpoint(
        os.path.join(OUTPUT_DIR, 'best_model_phase1.h5'),
        monitor='val_accuracy',
        save_best_only=True,
        verbose=1
    )
]

print("üèãÔ∏è Phase 1: Training classification head...\n")

history = model.fit(
    train_generator,
    validation_data=val_generator,
    epochs=EPOCHS,
    callbacks=callbacks,
    verbose=1
)

print("\n‚úÖ Phase 1 complete!")

---
## üî• Step 7: Fine-tuning (Phase 2 - Unfreeze Top Layers)

In [None]:
# Unfreeze 30 layer terakhir dari MobileNetV2
base_model.trainable = True
for layer in base_model.layers[:-30]:
    layer.trainable = False

# Recompile dengan learning rate lebih kecil
model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=LEARNING_RATE / 10),
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

callbacks_ft = [
    EarlyStopping(
        monitor='val_loss',
        patience=8,
        restore_best_weights=True,
        verbose=1
    ),
    ReduceLROnPlateau(
        monitor='val_loss',
        factor=0.5,
        patience=4,
        min_lr=1e-8,
        verbose=1
    ),
    ModelCheckpoint(
        os.path.join(OUTPUT_DIR, 'best_model_phase2.h5'),
        monitor='val_accuracy',
        save_best_only=True,
        verbose=1
    )
]

print("üî• Phase 2: Fine-tuning...\n")

history_fine = model.fit(
    train_generator,
    validation_data=val_generator,
    epochs=20,
    callbacks=callbacks_ft,
    verbose=1
)

print("\n‚úÖ Fine-tuning complete!")

---
## üìà Step 8: Evaluate & Visualize

In [None]:
# Plot training curves
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Accuracy
axes[0].plot(history.history['accuracy'], label='Phase 1 Train', color='blue')
axes[0].plot(history.history['val_accuracy'], label='Phase 1 Val', color='blue', linestyle='--')
if history_fine:
    offset = len(history.history['accuracy'])
    epochs_ft = range(offset, offset + len(history_fine.history['accuracy']))
    axes[0].plot(epochs_ft, history_fine.history['accuracy'], label='Phase 2 Train', color='red')
    axes[0].plot(epochs_ft, history_fine.history['val_accuracy'], label='Phase 2 Val', color='red', linestyle='--')
axes[0].set_title('Accuracy', fontsize=14)
axes[0].set_xlabel('Epoch')
axes[0].set_ylabel('Accuracy')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Loss
axes[1].plot(history.history['loss'], label='Phase 1 Train', color='blue')
axes[1].plot(history.history['val_loss'], label='Phase 1 Val', color='blue', linestyle='--')
if history_fine:
    axes[1].plot(epochs_ft, history_fine.history['loss'], label='Phase 2 Train', color='red')
    axes[1].plot(epochs_ft, history_fine.history['val_loss'], label='Phase 2 Val', color='red', linestyle='--')
axes[1].set_title('Loss', fontsize=14)
axes[1].set_xlabel('Epoch')
axes[1].set_ylabel('Loss')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.suptitle('üìà Training History', fontsize=16)
plt.tight_layout()
plt.show()

# Final score
val_loss, val_accuracy = model.evaluate(val_generator)
print(f"\nüìä Final Validation Accuracy: {val_accuracy:.2%}")
print(f"üìä Final Validation Loss: {val_loss:.4f}")

In [None]:
# Confusion Matrix & Classification Report
from sklearn.metrics import confusion_matrix, classification_report

val_generator.reset()
predictions = model.predict(val_generator)
predicted_classes = np.argmax(predictions, axis=1)
true_classes = val_generator.classes

# Confusion matrix
cm = confusion_matrix(true_classes, predicted_classes)

plt.figure(figsize=(8, 6))
plt.imshow(cm, interpolation='nearest', cmap='Blues')
plt.title('Confusion Matrix', fontsize=14)
plt.colorbar()
tick_marks = np.arange(len(class_names))
plt.xticks(tick_marks, class_names, rotation=45, ha='right')
plt.yticks(tick_marks, class_names)

# Annotate cells
for i in range(cm.shape[0]):
    for j in range(cm.shape[1]):
        plt.text(j, i, str(cm[i, j]), ha='center', va='center',
                 color='white' if cm[i, j] > cm.max()/2 else 'black', fontsize=14)

plt.xlabel('Predicted')
plt.ylabel('True')
plt.tight_layout()
plt.show()

print("\nüìã Classification Report:")
print(classification_report(true_classes, predicted_classes, target_names=class_names))

---
## üíæ Step 9: Export Model (Teachable Machine Format)

Output disimpan di `/kaggle/working/` ‚Üí otomatis muncul di tab **Output** Kaggle.

In [None]:
# ============================================
# SAVE ke /kaggle/working/ (auto-download)
# ============================================

# Save model
model_path = os.path.join(OUTPUT_DIR, 'keras_model.h5')
model.save(model_path)
print(f"‚úÖ Model saved: {model_path}")

# Save labels.txt (format Teachable Machine)
labels_path = os.path.join(OUTPUT_DIR, 'labels.txt')
with open(labels_path, 'w') as f:
    for idx, name in enumerate(class_names):
        f.write(f"{idx} {name}\n")
print(f"‚úÖ Labels saved: {labels_path}")

# Show labels
print(f"\nüìã labels.txt:")
with open(labels_path, 'r') as f:
    print(f.read())

# File sizes
model_size = os.path.getsize(model_path) / (1024 * 1024)
print(f"üì¶ Model size: {model_size:.1f} MB")
print(f"üìä Accuracy: {val_accuracy:.2%}")
print(f"\n" + "="*50)
print(f"üì• DOWNLOAD:")
print(f"   Klik tab 'Output' di sidebar kanan Kaggle")
print(f"   Download keras_model.h5 dan labels.txt")
print(f"   Copy ke: minggu-8-final-project/project/models/")
print(f"="*50)

---
## üß™ Step 10: Quick Test

In [None]:
# Test prediction pada beberapa gambar
from PIL import Image

print("üß™ Testing model...\n")

correct = 0
total = 0

for class_folder in sorted(os.listdir(DATASET_PATH)):
    class_path = os.path.join(DATASET_PATH, class_folder)
    if not os.path.isdir(class_path):
        continue
    
    images = [f for f in os.listdir(class_path) if f.lower().endswith(('.jpg', '.jpeg', '.png'))]
    test_images = images[:3]  # Test 3 foto per orang
    
    for img_name in test_images:
        img_path = os.path.join(class_path, img_name)
        img = Image.open(img_path).resize((IMG_SIZE, IMG_SIZE))
        img_array = np.array(img) / 255.0
        img_array = np.expand_dims(img_array, axis=0)
        
        preds = model.predict(img_array, verbose=0)
        pred_idx = np.argmax(preds[0])
        pred_name = class_names[pred_idx]
        confidence = preds[0][pred_idx]
        
        is_correct = pred_name == class_folder
        status = "‚úÖ" if is_correct else "‚ùå"
        if is_correct:
            correct += 1
        total += 1
        
        print(f"{status} True: {class_folder:15} ‚Üí Predicted: {pred_name:15} ({confidence:.1%})")

print(f"\nüìä Test Accuracy: {correct}/{total} = {correct/total:.1%}")

---

## ‚úÖ Selesai!

### Cara download dari Kaggle:
1. Lihat sidebar kanan ‚Üí tab **"Output"**
2. Download `keras_model.h5` dan `labels.txt`
3. Copy ke folder project:
   ```
   minggu-8-final-project/project/models/
   ‚îú‚îÄ‚îÄ keras_model.h5    ‚Üê copy ke sini
   ‚îî‚îÄ‚îÄ labels.txt        ‚Üê copy ke sini
   ```
4. Jalankan aplikasi:
   ```bash
   cd minggu-8-final-project/project
   python main_app.py
   ```

### Tips:
- Kalau akurasi rendah, tambah lebih banyak foto per orang
- Pastikan foto bervariasi (angle, ekspresi, cahaya)
- Model ini **kompatibel** dengan format Teachable Machine
- File output otomatis tersimpan di Kaggle dan bisa didownload kapan saja