# üí® Smart Cage - MQ2 Gas Sensor Dataset Collection

**Samsung Innovation Campus - Phase 3**

Notebook ini mengumpulkan data dari MQ2 Gas Sensor via MQTT untuk training model ML.

**Auto-labeling:**
- **Aman**: gas_duration < 2000ms
- **Waspada**: 2000ms <= gas_duration < 4000ms
- **Bahaya**: gas_duration >= 4000ms

## üì¶ Step 1: Install Dependencies

In [None]:
!pip install paho-mqtt pandas

## üìö Step 2: Import Libraries

In [None]:
import json
import pandas as pd
import paho.mqtt.client as mqtt
from datetime import datetime
import time

print("‚úÖ Libraries imported!")

## ‚öôÔ∏è Step 3: MQTT Configuration

In [None]:
# MQTT Config
MQTT_BROKER = "broker.hivemq.com"
MQTT_PORT = 1883
TOPIC_GAS_DATA = "final-project/Mahasiswa-Berpola-Pikir/smartcage/gas/data"

# Collection settings
target_samples = 150  # Minimum samples untuk training
data_collected = []

print(f"üì° Broker: {MQTT_BROKER}")
print(f"üìç Topic: {TOPIC_GAS_DATA}")
print(f"üéØ Target: {target_samples} samples")

## üéØ Step 4: Auto-Labeling Function

In [None]:
def auto_label(duration_ms):
    """
    Auto-label berdasarkan durasi gas terdeteksi:
    - Aman: < 2000ms
    - Waspada: 2000ms - 4000ms
    - Bahaya: >= 4000ms
    """
    if duration_ms < 2000:
        return "Aman"
    elif duration_ms < 4000:
        return "Waspada"
    else:
        return "Bahaya"

# Test labeling
print("Test auto-labeling:")
print(f"  0ms    ‚Üí {auto_label(0)}")
print(f"  1500ms ‚Üí {auto_label(1500)}")
print(f"  2500ms ‚Üí {auto_label(2500)}")
print(f"  5000ms ‚Üí {auto_label(5000)}")

## üì° Step 5: MQTT Callbacks

In [None]:
def on_connect(client, userdata, flags, rc):
    if rc == 0:
        print("‚úÖ Connected to MQTT broker")
        client.subscribe(TOPIC_GAS_DATA)
        print(f"üìç Subscribed to: {TOPIC_GAS_DATA}")
    else:
        print(f"‚ùå Connection failed with code {rc}")

def on_message(client, userdata, msg):
    global data_collected
    try:
        payload = msg.payload.decode('utf-8')
        data = json.loads(payload)
        
        # Extract data
        gas_detected = data.get('gas_detected', False)
        duration_ms = data.get('duration_ms', 0)
        
        # Auto-label
        label = auto_label(duration_ms)
        
        # Create entry
        entry = {
            'timestamp': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
            'gas_detected': gas_detected,
            'duration_ms': duration_ms,
            'label': label
        }
        data_collected.append(entry)
        
        # Progress
        count = len(data_collected)
        print(f"[{count}/{target_samples}] Gas={gas_detected}, Duration={duration_ms}ms ‚Üí {label}")
        
        # Auto-stop when target reached
        if count >= target_samples:
            print(f"\n‚úÖ Target {target_samples} samples tercapai!")
            client.disconnect()
    except Exception as e:
        print(f"‚ùå Error: {e}")

print("‚úÖ Callbacks defined!")

## üöÄ Step 6: Start Collection

**PENTING:**
1. Pastikan ESP32 sudah running dan publish ke MQTT
2. Untuk data training yang seimbang, expose MQ2 ke gas (lighter gas, dll) untuk mendapat label "Waspada" dan "Bahaya"
3. Collection akan otomatis berhenti setelah mencapai target samples

In [None]:
# Reset data
data_collected = []

# Create client
client = mqtt.Client(client_id=f"MQ2Collector_{int(time.time())}")
client.on_connect = on_connect
client.on_message = on_message

# Connect
print(f"üîå Connecting to {MQTT_BROKER}...")
client.connect(MQTT_BROKER, MQTT_PORT, 60)

# Run loop until disconnect
print("üì° Collecting data... (expose MQ2 to gas for Waspada/Bahaya labels)")
client.loop_forever()

## üíæ Step 7: Save to CSV

In [None]:
if len(data_collected) > 0:
    df = pd.DataFrame(data_collected)
    
    # Save CSV
    filename = f"mq2_gas_dataset_{datetime.now().strftime('%Y%m%d_%H%M%S')}.csv"
    df.to_csv(filename, index=False)
    print(f"‚úÖ Dataset saved: {filename}")
    print(f"üìä Total samples: {len(df)}")
    
    # Label distribution
    print("\nüìà Label Distribution:")
    print(df['label'].value_counts())
    
    # Preview
    print("\nüìã Preview:")
    display(df.head(10))
    
    # Statistics
    print("\nüìä Statistics:")
    display(df.describe())
else:
    print("‚ö†Ô∏è No data collected!")

## üìä Step 8: Visualize Distribution

In [None]:
import matplotlib.pyplot as plt

if len(data_collected) > 0:
    fig, axes = plt.subplots(1, 2, figsize=(12, 4))
    
    # Duration histogram
    axes[0].hist(df['duration_ms'], bins=20, edgecolor='black')
    axes[0].set_xlabel('Duration (ms)')
    axes[0].set_ylabel('Count')
    axes[0].set_title('Gas Duration Distribution')
    axes[0].axvline(x=2000, color='orange', linestyle='--', label='Waspada threshold')
    axes[0].axvline(x=4000, color='red', linestyle='--', label='Bahaya threshold')
    axes[0].legend()
    
    # Label pie chart
    label_counts = df['label'].value_counts()
    colors = {'Aman': 'green', 'Waspada': 'orange', 'Bahaya': 'red'}
    pie_colors = [colors.get(label, 'gray') for label in label_counts.index]
    axes[1].pie(label_counts, labels=label_counts.index, autopct='%1.1f%%', 
                colors=pie_colors, startangle=90)
    axes[1].set_title('Label Distribution')
    
    plt.tight_layout()
    plt.show()
else:
    print("‚ö†Ô∏è No data to visualize!")