# Applying CRISP-DM Methodology to the PuzzArm Project

**Assessment Portfolio - Part 2**

**Student Name:** [Your Name]  
**Date:** [Date]  
**Course:** AI Clustered Units (ICTAII501, ICTAII502)

---

## Document Purpose

This notebook applies the Cross-Industry Standard Process for Data Mining (CRISP-DM) methodology to the PuzzArm project, structuring the AI/ML components across its six phases:

1. Business Understanding
2. Data Understanding
3. Data Preparation
4. Modeling
5. Evaluation
6. Deployment

**Project Recap:** PuzzArm is an AI-powered robotic system using Jetson Nano and xArm1S to solve a number puzzle (0-9 pieces), with dual-arm teleop for data collection.

**Iteration Note:** CRISP-DM is cyclical—after Deployment, loop back to Business Understanding for refinements.

## Setup and Imports

In [None]:
# Standard libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import json

# Set plotting style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (10, 6)

# Display settings
pd.set_option('display.max_columns', None)

print("Libraries imported successfully!")

---

# Phase 1: Business Understanding

**Objective:** Define the problem, goals, and success criteria in business terms.

**Mapping to Units:** ICTAII501 PC 1.1-1.2

## 1.1 Business Problem Definition

In [None]:
# Business goals configuration
business_goals = {
    'automation_target': 0.80,
    'time_saving_hours': 2,
    'target_sessions_per_month': 10,
    'current_manual_time_per_session': 3,
    'target_automated_time': 1.5
}

time_saved_per_month = (business_goals['current_manual_time_per_session'] - 
                        business_goals['target_automated_time']) * business_goals['target_sessions_per_month']

print(f"Projected monthly time savings: {time_saved_per_month} hours")
print(f"Automation target: {business_goals['automation_target']*100}%")

## 1.2 Data Mining Goals

In [None]:
# Define success metrics
success_criteria = pd.DataFrame([
    {'Metric': 'Detection Accuracy', 'Target': '90%', 'Priority': 'High'},
    {'Metric': 'Pose Estimation Range', 'Target': '360°', 'Priority': 'High'},
    {'Metric': 'Pick-Place Success', 'Target': '85%', 'Priority': 'Critical'},
    {'Metric': 'Inference Time', 'Target': '<100ms', 'Priority': 'Medium'},
    {'Metric': 'Total Solve Time', 'Target': '<5 min', 'Priority': 'High'}
])

print(success_criteria.to_string(index=False))

## ✏️ Student Input: Your Business Context

**Instructions:** Describe how your image identifier addresses a specific business need.

[Write your business understanding here]

---

# Phase 2: Data Understanding

**Objective:** Collect initial data, explore it, and identify quality issues.

**Mapping to Units:** ICTAII502 PC 1.1-1.6

In [None]:
# Example: Class distribution (replace with your actual data)
classes = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
sample_counts = [520, 510, 515, 505, 512, 518, 508, 515, 510, 507]

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

ax1.bar(classes, sample_counts, color='steelblue', alpha=0.7)
ax1.set_xlabel('Puzzle Piece Class')
ax1.set_ylabel('Number of Samples')
ax1.set_title('Class Distribution')
ax1.axhline(y=np.mean(sample_counts), color='r', linestyle='--', label=f'Mean: {np.mean(sample_counts):.0f}')
ax1.legend()

ax2.pie(sample_counts, labels=classes, autopct='%1.1f%%', startangle=90)
ax2.set_title('Class Distribution %')

plt.tight_layout()
plt.show()

## ✏️ Student Input: Your Data Exploration

[Write your data exploration findings here]

In [None]:
# Your exploration code here

---

# Phase 3: Data Preparation

**Objective:** Clean, transform, and construct the final dataset.

**Mapping to Units:** ICTAII502 PC 2.1-2.4

In [None]:
# Dataset split
total_samples = 5000
train_count = int(total_samples * 0.70)
val_count = int(total_samples * 0.20)
test_count = total_samples - train_count - val_count

split_df = pd.DataFrame({
    'Set': ['Training', 'Validation', 'Test'],
    'Samples': [train_count, val_count, test_count],
    'Percentage': [70, 20, 10]
})

print(split_df.to_string(index=False))

plt.figure(figsize=(8, 8))
plt.pie([train_count, val_count, test_count], 
        labels=['Training', 'Validation', 'Test'],
        autopct='%1.1f%%',
        colors=['#3498db', '#e74c3c', '#2ecc71'],
        explode=(0.05, 0.05, 0.05))
plt.title('Dataset Split')
plt.show()

## ✏️ Student Input: Your Data Preparation

[Write your data preparation steps here]

---

# Phase 4: Modeling

**Objective:** Select and apply ML techniques, tuning parameters.

**Mapping to Units:** ICTAII502 PC 3.1-3.5

In [None]:
# Simulated training history
epochs = np.arange(1, 51)
train_loss = 2.5 * np.exp(-0.05 * epochs) + np.random.normal(0, 0.05, 50)
val_loss = 2.8 * np.exp(-0.045 * epochs) + np.random.normal(0, 0.08, 50)
train_acc = 1 - 0.9 * np.exp(-0.06 * epochs) + np.random.normal(0, 0.01, 50)
val_acc = 1 - 0.9 * np.exp(-0.055 * epochs) + np.random.normal(0, 0.015, 50)

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

ax1.plot(epochs, train_loss, label='Train Loss', linewidth=2)
ax1.plot(epochs, val_loss, label='Val Loss', linewidth=2)
ax1.set_xlabel('Epoch')
ax1.set_ylabel('Loss')
ax1.set_title('Training Loss')
ax1.legend()
ax1.grid(alpha=0.3)

ax2.plot(epochs, train_acc * 100, label='Train Acc', linewidth=2)
ax2.plot(epochs, val_acc * 100, label='Val Acc', linewidth=2)
ax2.axhline(y=90, color='r', linestyle='--', label='Target')
ax2.set_xlabel('Epoch')
ax2.set_ylabel('Accuracy (%)')
ax2.set_title('Training Accuracy')
ax2.legend()
ax2.grid(alpha=0.3)

plt.tight_layout()
plt.show()

## ✏️ Student Input: Your Model Building

[Write your model building experience here]

---

# Phase 5: Evaluation

**Objective:** Assess model performance against business goals.

**Mapping to Units:** ICTAII502 PC 5.1-5.6; ICTAII501 PC 3

In [None]:
# Confusion matrix
from sklearn.metrics import confusion_matrix

np.random.seed(42)
n_samples = 500
y_true = np.random.randint(0, 10, n_samples)
y_pred = y_true.copy()
error_indices = np.random.choice(n_samples, size=int(n_samples * 0.08), replace=False)
y_pred[error_indices] = np.random.randint(0, 10, len(error_indices))

cm = confusion_matrix(y_true, y_pred)

plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
            xticklabels=classes, yticklabels=classes)
plt.xlabel('Predicted')
plt.ylabel('True')
plt.title('Confusion Matrix')
plt.tight_layout()
plt.show()

accuracy = np.trace(cm) / np.sum(cm)
print(f"Overall Accuracy: {accuracy*100:.2f}%")

## ✏️ Student Input: Your Evaluation Results

[Write your evaluation findings here]

---

# Phase 6: Deployment

**Objective:** Plan rollout, monitoring, and maintenance.

**Mapping to Units:** ICTAII501 PC 2; ICTAII502 PC 4.1-4.5

In [None]:
# Example ROS2 integration
deployment_code = '''
#!/usr/bin/env python3
import rclpy
from rclpy.node import Node
from sensor_msgs.msg import Image

class PuzzleDetectorNode(Node):
    def __init__(self):
        super().__init__('puzzle_detector')
        self.subscription = self.create_subscription(
            Image, '/camera/image_raw', 
            self.image_callback, 10)
    
    def image_callback(self, msg):
        # Run inference
        pass
'''

print("Example ROS2 Node:")
print(deployment_code)

## ✏️ Student Input: Your Deployment Plan

[Write your deployment experience here]

---

# Overall Reflection

## CRISP-DM Cycle Visualization

In [None]:
# Visualize CRISP-DM cycle
fig, ax = plt.subplots(figsize=(10, 10))

phases_circle = ['Business\nUnderstanding', 'Data\nUnderstanding', 'Data\nPreparation', 
                 'Modeling', 'Evaluation', 'Deployment']
n_phases = len(phases_circle)
angles = np.linspace(0, 2*np.pi, n_phases, endpoint=False)

radius = 1
x = radius * np.cos(angles)
y = radius * np.sin(angles)

# Draw connections
for i in range(n_phases):
    next_i = (i + 1) % n_phases
    ax.annotate('', xy=(x[next_i], y[next_i]), xytext=(x[i], y[i]),
                arrowprops=dict(arrowstyle='->', lw=2, color='#3498db'))

# Draw center
ax.add_patch(plt.Circle((0, 0), 0.2, color='#e74c3c', alpha=0.7))
ax.text(0, 0, 'Data', ha='center', va='center', fontsize=12, fontweight='bold', color='white')

# Draw phase nodes
for i, (phase, angle, xi, yi) in enumerate(zip(phases_circle, angles, x, y)):
    ax.add_patch(plt.Circle((xi, yi), 0.15, color='#2ecc71', alpha=0.7))
    text_offset = 0.35
    text_x = (radius + text_offset) * np.cos(angle)
    text_y = (radius + text_offset) * np.sin(angle)
    ax.text(text_x, text_y, phase, ha='center', va='center', fontsize=10, fontweight='bold')

ax.set_xlim(-1.8, 1.8)
ax.set_ylim(-1.8, 1.8)
ax.set_aspect('equal')
ax.axis('off')
ax.set_title('CRISP-DM Iterative Cycle', fontsize=14, fontweight='bold', pad=20)

plt.tight_layout()
plt.show()

## ✏️ Student Reflection (200+ words)

**Instructions:** Reflect on how CRISP-DM guided your work. Address:
- How did CRISP-DM help structure your work?
- What challenges did you face in each phase?
- How did the iterative nature help?
- What would you do differently next time?

[Write your 200+ word reflection here]

**Word Count:** [Your count]

## Competency Mapping Summary

In [None]:
# Competency mapping
competency_mapping = pd.DataFrame([
    {'Unit': 'ICTAII501', 'Element': 'PC 1.1-1.2', 'Evidence': 'Phase 1', 'Status': '✓'},
    {'Unit': 'ICTAII501', 'Element': 'PC 2', 'Evidence': 'Phase 4 & 6', 'Status': '✓'},
    {'Unit': 'ICTAII501', 'Element': 'PC 3', 'Evidence': 'Phase 5', 'Status': '✓'},
    {'Unit': 'ICTAII502', 'Element': 'PC 1.1-1.6', 'Evidence': 'Phase 2', 'Status': '✓'},
    {'Unit': 'ICTAII502', 'Element': 'PC 2.1-2.4', 'Evidence': 'Phase 3', 'Status': '✓'},
    {'Unit': 'ICTAII502', 'Element': 'PC 3.1-3.5', 'Evidence': 'Phase 4', 'Status': '✓'},
    {'Unit': 'ICTAII502', 'Element': 'PC 4.1-4.5', 'Evidence': 'Phase 6', 'Status': '✓'},
    {'Unit': 'ICTAII502', 'Element': 'PC 5.1-5.6', 'Evidence': 'Phase 5', 'Status': '✓'}
])

print("Competency Mapping:")
print(competency_mapping.to_string(index=False))

completed = len(competency_mapping)
print(f"\nCompletion: {completed}/{completed} criteria addressed (100%)")

---

# Portfolio Declaration

I declare that:
- This portfolio reflects my own work on the PuzzArm project
- All code, analysis, and reflections are my original work
- I have properly cited any external sources
- The evidence demonstrates competency in ICTAII501 and ICTAII502

**Student Name:** [Your Name]  
**Student ID:** [Your ID]  
**Date:** [Date]  
**Signature:** [Your Signature]

---

## Assessor Section

**Assessor Name:** ________________  
**Date:** ________________  

### Competency Checklist:

**ICTAII501:**
- [ ] PC 1.1-1.2: Business understanding
- [ ] PC 2: Solution design
- [ ] PC 3: Documentation

**ICTAII502:**
- [ ] PC 1.1-1.6: Data analysis
- [ ] PC 2.1-2.4: Feature engineering
- [ ] PC 3.1-3.5: Model building
- [ ] PC 4.1-4.5: Testing
- [ ] PC 5.1-5.6: Evaluation

**Overall:** [ ] Competent  [ ] Not Yet Competent

**Feedback:**

[Assessor comments]

---

## Submission Checklist

- [ ] All student input sections completed
- [ ] Actual data from Part 1 included
- [ ] All visualizations generated
- [ ] Reflection written (200+ words)
- [ ] Metrics tables filled
- [ ] Code examples included
- [ ] Declaration signed
- [ ] Exported to PDF/HTML
- [ ] Supporting files included

---

# End of Portfolio

**Thank you for completing this CRISP-DM assessment!** 🤖🧩

Remember: CRISP-DM is iterative—use insights from this iteration to improve your next version!