# Exercise 4: Embedded Development Environment & Student Implementation Guide

## 🎯 Learning Objectives
By the end of this exercise, you will be able to:
- Set up and use the STM32CubeIDE development environment
- Build and debug embedded AI applications
- Understand the STUDENT_MODE and DUMMY_INPUT_BUFFER configuration system
- Implement core image processing functions for face recognition
- Test and validate your implementations step by step

## 📋 Prerequisites
- Completed Exercises 1-3 (Pipeline understanding, model conversion, flashing)
- STM32CubeIDE installed on your system
- STM32N6 development board connected
- Basic understanding of C programming

## 🔧 Part 1: Development Environment Setup

### Step 1: Opening the Project in STM32CubeIDE

1. **Launch STM32CubeIDE**
   - Open STM32CubeIDE from your applications menu
   - Select or create a workspace directory

2. **Import the Project**
   ```
   File → Import → General → Existing Projects into Workspace
   → Browse to: /home/vboxuser/Desktop/Workshop/EdgeAI_Workshop
   → Select "EdgeAI_Workshop" project
   → Click "Finish"
   ```

3. **Project Structure Overview**
   ```
   EdgeAI_Workshop/
   ├── STM32CubeIDE/           # IDE project files
   ├── Src/                    # Source code (.c files)
   ├── Inc/                    # Header files (.h files)
   ├── Middlewares/            # ST middleware libraries
   ├── STM32Cube_FW_N6/       # STM32 HAL drivers
   ├── Makefile               # Command-line build system
   └── dummy_buffer/          # Test data and utilities
   ```

### Step 2: Verifying Build System

#### Using STM32CubeIDE 

1. **Clean and Build**
   ```
   Project → Clean... → Select "EdgeAI_Workshop" → Clean
   Project → Build Project (or Ctrl+B)
   ```

2. **Expected Output**
   - Build should complete without errors
   - Look for: `Finished building target: STM32N6_GettingStarted_ObjectDetection.elf`
   - Binary files should be generated in `STM32CubeIDE/Debug/`

   ```

### ⚠️ Troubleshooting Build Issues

**Common Problems:**
- **Missing ARM toolchain**: Install `arm-none-eabi-gcc`
- **Path issues**: Ensure STM32CubeIDE can find the toolchain
- **Permission errors**: Check file permissions in project directory
- **Memory issues**: Close other applications if build fails due to memory

### Step 3: Debug Mode Setup and Testing

#### Hardware Connection Verification

1. **Check ST-Link Connection**
   - Connect STM32N6 board via USB
   - Verify green LED on ST-Link portion
   - In terminal: `lsusb | grep STMicroelectronics`

   ```

#### Testing Debug Session

1. **Start Debug Session**
   ```
   Click "Debug" → Switch to Debug perspective → Yes
   ```

2. **Verify Debug Functionality**
   - Program should halt at `main()` function
   - Set breakpoint at line with `printf("🚀 Starting Edge AI Workshop...")`
   - Press F8 (Resume) and verify it hits the breakpoint
   - Check Variables view shows local variables
   - Verify SWV ITM Console shows printf output


### ⚠️ Troubleshooting Debug Issues

**Common Problems:**
- **"No ST-Link detected"**: Check USB connection, try different port
- **"Target not responding"**: Power cycle the board, check jumper settings
- **Breakpoints not hit**: Ensure debug symbols are included in build

## ⚙️ Part 2: Configuration System Deep Dive

### Understanding the Dual Configuration System

This project uses a sophisticated configuration system designed for educational purposes. Understanding these defines is crucial for your implementation work.

### 📍 STUDENT_MODE Configuration

**Location**: `Inc/app_config.h` line 32

```c
/* ========================================================================= */
/* STUDENT MODE CONFIGURATION                                                */
/* ========================================================================= */
/* Enable STUDENT_MODE to use student implementation stub files.            */
/* Comment out this define to use complete instructor implementations.      */
/* ========================================================================= */
//#define STUDENT_MODE
```

#### When STUDENT_MODE is ENABLED (`#define STUDENT_MODE`):
- ✅ **Your implementations** in `Src/crop_img.c` are used
- ✅ Function stubs with **TODO comments** and **implementation hints**
- ✅ **Compiler warnings** for unused parameters (guides implementation)
- ✅ **Step-by-step learning** experience

#### When STUDENT_MODE is DISABLED (commented out):
- ✅ **Complete instructor implementations** are used
- ✅ **Fully functional** face detection and recognition
- ✅ **Reference implementation** for comparison
- ✅ **Expected behavior** for testing your implementations

#### 🎯 Recommended Workflow:
1. **Start with STUDENT_MODE disabled** → Test that everything works
2. **Enable STUDENT_MODE** → Implement functions step by step
3. **Toggle between modes** → Compare your results with reference

### 📍 DUMMY_INPUT_BUFFER Configuration

**Location**: `Inc/app_config.h` line 41

```c
/* ========================================================================= */
/* DUMMY INPUT BUFFER CONFIGURATION                                          */
/* ========================================================================= */
/* Enable DUMMY_INPUT_BUFFER to use a predefined test image for debugging.  */
/* This overrides the camera/PC stream input with a constant test pattern.  */
/* Useful for students to test their implementations with known input.      */
/* ========================================================================= */
#define DUMMY_INPUT_BUFFER
```

#### When DUMMY_INPUT_BUFFER is ENABLED:
- ✅ **Consistent test input** → Same image every time
- ✅ **Known ground truth** → Expected results documented
- ✅ **No camera required** → Works without hardware setup
- ✅ **Reproducible debugging** → Identical conditions each run
- ✅ **Implementation validation** → Compare against expected outputs

#### When DUMMY_INPUT_BUFFER is DISABLED:
- ✅ **Live camera input** → Real-time face detection
- ✅ **Production behavior** → How the final system works
- ✅ **Variable conditions** → Different faces, lighting, angles

#### 🎯 What Happens with DUMMY_INPUT_BUFFER Enabled:

1. **Image Buffer Override**:
   ```c
   // In load_dual_dummy_buffers() function:
   memcpy(nn_rgb, dummy_test_nn_rgb, DUMMY_TEST_NN_RGB_SIZE);
   ```

2. **Cropping Override**:
   ```c
   // In img_crop_align565_to_888() function:
   #ifdef DUMMY_INPUT_BUFFER
   memcpy(dst_img, dummy_cropped_face_rgb, dst_width*dst_height*3);
   #endif
   ```

3. **Consistent Test Data**:
   - `dummy_test_img_buffer`: 800x480 RGB565 image (camera frame)
   - `dummy_test_nn_rgb`: 128x128 RGB888 image (NN input)
   - `dummy_cropped_face_rgb`: 112x112 RGB888 face crop (recognition input)

### 🎛️ Configuration Combinations for Different Learning Phases

| Phase | STUDENT_MODE | DUMMY_INPUT_BUFFER | Purpose |
|-------|--------------|-------------------|----------|
| **Phase 1: Verification** | ❌ Disabled | ❌ Disabled | Test that system works |
| **Phase 2: Implementation** | ✅ Enabled | ✅ Enabled | Implement functions with consistent test data |
| **Phase 3: Validation** | ✅ Enabled | ❌ Disabled | Test your implementations with live input |

### 🔧 How to Change Configurations

1. **Edit `Inc/app_config.h`**
2. **To ENABLE**: Remove `//` → `#define STUDENT_MODE`
3. **To DISABLE**: Add `//` → `//#define STUDENT_MODE`
4. **Rebuild project** (both defines require recompilation)
5. **Flash and test** the new configuration

## 👨‍💻 Part 3: Student Implementation Guide

### Overview of Functions You Need to Implement

When `STUDENT_MODE` is enabled, you need to implement these core functions across multiple files:

#### Image Processing Functions (`Src/crop_img.c`)
| Function | Difficulty | Purpose | Input | Output |
|----------|------------|---------|-------|--------|
| `img_rgb_to_chw_float` | ⭐⭐ Basic | Convert RGB to CHW float | RGB888 image | CHW float array |
| `img_rgb_to_chw_float_norm` | ⭐⭐ Basic | Convert RGB to normalized CHW | RGB888 image | Normalized CHW float |
| `img_crop_resize` | ⭐⭐⭐ Intermediate | Crop and resize image | Source image | Resized crop |
| `img_crop_align` | ⭐⭐⭐⭐ Advanced | Face alignment with rotation | Face coordinates + eyes | Aligned face |
| `img_crop_align565_to_888` | ⭐⭐⭐⭐⭐ Expert | RGB565→RGB888 + alignment | RGB565 + face data | RGB888 aligned face |

#### Face Recognition Functions (`Src/face_utils.c`)
| Function | Difficulty | Purpose | Input | Output |
|----------|------------|---------|-------|--------|
| `embedding_cosine_similarity` | ⭐⭐ Basic | Calculate embedding similarity | Two embeddings | Similarity score |

#### Embedding Management Functions (`Src/target_embedding.c`)
| Function | Difficulty | Purpose | Input | Output |
|----------|------------|---------|-------|--------|
| `embeddings_bank_init` | ⭐ Simple | Initialize embedding storage | None | Void |
| `embeddings_bank_add` | ⭐⭐⭐ Intermediate | Add embedding to bank | Embedding vector | Success/failure |
| `embeddings_bank_reset` | ⭐ Simple | Clear all embeddings | None | Void |
| `embeddings_bank_count` | ⭐ Simple | Get current embedding count | None | Count |
| `compute_target` | ⭐⭐⭐ Intermediate | Compute average embedding | None (internal) | Void |

### 🎯 Recommended Implementation Order

1. **Start Simple**: `embeddings_bank_init`, `embeddings_bank_count`, `embeddings_bank_reset`
2. **Basic Image Processing**: `img_rgb_to_chw_float` and `img_rgb_to_chw_float_norm`
3. **Math Functions**: `embedding_cosine_similarity`
4. **Complex Logic**: `embeddings_bank_add` and `compute_target`
5. **Image Processing**: `img_crop_resize`
6. **Advanced Features**: `img_crop_align` (optional)
7. **Expert Challenge**: `img_crop_align565_to_888` (optional)

### 🔧 Function 1: `img_rgb_to_chw_float` (⭐⭐ Basic)

#### Purpose
Convert RGB image from HWC (Height×Width×Channel) layout to CHW (Channel×Height×Width) layout in float format.

#### Why This Matters
- **Neural networks** often expect CHW layout (channel-first)
- **OpenCV/PIL** typically uses HWC layout (channel-last)
- **Memory layout** affects performance and compatibility

#### Function Signature
```c
void img_rgb_to_chw_float(uint8_t *src_image, float32_t *dst_img,
                          const uint32_t src_stride, const uint16_t width,
                          const uint16_t height)
```

#### Input Layout (HWC):
```
src_image: [R₀,G₀,B₀, R₁,G₁,B₁, R₂,G₂,B₂, ...]
           │  pixel 0  │  pixel 1  │  pixel 2  │
```

#### Output Layout (CHW):
```
dst_img: [R₀,R₁,R₂,...] [G₀,G₁,G₂,...] [B₀,B₁,B₂,...]
         │ Red channel │ Green channel │ Blue channel │
```

#### 💡 Implementation Hints

1. **Memory Layout Understanding**:
   ```c
   // For a 2x2 image:
   // HWC: [R₀,G₀,B₀, R₁,G₁,B₁, R₂,G₂,B₂, R₃,G₃,B₃]
   // CHW: [R₀,R₁,R₂,R₃, G₀,G₁,G₂,G₃, B₀,B₁,B₂,B₃]
   ```

2. **Index Calculations**:
   ```c
   // For pixel at (y,x):
   src_offset = y * src_stride + x * 3;  // 3 bytes per pixel (RGB)
   dst_offset = y * width + x;           // Linear index in output
   
   // Channel offsets in CHW layout:
   red_base   = 0;
   green_base = height * width;
   blue_base  = 2 * height * width;
   ```

3. **Complete Implementation Template**:
   ```c
   void img_rgb_to_chw_float(uint8_t *src_image, float32_t *dst_img,
                             const uint32_t src_stride, const uint16_t width,
                             const uint16_t height)
   {
       // TODO: Implement nested loops
       for (uint16_t y = 0; y < height; y++) {
           const uint8_t *pIn = src_image + y * src_stride;
           for (uint16_t x = 0; x < width; x++) {
               // TODO: Calculate output indices
               uint32_t dst_idx = y * width + x;
               
               // TODO: Copy RGB values to CHW layout
               dst_img[dst_idx] = (float32_t)pIn[0];                           // Red
               dst_img[height * width + dst_idx] = (float32_t)pIn[1];          // Green  
               dst_img[2 * height * width + dst_idx] = (float32_t)pIn[2];      // Blue
               
               pIn += 3;  // Move to next pixel
           }
       }
   }
   ```

#### 🧪 Testing Your Implementation

1. **Enable STUDENT_MODE** in `app_config.h`
2. **Build and flash** the project
3. **Set breakpoint** in your function
4. **Inspect variables**:
   - Check that `dst_img[0]` contains first red pixel value
   - Check that `dst_img[width*height]` contains first green pixel value
   - Verify memory layout is correct

#### ❌ Common Mistakes
- **Wrong stride calculation**: Use `src_stride`, not `width * 3`
- **Channel order confusion**: RGB vs BGR
- **Index overflow**: Ensure proper bounds checking
- **Pointer arithmetic errors**: Be careful with `pIn += 3`

### 🔧 Function 2: `img_rgb_to_chw_float_norm` (⭐⭐ Basic)

#### Purpose
Same as Function 1, but with normalization for neural network input.

#### Normalization Formula
```c
normalized_value = (pixel_value - 127.5f) / 127.5f
```

This maps:
- `0` → `-1.0f`
- `127.5` → `0.0f` 
- `255` → `1.0f`

#### 💡 Implementation Hints

1. **Start with Function 1**: Copy your working `img_rgb_to_chw_float` implementation

2. **Add Normalization**:
   ```c
   // Instead of:
   dst_img[dst_idx] = (float32_t)pIn[0];
   
   // Use:
   dst_img[dst_idx] = ((float32_t)pIn[0] - 127.5f) / 127.5f;
   ```

3. **Alternative Implementation** (more efficient):
   ```c
   for (uint8_t hidx = 0; hidx < 3; hidx++) {
       dst_img[hidx * height * width + dst_idx] = 
           (((float32_t)pIn[hidx]) / 255.0f - 0.5f) / 0.5f;
   }
   ```

#### 🧪 Testing Your Implementation
- **Check value ranges**: All outputs should be in [-1.0, 1.0]
- **Test extremes**: Black pixels (0) → -1.0, White pixels (255) → 1.0
- **Check gray (127-128)**: Should be close to 0.0

### 🔧 Function 3: `img_crop_resize` (⭐⭐⭐ Intermediate)

#### Purpose
Crop a rectangular region from source image and resize it to destination size using nearest neighbor interpolation.

#### Function Signature
```c
void img_crop_resize(uint8_t *src_image, uint8_t *dst_img,
                     const uint16_t src_width, const uint16_t src_height,
                     const uint16_t dst_width, const uint16_t dst_height,
                     const uint16_t bpp, int x0, int y0,
                     int crop_width, int crop_height)
```

#### Parameters Explained
- `src_image`: Source image buffer
- `dst_img`: Destination image buffer  
- `src_width/height`: Source image dimensions
- `dst_width/height`: Destination image dimensions
- `bpp`: Bytes per pixel (usually 3 for RGB)
- `x0, y0`: Top-left corner of crop region
- `crop_width/height`: Size of crop region

#### 💡 Implementation Hints

1. **Understand the Mapping**:
   ```c
   // For each destination pixel (dst_x, dst_y):
   // Find corresponding source pixel:
   src_x = x0 + (dst_x * crop_width) / dst_width;
   src_y = y0 + (dst_y * crop_height) / dst_height;
   ```

2. **Bounds Checking** (Critical!):
   ```c
   if (src_x < 0) src_x = 0;
   if (src_x >= src_width) src_x = src_width - 1;
   if (src_y < 0) src_y = 0;
   if (src_y >= src_height) src_y = src_height - 1;
   ```

3. **Pixel Copying**:
   ```c
   const uint8_t *pIn = src_image + (src_y * src_width + src_x) * bpp;
   uint8_t *pOut = dst_img + (dst_y * dst_width + dst_x) * bpp;
   
   for (int c = 0; c < bpp; c++) {
       pOut[c] = pIn[c];
   }
   ```

4. **Complete Implementation Template**:
   ```c
   void img_crop_resize(uint8_t *src_image, uint8_t *dst_img,
                        const uint16_t src_width, const uint16_t src_height,
                        const uint16_t dst_width, const uint16_t dst_height,
                        const uint16_t bpp, int x0, int y0,
                        int crop_width, int crop_height)
   {
       for (int dst_y = 0; dst_y < dst_height; dst_y++) {
           // TODO: Calculate source Y coordinate
           int src_y = y0 + (dst_y * crop_height) / dst_height;
           // TODO: Apply bounds checking for src_y
           
           for (int dst_x = 0; dst_x < dst_width; dst_x++) {
               // TODO: Calculate source X coordinate  
               int src_x = x0 + (dst_x * crop_width) / dst_width;
               // TODO: Apply bounds checking for src_x
               
               // TODO: Calculate source and destination pointers
               // TODO: Copy pixel data (all channels)
           }
       }
   }
   ```

#### 🧪 Testing Your Implementation
- **Test with simple crop**: x0=0, y0=0, crop_width=src_width (should be identical)
- **Test scaling**: Compare 1:1 crop vs 2:1 downscale
- **Test bounds**: Crop near image edges
- **Visual verification**: Check if cropped region looks correct

### 🔧 Function 4: `img_crop_align` (⭐⭐⭐⭐ Advanced - Optional)

#### Purpose
Crop and align face image based on eye positions, applying rotation to normalize face orientation.

#### Why Face Alignment Matters
- **Improves recognition accuracy**: Consistent face orientation
- **Normalizes pose variations**: Reduces head tilt effects  
- **Standardizes input**: Makes recognition more robust

#### Mathematical Concepts

1. **Rotation Angle Calculation**:
   ```c
   float angle = -atan2f(right_eye_y - left_eye_y, right_eye_x - left_eye_x);
   ```

2. **Rotation Matrix**:
   ```c
   float cos_a = cosf(angle);
   float sin_a = sinf(angle);
   
   // Inverse rotation to map destination to source:
   src_x = x_center + (nx * width) * cos_a + (ny * height) * sin_a;
   src_y = y_center + (ny * height) * cos_a - (nx * width) * sin_a;
   ```

#### 💡 Implementation Hints

This is an **advanced function**. Focus on the basic functions first. If you want to attempt it:

1. **Start with rotation understanding**: Study 2D rotation matrices
2. **Use instructor implementation**: Compare your approach
3. **Test incrementally**: Small rotations first
4. **Visual debugging**: Check if rotation looks correct

#### 🎯 Learning Focus
- **Understand the concept**: Why face alignment helps
- **Study the math**: Rotation matrices and coordinate transforms
- **Observe the results**: Compare aligned vs non-aligned faces

⚠️ **Recommendation**: Implement basic functions first, then return to this if time permits.

### 🔧 Function 5: `img_crop_align565_to_888` (⭐⭐⭐⭐⭐ Expert - Optional)

#### Purpose
Advanced function combining RGB565→RGB888 format conversion with face alignment.

#### RGB565 Format Understanding
```
RGB565: 16-bit format
Bits: [R₄R₃R₂R₁R₀ G₅G₄G₃G₂G₁G₀ B₄B₃B₂B₁B₀]
      │    5 bits   │    6 bits    │   5 bits  │
```

#### RGB565 to RGB888 Conversion
```c
uint16_t pixel = *((uint16_t*)source_ptr);
uint8_t red   = ((pixel >> 11) & 0x1F) << 3;  // 5→8 bits
uint8_t green = ((pixel >> 5) & 0x3F) << 2;   // 6→8 bits  
uint8_t blue  = (pixel & 0x1F) << 3;          // 5→8 bits
```

#### 🎯 Expert Challenge
This function combines:
- **Format conversion** (RGB565→RGB888)
- **Face alignment** (rotation matrix)
- **Memory management** (different stride calculations)

⚠️ **Recommendation**: Only attempt after mastering all previous functions.

### 🔧 Function 6: `embedding_cosine_similarity` (⭐⭐ Basic)

#### Purpose
Calculate the cosine similarity between two face embedding vectors to determine how similar two faces are.

#### Why Cosine Similarity Matters
- **Face Recognition Core**: Main algorithm for comparing faces
- **Angle-based Comparison**: Measures angle between vectors, not magnitude
- **Robust to Scale**: Works regardless of embedding vector length
- **Interpretable Results**: -1.0 (opposite) to +1.0 (identical)

#### Function Signature
```c
float embedding_cosine_similarity(const float *emb1, const float *emb2, uint32_t len)
```

#### Mathematical Formula
```
cosine_similarity = dot_product(A, B) / (norm(A) * norm(B))

Where:
- dot_product(A, B) = Σ(A[i] * B[i])
- norm(A) = sqrt(Σ(A[i]²))
- norm(B) = sqrt(Σ(B[i]²))
```

#### 💡 Implementation Hints

1. **Input Validation**:
   ```c
   if (!emb1 || !emb2 || len == 0) {
       return 0.0f;  // Invalid input
   }
   ```

2. **Single-Pass Calculation** (Efficient):
   ```c
   float dot_product = 0.0f;
   float norm1_squared = 0.0f;
   float norm2_squared = 0.0f;
   
   for (uint32_t i = 0; i < len; i++) {
       const float val1 = emb1[i];
       const float val2 = emb2[i];
       
       dot_product += val1 * val2;        // Dot product
       norm1_squared += val1 * val1;      // Norm squared for emb1
       norm2_squared += val2 * val2;      // Norm squared for emb2
   }
   ```

3. **Handle Zero Norms**:
   ```c
   if (norm1_squared == 0.0f || norm2_squared == 0.0f) {
       return 0.0f;  // Avoid division by zero
   }
   ```

4. **Complete Implementation Template**:
   ```c
   float embedding_cosine_similarity(const float *emb1, const float *emb2, uint32_t len)
   {
       // TODO: Input validation
       if (!emb1 || !emb2 || len == 0) {
           return 0.0f;
       }
       
       // TODO: Initialize accumulators
       float dot_product = 0.0f;
       float norm1_squared = 0.0f;
       float norm2_squared = 0.0f;
       
       // TODO: Single pass calculation
       for (uint32_t i = 0; i < len; i++) {
           const float val1 = emb1[i];
           const float val2 = emb2[i];
           
           // TODO: Accumulate dot product and norms
           dot_product += val1 * val2;
           norm1_squared += val1 * val1;
           norm2_squared += val2 * val2;
       }
       
       // TODO: Check for zero norms
       if (norm1_squared == 0.0f || norm2_squared == 0.0f) {
           return 0.0f;
       }
       
       // TODO: Calculate and return cosine similarity
       return dot_product / sqrtf(norm1_squared * norm2_squared);
   }
   ```

#### 🧪 Testing Your Implementation

1. **Test with identical vectors**:
   ```c
   float vec1[] = {1.0f, 2.0f, 3.0f};
   float vec2[] = {1.0f, 2.0f, 3.0f};
   float sim = embedding_cosine_similarity(vec1, vec2, 3);
   // Expected: 1.0f (identical)
   ```

2. **Test with opposite vectors**:
   ```c
   float vec1[] = {1.0f, 2.0f, 3.0f};
   float vec2[] = {-1.0f, -2.0f, -3.0f};
   float sim = embedding_cosine_similarity(vec1, vec2, 3);
   // Expected: -1.0f (opposite)
   ```

3. **Test with orthogonal vectors**:
   ```c
   float vec1[] = {1.0f, 0.0f, 0.0f};
   float vec2[] = {0.0f, 1.0f, 0.0f};
   float sim = embedding_cosine_similarity(vec1, vec2, 3);
   // Expected: 0.0f (perpendicular)
   ```

#### ❌ Common Mistakes
- **Forgetting sqrt()**: Use `sqrtf(norm1_squared * norm2_squared)`, not just `norm1_squared * norm2_squared`
- **Division by zero**: Always check for zero norms
- **Using separate loops**: Single pass is more efficient
- **Wrong data types**: Use `float`, not `double` for embedded systems

### 🔧 Embedding Management Functions (⭐-⭐⭐⭐ Simple to Intermediate)

#### Purpose Overview
These functions manage a collection (bank) of face embeddings to create a robust representation of a target person. Multiple embeddings are averaged to handle variations in lighting, angles, and expressions.

#### Key Concepts

1. **Embedding Bank**: Array storing multiple embeddings (up to 10) for one person
2. **Target Embedding**: Average of all embeddings in the bank (normalized)
3. **Normalization**: Ensures embeddings have unit length for cosine similarity
4. **Bank Management**: Add, reset, count operations

#### Constants (defined in `target_embedding.h`)
```c
#define EMBEDDING_SIZE 128        // Size of each embedding vector
#define EMBEDDING_BANK_SIZE 10    // Maximum number of embeddings to store
```

---

### 🔧 Function 7: `embeddings_bank_init` (⭐ Simple)

#### Purpose
Initialize the embedding bank to a clean state. Called at system startup.

#### Function Signature
```c
void embeddings_bank_init(void)
```

#### 💡 Implementation Hints
```c
void embeddings_bank_init(void)
{
    // TODO: Set bank_count to 0
    bank_count = 0;
    
    // TODO: Zero out embedding_bank array
    memset(embedding_bank, 0, sizeof(embedding_bank));
    
    // TODO: Zero out target_embedding array
    memset(target_embedding, 0, sizeof(target_embedding));
}
```

#### 🧪 Testing
- Verify `bank_count` is 0 after initialization
- Check that all embedding arrays contain zeros

---

### 🔧 Function 8: `embeddings_bank_count` (⭐ Simple)

#### Purpose
Return the current number of embeddings stored in the bank.

#### Function Signature
```c
int embeddings_bank_count(void)
```

#### 💡 Implementation Hints
```c
int embeddings_bank_count(void)
{
    // TODO: Return current bank count
    return bank_count;
}
```

#### 🧪 Testing
- Should return 0 after initialization
- Should increment after each successful `embeddings_bank_add()`

---

### 🔧 Function 9: `embeddings_bank_reset` (⭐ Simple)

#### Purpose
Reset the embedding bank to empty state. Used when switching between different people.

#### Function Signature
```c
void embeddings_bank_reset(void)
```

#### 💡 Implementation Hints
```c
void embeddings_bank_reset(void)
{
    // TODO: Call initialization function
    embeddings_bank_init();
}
```

#### 🧪 Testing
- Bank should be empty after reset
- Should behave identically to `embeddings_bank_init()`

---

### 🔧 Function 10: `embeddings_bank_add` (⭐⭐⭐ Intermediate)

#### Purpose
Add a new face embedding to the bank. The embedding is normalized before storage, and the target embedding is recomputed.

#### Function Signature
```c
int embeddings_bank_add(const float *embedding)
```

#### Return Values
- **Positive number**: Number of embeddings in bank (success)
- **-1**: Error (bank full or invalid embedding)

#### 💡 Implementation Hints

1. **Check Bank Capacity**:
   ```c
   if (bank_count >= EMBEDDING_BANK_SIZE) {
       return -1;  // Bank is full
   }
   ```

2. **Calculate Embedding Norm**:
   ```c
   float norm = 0.0f;
   for (int i = 0; i < EMBEDDING_SIZE; i++) {
       norm += embedding[i] * embedding[i];
   }
   norm = sqrtf(norm);
   
   if (norm == 0.0f) {
       return -1;  // Invalid embedding (all zeros)
   }
   ```

3. **Normalize and Store**:
   ```c
   for (int i = 0; i < EMBEDDING_SIZE; i++) {
       embedding_bank[bank_count][i] = embedding[i] / norm;
   }
   ```

4. **Complete Implementation Template**:
   ```c
   int embeddings_bank_add(const float *embedding)
   {
       // TODO: Check if bank is full
       if (bank_count >= EMBEDDING_BANK_SIZE) {
           return -1;
       }
       
       // TODO: Calculate norm of input embedding
       float norm = 0.0f;
       for (int i = 0; i < EMBEDDING_SIZE; i++) {
           norm += embedding[i] * embedding[i];
       }
       norm = sqrtf(norm);
       
       // TODO: Check for zero norm
       if (norm == 0.0f) {
           return -1;
       }
       
       // TODO: Normalize and store embedding
       for (int i = 0; i < EMBEDDING_SIZE; i++) {
           embedding_bank[bank_count][i] = embedding[i] / norm;
       }
       
       // TODO: Increment bank count
       bank_count++;
       
       // TODO: Recompute target embedding
       compute_target();
       
       // TODO: Return new bank count
       return bank_count;
   }
   ```

#### 🧪 Testing Your Implementation
- Add single embedding: should return 1
- Add multiple embeddings: count should increment
- Try to overfill bank: should return -1
- Add zero embedding: should return -1
- Check that target embedding is updated after each addition

---

### 🔧 Function 11: `compute_target` (⭐⭐⭐ Intermediate)

#### Purpose
Compute the target embedding as the normalized average of all embeddings in the bank. This is a **static** function (internal use only).

#### Function Signature
```c
static void compute_target(void)
```

#### Algorithm Steps
1. **Handle Empty Bank**: Zero out target if no embeddings
2. **Calculate Average**: Sum all embeddings element-wise, then divide by count
3. **Normalize Result**: Ensure target embedding has unit length

#### 💡 Implementation Hints

1. **Handle Empty Bank**:
   ```c
   if (bank_count == 0) {
       memset(target_embedding, 0, sizeof(target_embedding));
       return;
   }
   ```

2. **Calculate Sum**:
   ```c
   float sum[EMBEDDING_SIZE];
   memset(sum, 0, sizeof(sum));
   
   for (int n = 0; n < bank_count; n++) {
       for (int i = 0; i < EMBEDDING_SIZE; i++) {
           sum[i] += embedding_bank[n][i];
       }
   }
   ```

3. **Calculate Average**:
   ```c
   for (int i = 0; i < EMBEDDING_SIZE; i++) {
       target_embedding[i] = sum[i] / (float)bank_count;
   }
   ```

4. **Normalize Target**:
   ```c
   float norm = 0.0f;
   for (int i = 0; i < EMBEDDING_SIZE; i++) {
       norm += target_embedding[i] * target_embedding[i];
   }
   norm = sqrtf(norm);
   
   if (norm > 0.0f) {
       for (int i = 0; i < EMBEDDING_SIZE; i++) {
           target_embedding[i] /= norm;
       }
   }
   ```

#### 🧪 Testing Your Implementation
- After adding embeddings, check that `target_embedding` has unit length
- Verify target is average of bank embeddings
- Test with single embedding: target should equal normalized input
- Test with multiple identical embeddings: target should equal normalized input

#### ❌ Common Mistakes for Embedding Functions
- **Not normalizing embeddings**: Essential for cosine similarity
- **Integer division**: Use `(float)bank_count`, not just `bank_count`
- **Forgetting to call `compute_target()`**: Target won't be updated
- **Array bounds errors**: Check `bank_count < EMBEDDING_BANK_SIZE`
- **Memory management**: Use `memset()` for array initialization

## 🧪 Part 4: Testing and Validation Workflow

### Comprehensive Testing Strategy

#### Phase 1: Instructor Implementation Baseline

1. **Disable STUDENT_MODE**:
   ```c
   // In app_config.h:
   //#define STUDENT_MODE  // ← Comment out
   ```

2. **Enable DUMMY_INPUT_BUFFER**:
   ```c
   #define DUMMY_INPUT_BUFFER  // ← Enabled
   ```

3. **Expected Behavior**:
   - Build completes without errors
   - Face detection finds faces in test image
   - Face cropping produces clean 112×112 face images
   - Face recognition identifies known faces
   - System runs smoothly without crashes

#### Phase 2: Student Implementation Testing

1. **Enable STUDENT_MODE**:
   ```c
   #define STUDENT_MODE  // ← Uncomment
   ```

2. **Incremental Implementation**:
   - Implement one function at a time
   - Test after each implementation
   - Compare results with instructor version

3. **Debug Strategy per Function**:

   **For `img_rgb_to_chw_float`**:
   ```c
   // Set breakpoint and inspect:
   printf("First red pixel: %.1f\n", dst_img[0]);
   printf("First green pixel: %.1f\n", dst_img[width*height]);
   printf("First blue pixel: %.1f\n", dst_img[2*width*height]);
   ```

   **For `img_crop_resize`**:
   ```c
   // Check if crop region is reasonable:
   printf("Crop region: (%d,%d) size %dx%d\n", x0, y0, crop_width, crop_height);
   printf("Output size: %dx%d\n", dst_width, dst_height);
   ```

### Debugging Tips and Common Issues

#### 🚨 Compilation Errors

**Problem**: "undefined reference to function_name"
```
Solution: Check that STUDENT_MODE is enabled and function is implemented
```

**Problem**: "unused parameter warnings"
```
Solution: This is normal in STUDENT_MODE - implement the function to use parameters
```

#### 🚨 Runtime Issues

**Problem**: Hard fault / system crash
```
Likely Causes:
- Buffer overflow (check array bounds)
- Null pointer access (verify src_image, dst_img not NULL)
- Stack overflow (check local variable sizes)

Debug Steps:
1. Add NULL checks at function start
2. Verify array indices are within bounds  
3. Use debugger to find exact crash location
```

**Problem**: Incorrect output / artifacts
```
Likely Causes:
- Wrong memory layout (HWC vs CHW)
- Incorrect stride calculation
- Channel order confusion (RGB vs BGR)
- Off-by-one errors in loops

Debug Steps:
1. Print first few pixel values
2. Compare with instructor implementation
3. Check intermediate calculations
```

#### 🔍 Debug Console Output Analysis

**Look for these messages**:
```
✅ Good signs:
"🔄 Loading dual dummy buffers (test image)..."
"🎯 Face detected at (x,y) with confidence..."
"✅ Face cropped successfully"

❌ Warning signs:
"❌ No faces detected"
"⚠️ Face detection confidence too low"
"💥 Hard fault detected"
```

## 📊 Part 5: Performance Analysis and Optimization

### Understanding Embedded Constraints

#### Memory Limitations
```
STM32N6 Memory Map:
- Flash: 2MB (program code, constants)
- RAM: 1.5MB (variables, stack, heap)
- PSRAM: External memory (image buffers)
```

#### Performance Considerations

1. **CPU Usage**:
   - ARM Cortex-M55 @ 600MHz
   - Integer operations preferred over float
   - SIMD instructions available (advanced)

2. **Memory Access Patterns**:
   - Sequential access faster than random
   - Cache-friendly algorithms preferred
   - Minimize memory allocations

3. **Image Processing Optimization**:
   ```c
   // Efficient (cache-friendly):
   for (y = 0; y < height; y++) {
       for (x = 0; x < width; x++) {
           // Process pixel at (x,y)
       }
   }
   
   // Less efficient (cache-unfriendly):
   for (x = 0; x < width; x++) {
       for (y = 0; y < height; y++) {
           // Process pixel at (x,y) - jumps memory locations
       }
   }
   ```

### 📈 Performance Measurement

#### Using STM32 Performance Counters
```c
// Add to your function for timing:
uint32_t start_time = HAL_GetTick();
img_rgb_to_chw_float(...);
uint32_t end_time = HAL_GetTick();
printf("Function took %lu ms\n", end_time - start_time);
```

#### Expected Performance Targets
```
Function Performance Goals (128x128 image):
- img_rgb_to_chw_float: < 5ms
- img_crop_resize: < 10ms  
- img_crop_align: < 15ms
```

### 🚀 Optimization Techniques

#### 1. Loop Unrolling
```c
// Instead of:
for (int c = 0; c < 3; c++) {
    dst[c] = src[c];
}

// Use:
dst[0] = src[0];  // Red
dst[1] = src[1];  // Green  
dst[2] = src[2];  // Blue
```

#### 2. Pointer Arithmetic
```c
// Efficient pointer advancement:
const uint8_t *pIn = src_image;
float32_t *pOut = dst_img;
for (int i = 0; i < total_pixels; i++) {
    *pOut++ = (float32_t)*pIn++;
}
```

#### 3. Compile-Time Constants
```c
// Use #define for known constants:
#define FACE_SIZE 112
#define CHANNELS 3
// Compiler can optimize better with constants
```

## 🎯 Part 6: Integration and Final Testing

### Complete System Integration Test

#### Final Configuration Test Matrix

| Test | STUDENT_MODE | DUMMY_INPUT_BUFFER | Expected Result |
|------|--------------|-------------------|------------------|
| 1 | ❌ | ✅ | Instructor implementation with test data |
| 2 | ✅ | ✅ | Your implementation with test data |
| 3 | ✅ | ❌ | Your implementation with live camera |
| 4 | ❌ | ❌ | Instructor implementation with live camera |

#### Success Criteria

✅ **Test 1 Success** (Baseline):
- System boots and initializes
- Face detection finds test image faces
- Face recognition works correctly
- No crashes or errors

✅ **Test 2 Success** (Your Implementation):
- Results similar to Test 1
- Your functions produce correct output
- Performance is acceptable
- No functional regressions

✅ **Test 3 Success** (Live Testing):
- Works with real camera input
- Handles various faces and conditions
- Maintains performance in real-time
- Robust against edge cases

✅ **Test 4 Success** (Reference):
- Confirms hardware and camera work
- Provides performance baseline
- Validates overall system integration

### Troubleshooting Integration Issues

#### 🚨 "Works in DUMMY mode but fails with camera"

**Possible Causes**:
1. **Timing issues**: Camera input changes while processing
2. **Memory alignment**: Camera data has different stride
3. **Format differences**: Camera RGB vs expected format

**Debug Steps**:
```c
// Add debug prints to check input:
printf("Camera input: first pixel RGB=(%d,%d,%d)\n", 
       src_image[0], src_image[1], src_image[2]);
printf("Image stride: %d, expected: %d\n", actual_stride, expected_stride);
```

#### 🚨 "Performance degradation with your implementation"

**Optimization Priority**:
1. **Profile each function**: Find the bottleneck
2. **Check memory access patterns**: Ensure cache efficiency
3. **Reduce function calls**: Inline small functions
4. **Use instructor implementation**: As performance reference

#### 🚨 "Intermittent crashes or artifacts"

**Investigation Steps**:
1. **Enable stack monitoring**: Check for stack overflow
2. **Add buffer guards**: Detect buffer overruns
3. **Stress test**: Run for extended periods
4. **Memory debugging**: Check for corruption patterns

### Performance Benchmarking

#### Key Metrics to Monitor
```
📊 System Performance Targets:
- Face Detection: < 100ms per frame
- Face Cropping: < 20ms per face
- Face Recognition: < 50ms per face
- Total Pipeline: < 200ms for single face
- Memory Usage: < 80% of available RAM
- CPU Usage: < 90% average
```

#### Measuring Real Performance
```c
// Add performance monitoring:
typedef struct {
    uint32_t detection_time_ms;
    uint32_t cropping_time_ms;
    uint32_t recognition_time_ms;
    uint32_t total_time_ms;
} performance_stats_t;

// Use in main loop:
uint32_t start = HAL_GetTick();
// ... perform operation ...
stats.operation_time_ms = HAL_GetTick() - start;
```



- Configurable alerts and thresholds