# IBM AI Methodology - Fruit Ripeness Classifier

**Student:** Maria Paula Salazar Agudelo  
**Course:** AI Minor - Personal Challenge  
**Date:** 2025

---

## What is this notebook?

This notebook shows how I applied the IBM AI Methodology (10 steps) to my fruit ripeness classification project.

For each step, I'll explain:
- What the step means
- How I applied it to my project
- What I learned or discovered

---

## Step 1: Business Understanding

**What it means:** Understand the problem you're trying to solve and why it matters.

### My Problem

**The situation:**
- When shopping for fruit, it's hard to tell if it's ripe, overripe, or not ready yet
- People often buy fruit that goes bad quickly or tastes bad
- This wastes money and food

**What I want to build:**
A mobile app where you take a photo of fruit and it tells you:
- **Fresh** â†’ Good to buy now
- **Rotten** â†’ Don't buy
- **Unripe** â†’ Wait a few days

**Why it matters:**
- Helps people buy better fruit
- Reduces food waste
- Saves money

**Target users:** Anyone who shops for groceries (especially people like me who aren't good at picking fruit!)

**Success criteria:** Model accuracy â‰¥ 85%

## Step 2: Analytic Approach

**What it means:** Decide what type of AI/ML technique will solve the problem.

### My Approach

**Problem type:** Image classification

**Why image classification?**
- I have images of fruit
- I need to classify them into categories (fresh/rotten/unripe)
- This is a supervised learning problem

**Technique chosen:** Convolutional Neural Network (CNN) with Transfer Learning

**Why CNNs?**
- CNNs are designed for image data
- They can learn visual patterns (color, texture, spots, bruises)
- They're proven to work well for image classification

**Why Transfer Learning?**
- Training from scratch takes weeks and huge datasets
- Pre-trained models (like MobileNetV2) already know how to "see" (detect edges, shapes, colors)
- I only need to teach it my specific fruits - much faster!

**Model selected:** MobileNetV2
- Lightweight (works on mobile phones)
- Fast inference
- Good accuracy
- Pre-trained on ImageNet (1.4M images)

## Step 3: Data Requirements

**What it means:** Figure out what data you need to solve the problem.

### What Data I Need

**Type of data:** Images of fruit

**Categories needed:**
- 3 fruit types: Apples, Bananas, Oranges (common fruits everyone buys)
- 3 ripeness stages: Fresh, Rotten, Unripe
- Total: 9 classes (3 fruits Ã— 3 stages)

**Data characteristics needed:**
- Different lighting conditions (store lights, natural light)
- Different angles (top view, side view)
- Different backgrounds
- Various fruit sizes
- Clear ripeness indicators visible (brown spots for rotten, green for unripe, etc.)

**Quantity needed:**
- Minimum: 500+ images per class (for basic training)
- Ideal: 1000+ images per class (for good accuracy)
- My target: ~2000 images per class

**Train/Test split:**
- Training: 80% (to train the model)
- Testing: 20% (to evaluate performance)

## Step 4: Data Collection

**What it means:** Actually get the data you need.

### How I Got My Data

**Source:** Kaggle - "Fruit Ripeness Dataset"

**Why this dataset?**
- Already labeled with ripeness stages
- Contains apples, bananas, oranges
- ~20,000 images total
- Free and public (no copyright issues)
- Already split into train/test folders

**Dataset structure:**
```
data/
â”œâ”€â”€ train/
â”‚   â”œâ”€â”€ freshapples/
â”‚   â”œâ”€â”€ freshbanana/
â”‚   â”œâ”€â”€ freshoranges/
â”‚   â”œâ”€â”€ rottenapples/
â”‚   â”œâ”€â”€ rottenbanana/
â”‚   â”œâ”€â”€ rottenoranges/
â”‚   â”œâ”€â”€ unripe apple/
â”‚   â”œâ”€â”€ unripe banana/
â”‚   â””â”€â”€ unripe orange/
â””â”€â”€ test/
    â””â”€â”€ (same structure)
```

**Download process:**
1. Created Kaggle account
2. Downloaded dataset ZIP file
3. Extracted to `data/` folder
4. Verified folder structure

**No data collection challenges** - dataset was ready to use!

## Step 5: Data Understanding

**What it means:** Explore the data to understand what you have.

### My Data Analysis

**Where:** See `01_Dataset_Analysis.ipynb` for detailed analysis

**What I checked:**

**1. Dataset size:**
- Training images: 16,217
- Test images: 3,739
- Total: 19,956 images

**2. Class distribution:**
- Checked if classes are balanced
- Found some classes have more images than others
- Calculated imbalance ratio to see if it's a problem

**3. Image characteristics:**
- Image size: Varies (will need to resize to 224Ã—224)
- Format: JPG
- Color: RGB (3 channels)
- Quality: Good, clear images

**4. Visual inspection:**
- Viewed sample images from each class
- Confirmed ripeness labels look correct
- Checked for obvious errors or mislabeled images

**5. Train/Test split:**
- Already split by the dataset creator
- Ratio: 81% train / 19% test (good split)
- No overlap between train and test (verified)

**Key findings:**
- Dataset is big enough for training
- Images are clear and well-labeled
- Some class imbalance, but not severe
- Good variety in angles and lighting

## Step 6: Data Preparation

**What it means:** Process and transform the data to make it ready for the model.

### My Data Preparation

**Where:** See `02_Model_Training.ipynb` for implementation

**What I did:**

**1. Image Preprocessing:**
- Resize all images to 224Ã—224 pixels (MobileNetV2 requirement)
- Normalize pixel values from [0, 255] to [0, 1]
- Convert to float32 data type

**2. Data Augmentation (training only):**
- **Rotation:** Â±20Â° (fruit can be tilted)
- **Horizontal flip:** Mirror image
- **Zoom:** Â±20% (fruit can be closer/farther)
- **Shift:** Â±20% (fruit not always centered)
- **Brightness:** Â±20% (different lighting)

**Why augmentation?**
- Creates more training variety
- Prevents overfitting (memorization)
- Model learns to recognize fruit from any angle/lighting
- Makes model more robust for real-world use

**3. Batching:**
- Batch size: 32 images per batch
- Why? GPU memory limitations + good training stability

**4. Label Encoding:**
- One-hot encoding for 9 classes
- Example: "freshapples" â†’ [1, 0, 0, 0, 0, 0, 0, 0, 0]

**No manual cleaning needed** - dataset was already clean!

## Step 7: Modeling

**What it means:** Build and train the machine learning model.

### My Model

**Where:** See `02_Model_Training.ipynb` for full training process

**Architecture:**

```
Input: 224Ã—224Ã—3 image
    â†“
MobileNetV2 Base (FROZEN)
    - Pre-trained on ImageNet
    - Extracts features (edges, textures, shapes)
    â†“
GlobalAveragePooling2D
    - Reduces dimensions
    â†“
Dense(256) + ReLU
    - Learns fruit-specific patterns
    â†“
Dropout(0.5)
    - Prevents overfitting
    â†“
Dense(9) + Softmax
    - 9 class probabilities
    â†“
Output: Predicted class
```

**Training configuration:**
- **Optimizer:** Adam (learning_rate=0.0001)
- **Loss function:** Categorical crossentropy
- **Metrics:** Accuracy
- **Epochs:** 50 (with early stopping)
- **Batch size:** 32

**Why these choices?**
- Adam optimizer: Works well for most problems, adapts learning rate automatically
- Low learning rate: Fine-tuning needs small steps
- Categorical crossentropy: Standard for multi-class classification
- Early stopping: Stops when accuracy plateaus (prevents overfitting)

**Training time:** ~12 hours on CPU

**Model size:** 31 MB (small enough for mobile!)

## Step 8: Evaluation

**What it means:** Test the model and measure how well it works.

### My Evaluation Results

**Test accuracy:** 99.8%

**Wait, this seems too good?**

Actually, I got such high accuracy because:
- Dataset is very clean and consistent
- Images have clear ripeness indicators
- Transfer learning with MobileNetV2 is powerful
- Good data augmentation prevented overfitting

**But does it work in real life?**

That's why I created the prediction tracking system:
- SQLite database saves every prediction
- Can test on real photos and track accuracy
- Compare test set accuracy vs. real-world accuracy

**Performance metrics:**
- **Training accuracy:** 99.8%
- **Validation accuracy:** 99.8%
- **Test accuracy:** 99.8%
- **Training loss:** 0.012
- **No overfitting!** (train/val accuracies match)

**What I checked:**
1. Overall accuracy (99.8% âœ“)
2. Per-class accuracy (all classes > 98% âœ“)
3. Confusion matrix (very few mistakes âœ“)
4. Confidence levels (mostly > 90% âœ“)

**Success criteria met:** âœ… Target was â‰¥85%, achieved 99.8%

## Step 9: Deployment

**What it means:** Make the model available for use.

### My Deployment Plan

**Current status:** Model is trained and saved

**Phase 1: Python Scripts (âœ… DONE)**
- Command-line prediction: `python scripts/predict.py image.jpg`
- Database tracking: All predictions saved automatically
- Visualization tools: View prediction history

**Phase 2: Web Demo (PLANNED)**
- Flask web app
- Upload image or use webcam
- Get instant prediction with confidence
- Show prediction history

**Phase 3: Mobile App (FUTURE)**
- Convert to TensorFlow Lite (.tflite)
- Build Flutter app
- Camera integration
- Offline predictions (no internet needed)

**Why this order?**
1. Scripts first â†’ Quick testing and validation
2. Web app second â†’ Demo for teachers/portfolio
3. Mobile app last â†’ Full product (takes more time)

**Deployment files created:**
- `models/fruit_classifier.keras` (trained model)
- `models/class_labels.json` (class names)
- `models/training_config.json` (training info)
- `scripts/predict.py` (prediction script)
- `scripts/db_helper.py` (database functions)
- `predictions.db` (SQLite database)

## Step 10: Feedback

**What it means:** Get feedback, monitor performance, and improve the model.

### My Feedback System

**How I track feedback:**

**1. Database System**
- Every prediction saved to `predictions.db`
- Tracks: image, prediction, confidence, timestamp
- Can mark predictions as correct/incorrect

**2. Performance Monitoring**
```python
# Check accuracy on real images
python scripts/view_history.py

# Visualize results
python scripts/visualize_predictions.py
```

**3. Error Analysis**
- Query database for wrong predictions
- See which fruits get confused
- Example: "Does it confuse unripe bananas with fresh bananas?"

**What to look for:**
- Real-world accuracy lower than test accuracy? â†’ Need more diverse training data
- Specific class performing badly? â†’ Need more examples of that class
- Low confidence predictions? â†’ Model is uncertain, might need retraining

**Improvement plan:**
1. Test on 100 real fruit photos
2. Calculate real-world accuracy
3. If < 85%, retrain with more augmentation
4. If specific fruit fails, add more training images for it
5. Repeat until real-world accuracy â‰¥ 85%

**Feedback loop:**
```
Use model â†’ Save predictions â†’ Analyze errors â†’ 
Collect more data â†’ Retrain â†’ Test again â†’ Deploy
```

**Why this matters:**
- Test data might not match real-world conditions
- Continuous improvement based on actual usage
- Builds trust in the model

## Summary: Complete IBM AI Methodology Applied

| Step | What I Did | Where to See It |
|------|-----------|----------------|
| 1. Business Understanding | Identified fruit shopping problem, defined success criteria (â‰¥85% accuracy) | This notebook |
| 2. Analytic Approach | Chose CNN + Transfer Learning (MobileNetV2) | This notebook |
| 3. Data Requirements | Defined need for 9-class fruit images, ~2000 per class | This notebook |
| 4. Data Collection | Downloaded Kaggle dataset (20K images) | `data/` folder |
| 5. Data Understanding | Analyzed dataset size, distribution, quality | `01_Dataset_Analysis.ipynb` |
| 6. Data Preparation | Preprocessing, augmentation, batching | `02_Model_Training.ipynb` |
| 7. Modeling | Built and trained MobileNetV2 model (50 epochs) | `02_Model_Training.ipynb` |
| 8. Evaluation | Tested model, achieved 99.8% accuracy | `02_Model_Training.ipynb` |
| 9. Deployment | Created prediction scripts, database system | `scripts/predict.py` |
| 10. Feedback | Database tracking, error analysis tools | `scripts/visualize_predictions.py` |

---

## Project Status

âœ… **Completed:**
- Data analysis
- Model training
- Model evaluation
- Prediction scripts
- Database system

ðŸ”„ **In Progress:**
- Real-world testing
- Performance monitoring

ðŸ“… **Planned:**
- Flask web demo
- Mobile app (TensorFlow Lite)

---

**Student:** Maria Paula Salazar Agudelo  
**Course:** AI Minor - Personal Challenge  
**Date:** 2025