<center><img src="../images/rubblescout.png" alt="Header" style="width: 400px;"/></center>

# Finger Counting Using Computer Vision

## Introduction
This project aims to develop an intelligent system capable of counting the number of fingers on a human hand in real-time. The goal is to display this count in an intuitive way using a binary display on LEDs, controlled via an ESP32 microcontroller. This task involves the use of advanced computer vision and artificial intelligence technologies, including leveraging the powerful Jetson Nano to run a deep learning model.

## Context
Microcontrollers such as the Jetson Nano and ESP32 are at the heart of many modern applications in robotics and the Internet of Things (IoT). They are capable of processing complex signals and controlling devices in real-time. The integration of artificial intelligence into these systems paves the way for more intuitive and interactive applications.

## Problem Statement
Recognizing human gestures is a complex issue due to the variability of hand shapes and lighting conditions. The challenge is to develop a system that can understand these visual signals and translate them into a simple digital format that is easy for users to interpret.

## Project Development
### Model Architecture
For this project, we used a deep learning model based on ResNet-34, known for its performance in image classification tasks. This model was trained directly on the Jetson Nano, taking advantage of its GPU computing capabilities to speed up the process.

### Training and Model Performance
The model was trained on a dataset consisting of images of hands displaying a varying number of fingers. With the Jetson Nano's computational power, the model achieved satisfactory accuracy, demonstrating its ability to count fingers in real-time.

### Integration with ESP32
The trained model is used to perform inference on the Jetson Nano. The results are then transmitted to the ESP32 via an I2C connection, where they are converted into a binary display on LEDs.

## Results and Discussion
Tests performed show that the system can count fingers with high accuracy in real-time. The results are successfully displayed on the ESP32, validating the chosen approach.

## Dataset and Model Training
The initial dataset utilized in this project comprised exclusively of left-hand images, which provided a foundational understanding for the model. To enhance the model's robustness and increase its accuracy, the dataset was augmented with additional photos captured in a controlled environment. These supplementary images were crucial in training the model to recognize various finger counts and hand orientations. However, it is important to note that the current limitation of the model is its specificity to the left hand in certain orientations. Future iterations of the model could benefit from a more diverse dataset that includes multiple hand orientations and both left and right hands to improve generalizability.

## Challenges and Limitations
One of the key challenges faced during the project was the model's initial inability to generalize well to different hand orientations and lighting conditions. This was mitigated by augmenting the dataset with various images of the left hand. Despite improvements, the model still primarily recognizes the left hand within a certain orientation range. Further work is needed to extend the model's capabilities to include a wider variety of hand positions and to ensure robust recognition regardless of the hand used.

Here are the gestures that the current model recognize :

<center><img src="../images/rubblescout.png" alt="Header" style="width: 400px;"/></center>
<center><img src="../images/rubblescout.png" alt="Header" style="width: 400px;"/></center>
<center><img src="../images/rubblescout.png" alt="Header" style="width: 400px;"/></center>
<center><img src="../images/rubblescout.png" alt="Header" style="width: 400px;"/></center>
<center><img src="../images/rubblescout.png" alt="Header" style="width: 400px;"/></center>
<center><img src="../images/rubblescout.png" alt="Header" style="width: 400px;"/></center>

## Future Work
To overcome the current limitations, it is proposed that future work should include collecting a balanced dataset that contains images of both left and right hands in multiple orientations and under diverse lighting conditions. Additionally, applying techniques such as data augmentation and transfer learning may further enhance the model's performance and its ability to operate in real-world scenarios.


## Conclusion
This project demonstrates the effectiveness of intelligent embedded systems in recognizing and interpreting human gestures. The successful integration of computer vision and electronics opens the way for innovative applications in robotics and IoT.

## References
- He K., Zhang X., Ren S., and Sun J., "Deep Residual Learning for Image Recognition", 2016.
- Jetson Nano Developer Kit, NVIDIA.
- ESP32, Espressif Systems.


# Getting Started with AI on Jetson Nano
### Interactive Classification Tool

This notebook is an interactive data collection, training, and testing tool, provided as part of the NVIDIA Deep Learning Institute (DLI) course, "Getting Started with AI on Jetson Nano". It is designed to be run on the Jetson Nano in conjunction with the detailed instructions provided in the online DLI course pages. 

To start the tool, set the **Camera** and **Task** code cell definitions, then execute all cells.  The interactive tool widgets at the bottom of the notebook will display.  The tool can then be used to gather data, add data, train data, and test data in an iterative and interactive fashion! 

The explanations in this notebook are intentionally minimal to provide a streamlined experience.  Please see the DLI course pages for detailed information on tool operation and project creation.

### Camera
First, create your camera and set it to `running`.  Uncomment the appropriate camera selection lines, depending on which type of camera you're using (USB or CSI). This cell may take several seconds to execute.

<div style="border:2px solid black; background-color:#e3ffb3; font-size:12px; padding:8px; margin-top: auto;">
    <h4><i>Tip</i></h4>
    <p>There can only be one instance of CSICamera or USBCamera at a time.  Before starting this notebook, make sure you have executed the final "shutdown" cell in any other notebooks you have run so that the camera is released. 
    </p>
</div>

In [1]:
# Check device number
!ls -ltrh /dev/video*

crw-rw---- 1 root video 81, 0 Apr 16 14:38 /dev/video0


In [2]:
from jetcam.usb_camera import USBCamera
from jetcam.csi_camera import CSICamera

# for USB Camera (Logitech C270 webcam), uncomment the following line
camera = USBCamera(width=224, height=224, capture_device=0) # confirm the capture_device number

# for CSI Camera (Raspberry Pi Camera Module V2), uncomment the following line
# camera = CSICamera(width=224, height=224, capture_device=0) # confirm the capture_device number

camera.running = True
print("camera created")

camera created


### Task
Next, define your project `TASK` and what `CATEGORIES` of data you will collect.  You may optionally define space for multiple `DATASETS` with names of your choosing. 

Uncomment/edit the associated lines for the classification task you're building and execute the cell.
This cell should only take a few seconds to execute.

In [3]:
import torchvision.transforms as transforms
from dataset import ImageClassificationDataset

TASK = 'fingers'

CATEGORIES = ['0','1', '2', '3', '4', '5']

DATASETS = ['A']

TRANSFORMS = transforms.Compose([
    transforms.ColorJitter(0.2, 0.2, 0.2, 0.2),
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

datasets = {}
for name in DATASETS:
    datasets[name] = ImageClassificationDataset('../data/classification/' + TASK + '_' + name, CATEGORIES, TRANSFORMS)
    
print("{} task with {} categories defined".format(TASK, CATEGORIES))

fingers task with ['0', '1', '2', '3', '4', '5'] categories defined


In [4]:
# Set up the data directory location if not there already
DATA_DIR = '/nvdli-nano/data/classification/'
!mkdir -p {DATA_DIR}

### Data Collection
Execute the cell below to create the data collection tool widget. This cell should only take a few seconds to execute.

In [5]:
import ipywidgets
import traitlets
from IPython.display import display
from jetcam.utils import bgr8_to_jpeg

# initialize active dataset
dataset = datasets[DATASETS[0]]

# unobserve all callbacks from camera in case we are running this cell for second time
camera.unobserve_all()

# create image preview
camera_widget = ipywidgets.Image()
traitlets.dlink((camera, 'value'), (camera_widget, 'value'), transform=bgr8_to_jpeg)

# create widgets
dataset_widget = ipywidgets.Dropdown(options=DATASETS, description='dataset')
category_widget = ipywidgets.Dropdown(options=dataset.categories, description='category')
count_widget = ipywidgets.IntText(description='count')
save_widget = ipywidgets.Button(description='add')

# manually update counts at initialization
count_widget.value = dataset.get_count(category_widget.value)

# sets the active dataset
def set_dataset(change):
    global dataset
    dataset = datasets[change['new']]
    count_widget.value = dataset.get_count(category_widget.value)
dataset_widget.observe(set_dataset, names='value')

# update counts when we select a new category
def update_counts(change):
    count_widget.value = dataset.get_count(change['new'])
category_widget.observe(update_counts, names='value')

# save image for category and update counts
def save(c):
    dataset.save_entry(camera.value, category_widget.value)
    count_widget.value = dataset.get_count(category_widget.value)
save_widget.on_click(save)

data_collection_widget = ipywidgets.VBox([
    ipywidgets.HBox([camera_widget]), dataset_widget, category_widget, count_widget, save_widget
])

# display(data_collection_widget)
print("data_collection_widget created")

data_collection_widget created


### Model
Execute the following cell to define the neural network and adjust the fully connected layer (`fc`) to match the outputs required for the project.  This cell may take several seconds to execute.

In [6]:
import torch
import torchvision


device = torch.device('cuda')

# RESNET 18
#model = torchvision.models.resnet18(pretrained=True)
#model.fc = torch.nn.Linear(512, len(dataset.categories))

# RESNET 34
model = torchvision.models.resnet34(pretrained=True)
model.fc = torch.nn.Linear(512, len(dataset.categories))
    
model = model.to(device)

model_save_button = ipywidgets.Button(description='save model')
model_load_button = ipywidgets.Button(description='load model')
model_path_widget = ipywidgets.Text(description='model path', value='/nvdli-nano/data/classification/my_model.pth')

def load_model(c):
    model.load_state_dict(torch.load(model_path_widget.value))
model_load_button.on_click(load_model)
    
def save_model(c):
    torch.save(model.state_dict(), model_path_widget.value)
model_save_button.on_click(save_model)

model_widget = ipywidgets.VBox([
    model_path_widget,
    ipywidgets.HBox([model_load_button, model_save_button])
])

# display(model_widget)
print("model configured and model_widget created")

model configured and model_widget created


### Live  Execution
Execute the cell below to set up the live execution widget.  This cell should only take a few seconds to execute.

In [7]:
import threading
import time
from utils import preprocess
import torch.nn.functional as F

state_widget = ipywidgets.ToggleButtons(options=['stop', 'live'], description='state', value='stop')
prediction_widget = ipywidgets.Text(description='prediction')
score_widgets = []
for category in dataset.categories:
    score_widget = ipywidgets.FloatSlider(min=0.0, max=1.0, description=category, orientation='vertical')
    score_widgets.append(score_widget)

def live(state_widget, model, camera, prediction_widget, score_widget):
    global dataset
    while state_widget.value == 'live':
        image = camera.value
        preprocessed = preprocess(image)
        output = model(preprocessed)
        output = F.softmax(output, dim=1).detach().cpu().numpy().flatten()
        category_index = output.argmax()
        prediction_widget.value = dataset.categories[category_index]
        for i, score in enumerate(list(output)):
            score_widgets[i].value = score
            
def start_live(change):
    if change['new'] == 'live':
        execute_thread = threading.Thread(target=live, args=(state_widget, model, camera, prediction_widget, score_widget))
        execute_thread.start()

state_widget.observe(start_live, names='value')

live_execution_widget = ipywidgets.VBox([
    ipywidgets.HBox(score_widgets),
    prediction_widget,
    state_widget
])

# display(live_execution_widget)
print("live_execution_widget created")

live_execution_widget created


### Training and Evaluation
Execute the following cell to define the trainer, and the widget to control it. This cell may take several seconds to execute.

In [8]:
BATCH_SIZE = 8

optimizer = torch.optim.Adam(model.parameters())
# optimizer = torch.optim.SGD(model.parameters(), lr=1e-3, momentum=0.9)

epochs_widget = ipywidgets.IntText(description='epochs', value=1)
eval_button = ipywidgets.Button(description='evaluate')
train_button = ipywidgets.Button(description='train')
loss_widget = ipywidgets.FloatText(description='loss')
accuracy_widget = ipywidgets.FloatText(description='accuracy')
progress_widget = ipywidgets.FloatProgress(min=0.0, max=1.0, description='progress')

def train_eval(is_training):
    global BATCH_SIZE, LEARNING_RATE, MOMENTUM, model, dataset, optimizer, eval_button, train_button, accuracy_widget, loss_widget, progress_widget, state_widget
    
    try:
        train_loader = torch.utils.data.DataLoader(
            dataset,
            batch_size=BATCH_SIZE,
            shuffle=True
        )

        state_widget.value = 'stop'
        train_button.disabled = True
        eval_button.disabled = True
        time.sleep(1)

        if is_training:
            model = model.train()
        else:
            model = model.eval()
        while epochs_widget.value > 0:
            i = 0
            sum_loss = 0.0
            error_count = 0.0
            for images, labels in iter(train_loader):
                # send data to device
                images = images.to(device)
                labels = labels.to(device)

                if is_training:
                    # zero gradients of parameters
                    optimizer.zero_grad()

                # execute model to get outputs
                outputs = model(images)

                # compute loss
                loss = F.cross_entropy(outputs, labels)

                if is_training:
                    # run backpropogation to accumulate gradients
                    loss.backward()

                    # step optimizer to adjust parameters
                    optimizer.step()

                # increment progress
                error_count += len(torch.nonzero(outputs.argmax(1) - labels).flatten())
                count = len(labels.flatten())
                i += count
                sum_loss += float(loss)
                progress_widget.value = i / len(dataset)
                loss_widget.value = sum_loss / i
                accuracy_widget.value = 1.0 - error_count / i
                
            if is_training:
                epochs_widget.value = epochs_widget.value - 1
            else:
                break
    except e:
        pass
    model = model.eval()

    train_button.disabled = False
    eval_button.disabled = False
    state_widget.value = 'live'
    
train_button.on_click(lambda c: train_eval(is_training=True))
eval_button.on_click(lambda c: train_eval(is_training=False))
    
train_eval_widget = ipywidgets.VBox([
    epochs_widget,
    progress_widget,
    loss_widget,
    accuracy_widget,
    ipywidgets.HBox([train_button, eval_button])
])

# display(train_eval_widget)
print("trainer configured and train_eval_widget created")

trainer configured and train_eval_widget created


### Display the Interactive Tool!

The interactive tool includes widgets for data collection, training, and testing.

<center><img src="../images/classification_tool_key2.png" alt="tool key" width=500/></center>
<br>
<center><img src="../images/classification_tool_key1.png" alt="tool key"/></center>

Execute the cell below to create and display the full interactive widget.  Follow the instructions in the online DLI course pages to build your project.

In [9]:
# Combine all the widgets into one display
all_widget = ipywidgets.VBox([
    ipywidgets.HBox([data_collection_widget, live_execution_widget]), 
    train_eval_widget,
    model_widget
])

display(all_widget)

VBox(children=(HBox(children=(VBox(children=(HBox(children=(Image(value=b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01…

<h1 style="background-color:#76b900;"></h1>

## Before you go...<br><br>Shut down the camera and/or notebook kernel to release the camera resource

In [None]:
# Attention!  Execute this cell before moving to another notebook
# The USB camera application only requires that the notebook be reset
# The CSI camera application requires that the 'camera' object be specifically released

import os
import IPython

if type(camera) is CSICamera:
    print("Ignore 'Exception in thread' tracebacks\n")
    camera.cap.release()

os._exit(00)

Return to the DLI course pages for the next instructions.

<center><img src="../images/DLI Header.png" alt="Header" style="width: 400px;"/></center>