Skip to content

Shared-Reality-Lab/VNAVI

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VNAVI

CI CPU testing CI CPU testing

The repository for Vision-guided Navigation Assistance for the Visually Impaired project at Shared Reality Lab.

Key words: React Native, Nginx, Gunicorn, Python, YOLOv5, PyTorch, Docker, Linux.

1    Introduction

This application helps visually impaired people to reach objects of interest by performing camera captured image analysis and giving audio navigation on mobile phones. We expect the application is capable to run on multiple mobile platforms, e.g. Android and iOS. And the analysis should carry out locally or on cloud.

2    Functioning Senarios

To achieve this goal, we decompose it into several senarios. For example, navigation to doorways. In each senario, the application performs object detection, distance measurement, results rendering and audio feedback.

2.1    Doorways Navigation

This is the senario that we are working on. Ideally, the application gives audio guidance and inform the user the location of the nearby doorways. However, due to the absence of doorways dataset, we change to focus on doors and handles specifically.

3    Architecture and Implementation

This section berifly presents some of key points of the whole system, including basic architecture, frameworks and workflows.

3.1    Client Side

As the rising demand of application capabilities, it is hard and time-consuming for developers to transform the source code to different platforms. To solve this issue, we use a cross-platform framework React Native when developing the client app. The app starts the built-in camera in the mobile phone, and captures pictures that will be sent to the server for analysis. In addition, the client app gives feedback after it retrieving the analysis result from the server.

v0.0.1 Client Camera View v0.0.1 Client Result Window
Figure 1: App Client View (v0.0.1).

v0.0.2 Client Camera View v0.0.2 Client Result Window 1 v0.0.2 Client Result Window 2
Figure 2: App Client View (v0.0.2).

3.2    Server Side

The Flask server is responsible for receiving and processing requests from clients and making responses to them. Nginx and gunicorn help to listen requests and run python scripts. We use a customized YOLOv5 (You Only Look Once v5) [1] model to detect and locate doors in the image.

4    The Deep Learning Model

The most important task is creating a robust object detection workflow. After comparing a varity of deep learning CV approaches, we choose to take advantage of YOLOv5 as it is highly customizable and has a strong capability of detecting multiple objects.

The Door Detect dataset [2] serves training and testing purposes. The training set includes 1092 randomly picked images and labels, the remaining 121 images and labels are used for testing. A YOLOv5m with 1280 inputs is trained with the Door-detect dataset, the following figure shows the training and validation results.

Metrics Figure 3: Metrics.

Validation Figure 4: Validation.

5    Docker and Docker Compose

The server side application supports docker and docker compose. The base image for this docker image is from PyTorch with CUDA runtime [3], the tag for the specific image that we use is 1.11.0-cuda11.3-cudnn8-runtime. GPU accesses from docker container requires docker compose, the initial configuration points to the GPU with index 0 on the device.

6    References

[1] Ultralytics. You Only Look Once v5 (YOLOv5). Available at: https://github.com/ultralytics/yolov5.

[2] MiguelARD. Door Detect Dataset. Available at: https://github.com/MiguelARD/DoorDetect-Dataset.

[3] PyTorch. PyTorch Docker Image. Available at: https://hub.docker.com/r/pytorch/pytorch.

About

The repository for Vision-guided Navigation Assistance for the Visually Impaired project at Shared Reality Lab.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 86.2%
  • JavaScript 5.5%
  • Java 3.1%
  • C++ 1.4%
  • Shell 0.8%
  • Dockerfile 0.8%
  • Other 2.2%