VNAVI

The repository for Vision-guided Navigation Assistance for the Visually Impaired project at Shared Reality Lab.

^{Key words: React Native, Nginx, Gunicorn, Python, YOLOv5, PyTorch, Docker, Linux.}

1 Introduction

This application helps visually impaired people to reach objects of interest by performing camera captured image analysis and giving audio navigation on mobile phones. We expect the application is capable to run on multiple mobile platforms, e.g. Android and iOS. And the analysis should carry out locally or on cloud.

2 Functioning Senarios

To achieve this goal, we decompose it into several senarios. For example, navigation to doorways. In each senario, the application performs object detection, distance measurement, results rendering and audio feedback.

2.1 Doorways Navigation

This is the senario that we are working on. Ideally, the application gives audio guidance and inform the user the location of the nearby doorways. However, due to the absence of doorways dataset, we change to focus on doors and handles specifically.

3 Architecture and Implementation

This section berifly presents some of key points of the whole system, including basic architecture, frameworks and workflows.

3.1 Client Side

As the rising demand of application capabilities, it is hard and time-consuming for developers to transform the source code to different platforms. To solve this issue, we use a cross-platform framework React Native when developing the client app. The app starts the built-in camera in the mobile phone, and captures pictures that will be sent to the server for analysis. In addition, the client app gives feedback after it retrieving the analysis result from the server.

Figure 1: App Client View (v0.0.1).

Figure 2: App Client View (v0.0.2).

3.2 Server Side

The Flask server is responsible for receiving and processing requests from clients and making responses to them. Nginx and gunicorn help to listen requests and run python scripts. We use a customized YOLOv5 (You Only Look Once v5) [1] model to detect and locate doors in the image.

4 The Deep Learning Model

The most important task is creating a robust object detection workflow. After comparing a varity of deep learning CV approaches, we choose to take advantage of YOLOv5 as it is highly customizable and has a strong capability of detecting multiple objects.

The Door Detect dataset [2] serves training and testing purposes. The training set includes 1092 randomly picked images and labels, the remaining 121 images and labels are used for testing. A YOLOv5m with 1280 inputs is trained with the Door-detect dataset, the following figure shows the training and validation results.

Figure 3: Metrics.

Figure 4: Validation.

5 Docker and Docker Compose

The server side application supports docker and docker compose. The base image for this docker image is from PyTorch with CUDA runtime [3], the tag for the specific image that we use is 1.11.0-cuda11.3-cudnn8-runtime. GPU accesses from docker container requires docker compose, the initial configuration points to the GPU with index 0 on the device.

6 References

[1] Ultralytics. You Only Look Once v5 (YOLOv5). Available at: https://github.com/ultralytics/yolov5.

[2] MiguelARD. Door Detect Dataset. Available at: https://github.com/MiguelARD/DoorDetect-Dataset.

[3] PyTorch. PyTorch Docker Image. Available at: https://hub.docker.com/r/pytorch/pytorch.

Name		Name	Last commit message	Last commit date
Latest commit History 111 Commits
.github/workflows		.github/workflows
Resources		Resources
VnaviApp		VnaviApp
vnavibackend		vnavibackend
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

Resources

Resources

VnaviApp

VnaviApp

vnavibackend

vnavibackend

.gitignore

.gitignore

README.md

README.md

Repository files navigation

VNAVI

1 Introduction

2 Functioning Senarios

2.1 Doorways Navigation

3 Architecture and Implementation

3.1 Client Side

3.2 Server Side

4 The Deep Learning Model

5 Docker and Docker Compose

6 References

About

Releases

Packages

Languages

Shared-Reality-Lab/VNAVI

Folders and files

Latest commit

History

Repository files navigation

VNAVI

1 Introduction

2 Functioning Senarios

2.1 Doorways Navigation

3 Architecture and Implementation

3.1 Client Side

3.2 Server Side

4 The Deep Learning Model

5 Docker and Docker Compose

6 References

About

Resources

Stars

Watchers

Forks

Languages