Skip to content


Repository files navigation

Visionary: Voice Guide for Visually Handicapped People

CS3244 Team Project
Created by National University of Singapore CS3244 Machine Learning Group 26. All of us are equally contributed.
Jin Shuyuan, Mou Ziyang, Tian Xin, Tian Xueyan, Wang Tengda, Zhao Tianze

1. App features

  • GPS location share
    location is shared to family to ensure blind people's safety in case of emergency.
  • Camera based image capturing
    capture real-time view in front of the user.
  • Object detection and classification
    focus on human object
  • Moving direction determination
    based on bounding box size in sequencial images to determine whether the peron is moving to/away from the user.
  • Voice Guide
    based on object classification and moving direction, output audio guide (a person is moving to you/ a person is moving away from you)

2. Detailed instructions for setting up YOLOv3(4.3 ~ 4.6)

The official YOLOv3 is implemented using C language on Linux OS. For our convenience, a simplified pytorch version is used for this project. And some useful parts of the code has been copied and uploaded in this repository.

  • Make sure your laptop has Python installed. Anaconda is recommended for installing Python as it helps to install various dependencies.
  • Run the following commands using Git Bash
    $ git clone
    $ cd CS3244-ML-Project/
  • Download pretrained weights
    The pretrained weights can be downloaded here which is trained on COCO dataset
    Then, copy the weights into .\weights

  • Obtain COCO dataset
    Due to space constraints of our laptop, for testing, we currently don't need to download the whole dataset
    Currently, there are limited images in .\data\samples. If you want to test more images, can put them into samples folder.
    Or, line 33 of webcam = True. This can activate the webcam to feed real-time webcam images into our model.

  • Run simple tests
    Please refer to Inference section in

python --cfg cfg/yolov3.cfg --weights weights/yolov3.weights --images data/testdata


python --cfg cfg/yolov3.cfg --weights weights/yolov3.weights --images data/testvideo

3. Moving direction determination and voice guide (4.7 ~ 4.13)

Output .txt files for bounding box and label

  • Line 19 of save_txt = True. This will save .txt files in Output folder

Understand .txt file structure

  • Each line contains 6 numbers. The first 4 numbers are bounding box coordinates. The 5th number is the category of the object. And the last number is confidence of the category
  • .\data\coco.names specifies possible categories. The (5th number + 1) line is the corresponding type.

Focus on person type objects

For simplicity and purpose of the app, we only focus on person type objects

  • detect.pyline 80 if int(cls) == 0: helps us to only mark person bounding box in output images.

Choose main object

In real conditions, there may be multiple people in front of the user, we need to compare bounding boxes size to determine which person is closer to user and take it as the main object.

Determine main object moving direction (4.8 ~ 4.10)

Compare the biggest bounding box with the average width or height of the previous 5 images. If current box size is larger, output audio "Approaching"

Transform detection and direction to voice (4.11 ~ 4.12)

4. Research on dataset, neural network and methods comparisons(4.1 ~ 4.2 & 4.9 ~ 4.13)

Neural Network NN Link Description Paper
YOLOv3 (for both windows and linux) YOLO is a state-of-the-art, real-time object detection system. It has Pretrained Convolutional Weights which is tarined on Imagenet.
Faster R-CNN


Dataset Link Description
COCO dataset
Pascal VOC challenge

5. Paper writing (4.14 ~ 4.17)

overleaf online latex editing using AAAI author toolkit
Our TeX source code is Here

6. Work Allocation

Assignment People
Research on other papers & Run rough experiments & Compare its disadvantages over YOLOv3 Wang Tengda, Mou Ziyang, Tian Xin
YOLOv3 detailed structure & techincal details Tian Xueyan, Jin Shuyuan
Bounding box comparison & Output video with voice Zhao Tianze


Blind people voice guide app







No releases published


No packages published