Skip to content

JennyVanessa/Paddle-SVHN

Repository files navigation

Paddle-SVHN

This project reproduces Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks based on the paddlepaddle framework and participates in the Baidu paper reproduction competition.)

1 Introduction

The main idea of this exercise is to study the evolvement of the state of the art and main work along topic of visual attention model. There are two datasets that are studied: augmented MNIST and SVHN. The former dataset focused on canonical problem  —  handwritten digits recognition, but with cluttering and translation, the latter focus on real world problem  —  street view house number (SVHN) transcription. In this exercise, the following papers are studied in the way of developing a good intuition to choose a proper model to tackle each of the above challenges.

Paper:

  • [1] Goodfellow I J, Bulatov Y, Ibarz J, et al. Multi-digit number recognition from street view imagery using deep convolutional neural networks[J]. arXiv preprint arXiv:1312.6082, 2013.

Reference project

The link of aistudio:

2 Results_Compared

Methods Model Download Batch Size Learning Rate Patience Decay Step Decay Rate Training Speed (FPS) Accuracy
Pytorch_SVHN torch_model 512 0.16 100 625 0.9 ~1700 95.65%
Paddle_SVHN paddle_model (Extraction_code: v4yj) 1024 0.01 100 625 0.9 ~1700 95.71%

3 Dataset

  • SVHN Dataset format 1
    • test.tar.gz
    • train.tar.gz
    • extra.tar.gz

4 Recommended environment

  • Python 3.6
  • paddlepaddle-gpu 2.0.2
  • visdom
  • protobuf
  • lmdb

5 Start

step1: Clone

git clone https://github.com/JennyVanessa/Paddle-SVHN.git
cd Paddle-SVHN

step2: Pip installl

Install paddle following the official tutorial.

pip install visdom
pip install h5py
pip install protobuf
pip install lmdb

step3: Download dataset

  1. Download SVHN Dataset format 1

  2. Extract to data folder, now your folder structure should be like below:

    SVHNClassifier
        - data
            - extra
                - 1.png 
                - 2.png
                - ...
                - digitStruct.mat
            - test
                - 1.png 
                - 2.png
                - ...
                - digitStruct.mat
            - train
                - 1.png 
                - 2.png
                - ...
                - digitStruct.mat
    

step4: Convert Dataset to Lmdb format

$ python convert_to_lmdb.py --data_dir ./data

step5: Train

Save training log in train.log file and save trained model in ./logs directory

$ python train.py --data_dir ./data --logdir ./logs >> train.log

The output is:

data/test.lmdb
Start training
=> 2021-11-01 16:47:58.488561: step 100, loss = 7.821150, learning_rate = 0.100000 (2290.5 examples/sec)
=> 2021-11-01 16:48:43.666231: step 200, loss = 7.897614, learning_rate = 0.100000 (2284.5 examples/sec)
=> 2021-11-01 16:49:30.083858: step 300, loss = 7.818493, learning_rate = 0.100000 (2293.1 examples/sec)
=> 2021-11-01 16:50:15.438407: step 400, loss = 7.806008, learning_rate = 0.100000 (2276.1 examples/sec)
=> 2021-11-01 16:51:02.383038: step 500, loss = 7.821648, learning_rate = 0.100000 (2284.2 examples/sec)
=> 2021-11-01 16:51:47.870291: step 600, loss = 7.811975, learning_rate = 0.100000 (2269.1 examples/sec)
=> 2021-11-01 16:52:34.556187: step 700, loss = 7.864832, learning_rate = 0.100000 (2283.2 examples/sec)
=> 2021-11-01 16:53:20.091155: step 800, loss = 7.786717, learning_rate = 0.090000 (2266.6 examples/sec)
=> 2021-11-01 16:54:06.938361: step 900, loss = 7.849339, learning_rate = 0.090000 (2278.9 examples/sec)
=> 2021-11-01 16:54:52.568350: step 1000, loss = 7.795635, learning_rate = 0.090000 (2261.9 examples/sec)
=> Model saved to file: ./logs/model-1000.pdparams
=> patience = 100
=> Evaluating on validation dataset...
==> accuracy = 0.022880, best accuracy 0.000000
...

Retrain if you need

$ python train.py --data_dir ./data --logdir ./logs_retrain --restore_checkpoint ./logs/model-100.pdparams

step6: Evaluate

$ python eval.py --data_dir ./data ./logs/model-100.pdparams

The output is:

Start evaluating
Evaluate /home/aistudio/logs/model-359000.pdparams on /home/aistudio/data/test.lmdb, accuracy = 0.953551
Done

step7: Infer

$ python infer.py --checkpoint=./logs/model-100.pdparams ./image/test1.png

The test1.png shows:

test1.png

The infer output is:

length: 2
digits: 7 5 10 10 10

step8: Clean

$ rm -rf ./logs
or
$ rm -rf ./logs_retrain

6 Code Structure

├─convert_to_lmdb.py                         
├─dataset.py                
├─eval.py                           
├─model.py    
├─evaluator.py
├─draw_bbox.py
├─example_pb2.py
├─infer.py
├─read_lmdb_sample.py
├─visiualize.py
├─train.py
├─train.log
├─images                          
│  test1.png              
                    

About

Multi-digit Number Recognition from Street View Imagery using Deep Convol utional Neural Networks

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages