Does VGG not support training with CPU? #4

Ostnie · 2018-04-04T08:23:59Z

Hi， I'm a student study deep learning for several month and your job help me a lot ,It's a wonderful job!
But now I meet a problem that if I run the vgg , the code will stop running without any error ,and the systerm told me that the python didn't work ,I never met such a problem before.
Besides this , in my data the Alexnet is much better than resnet ,I do not know how to explain this situation ，I both use your default parameter ,and the acc of Alexnet is 99.5% and the resnet is only 91%.
Many thanks!

dgurkaynak · 2018-04-05T20:52:39Z

Thank you @Ostnie 👍

I've just checked, I can train VGGNet on my 2014 MBP without any error. Models should work on both CPU and GPU. No hardware is specified in the code, tensorflow detects automatically which hardware it should run on. What version of python and tensorflow do you use? Can you add console output?

According to my experiences, ResNet performs better than AlexNet. But I think it's perfectly normal to expect AlexNet outperforming ResNet in some cases. ResNet is much deeper network, so it requires more data to train. If your dataset is relatively small, AlexNet can learn better than ResNet.

Ostnie · 2018-04-24T08:07:12Z

Hi，How can I use your code to predict new data ,I did have some try but failed ,could you please give me some examples or advice ?

Ostnie · 2018-04-24T12:07:15Z

pred.zip

Hi, I'm glad to tell you I have success run my code for prediction ,it's pred.zip that I upload , but I think the code is a little strange ,maybe it can works well but I think it is not the right way to use for prediction .I only change the code after 96 lines, I guess when you want to use a model you should define all the variables again ,so I copyed the 0-96 line in your code ,but it makes me uncomfortable and I hope you can tell me the right way to use a model for prediction.

2.Another problem is that why the prediction speed become more and more slowly with the program running ? At the begining of the program ,one picture takes 0.4 second ,five minutes late,it takes 0.8 second a picture! and at last it takes 1.5 second ,I don't know how to solve it ,could you please help me?

3.Oh，I have a new question ,If I can't get a good result by this fintune work ,besides change the learning rate ,the train layers ,the Dropout ,what can I do ? Can we change this fintune program's network structure ?

dgurkaynak · 2018-04-25T17:37:10Z

That's a really tough question. Perform a grid search for hyperparameters. You can try with a small subset of your dataset in order to make sure network is working.

How big is your dataset and what is the number of classes? You can try to augment your dataset.

Take this example code to do just prediction. I'm not sure its working, I haven't tested it. But you can easily edit.

import os, sys
import numpy as np
import tensorflow as tf
import datetime
from model import ResNetModel
sys.path.insert(0, '../utils')
from preprocessor import BatchPreprocessor


tf.app.flags.DEFINE_float('learning_rate', 0.0001, 'Learning rate for adam optimizer')
tf.app.flags.DEFINE_integer('resnet_depth', 50, 'ResNet architecture to be used: 50, 101 or 152')
tf.app.flags.DEFINE_integer('num_classes', 26, 'Number of classes')
tf.app.flags.DEFINE_integer('batch_size', 64, 'Batch size')
tf.app.flags.DEFINE_string('val_file', '../data/val.txt', 'Validation dataset file')

FLAGS = tf.app.flags.FLAGS


def main(_):
    # Placeholders
    x = tf.placeholder(tf.float32, [FLAGS.batch_size, 224, 224, 3])
    y = tf.placeholder(tf.float32, [None, FLAGS.num_classes])
    is_training = tf.placeholder('bool', [])

    # Model
    model = ResNetModel(is_training, depth=FLAGS.resnet_depth, num_classes=FLAGS.num_classes)
    prob = model.inference(x)

    # Accuracy of the model
    correct_pred = tf.equal(tf.argmax(model.prob, 1), tf.argmax(y, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
    saver = tf.train.Saver()

    val_preprocessor = BatchPreprocessor(dataset_file_path=FLAGS.val_file, num_classes=FLAGS.num_classes, output_size=[224, 224])

    # Get the number of training/validation steps per epoch
    val_batches_per_epoch = np.floor(len(val_preprocessor.labels) / FLAGS.batch_size).astype(np.int16)


    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        train_writer.add_graph(sess.graph)

        # Load the pretrained weights
        # model.load_original_weights(sess, skip_layers=train_layers)

        # Directly restore (your model should be exactly the same with checkpoint)
        saver.restore(sess, "/some/path/to.ckpt")


        # Start validation
        test_acc = 0.
        test_count = 0

        for _ in range(val_batches_per_epoch):
            batch_tx, batch_ty = val_preprocessor.next_batch(FLAGS.batch_size)
            acc = sess.run(accuracy, feed_dict={x: batch_tx, y: batch_ty, is_training: False})
            test_acc += acc
            test_count += 1

        test_acc /= test_count
        print("{} Validation Accuracy = {:.4f}".format(datetime.datetime.now(), test_acc))

        # Reset the dataset pointers
        val_preprocessor.reset_pointer()

            
if __name__ == '__main__':
    tf.app.run()

Ostnie · 2018-05-04T11:55:36Z

I'm a little curious,Does nobody use trained models to solve problems? I barely saw anyone give a tutorial on how to use the trained model to work ，they just train their model and get a number called accurancy ,but there is no sense if the model can't solve the wild problem .Maybe they just want to write a paper ?

dgurkaynak · 2018-05-06T07:04:23Z

Yeah, transfer learning and finetuning are commonly used techniques in the literature. I recommend to read "How transferable are features in deep neural networks?" paper.

dgurkaynak closed this as completed Apr 25, 2018

dgurkaynak mentioned this issue May 10, 2018

Checkpoint #7

Closed

dgurkaynak mentioned this issue Oct 30, 2019

Question about inference #14

Closed

This was referenced Jan 11, 2020

ckpt 2 npy #16

Closed

how to use the model created #15

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does VGG not support training with CPU? #4

Does VGG not support training with CPU? #4

Ostnie commented Apr 4, 2018

dgurkaynak commented Apr 5, 2018

Ostnie commented Apr 24, 2018

Ostnie commented Apr 24, 2018 •

edited

dgurkaynak commented Apr 25, 2018

Ostnie commented May 4, 2018

dgurkaynak commented May 6, 2018

Does VGG not support training with CPU? #4

Does VGG not support training with CPU? #4

Comments

Ostnie commented Apr 4, 2018

dgurkaynak commented Apr 5, 2018

Ostnie commented Apr 24, 2018

Ostnie commented Apr 24, 2018 • edited

dgurkaynak commented Apr 25, 2018

Ostnie commented May 4, 2018

dgurkaynak commented May 6, 2018

Ostnie commented Apr 24, 2018 •

edited