Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does VGG not support training with CPU? #4

Closed
Ostnie opened this issue Apr 4, 2018 · 6 comments
Closed

Does VGG not support training with CPU? #4

Ostnie opened this issue Apr 4, 2018 · 6 comments

Comments

@Ostnie
Copy link

Ostnie commented Apr 4, 2018

Hi, I'm a student study deep learning for several month and your job help me a lot ,It's a wonderful job!
But now I meet a problem that if I run the vgg , the code will stop running without any error ,and the systerm told me that the python didn't work ,I never met such a problem before.
Besides this , in my data the Alexnet is much better than resnet ,I do not know how to explain this situation ,I both use your default parameter ,and the acc of Alexnet is 99.5% and the resnet is only 91%.
Many thanks!

@dgurkaynak
Copy link
Owner

Thank you @Ostnie 👍

I've just checked, I can train VGGNet on my 2014 MBP without any error. Models should work on both CPU and GPU. No hardware is specified in the code, tensorflow detects automatically which hardware it should run on. What version of python and tensorflow do you use? Can you add console output?

According to my experiences, ResNet performs better than AlexNet. But I think it's perfectly normal to expect AlexNet outperforming ResNet in some cases. ResNet is much deeper network, so it requires more data to train. If your dataset is relatively small, AlexNet can learn better than ResNet.

@Ostnie
Copy link
Author

Ostnie commented Apr 24, 2018

Hi,How can I use your code to predict new data ,I did have some try but failed ,could you please give me some examples or advice ?

@Ostnie
Copy link
Author

Ostnie commented Apr 24, 2018

pred.zip

  1. Hi, I'm glad to tell you I have success run my code for prediction ,it's pred.zip that I upload , but I think the code is a little strange ,maybe it can works well but I think it is not the right way to use for prediction .I only change the code after 96 lines, I guess when you want to use a model you should define all the variables again ,so I copyed the 0-96 line in your code ,but it makes me uncomfortable and I hope you can tell me the right way to use a model for prediction.

2.Another problem is that why the prediction speed become more and more slowly with the program running ? At the begining of the program ,one picture takes 0.4 second ,five minutes late,it takes 0.8 second a picture! and at last it takes 1.5 second ,I don't know how to solve it ,could you please help me?

3.Oh,I have a new question ,If I can't get a good result by this fintune work ,besides change the learning rate ,the train layers ,the Dropout ,what can I do ? Can we change this fintune program's network structure ?

@dgurkaynak
Copy link
Owner

  1. That's a really tough question. Perform a grid search for hyperparameters. You can try with a small subset of your dataset in order to make sure network is working.

How big is your dataset and what is the number of classes? You can try to augment your dataset.

Take this example code to do just prediction. I'm not sure its working, I haven't tested it. But you can easily edit.

import os, sys
import numpy as np
import tensorflow as tf
import datetime
from model import ResNetModel
sys.path.insert(0, '../utils')
from preprocessor import BatchPreprocessor


tf.app.flags.DEFINE_float('learning_rate', 0.0001, 'Learning rate for adam optimizer')
tf.app.flags.DEFINE_integer('resnet_depth', 50, 'ResNet architecture to be used: 50, 101 or 152')
tf.app.flags.DEFINE_integer('num_classes', 26, 'Number of classes')
tf.app.flags.DEFINE_integer('batch_size', 64, 'Batch size')
tf.app.flags.DEFINE_string('val_file', '../data/val.txt', 'Validation dataset file')

FLAGS = tf.app.flags.FLAGS


def main(_):
    # Placeholders
    x = tf.placeholder(tf.float32, [FLAGS.batch_size, 224, 224, 3])
    y = tf.placeholder(tf.float32, [None, FLAGS.num_classes])
    is_training = tf.placeholder('bool', [])

    # Model
    model = ResNetModel(is_training, depth=FLAGS.resnet_depth, num_classes=FLAGS.num_classes)
    prob = model.inference(x)

    # Accuracy of the model
    correct_pred = tf.equal(tf.argmax(model.prob, 1), tf.argmax(y, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
    saver = tf.train.Saver()

    val_preprocessor = BatchPreprocessor(dataset_file_path=FLAGS.val_file, num_classes=FLAGS.num_classes, output_size=[224, 224])

    # Get the number of training/validation steps per epoch
    val_batches_per_epoch = np.floor(len(val_preprocessor.labels) / FLAGS.batch_size).astype(np.int16)


    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        train_writer.add_graph(sess.graph)

        # Load the pretrained weights
        # model.load_original_weights(sess, skip_layers=train_layers)

        # Directly restore (your model should be exactly the same with checkpoint)
        saver.restore(sess, "/some/path/to.ckpt")


        # Start validation
        test_acc = 0.
        test_count = 0

        for _ in range(val_batches_per_epoch):
            batch_tx, batch_ty = val_preprocessor.next_batch(FLAGS.batch_size)
            acc = sess.run(accuracy, feed_dict={x: batch_tx, y: batch_ty, is_training: False})
            test_acc += acc
            test_count += 1

        test_acc /= test_count
        print("{} Validation Accuracy = {:.4f}".format(datetime.datetime.now(), test_acc))

        # Reset the dataset pointers
        val_preprocessor.reset_pointer()

            
if __name__ == '__main__':
    tf.app.run()

@Ostnie
Copy link
Author

Ostnie commented May 4, 2018

I'm a little curious,Does nobody use trained models to solve problems? I barely saw anyone give a tutorial on how to use the trained model to work ,they just train their model and get a number called accurancy ,but there is no sense if the model can't solve the wild problem .Maybe they just want to write a paper ?

@dgurkaynak
Copy link
Owner

Yeah, transfer learning and finetuning are commonly used techniques in the literature. I recommend to read "How transferable are features in deep neural networks?" paper.

@dgurkaynak dgurkaynak mentioned this issue May 10, 2018
This was referenced Jan 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants