Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eval Script #653

Merged
merged 8 commits into from Jan 24, 2019

Conversation

Projects
None yet
2 participants
@jamesmcclain
Copy link
Member

jamesmcclain commented Jan 7, 2019

Overview

Run DeepLab evaluation script upon request.

Test with this branch.

Note: this current logs evaluation events to a directory other than the training log directory. This is because logging to that directory appears to disrupt the normal logging behavior of the training script (it might be a problem caused by the slow computer that I am using, I will have to investigate more later).

The eval script seems to add these:
screenshot_2019-01-07_07-19-48

Checklist

  • Updated docs/changelog.rst
  • Ran scripts/format_code and committed any changes
  • Documentation updated if needed
  • PR has a name that won't get you publicly shamed for vagueness

Closes #647

@jamesmcclain jamesmcclain added the review label Jan 7, 2019

@jamesmcclain jamesmcclain changed the title Eval Eval Script Jan 7, 2019

@lewfish
Copy link
Contributor

lewfish left a comment

I tried it and the miou metric was logged to the console, but didn't show up in Tensorboard.

replace_model: Replace the model checkpoint if exists.
If false, this will continue training from
checkpoing if exists, if the backend allows for this.
do_eval: Boolean determining whether to run the eval

This comment has been minimized.

Copy link
@lewfish

lewfish Jan 14, 2019

Contributor

I think the default for this should be True. (this change should also occur in the protobuf definition)

This comment has been minimized.

Copy link
@lewfish

lewfish Jan 14, 2019

Contributor

It seems like most people would want to see validation metrics, and only some would want to turn it off (to save CPU power I guess).

This comment has been minimized.

Copy link
@jamesmcclain

jamesmcclain Jan 14, 2019

Author Member

The eval script is extremely expensive to run.

This comment has been minimized.

Copy link
@lewfish

lewfish Jan 14, 2019

Contributor

Do you know if it uses the CPU in parallel with training on the GPU? I'm hoping that's the case. Using one of the CPUs isn't so bad, especially if we set the number of validation samples to something relatively small so it won't be constantly running.

This comment has been minimized.

Copy link
@jamesmcclain

jamesmcclain Jan 15, 2019

Author Member

To the best of my knowledge, it uses the same resource(s) as the training script.

Show resolved Hide resolved rastervision/backend/tf_deeplab.py Outdated
@jamesmcclain

This comment has been minimized.

Copy link
Member Author

jamesmcclain commented Jan 18, 2019

Merging shortly...

@jamesmcclain

This comment has been minimized.

Copy link
Member Author

jamesmcclain commented Jan 18, 2019

I tried it and the miou metric was logged to the console, but didn't show up in Tensorboard.

Is this a fresh comment? miou should show up after some time (after the first checkpoint has been created and the eval script has had time to run).

@lewfish

This comment has been minimized.

Copy link
Contributor

lewfish commented Jan 18, 2019

Is this a fresh comment? miou should show up after some time (after the first checkpoint has been created and the eval script has had time to run).

I tried it a while ago, and that's what happened. Maybe I didn't wait long enough for the first checkpoint.

@lewfish

This comment has been minimized.

Copy link
Contributor

lewfish commented Jan 18, 2019

I'll test it again.

@lewfish

This comment has been minimized.

Copy link
Contributor

lewfish commented Jan 18, 2019

I followed your testing instructions and didn't see the miou panel in Tensorboard even after checkpoints were written.

screen shot 2019-01-18 at 1 45 51 pm

jamesmcclain added some commits Jan 7, 2019

@jamesmcclain

This comment has been minimized.

Copy link
Member Author

jamesmcclain commented Jan 21, 2019

Confirmed working with this branch.

screenshot_2019-01-21_09-47-29

Keep in mind that this is somewhat timing dependent as well. I was never able to see the miou show up in tensorboard on my laptop because running the training and evaluation scripts at the same time seemed to be too taxing (the eval process seemed to either stop making progress or some other timing bug was exposed). On a more powerful computer, the miou shows up every time.

This goes to my comment above about the eval script being probably too expensive to be practical (or at least desirable).

@jamesmcclain jamesmcclain force-pushed the jamesmcclain:eval branch from 7d178b5 to 8ea561f Jan 21, 2019

@lewfish

This comment has been minimized.

Copy link
Contributor

lewfish commented Jan 24, 2019

Keep in mind that this is somewhat timing dependent as well. I was never able to see the miou show up in tensorboard on my laptop because running the training and evaluation scripts at the same time seemed to be too taxing (the eval process seemed to either stop making progress or some other timing bug was exposed). On a more powerful computer, the miou shows up every time.

This goes to my comment above about the eval script being probably too expensive to be practical (or at least desirable).

Ah, ok.

@lewfish

This comment has been minimized.

Copy link
Contributor

lewfish commented Jan 24, 2019

I think this is ready to merge.

@jamesmcclain jamesmcclain merged commit 66a1ef5 into azavea:develop Jan 24, 2019

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@jamesmcclain jamesmcclain deleted the jamesmcclain:eval branch Jan 24, 2019

@jamesmcclain jamesmcclain removed the review label Jan 24, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.