Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why need the data processing step? #25

Closed
JonnyKong opened this issue Apr 6, 2018 · 3 comments
Closed

Why need the data processing step? #25

JonnyKong opened this issue Apr 6, 2018 · 3 comments

Comments

@JonnyKong
Copy link

Hi Matt,

Thanks for the elegant code in Tensorflow. But why is the data processing step necessary?

It seems to me that it's possible to load the dataset into memory before training (at least for the PacMan dataset), and then randomly select 32*32 patches at runtime. Will that make I/O faster?

Thanks in advance

@serkansulun
Copy link

Here is my hypothesis:
Once the dataset is expressive enough for the problem, it is more efficient to train the network over that dataset multiple times (i.e. epochs). In this case, preprocessing the data multiple times is unnecessary.

@dyelax
Copy link
Owner

dyelax commented Apr 14, 2018

Sorry for the delayed response! The initial reason I added a separate pre-processing pipeline was because I was being bottlenecked by i/o and it is much more efficient to read in 32x32 image patches than the full-sized images. I don't remember what the memory specs of my machine were, but I think I was running into issues loading the whole dataset into memory. If you are able to do that, it might make more sense to create the patches at runtime.

edit
One problem with creating the patches at runtime on the Ms. Pac-Man data is that most patches will have no movement in them. To solve this, I randomly sample patches until it finds one with movement. This means that it will take longer to generate some patches than others. It's probably more efficient to get all of this generation cost out of the way once during pre-processing than having to deal with it every epoch during training.

@JonnyKong
Copy link
Author

Thanks for your reply? It's very helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants