Why need the data processing step? #25

JonnyKong · 2018-04-06T11:40:40Z

Hi Matt,

Thanks for the elegant code in Tensorflow. But why is the data processing step necessary?

It seems to me that it's possible to load the dataset into memory before training (at least for the PacMan dataset), and then randomly select 32*32 patches at runtime. Will that make I/O faster?

Thanks in advance

serkansulun · 2018-04-12T10:56:23Z

Here is my hypothesis:
Once the dataset is expressive enough for the problem, it is more efficient to train the network over that dataset multiple times (i.e. epochs). In this case, preprocessing the data multiple times is unnecessary.

dyelax · 2018-04-14T17:42:30Z

Sorry for the delayed response! The initial reason I added a separate pre-processing pipeline was because I was being bottlenecked by i/o and it is much more efficient to read in 32x32 image patches than the full-sized images. I don't remember what the memory specs of my machine were, but I think I was running into issues loading the whole dataset into memory. If you are able to do that, it might make more sense to create the patches at runtime.

edit
One problem with creating the patches at runtime on the Ms. Pac-Man data is that most patches will have no movement in them. To solve this, I randomly sample patches until it finds one with movement. This means that it will take longer to generate some patches than others. It's probably more efficient to get all of this generation cost out of the way once during pre-processing than having to deal with it every epoch during training.

JonnyKong · 2018-04-15T04:01:15Z

Thanks for your reply? It's very helpful.

JonnyKong closed this as completed Apr 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why need the data processing step? #25

Why need the data processing step? #25

JonnyKong commented Apr 6, 2018

serkansulun commented Apr 12, 2018

dyelax commented Apr 14, 2018 •

edited

JonnyKong commented Apr 15, 2018

Why need the data processing step? #25

Why need the data processing step? #25

Comments

JonnyKong commented Apr 6, 2018

serkansulun commented Apr 12, 2018

dyelax commented Apr 14, 2018 • edited

JonnyKong commented Apr 15, 2018

dyelax commented Apr 14, 2018 •

edited