Sort out models, data, and tools #142

shelhamer · 2014-02-23T23:22:01Z

Re-arrange dirs for cleanliness.
Move learned models and data to ~~caffe-mug repo~~ dropbox for now (and a suitable server later).
Provide scripts to download models and data as needed.
Ignore models and data in the main repo by default. This makes local experimentation convenient.
Keep model definitions around.

Orchestrating updates between commits in caffe and uploaded models and data has overhead, but is worth the separation of concerns and keeping the repo lean.

sergeyk · 2014-02-25T00:57:54Z

Decided to have a separate repository for data, model definition files, and example models, with scripts in this repo to download them as needed. The separate repo will be stable against master, not dev. @shelhamer will do this.

mavenlin · 2014-02-25T02:35:23Z

@sergeyk that will be nice, currently the repo is big and slow to download because of the synsets. Can these be removed from the git history also?

shelhamer · 2014-02-25T03:35:11Z

@mavenlin the synsets are on my list for this reorganization. Filtering them from the history is necessary to save space, and a simple command, but it breaks history.

We'll consider such house cleaning when it comes time to release Caffe 1.0, but we're not going to rewrite history on master so casually now.

shelhamer · 2014-02-25T08:38:54Z

@sergeyk I added you as a collaborator on my fork so that we can jointly take care of the documentation updates trigged by this PR. All you have to do is push to my fork's data-aux branch.

shelhamer · 2014-02-25T08:54:15Z

Instead of scripts to pull models and data, a mug submodule for data and models was considered but this lacks choice: with a submodule it's all or nothing.

shelhamer · 2014-02-25T08:59:46Z

This isn't going to work with github's (not unreasonable) file size and traffic limitations. git's a drag with large binary files too, so perhaps it's best.

The alternative is self-hosting from campus or ICSI.

sergeyk · 2014-02-25T09:05:34Z

Let's host as many models, sample data, and model def files as possible in
a github repo. For really large models, we can upload them to a publicly
accessible ICSI place, and note the version (just the date, probably) in
the filename.

On Tue, Feb 25, 2014 at 12:59 AM, Evan Shelhamer
notifications@github.comwrote:

This isn't going to work with github's (not unreasonable) file size and
traffic limitations. git's a drag with large binary files too, so perhaps
it's best.

The alternative is self-hosting from campus or ICSI.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/142#issuecomment-35987363
.

shelhamer · 2014-02-25T09:11:21Z

Oh, sorry I wasn't clear. This isn't going to work at all. Not even the Caffe reference imagenet model fits on its own as there's a filesize cap of 100mb.

My fallback plan is ICSI hosting and versioning the fetch urls of the scripts in master. We can have a simple script to publish the models, defs, and data into a dir based on the timestamp.

sergeyk · 2014-02-25T09:12:56Z

I know that the reference imagenet model won't fit. I still think that prototxt files and small sample data should be hosted on github -- everything that fits under 100mb, which is gonig to be basically everything except imagenet models.

shelhamer · 2014-02-25T09:42:06Z

Resolution: keep model definitions in the repo, drop included data, and add scripts to fetch learned models and data as needed. Auxiliary data and model weights will live on dropbox for the moment, and will find their permanent home on a Berkeley server after March 7. Our group will be bringing a demo server online after that date which can hold the data.

shelhamer · 2014-02-25T10:53:10Z

This feels ready to me, modulo fixing the docs changes this triggers. @sergeyk, how about we update this once #155 is in?

kloudkl · 2014-02-25T13:56:19Z

It seemed not a high priority to have a demo (#78). May I ask what demo will the server host?

shelhamer · 2014-02-25T19:01:31Z

Any suggestions on the dir structure or names are welcome–this is the time to arrange everything neatly.

@kloudkl re: demo, there will in fact be a Caffe demo along the lines of the DeCAF demo, and along with it other demos of our research group's projects. @Yangqing was not against a demo so much as spending too much time engineering a simple illustration of the framework and not focusing on the research hacking.

sguada · 2014-02-25T19:25:01Z

What about this dir structure?

data/ //Contains one folder per dataset. Each folder can have a script to get the data
- mnist/
- cifar10/
- ilsvrc_2012/
- ...
tools/ //Contains the main caffe tools for training, testing and finetunning, net_speed, ...
- train_net.cpp
- test_net.cpp
- fine_tuninng.cpp
- ...
models/ //Contains the different protxt defining the models
docs/ //Contains the documentation
examples/ //Could contain some samples of uses of caffe, but should mix everything

shelhamer · 2014-02-25T19:33:14Z

That looks right, but I'm torn about how examples fit in. Packing example code, model, and data together makes the example clear, but reuse weird. I'll package purely example files up together, but keep data on its own.

Collect core Caffe tools like train_net, device_query, etc. together in tools/ and include helper scripts under tools/extra.

Data, models, and examples should not be versioned by default. Reference versions of these are not to be casually committed. Plus this makes for a better playground in examples without having to worry about data, intermediate files, or experiments being accidentally tracked.

shelhamer · 2014-02-26T02:40:39Z

Ok everyone, feast your eyes and let me know. Speak now or forever hold your peace.

@Yangqing @jeffdonahue @sergeyk @sguada @longjon

jeffdonahue · 2014-02-26T02:53:29Z

Looks great to me! Thanks for the reorganization work @shelhamer.

- fix paths - replace shell command blocks with scripts - file ipython notebooks in examples - proofread

sguada · 2014-02-26T03:03:04Z

It looks good but I cannot compile it due the hdf5 dependency introduced in #147
So not sure if it will work or not.

There are some small error in the get_data.sh
./get_ilsvrc_aux.sh: 9: ./get_ilsvrc_aux.sh: Bad substitution
./get_mnist.sh: 4: ./get_mnist.sh: Bad substitution
./get_cifar10.sh: 4: ./get_cifar10.sh: Bad substitution

shelhamer · 2014-02-26T03:09:13Z

Perhaps we should shortlist this for master like Sergey's #157 since this changes a lot and everything should be merged into the new arrangement and not the old.

dev and feature branches will need to be rebased, so we should incorporate this, Jeff's #163 and Eric's #152 into master then rebase dev all at once.

sguada · 2014-02-26T03:20:45Z

That will solve the problems for now. We still need to figure out a way
either make hdf5 optional or to explain how to install it properly.

Sergio

2014-02-25 19:09 GMT-08:00 Evan Shelhamer notifications@github.com:

Perhaps we should merge this to master instead of dev like Sergey's #157 https://github.com/BVLC/caffe/pull/157since a lot depends on this change. Then it would work cleanly for whoever
doesn't have hdf5 for now too.

dev and feature branches would need to be rebased, so we should
incorporate this, Jeff's #163 https://github.com/BVLC/caffe/pull/163and Eric's
#152 #152 into master then rebase
dev all at once.

Reply to this email directly or view it on GitHubhttps://github.com//pull/142#issuecomment-36086428
.

sergeyk · 2014-02-26T03:21:51Z

I'm for merging this one and #163 into master directly, but not #152, as it does not currently exist and is an involved code change.

sergeyk · 2014-02-26T03:22:10Z

Looks good to me, we'll fix potential mistakes once we merge.

shelhamer · 2014-02-26T03:25:49Z

@sguada there's no script get_data.sh? I tested the fetch scripts on osx and ubuntu.

@sergeyk. Agreed on all counts. I'll merge soon.

sguada · 2014-02-26T03:28:08Z

@shelhamer I meant all the get_ilsvrc_aux.sh, get_mnist.sh
But maybe it is just me.

sguada · 2014-02-26T03:30:19Z

@shelhamer beside that it is great, ready to merge

shelhamer · 2014-02-26T03:31:56Z

@sguada you might have some kind of weird shell. Check which sh perhaps?

Sort out models, data, examples, and tools

shelhamer self-assigned this Feb 24, 2014

shelhamer added the work-in-progress label Feb 25, 2014

kloudkl mentioned this pull request Feb 25, 2014

visualization of the conv layer kernels #158

Closed

shelhamer added 12 commits February 25, 2014 18:12

move imagenet splits + synsets to data

2097463

add imagenet mean file

43c3ee0

file mnist

59b27e8

Make tools/ for core binaries, stow scripts/ in tools/extra

5f9abea

Collect core Caffe tools like train_net, device_query, etc. together in tools/ and include helper scripts under tools/extra.

fix mnist comments in cifar example

dd93c04

file models

ab1938c

swap ilsvrc data with fetch script

b053d5c

fetch caffe_reference_imagenet_model

142dfab

include model checksum, you never know these days

769fe9e

everything in its right place

04cec0a

TODO cifar example

22b15ab

shelhamer added 5 commits February 25, 2014 18:56

fix mnist, add deploy net example

37cebe6

harmonize imagenet example, name caffe reference model CaffeNet

f78c70b

fix + rename lenet training script

c78287e

bring mnist docs back to reality

ed3ef9e

bring imagenet docs back to reality

fe64383

- fix paths - replace shell command blocks with scripts - file ipython notebooks in examples - proofread

shelhamer mentioned this pull request Feb 26, 2014

Split CUDA code (*.cu) from CPU code (*.cpp). #152

Closed

5 tasks

shelhamer added a commit that referenced this pull request Feb 26, 2014

Merge pull request #142 from shelhamer/data-aux

d323547

Sort out models, data, examples, and tools

shelhamer merged commit d323547 into BVLC:dev Feb 26, 2014

shelhamer deleted the data-aux branch February 26, 2014 03:56

shelhamer mentioned this pull request Feb 26, 2014

C++ linter #163

Merged

shelhamer added a commit that referenced this pull request Feb 26, 2014

Merge pull request #142 from shelhamer/data-aux

bca6595

Sort out models, data, examples, and tools

shelhamer mentioned this pull request Feb 26, 2014

Next: dev @ 25.02.14 into master #167

Merged

shelhamer added a commit that referenced this pull request Feb 26, 2014

Merge pull request #142 from shelhamer/data-aux

f0b76ea

Sort out models, data, examples, and tools

shelhamer removed the work in progress label Mar 23, 2014

mitmul pushed a commit to mitmul/caffe that referenced this pull request Sep 30, 2014

Merge pull request BVLC#142 from shelhamer/data-aux

3141066

Sort out models, data, examples, and tools

wangxudong-cq mentioned this pull request Oct 20, 2022

Failed inference with nyud-fcn32s-hha #7064

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sort out models, data, and tools #142

Sort out models, data, and tools #142

shelhamer commented Feb 23, 2014

sergeyk commented Feb 25, 2014

mavenlin commented Feb 25, 2014

shelhamer commented Feb 25, 2014

shelhamer commented Feb 25, 2014

shelhamer commented Feb 25, 2014

shelhamer commented Feb 25, 2014

sergeyk commented Feb 25, 2014

shelhamer commented Feb 25, 2014

sergeyk commented Feb 25, 2014

shelhamer commented Feb 25, 2014

shelhamer commented Feb 25, 2014

kloudkl commented Feb 25, 2014

shelhamer commented Feb 25, 2014

sguada commented Feb 25, 2014

shelhamer commented Feb 25, 2014

shelhamer commented Feb 26, 2014

jeffdonahue commented Feb 26, 2014

sguada commented Feb 26, 2014

shelhamer commented Feb 26, 2014

sguada commented Feb 26, 2014

sergeyk commented Feb 26, 2014

sergeyk commented Feb 26, 2014

shelhamer commented Feb 26, 2014

sguada commented Feb 26, 2014

sguada commented Feb 26, 2014

shelhamer commented Feb 26, 2014

Sort out models, data, and tools #142

Sort out models, data, and tools #142

Conversation

shelhamer commented Feb 23, 2014

sergeyk commented Feb 25, 2014

mavenlin commented Feb 25, 2014

shelhamer commented Feb 25, 2014

shelhamer commented Feb 25, 2014

shelhamer commented Feb 25, 2014

shelhamer commented Feb 25, 2014

sergeyk commented Feb 25, 2014

shelhamer commented Feb 25, 2014

sergeyk commented Feb 25, 2014

shelhamer commented Feb 25, 2014

shelhamer commented Feb 25, 2014

kloudkl commented Feb 25, 2014

shelhamer commented Feb 25, 2014

sguada commented Feb 25, 2014

shelhamer commented Feb 25, 2014

shelhamer commented Feb 26, 2014

jeffdonahue commented Feb 26, 2014

sguada commented Feb 26, 2014

shelhamer commented Feb 26, 2014

sguada commented Feb 26, 2014

sergeyk commented Feb 26, 2014

sergeyk commented Feb 26, 2014

shelhamer commented Feb 26, 2014

sguada commented Feb 26, 2014

sguada commented Feb 26, 2014

shelhamer commented Feb 26, 2014