Training results with 1 h5 file differs from training with multiple h5 files

### Issue summary
I'm training siamese neural network from example directory. The only thing i've changed is input format: i use HDF5 files instead of LMDB.
Now input layers for training and testing in .prototxt files are following:

```
layer {
  name: "pair_data"
  type: "HDF5Data"
  top: "pair_data"
  top: "sim"
  include {
    phase: TRAIN
  }
  hdf5_data_param {
    source: "/home/amiyusov/Projects/siamese/train/hdf5_files.txt"
    batch_size: 64
  }
}
layer {
  name: "pair_data"
  type: "HDF5Data"
  top: "pair_data"
  top: "sim"
  include {
    phase: TEST
  }
  hdf5_data_param {
    source: "/home/amiyusov/Projects/siamese/test/hdf5_files.txt"
    batch_size: 100
  }
}
```

I've created 60000 pairs from MNIST files, written them .h5 file and written paths to .h5 files to hdf5_files.txt  in Python script as following:

```

    f = open( os.path.join(output_dir,'hdf5_files.txt'), 'w')
    ...
    with h5py.File(os.path.join(output_dir, 'data'+str(i)+'.h5'), 'w') as h5f:
        h5f.create_dataset('pair_data', (batch_size, 2*channels, height, width), dtype =np.float32, data=image_pairs)
        h5f.create_dataset('sim', (batch_size, 1), dtype=np.uint8, data=labels)
    ...
    f.write(output_dir + 'data' + str(i) + '.h5\n')
```


When I create 1 train .h5 file with 60000 pairs, 1 test .h5 file with 10000 pairs and train my network on that data i get following results:
http://i.imgur.com/TJGpfc4.png
However when i create 30 .h5 files and 5 test .h5 files with 2000 pairs in each file and train my network with same parameters, i've used with 1 .h5 file, i get worse results:
http://i.imgur.com/v8ii9Dm.png

Test loss also behaves differently in those 2 runs: after 50000 iteration with 1 file test loss ends up being something around 0.02 but  when i used 30 train files it ended close to 0.07

### Steps to reproduce

1. Generate 60000 pairs for siamese neural network and save them.
2. Create 1 .h5 file with all 60000 pairs you generated in step 1; train siamese neural network with that hdf5 input.
3. Create  30 .h5 files with 2000 pairs each; train siamese neural network with that hdf5 input.
4. Compare results of both train runs by monitoring test loss and with help of ipyhton notebook from siamese example directory from caffe git repository.

### System configuration
Operating system: Ubuntu 16.04
Compiler:
CUDA version (if applicable): 7.5
CUDNN version (if applicable):
BLAS:
Python: 2.7


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training results with 1 h5 file differs from training with multiple h5 files #5524

Issue summary

Steps to reproduce

System configuration

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Training results with 1 h5 file differs from training with multiple h5 files #5524

Description

Issue summary

Steps to reproduce

System configuration

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions