Skip to content

add loss_mean_squared_per_channel#1863

Merged
davisking merged 9 commits into
davisking:masterfrom
arrufat:loss-channel
Aug 28, 2019
Merged

add loss_mean_squared_per_channel#1863
davisking merged 9 commits into
davisking:masterfrom
arrufat:loss-channel

Conversation

@arrufat
Copy link
Copy Markdown
Contributor

@arrufat arrufat commented Aug 19, 2019

This pull request adds support for a loss that computes independently on each plan of the output tensor.
This loss is useful to estimate keypoints (for pose estimation, etc).
The article that motivated me to add this kind of loss was Simple Baselines for Human Pose Estimation and Tracking.
I have successfully trained a network that replicates that article with this loss.

As a unit test, I added a small example that takes a matrix with 9 white dots as an input and outputs 9 channels, each one with a white dot if there was a dot at the associated position.

Sorry for the delay in making this PR... I couldn't think of a simple example (that can actually be used to learn how to use the loss)

Copy link
Copy Markdown
Owner

@davisking davisking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR :)

Just a few minor things and then it's ready to merge.

Comment thread dlib/dnn/loss.h Outdated
Comment thread dlib/dnn/loss.h Outdated
Comment thread dlib/dnn/loss.h
@arrufat
Copy link
Copy Markdown
Contributor Author

arrufat commented Aug 19, 2019

I've updated with the requested changes. Your suggested name is more self-explanatory.

@arrufat
Copy link
Copy Markdown
Contributor Author

arrufat commented Aug 21, 2019

My last commit fixed a build error on the windows checks (it built fine on my machine with GCC-9.1).
Is there anything else I need to do?

@davisking
Copy link
Copy Markdown
Owner

Thanks, the code looks good. I tried running the test and it passes, but I notice the loss only drops a little. Is it possible to make the test so that it runs to a low loss? Otherwise it's not obvious that it's working. For instance, you could ask it to do something simple like train 10 linear functions rather than use this more complex network in the test now.

@davisking
Copy link
Copy Markdown
Owner

For instance, make up 10 random linear functions. Then use them to make training data. E.g. make a 10x5 random matrix w, a random 5x1000 random matrix x and set y to w*x. Train a network with just one linear layer to map x to y. It should be able to learn it to basically 0 error and do so very quickly.

@arrufat
Copy link
Copy Markdown
Contributor Author

arrufat commented Aug 21, 2019

Ok, I'll try that. I also noticed the loss drops only a little, and I think it's because predicting everything as 0 already gives a pretty low error (since there's only one pixel that would differ). I'll think of a more obvious test.

Comment thread dlib/test/dnn.cpp Outdated
@davisking
Copy link
Copy Markdown
Owner

That’s better. But a little linear thing that dropped the loss really low would be nice.

The unit tests also aren’t a good way to do documentation. That kind of thing should go in a fully documented example program (you don’t have to make one for this though).

@arrufat
Copy link
Copy Markdown
Contributor Author

arrufat commented Aug 22, 2019

Ok, I understand. I'll redo the tests then, probably during this weekend :)

Edit: I did this, because sometimes, I find myself reading the unit tests to better understand how to use some parts of the library.

@arrufat
Copy link
Copy Markdown
Contributor Author

arrufat commented Aug 24, 2019

I've implemented the suggested linear test. I did not fiddle much with the parameters, but I'm not very happy with the results. Maybe learning that many random functions is not easy...

  • error_before = 2.03315
  • error_after = 0.216274

For the moment, I haven't removed the previous test. Please let me know what you think, or if I misunderstood your test.

Copy link
Copy Markdown
Owner

@davisking davisking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new tests look good. With some minor tweaks they will probably give really low error.

Comment thread dlib/test/dnn.cpp Outdated
Comment thread dlib/test/dnn.cpp Outdated
Comment thread dlib/test/dnn.cpp Outdated
@arrufat
Copy link
Copy Markdown
Contributor Author

arrufat commented Aug 25, 2019

Thank you for all the suggestions, I've simplified the MSE computation and reduced the dimensions of the data, it still does not converge to 0 as fast as I'd like:

  • error_before: 2.95371
  • error_after: 0.137677

@arrufat
Copy link
Copy Markdown
Contributor Author

arrufat commented Aug 25, 2019

I'm suspecting maybe the tests are too random? Maybe I need to generate a more structured weight matrix w?

@davisking
Copy link
Copy Markdown
Owner

Don’t call set_max_num_epochs. That’s probably telling it to stop early.

@arrufat
Copy link
Copy Markdown
Contributor Author

arrufat commented Aug 25, 2019

The only reason why I put the value of 160 is because it stops at epoch 157 (where it reaches a learning rate of 1e-7). I've also tried with smaller learning rate, but I get the same results.

@arrufat
Copy link
Copy Markdown
Contributor Author

arrufat commented Aug 26, 2019

I've modified the network a little bit:

using net_type = loss_mean_squared_per_channel_and_pixel<num_channels,                                                                                                                                                               
                    extract<0, num_channels, 1, dimension,                                                                                                                                                                           
                    fc<num_outputs,                                                                                                                                                                                                  
                    relu<bn_fc<fc<500,                                                                                                                                                                                               
                    input<matrix<float>>>>>>>>;

Now, the errors look like this:

  • error_before: 2.25673
  • error_after: 0.0214044

@davisking
Copy link
Copy Markdown
Owner

Cool. That's good enough for me :)

Remove the non-simple version of the test and the PR is good to go.

@arrufat
Copy link
Copy Markdown
Contributor Author

arrufat commented Aug 28, 2019

Great! Thank you! I'm probably more happy than you about this PR (my first serious one).
I wanted to contribute to dlib so badly!!

@davisking
Copy link
Copy Markdown
Owner

No problem, thanks for making the PR :)

@davisking davisking merged commit 170877d into davisking:master Aug 28, 2019
bkornel added a commit to bkornel/dlib that referenced this pull request Sep 18, 2019
* Fixed compiler warnings

* Include the Intel MKL's iomp dll in the output folder to reduce confusino for windows users.

* Fixed build error in newer clang on OpenBSD.

* Fixed constness for lapack functions (davisking#1737)

* disable annoying warning

* Fixed global_function_search's initialization being wrong if explicitly
given an empty list of initial function evaluations.

* Suppress compiler warnings

* Make things work in visual studio.

* fix some pedantic warnings (davisking#1756)

* fix some pedantic warnings

* remove unneeded assert

* more pedantic silencing (davisking#1763)

* prevent GCC from complaining about this unused parameter

* Even more warning silencing (davisking#1766)

These warnings occurred when building the semantic segmentation
examples

* iEnsures DLIB_FALLTHROUGH macro is only set for GCC>=7 (davisking#1770)

* Feature/upgrade libjpeg (davisking#1769)

* Upgrades dlib's included libjpeg to version 8d

* Overloads load_jpeg to read from memory buffer

* Removes "__inline__" define in jconfig, broke VC build

* Changes buffer size type to size_t

* Adds a comprehensive error message when jpeg loading fails.

* Disable use of non-memory based backing store in libjpeg.  This fixes
libjpeg not being able to open some types of jpeg file.

* Stop building parts of libjpeg we don't need.

* Add input_grayscale_image_pyramid, issue davisking#354 (davisking#1761)

Add input_grayscale_image_pyramid

* Added methods for getting keyboard and mouse clicks to image_window's pyhton API.

* Fixed pytest broken dependencies

* Fix python setup warnings

* Revert "Fixed pytest broken dependencies"

Apparently pytest is still sort of busted.

This reverts commit 5e63d01.

*  Fix setting a point's y coordinate changes x instead (Python bindings) (davisking#1795)

* Add point assignment test

* Fix setting points y coordinate changes x instead (issue davisking#1794)

* Push all include and link options needed for dlib to pkg-config.  We do this by getting them from the same list cmake uses.

* Fixed incorrect return type

* Fixed grammar in comments

* Added missing include

* fixed typo in docs

* fix mismatch between documentation and implementation (davisking#1835)

* Fixed cmake warning

* fixing grammar

* Fix the CMake BUILDING_PYTHON_IN_MSVC variable not getting picked up where it should.

* pybind11: cmake: ignore the check between host-python and cross-compiler (davisking#1848)

When dlib is compiling, cmake will compare python architecture and target
architecture. So in cross-compiling case, it is irrevelant because host and
target architecture often differs. The main problem come from checking python
architecture on host and not on target.

Here is an error when compiling dlib from x86_64 to arm 32-bit target :
```
Python config failure: Python is 64-bit, chosen compiler is 32-bit
```

So :
- Skipping the comparation when cross-compiling is enabled.

Signed-off-by: Romain Naour <romain.naour@smile.fr>
Signed-off-by: Alexandre PAYEN <alexandre.payen@smile.fr>

* Const-correct a LAPACK declaration and add aarch64 as a 64-bit architecture (davisking#1859)

* Added aarch64 to list of 64-bit architechtures

* Const-corrected declaration of ssyevr

* Fix davisking#1849 by calling device_global_buffer() unconditionally (davisking#1862)

* Hold on to the CUDA buffer - second try
see: davisking#1855 (comment)

* Fix davisking#1849 by calling device_global_buffer() unconditionally

* Simplified the device_global_buffer() code and API.

* don't cast away constness (davisking#1865)

* dpoint mutates x-coord in y-property (see davisking#1794) (davisking#1866)

* add loss_mean_squared_per_channel (davisking#1863)

add loss_mean_squared_per_channel_and_pixel

* Clear truth_idxs between samples (davisking#1870)

* Clear truth_idxs between samples

* Move truth_idxs inside loop body after all

* Push to truth_idxs even when the box can't be detected; improve formatting

* Add an option to force static runtime (davisking#1847)

* dos2unix tell_visual_studio_to_use_static_runtime.cmake

* Add an option to force static runtime
nidegen pushed a commit to kapanu/dlib that referenced this pull request Sep 23, 2020
add loss_mean_squared_per_channel_and_pixel
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants