Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review Geoffrey Hinton's talk titled "What's wrong with convolutional nets?" #157

Closed
quinnliu opened this issue Dec 22, 2014 · 15 comments

Comments

@quinnliu quinnliu added the Research label Dec 22, 2014
@quinnliu quinnliu added this to the MARK II milestone Dec 22, 2014
@quinnliu

This comment has been minimized.

Copy link
Member Author

commented Dec 22, 2014

@5:01 "capsules receives multi-dimensional prediction vectors from capsules in the layer below
and looks for a tight cluster of predictions."

"If it finds a tight cluster, it outputs:

    1. A high probability that an entity of its type exists in its domain.
    1. The center of gravity of the cluster, which is the generalized pose of that entity.
      "

"This is very good at filtering out noise because high-dimensional coincidences do not happen by chance."

@quinnliu

This comment has been minimized.

Copy link
Member Author

commented Dec 22, 2014

Four arguments against pooling

  • It is a bad fit to the psychology of shape perception:
    • It does not explain why we assign intrinsic coordinate frames to objects and why they have such huge effects
  • It solves the wrong problem
    • We want equivariance, not invariance. Disentangling rather than discarding.
  • It fails to use the underlying linear structure
    • It does not make use of the natural linear manifold that perfectly handles the largest
      source of variance in images
  • Pooling is a poor way to do dynamic routing
    • We need to route each part of the input to the neurons that know how to deal with it.
      Finding the best routing is equivalent to parsing the image
@quinnliu

This comment has been minimized.

Copy link
Member Author

commented Dec 22, 2014

Some psychological evidence that our visual systems impose coordinate frames in order to represent shapes.

@quinnliu

This comment has been minimized.

Copy link
Member Author

commented Dec 22, 2014

@11:07 He says "linear manifold used in computer graphics makes it very easy to deal with viewpoints" TODO: what is this exactly?

@11:16 "two things that have no problem with viewpoint. One is our brains and the other is computer graphics and therefore they work the same way". TODO: what is this exactly?

@quinnliu

This comment has been minimized.

Copy link
Member Author

commented Dec 22, 2014

@24:06 Conclusions of argument 1

  • People use rectangular coordinate frames embedded in objects.
    • Using a different fame totally changes the perception
  • The relation between an object and the viewer is represented by a whole
    bunch of active neurons that capture different aspects of the
    relationship, not by a single neuron or a coarse-coded set of neurons.
@quinnliu

This comment has been minimized.

Copy link
Member Author

commented Dec 22, 2014

@24:53 "If you want high accuracy for exactly where something is, you want Neurons with receptive fields that overlapp a whole lot" TODO: very interesting. How can you test this with WalnutiQ?

@quinnliu

This comment has been minimized.

Copy link
Member Author

commented Dec 22, 2014

TODO: consider as you go up the hierarchy changing receptive field sizes.

@quinnliu

This comment has been minimized.

Copy link
Member Author

commented Dec 22, 2014

@26:44 Two types of equivariance

  • If a low-level part moves to a very different position it will be represented by a different
    capsule.
    • This is "place-coded" equivariance
  • If a part only moves a small distance it will be represented by the same capsule but
    the pose outputs of the capsule will change.
    • This is "rate-coded" equivariance

TODO: VERY IMPORTANT TO UNDERSTAND as this principle is directly related to HTM and WalnutiQ brain models.

  • Higher-level capsules have bigger domains so low-level place-coded equivariance
    gets converted into high-level rate-coded equivariance.
@quinnliu

This comment has been minimized.

Copy link
Member Author

commented Dec 22, 2014

Argument 3: Extrapolating shape recognition to very different viewpoints

  • Current neural net wisdom:
    • Learn different models for different viewpoints.
    • This requires a LOT of training data.
  • A much better approach:
    • The manifold of images of the same rigid shape is highly non-linear in the space of
      pixel intensities.
    • Transform to a space in which the manifold is globally linear( i.e. the graphics
      representation that uses explicit pose coordinates)
    • This allows massive extrapolation.
@quinnliu

This comment has been minimized.

Copy link
Member Author

commented Dec 22, 2014

To be continued at 29:47...

@quinnliu

This comment has been minimized.

Copy link
Member Author

commented Dec 24, 2014

@35:25 How do convnets do routing?

  • In each pool, the output neuron only attends to the most active neuron in the pool.
    • This is a very primitive way to do routing.
  • A much better routing principle is to send the information to the capsule in the layer above
    that is best at dealing with it.
    • This can be done by getting a capsule to ask for more input from lower-level capsules that
      vote for its cluster and less input from lower-level capsules that vote for its outliers.
    • The total weight on the bottom-up inputs supplied by one lower-level capsule is at most 1.
@quinnliu

This comment has been minimized.

Copy link
Member Author

commented Dec 24, 2014

TODO: Dynamic routing based on interaction with input image idea using top down feedback

@quinnliu

This comment has been minimized.

Copy link
Member Author

commented Dec 24, 2014

@44:00 Getting lost about how capsules work...

@quinnliu

This comment has been minimized.

Copy link
Member Author

commented Dec 24, 2014

In Summary 2 important things to understand:

  1. Higher-level capsules have bigger domains so low-level place-coded equivariance gets converted into high-level rate-coded equivariance.
  2. A much better routing principle is to send the information to the capsule in the layer above that is best at dealing with it.
@quinnliu quinnliu closed this Dec 24, 2014
@ozzie00

This comment has been minimized.

Copy link

commented Dec 14, 2017

it is very clear, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.