Permalink
Browse files

I am in the process of redefining input to normalize it for illuminat…

…ion. This is based on on a very general principle:

entropic equalization is proportional to proximity, which means that long-range variation is higher than short-range variation
  • Loading branch information...
boris-kz committed Jul 22, 2018
1 parent 036e95c commit 9d847d5cb1a6e7b596d239522a10b4821fe9a889
Showing with 15 additions and 3 deletions.
  1. +15 −3 line_POC.py
View
@@ -3,14 +3,26 @@
from time import time
from collections import deque
''' core algorithm level 1: 1D-only proof of concept,
applied here to process lines of grey-scale pixels but not effective for recognition of 2D images.
'''
Core algorithm level 1, 1D-only, applied here to process lines of grey-scale pixels but not effective for recognition of 2D images.
Cross-comparison between consecutive pixels within horizontal scan line (row).
Resulting difference patterns dPs (spans of pixels forming same-sign differences)
and relative match patterns vPs (spans of pixels forming same-sign predictive value)
are redundant representations of each line of pixels.
I am in the process of redefining input to normalize it for illumination. This is based on on a very general principle:
entropic equalization is proportional to proximity, which means that long-range variation is higher than short-range variation.
Thus, most of what we perceive is diffuse impact of long-range variation: reflected light is far more common than emitted light.
Such combined-range impact should be disentangled, to insulate local information from variation in longer-range illumination.
Tentative ways to adjust for variation in illumination:
- albedo: brightness / maximal brightness, is more predictive than absolute brightness, but can't be learned from illuminated image.
- lateral ratio of brightness between pixels is invariant to illumination (will match between images with different illumination),
- but it's not compressive and won't match within an image, selectively adjusted on the next level?
- relative match: match / difference, is more predictive than absolute match, same ratio to replace difference?
postfix '_' denotes array name, vs. identical name of array elements '''
@@ -151,7 +163,7 @@ def frame(frame_of_pixels_): # postfix '_' denotes array name, vs. identical na
return frame_of_patterns_ # frame of patterns is output to level 2
# from scipy import misc
# f = misc.face(gray=True) # input frame of pixels
# f = misc.face(gray=True) # input pix-mapped image
# f = f.astype(int)
argument_parser = argparse.ArgumentParser()

24 comments on commit 9d847d5

@Twenkid

This comment has been minimized.

Show comment
Hide comment
@Twenkid

Twenkid Aug 4, 2018

Collaborator
  • For adding something and the 3D-reconstruction ("reverse graphics") and "unentanglement" of light.
    The note sounds as an overgenealization, though, and the light is in 3D.

It's true that the function of the pixels'coordinates, brightness, colour to the properties of the light source(s) are the values of the pattern of the reconstructed third dimension of the image as a sensory input, which turns the image into a 3D scene.

  • lateral ratio of brightness between pixels is invariant to illumination (will match between images with different illumination)

Is it? It could in an environment where you've set it so. The same image with different illumination or different images with the same illumination? It's possible for flat planes/textures (2D images in an "empty space") illuminated with an artificial ambient equal light.

What about shadows, ambient light, ambient occlusion, reflection? What about semi-transparency as well, fog, aerial perspective (long-distance blur and contrast reduction)? There's also one specific surface property, when the light ray hits the surface at a specific angle, it reflects almost all of the light, the law is not linear.

Also if you are illuminating, you're supposed to think of scenes, not images.
https://en.wikipedia.org/wiki/Global_illumination

BTW, do you know about histogram equalization?

One simple early discovery could be of the light source and its coordinates as a pattern: the highest brightness and inducing it as a "source" as the brightness diminishes in the directions out of it, so the rest "consumes" it. However there are reflections and other material properties and different reflectivity and angles (presumably/initially unknown), where the brightness on the projected 2D-image depends both on the light specifics, the distance, the angle of the surface to the sum of the light rays (the surface normal) and the surface reflectance/refraction properties ("material" specifications in the 3D engines).

Yes, initially the brightness map + coordinate distance/other patterns could be assumed as some common "distance" metrics, later "unentangled" further.

Requirements for normal rendering:

-- 3D space

-- Coordinates of the light sources, their colour, their intensity and how they spread/how the media affect it (if you start from no assumptions, you don't know if they weren't many)

-- Surface normals and surface properties ("material" in 3D jargon), because the luminance of the 2D pixel depends on the angle of the light to the specific hit point (the least)

-- There's varying ambient light and depending on the 3D structure, there are ambient occlusions (near bumps cast shadows around) and after each hit of the light rays there are bounces, which add up to the final brightness and colour; without that the image looks 3D, but with some strange regions and not photorealistic for a trained eye. Of course, the initial recovered models don't have to be perfect.

-- other cast shadows - if the "entire scene" (big enough sphere) is not visible, a 3D geometry which is casting shadows may be missed
-- etc.

This requires exploration/experimentation, known/discovered depth-map (there are such cameras), zoom in-out - another way to induce third dimension values (variables) as changes in 2D patterns properties, as a scale; rotation and other controllable transformations, using the values of the already known parameters to induce the values of the unknown as new patterns.

Collaborator

Twenkid replied Aug 4, 2018

  • For adding something and the 3D-reconstruction ("reverse graphics") and "unentanglement" of light.
    The note sounds as an overgenealization, though, and the light is in 3D.

It's true that the function of the pixels'coordinates, brightness, colour to the properties of the light source(s) are the values of the pattern of the reconstructed third dimension of the image as a sensory input, which turns the image into a 3D scene.

  • lateral ratio of brightness between pixels is invariant to illumination (will match between images with different illumination)

Is it? It could in an environment where you've set it so. The same image with different illumination or different images with the same illumination? It's possible for flat planes/textures (2D images in an "empty space") illuminated with an artificial ambient equal light.

What about shadows, ambient light, ambient occlusion, reflection? What about semi-transparency as well, fog, aerial perspective (long-distance blur and contrast reduction)? There's also one specific surface property, when the light ray hits the surface at a specific angle, it reflects almost all of the light, the law is not linear.

Also if you are illuminating, you're supposed to think of scenes, not images.
https://en.wikipedia.org/wiki/Global_illumination

BTW, do you know about histogram equalization?

One simple early discovery could be of the light source and its coordinates as a pattern: the highest brightness and inducing it as a "source" as the brightness diminishes in the directions out of it, so the rest "consumes" it. However there are reflections and other material properties and different reflectivity and angles (presumably/initially unknown), where the brightness on the projected 2D-image depends both on the light specifics, the distance, the angle of the surface to the sum of the light rays (the surface normal) and the surface reflectance/refraction properties ("material" specifications in the 3D engines).

Yes, initially the brightness map + coordinate distance/other patterns could be assumed as some common "distance" metrics, later "unentangled" further.

Requirements for normal rendering:

-- 3D space

-- Coordinates of the light sources, their colour, their intensity and how they spread/how the media affect it (if you start from no assumptions, you don't know if they weren't many)

-- Surface normals and surface properties ("material" in 3D jargon), because the luminance of the 2D pixel depends on the angle of the light to the specific hit point (the least)

-- There's varying ambient light and depending on the 3D structure, there are ambient occlusions (near bumps cast shadows around) and after each hit of the light rays there are bounces, which add up to the final brightness and colour; without that the image looks 3D, but with some strange regions and not photorealistic for a trained eye. Of course, the initial recovered models don't have to be perfect.

-- other cast shadows - if the "entire scene" (big enough sphere) is not visible, a 3D geometry which is casting shadows may be missed
-- etc.

This requires exploration/experimentation, known/discovered depth-map (there are such cameras), zoom in-out - another way to induce third dimension values (variables) as changes in 2D patterns properties, as a scale; rotation and other controllable transformations, using the values of the already known parameters to induce the values of the unknown as new patterns.

@boris-kz

This comment has been minimized.

Show comment
Hide comment
@boris-kz

boris-kz Aug 5, 2018

Owner
Owner

boris-kz replied Aug 5, 2018

@Twenkid

This comment has been minimized.

Show comment
Hide comment
@Twenkid

Twenkid Aug 6, 2018

Collaborator

, it only matters when light changes within a pattern. Otherwise, there will be comparison by division between strong patterns, which takes care of lighting and a lot other things.

Should the gen.alg. know about "light" without learning? I assume you're working on very primal input processing, it doesn't know is the light changing (position, intensity, colour), is it the surface geometry, are there objects which cast shadow, or is the image a screen.

The problem is that vPs are basically brightness patterns, they don't look terribly interesting unless defined through relative match = min / diff - ave.

Well, all patterns are not interesting individually for themselves. In the ANN they become interesting when many different patterns and classes are used together to solve a specific problem and to do something.

However basic, bright areas are more informative than dark areas.

Why? You set it so to imply "light", search for light? (It's true that babies, put in a dark room, scan the area to find light, i.e. searching for light is one of the early exploration behaviors in humans.)

Just because of "bigger summed magnitude"? Aren't the areas with more variations (more frequent and of higher magnitude) more informative? Aren't either dark or light "flat" areas equally informative?

  • lateral ratio of brightness between pixels is invariant to illumination (will match between images with different illumination) Is it?
    I meant the case when nothing or little changes, except for light. It's the definition of albedo: constant percentage of reflected light.

My point is that if you are to learn everything, you don't know whether the changes are caused by light (and what is "light", is it 3D (you start with 1D)). You must first learn what is light and robustly disentangle the cases when the changes in the pictures are due to a particular light and not something else, is it diffuse, directed, reflected ...

What about shadows, ambient light, ambient occlusion, reflection? What about semi-transparency as well, fog, aerial perspective (long-distance blur and contrast reduction)? There's also one specific surface property, when the light ray hits the surface at a specific angle, it reflects almost all the light, the law is not linear.
These are not general, they should be learned case-by-case.

Yes, they may cause variations in the picture and the alg. doesn't know about them, the equal... may not be valid.

BTW, do you know about histogram equalization?
Histograms discard positional information, I don't think it's a good idea.

OK, you don't know about it - it's a simple way to improve pictures quality, it spreads the distributions more evenly through the spectrum and compensates for overexposed or underexposed area. That's regarding the "entropy" stuff.

One simple early discovery could be of the light source and its coordinates as a pattern: the highest brightness and inducing it as a "source" as the brightness diminishes in the directions out of it, so the rest "consumes" it. However there are reflections and other material properties and different reflectivity and angles (presumably/initially unknown), where the brightness on the projected 2D-image depends both on the light specifics, the distance, the angle of the surface to the sum of the light rays (the surface normal)
Etc. (...)

All these things should be learnable by a general search algorithm, else it's not general. I am only defining comparison and initial input flow format.

Yes. I meant that you can't make precise assumptions about "light" or the correlations between pixel values from an unknown 3D world, without taking into account that you don't know the parameters of the scene.

There are ambiguities and many unknown parameters. Ignoring that is overgeneralizing.

Collaborator

Twenkid replied Aug 6, 2018

, it only matters when light changes within a pattern. Otherwise, there will be comparison by division between strong patterns, which takes care of lighting and a lot other things.

Should the gen.alg. know about "light" without learning? I assume you're working on very primal input processing, it doesn't know is the light changing (position, intensity, colour), is it the surface geometry, are there objects which cast shadow, or is the image a screen.

The problem is that vPs are basically brightness patterns, they don't look terribly interesting unless defined through relative match = min / diff - ave.

Well, all patterns are not interesting individually for themselves. In the ANN they become interesting when many different patterns and classes are used together to solve a specific problem and to do something.

However basic, bright areas are more informative than dark areas.

Why? You set it so to imply "light", search for light? (It's true that babies, put in a dark room, scan the area to find light, i.e. searching for light is one of the early exploration behaviors in humans.)

Just because of "bigger summed magnitude"? Aren't the areas with more variations (more frequent and of higher magnitude) more informative? Aren't either dark or light "flat" areas equally informative?

  • lateral ratio of brightness between pixels is invariant to illumination (will match between images with different illumination) Is it?
    I meant the case when nothing or little changes, except for light. It's the definition of albedo: constant percentage of reflected light.

My point is that if you are to learn everything, you don't know whether the changes are caused by light (and what is "light", is it 3D (you start with 1D)). You must first learn what is light and robustly disentangle the cases when the changes in the pictures are due to a particular light and not something else, is it diffuse, directed, reflected ...

What about shadows, ambient light, ambient occlusion, reflection? What about semi-transparency as well, fog, aerial perspective (long-distance blur and contrast reduction)? There's also one specific surface property, when the light ray hits the surface at a specific angle, it reflects almost all the light, the law is not linear.
These are not general, they should be learned case-by-case.

Yes, they may cause variations in the picture and the alg. doesn't know about them, the equal... may not be valid.

BTW, do you know about histogram equalization?
Histograms discard positional information, I don't think it's a good idea.

OK, you don't know about it - it's a simple way to improve pictures quality, it spreads the distributions more evenly through the spectrum and compensates for overexposed or underexposed area. That's regarding the "entropy" stuff.

One simple early discovery could be of the light source and its coordinates as a pattern: the highest brightness and inducing it as a "source" as the brightness diminishes in the directions out of it, so the rest "consumes" it. However there are reflections and other material properties and different reflectivity and angles (presumably/initially unknown), where the brightness on the projected 2D-image depends both on the light specifics, the distance, the angle of the surface to the sum of the light rays (the surface normal)
Etc. (...)

All these things should be learnable by a general search algorithm, else it's not general. I am only defining comparison and initial input flow format.

Yes. I meant that you can't make precise assumptions about "light" or the correlations between pixel values from an unknown 3D world, without taking into account that you don't know the parameters of the scene.

There are ambiguities and many unknown parameters. Ignoring that is overgeneralizing.

@boris-kz

This comment has been minimized.

Show comment
Hide comment
@boris-kz

boris-kz Aug 7, 2018

Owner
Owner

boris-kz replied Aug 7, 2018

@Twenkid

This comment has been minimized.

Show comment
Hide comment
@Twenkid

Twenkid Aug 8, 2018

Collaborator

Not explicitly, I was only thinking of "manually" boosting the power of
initial comparison

It's true that ratio (with some precision) seems to be more stable than +- in changing light in simple/"average" conditions, thus easier to detect simple patterns like bright area/dark area in a slightly changing environment.

However aren't you breaking the conceptual incremental hierarchy of the operations, if there's no other additional theoretical justifications of that shortcut? (I don't understand it yet.)

The problem is that vPs are basically brightness patterns, they don't look
terribly interesting unless defined through relative match = min / diff -
ave.

Well, all patterns are not interesting individually for themselves. In the
ANN they become interesting when many different patterns and classes are
used together to solve a specific problem and to do something.

Or refined by recursive internal search

Theoretically this search may start being productive after patterns/their hierarchies' complexity or internal structure/correlations/variety start to rival the input's ones. If you add one (or a few) variables (values) per level, it would take a while.

You can't see much in the dark. And bigger input ~ bigger impact, even if
that impact is reflection. Not always, but there is nothing darker than
empty space.

Dark is not absolute, you can't see much in too bright as well and as babies scan to find light, they know to close their eyes or turn aside when they see a too bright light. So the moderate ones are the "best".

In order something to be seen initially, enough of contrast and distinguishable steps are required in order to detect "shapes", therefore for initial seeing (detecting) shapes, and view-field-wide, the input shouldn't change too quickly, faster than the system can make sense of it.

It doesn't matter whether the samples are 0,1,2,3,4,5 or 250, 251, 252, 253, 254,255, i.e. the differences matter.

Of course, unless if the process starts with a lower sample-resolution, thus the low end of the spectrum is counted as straight "0" etc. or if the higher brightness is given a higher weight, or if the algorithm gives preferences at the higher end of the values.

However now I'm questioning that "higher impact" assumption for vision.

It reminds me of the robotic "turtles", which were searching for light sources: http://cyberneticzoo.com/cyberneticanimals/19xx-cybernetic-tortoise-unknown-russian/

Vision is a "detached" abstract sense, it's sensing at a distance and the specific pixel values make no sense for themselves; indeed, the distance can't be estimated by the brightness of the pixels themselves. Only the extreme light makes sense by its brightness alone, but it's repulsive. And the light intensity in between ( actually the higher relative local contrast ) is attractive for going to explore in its direction, even for the babies.

On the other hand, the bigger value == bigger impact == more important makes more sense for other modalities: the mechanical and proprioceptive inputs, as bigger forces, higher speed, higher temperature etc. can be presumed as more "dangerous" for the agent and may require faster reaction to possible changes. On the other hand, faster reactions are not about slow "generalization".

My point is that if you are to learn everything, you don't know whether
the changes are caused by light (and what is "light", is it 3D (you start
with 1D)). You must first learn what is light and robustly disentangle the
cases when the changes in the pictures are due to a particular light and
not something else, is it diffuse, directed, reflected ...
Yes, but the fact that almost all input is reflected vs. emitted may mean
that I may as well start with comparison by division.

Division, for the albedo, so you'll train the algorithm in a varying light (training settings) and the constant rate of change of the brightness within parts of the "objects" (coordinates within the patterns) would suggest that this change is both the pattern of light and the properties of the material?

I think the essence of "refl. vs em." is that "light" (the distribution of "average brightness") attenuates steeply from a light source (square law by the distance) and even quicker, almost immediately, after bouncing, due to the low reflectance, thus for a rough estimate the re-bounces can be ignored for the basic computation of the direction of the light source. (However the exceptions and the errors have to be discovered later.)

And yes, "on average"/in the supposed training environment, the light source could be deliberately set to be one/simple and the scene would be built of "simple" materials, in order not to confuse the algorithm: no mirrors, no semi-transparent balls, no shiny surfaces and multi-layer translucent objects etc.

Exclusive match, = |min / diff|, is intuitively appealing: it "feels" that
only clean match should count.
Match can also be "cleaned" by subtraction: min - |diff|, although I
currently can formally justify only:
min - |diff/4|: projected match adjusted for negative (
counter-projected) difference,
or
min - |diff/2| -> vP with value adjusted for it's redundancy to overlapping
dPs

I have to reflect on that, before answering, but what I "feel" is that your approach is preoccupied with finding the solution with one shot without any mistakes and search - the "theory first" dogma.

IMO the algorithm should be able to do some search within the primary operators too, since it's hard to estimate it blindly, the operators could be parts of the patterns.

It may be computationally cheaper than selecting and using a less efficient operator (if you can't know from the start), it may be done until a "good one" is found, in an on-going process, say parallel hierarchies and cross-comparisons between them, then selection of the better one - for different inputs, different branch may be selected

The most general must be general enough to find and adjust the right operators/hierarchy at an "acceptable" cost, but these are matter of choices. You yourself are now questioning the subtraction.

In ANN, they are trying "Neural Turing Machines"
https://arxiv.org/abs/1410.5401
and "Neural Arithmetic Logic Units": https://arxiv.org/pdf/1808.00508.pdf

The research line would probably start to combine them and use different types of architectures as modules and "operators".

In the case of humans, I suppose that brain also may not "know" the right "algorithm" (the best "operators" given its capabilities to adapt) from the first shot. All people's brains may be rediscovering the wheel and making the same mistakes until finding the right track - since it "worked" anyway, the "evolution" didn't care that it was inefficient.

Another architecture may be more "aware", flexible and prepared, still being general.

I can see how that's easier on the eye, but for a computer, dynamic range
only saves on bits per pixel?
It's shouldn't matter for me, I can do better by buffering derivatives (vs.
pixel) per pattern.

OK. I meant the philosophical side + your mention of the entropy equalization as a driving force. A wider and more even spectrum coverage is generally "prettier" than input which is thrown in one or another extreme of the space of inputs.

I am not trying to be precise here, just to improve the averages.

And you want one "average" (anchor) for all cases.

Collaborator

Twenkid replied Aug 8, 2018

Not explicitly, I was only thinking of "manually" boosting the power of
initial comparison

It's true that ratio (with some precision) seems to be more stable than +- in changing light in simple/"average" conditions, thus easier to detect simple patterns like bright area/dark area in a slightly changing environment.

However aren't you breaking the conceptual incremental hierarchy of the operations, if there's no other additional theoretical justifications of that shortcut? (I don't understand it yet.)

The problem is that vPs are basically brightness patterns, they don't look
terribly interesting unless defined through relative match = min / diff -
ave.

Well, all patterns are not interesting individually for themselves. In the
ANN they become interesting when many different patterns and classes are
used together to solve a specific problem and to do something.

Or refined by recursive internal search

Theoretically this search may start being productive after patterns/their hierarchies' complexity or internal structure/correlations/variety start to rival the input's ones. If you add one (or a few) variables (values) per level, it would take a while.

You can't see much in the dark. And bigger input ~ bigger impact, even if
that impact is reflection. Not always, but there is nothing darker than
empty space.

Dark is not absolute, you can't see much in too bright as well and as babies scan to find light, they know to close their eyes or turn aside when they see a too bright light. So the moderate ones are the "best".

In order something to be seen initially, enough of contrast and distinguishable steps are required in order to detect "shapes", therefore for initial seeing (detecting) shapes, and view-field-wide, the input shouldn't change too quickly, faster than the system can make sense of it.

It doesn't matter whether the samples are 0,1,2,3,4,5 or 250, 251, 252, 253, 254,255, i.e. the differences matter.

Of course, unless if the process starts with a lower sample-resolution, thus the low end of the spectrum is counted as straight "0" etc. or if the higher brightness is given a higher weight, or if the algorithm gives preferences at the higher end of the values.

However now I'm questioning that "higher impact" assumption for vision.

It reminds me of the robotic "turtles", which were searching for light sources: http://cyberneticzoo.com/cyberneticanimals/19xx-cybernetic-tortoise-unknown-russian/

Vision is a "detached" abstract sense, it's sensing at a distance and the specific pixel values make no sense for themselves; indeed, the distance can't be estimated by the brightness of the pixels themselves. Only the extreme light makes sense by its brightness alone, but it's repulsive. And the light intensity in between ( actually the higher relative local contrast ) is attractive for going to explore in its direction, even for the babies.

On the other hand, the bigger value == bigger impact == more important makes more sense for other modalities: the mechanical and proprioceptive inputs, as bigger forces, higher speed, higher temperature etc. can be presumed as more "dangerous" for the agent and may require faster reaction to possible changes. On the other hand, faster reactions are not about slow "generalization".

My point is that if you are to learn everything, you don't know whether
the changes are caused by light (and what is "light", is it 3D (you start
with 1D)). You must first learn what is light and robustly disentangle the
cases when the changes in the pictures are due to a particular light and
not something else, is it diffuse, directed, reflected ...
Yes, but the fact that almost all input is reflected vs. emitted may mean
that I may as well start with comparison by division.

Division, for the albedo, so you'll train the algorithm in a varying light (training settings) and the constant rate of change of the brightness within parts of the "objects" (coordinates within the patterns) would suggest that this change is both the pattern of light and the properties of the material?

I think the essence of "refl. vs em." is that "light" (the distribution of "average brightness") attenuates steeply from a light source (square law by the distance) and even quicker, almost immediately, after bouncing, due to the low reflectance, thus for a rough estimate the re-bounces can be ignored for the basic computation of the direction of the light source. (However the exceptions and the errors have to be discovered later.)

And yes, "on average"/in the supposed training environment, the light source could be deliberately set to be one/simple and the scene would be built of "simple" materials, in order not to confuse the algorithm: no mirrors, no semi-transparent balls, no shiny surfaces and multi-layer translucent objects etc.

Exclusive match, = |min / diff|, is intuitively appealing: it "feels" that
only clean match should count.
Match can also be "cleaned" by subtraction: min - |diff|, although I
currently can formally justify only:
min - |diff/4|: projected match adjusted for negative (
counter-projected) difference,
or
min - |diff/2| -> vP with value adjusted for it's redundancy to overlapping
dPs

I have to reflect on that, before answering, but what I "feel" is that your approach is preoccupied with finding the solution with one shot without any mistakes and search - the "theory first" dogma.

IMO the algorithm should be able to do some search within the primary operators too, since it's hard to estimate it blindly, the operators could be parts of the patterns.

It may be computationally cheaper than selecting and using a less efficient operator (if you can't know from the start), it may be done until a "good one" is found, in an on-going process, say parallel hierarchies and cross-comparisons between them, then selection of the better one - for different inputs, different branch may be selected

The most general must be general enough to find and adjust the right operators/hierarchy at an "acceptable" cost, but these are matter of choices. You yourself are now questioning the subtraction.

In ANN, they are trying "Neural Turing Machines"
https://arxiv.org/abs/1410.5401
and "Neural Arithmetic Logic Units": https://arxiv.org/pdf/1808.00508.pdf

The research line would probably start to combine them and use different types of architectures as modules and "operators".

In the case of humans, I suppose that brain also may not "know" the right "algorithm" (the best "operators" given its capabilities to adapt) from the first shot. All people's brains may be rediscovering the wheel and making the same mistakes until finding the right track - since it "worked" anyway, the "evolution" didn't care that it was inefficient.

Another architecture may be more "aware", flexible and prepared, still being general.

I can see how that's easier on the eye, but for a computer, dynamic range
only saves on bits per pixel?
It's shouldn't matter for me, I can do better by buffering derivatives (vs.
pixel) per pattern.

OK. I meant the philosophical side + your mention of the entropy equalization as a driving force. A wider and more even spectrum coverage is generally "prettier" than input which is thrown in one or another extreme of the space of inputs.

I am not trying to be precise here, just to improve the averages.

And you want one "average" (anchor) for all cases.

@Twenkid

This comment has been minimized.

Show comment
Hide comment
@Twenkid

Twenkid Aug 8, 2018

Collaborator

A related concept regarding the histogram equalization and detection/recognition of "light" as a pattern is possibly "lightness induction", the perceived value of the intensity in images, which is adjusted for the reconstructed expected position and qualities of the light sources in a 3D-reconstructed scene.

Our sense of "photorealism" and correct light coordinates and intensity are usually right, however it is confused when the clues in the 2D picture are ambiguous, like in the case explained here:

http://artificial-mind.blogspot.com/2012/01/colour-optical-illusions-are-effect-of.html
...

Other papers which popped up:

HYPOTHESIS AND THEORY ARTICLE
Front. Comput. Neurosci., 17 July 2014 | https://doi.org/10.3389/fncom.2014.00071
From image processing to computational neuroscience: a neural model based on histogram equalization
Marcelo Bertalmío*

https://www.frontiersin.org/articles/10.3389/fncom.2014.00071/full

Is lightness induction a pictorial illusion?
Perception, 2002, volume 31, pages 73 ^ 82
Alexander D Logvinenko, John Kane, Deborah A Ross ...

https://pdfs.semanticscholar.org/59c3/a03c71337ac16479c00f4db9005b3fe4207a.pdf

Collaborator

Twenkid replied Aug 8, 2018

A related concept regarding the histogram equalization and detection/recognition of "light" as a pattern is possibly "lightness induction", the perceived value of the intensity in images, which is adjusted for the reconstructed expected position and qualities of the light sources in a 3D-reconstructed scene.

Our sense of "photorealism" and correct light coordinates and intensity are usually right, however it is confused when the clues in the 2D picture are ambiguous, like in the case explained here:

http://artificial-mind.blogspot.com/2012/01/colour-optical-illusions-are-effect-of.html
...

Other papers which popped up:

HYPOTHESIS AND THEORY ARTICLE
Front. Comput. Neurosci., 17 July 2014 | https://doi.org/10.3389/fncom.2014.00071
From image processing to computational neuroscience: a neural model based on histogram equalization
Marcelo Bertalmío*

https://www.frontiersin.org/articles/10.3389/fncom.2014.00071/full

Is lightness induction a pictorial illusion?
Perception, 2002, volume 31, pages 73 ^ 82
Alexander D Logvinenko, John Kane, Deborah A Ross ...

https://pdfs.semanticscholar.org/59c3/a03c71337ac16479c00f4db9005b3fe4207a.pdf

@boris-kz

This comment has been minimized.

Show comment
Hide comment
@boris-kz

boris-kz Aug 9, 2018

Owner
Owner

boris-kz replied Aug 9, 2018

@Twenkid

This comment has been minimized.

Show comment
Hide comment
@Twenkid

Twenkid Aug 10, 2018

Collaborator
Collaborator

Twenkid replied Aug 10, 2018

@boris-kz

This comment has been minimized.

Show comment
Hide comment
@boris-kz

boris-kz Aug 11, 2018

Owner

It doesn't matter whether the samples are 0,1,2,3,4,5 or 250, 251, 252,
253, 254,255, i.e. the differences matter.

It does matter, but maybe not as much as the differences, in case of
reflected input. Which would mean that vPs are less valuable than dPs. How
would you quantify relative value of m vs. d?

I don't master your arithmetics

Then you can't contribute. If that "arithmetics" is wrong then basic comparison must be redone,
otherwise it must be part of the answer.
For example:

but may guess say that there are(range_of_input_pesample - len(dP) ) possible matches for varying
samples (2,3,4,5.. ; 3,4,5 ...) with the same dP, while different vP would match only in one.

Or vice versa, depending on whether you are looking at matches of inputs or matches of differences.

Also, given an initial value and dP, the following values can be
reconstructed, so maybe 1/len(dP) makes sense as a number, too.

They can be reconstructed from vPs too.

the notion of impact needs specification.
As of light it sounds as that you're actually starting from it as a primary
knowledge - of energy that's propagated?

Starting from complete ignorance, input is impact on a sensor. That's all that blank algorithm knows.
It may later realise that impact itself is not as predictive (matching) as the difference | ratio between impacts, but only after both input match and difference | ratio match are computed. Which would occur sparsely because comparison among differences is conditional.

Whatever the comparands are, match is defined as a common subset, this is tautological.
But comparison generates redundant vPs and dPs, so I use min - (diff | diff/2) to adjust for redundancy.
This adjustment is recent modification, it replaces specific olp and olp_ in my code.

So, back to reflected nature of visual input. Given constant pattern of albedo in observed object,
lateral ratio of brightness will be stable under changing lighting, but lateral match and difference won't be.
That ratio would replace difference to define dPs, the question is how to re-define match.

In my scheme, ratio contains both extended match: integer multiple, and remaining miss: fraction.
And clean match is adjusted for redundancy to overlapping miss patterns.
Which can be done in several ways, including:
clean match = min / difference: neither is redefined by division, only the clean-up is.
clean match = multiple / (multiple * fraction): both are redefined by division.
I am still figuring this out.

"Tactile" impact is a different modality and the algo may find some correlations between the two latter.
I shouldn't be thinking about it now.

The future impact (some sum of match I guess) is quite general, and the
"future" form of the initial call for exploration would be of different
spatio-temporal range. Local albedo is flat, all the gradients, contours,
... create the scene, their synergy does. A spehere is a sphere either if
it's darker or lighter, also if the raw input is inverted it also easily
matches the non-inverted patterns (it is seen as highly correlated). It's
different but contour-wise the same, some depths may be reversed. If you
replace light for dark the spatial awarenes is the same.

Match is a selection criterion, algo has to know what to search for. I
think main reason for my "clean match" intuition is the need to adjust for
vP redundancy to overlapping dPs. You can't test for it unless you
represent that redundancy, and that alone might be harder than figuring it
out from the first principles.

It asks for real input and interactions.

Ok, work on the code then.

Owner

boris-kz replied Aug 11, 2018

It doesn't matter whether the samples are 0,1,2,3,4,5 or 250, 251, 252,
253, 254,255, i.e. the differences matter.

It does matter, but maybe not as much as the differences, in case of
reflected input. Which would mean that vPs are less valuable than dPs. How
would you quantify relative value of m vs. d?

I don't master your arithmetics

Then you can't contribute. If that "arithmetics" is wrong then basic comparison must be redone,
otherwise it must be part of the answer.
For example:

but may guess say that there are(range_of_input_pesample - len(dP) ) possible matches for varying
samples (2,3,4,5.. ; 3,4,5 ...) with the same dP, while different vP would match only in one.

Or vice versa, depending on whether you are looking at matches of inputs or matches of differences.

Also, given an initial value and dP, the following values can be
reconstructed, so maybe 1/len(dP) makes sense as a number, too.

They can be reconstructed from vPs too.

the notion of impact needs specification.
As of light it sounds as that you're actually starting from it as a primary
knowledge - of energy that's propagated?

Starting from complete ignorance, input is impact on a sensor. That's all that blank algorithm knows.
It may later realise that impact itself is not as predictive (matching) as the difference | ratio between impacts, but only after both input match and difference | ratio match are computed. Which would occur sparsely because comparison among differences is conditional.

Whatever the comparands are, match is defined as a common subset, this is tautological.
But comparison generates redundant vPs and dPs, so I use min - (diff | diff/2) to adjust for redundancy.
This adjustment is recent modification, it replaces specific olp and olp_ in my code.

So, back to reflected nature of visual input. Given constant pattern of albedo in observed object,
lateral ratio of brightness will be stable under changing lighting, but lateral match and difference won't be.
That ratio would replace difference to define dPs, the question is how to re-define match.

In my scheme, ratio contains both extended match: integer multiple, and remaining miss: fraction.
And clean match is adjusted for redundancy to overlapping miss patterns.
Which can be done in several ways, including:
clean match = min / difference: neither is redefined by division, only the clean-up is.
clean match = multiple / (multiple * fraction): both are redefined by division.
I am still figuring this out.

"Tactile" impact is a different modality and the algo may find some correlations between the two latter.
I shouldn't be thinking about it now.

The future impact (some sum of match I guess) is quite general, and the
"future" form of the initial call for exploration would be of different
spatio-temporal range. Local albedo is flat, all the gradients, contours,
... create the scene, their synergy does. A spehere is a sphere either if
it's darker or lighter, also if the raw input is inverted it also easily
matches the non-inverted patterns (it is seen as highly correlated). It's
different but contour-wise the same, some depths may be reversed. If you
replace light for dark the spatial awarenes is the same.

Match is a selection criterion, algo has to know what to search for. I
think main reason for my "clean match" intuition is the need to adjust for
vP redundancy to overlapping dPs. You can't test for it unless you
represent that redundancy, and that alone might be harder than figuring it
out from the first principles.

It asks for real input and interactions.

Ok, work on the code then.

@Twenkid

This comment has been minimized.

Show comment
Hide comment
@Twenkid

Twenkid Aug 14, 2018

Collaborator
Collaborator

Twenkid replied Aug 14, 2018

@boris-kz

This comment has been minimized.

Show comment
Hide comment
@boris-kz

boris-kz Aug 14, 2018

Owner

Starting from complete ignorance, input is impact on a sensor. That's all that blank algorithm knows.

I.e. just data, nothing more conceptual that translates to higher level
concepts?... If it's so, IMO calling it "impact" is a bit confusing, because suggests
an... impact, something more "real" and direct.

It's the only reality that blank alg knows

The mechanical input has real impact right from the beginning, while visual means nothing alone,

Meaning is a feedback, initial assumption is always value = magnitude. This is built into comp, which is a core operation on all modalities and derived variables of the input. That's what makes this alg general.

Higher levels will compare normalized summed match per variable with that of other variables to determine relative predictive value of each. Then that relative higher-level match is fed back as a filter per variable.
We can increase a brightness match filter (ave_m) manually, to reduce the number (thus cost) of vPs.
Maybe down to a single negative vP per input span, if predictive value of min brightness is negligible.

But comp still needs to be defined from the first principles: it is used to compare all other variables too. Starting from d: comp(d) forms d_dPs and d_vPs, although with different ave_dm filter.

Besides, the mechanical "impact" and input is simpler and more reliable early input

That's how it started in simple animals, but it's not a good way to collect information.
There isn't much bandwidth or range in tactile input.
Vision provides ~80% of primary input in humans for a reason, and it's plenty sufficient.

It may later realise that impact itself is not as predictive (matching) as the difference | ratio
between impacts, but only after both input match and difference | ratio match are computed.
Which would occur sparsely because comparison among differences is conditional.
Whatever the comparands are, match is defined as a common subset, this is tautological.
But comparison generates redundant vPs and dPs, so I use min - (diff | diff/2) to adjust for redundancy.
This adjustment is recent modification, it replaces specific olp and olp_ in my code.

Now I see that formula: min - (diff | diff/2) makes sense, compensating for having the same match,
but bigger difference, but shouldn't diff be abs(diff) from the first level?

Yes, sorry for sloppy typing.
Justification for min - |diff|/2 is that |diff| indicates value of redundant dPs, which adds to the cost.
It's /2 because mim represents two comparands but diff only the larger of them.

Or it is in the code?
d += p - pri_p # fuzzy d accumulates differences between p and all prior and subsequent ps...
m = min(p, pri_p)
v += m + pri_m - abs(d + pri_d) /4 - ave *2 # fuzzy v accumulates deviation of match within bilateral...

Yes, though it's a bit out of date: /4 only adjusts for counter-projection of negative d:
/2 to adjust for lower value of d vs. m, + another /2 because it's only negative in one of two directions.
Combined with adjustment for redundancy (above), it would be m - |d| * 3/4, or approximately m - |d|.
It's a justification for evaluating clean match, but I am not sure if that's complete.

So, back to reflected nature of visual input. Given constant pattern of albedo in observed object,
lateral ratio of brightness will be stable under changing lighting, but lateral match and difference won't.
That ratio would replace difference to define dPs, the question is how to re-define match.

"Object" as images of a scene or "something that's watched" (and not subject),

Yes.

Regarding light, I think one reason for the ANN to need thousands and
thousands of the same class of annotated samples per class is because they
don't "understand" light as a pattern and the 3D-reconstruction, it doesn't
work with the "real" models, which are 3D geometrical ones, of the scene,
the predicted lighting and textures also in 3D (so they can be normalized
back to orthogonal view). It also doesn't focus and segment well as "real"
sub-patterns.
However if such a "normalized" segmented representation is present,
comparison between objects becomes easier, even trivial.
There were 3D-reconstruction algorithms even slightly before the DNN
revolution, now they are much better and a combination of them with ANN
could make a leap, either if it's with a "glue" or is an end-to-end DNN.

That's the idea behind CapsNet, but they don't work very well at the moment.
And then there is an issue of how to do comparison itself.

https://arxiv.org/abs/1808.02201
Holistic 3D Scene Parsing and Reconstruction from a Single RGB Image

That's interesting, but it's a combination of different methods, not a general (scalable) solution.
All this segmentation etc. should be done by core comp between various derivatives from previous comps.

In my scheme, ratio contains both extended match: integer multiple, and remaining miss: fraction.
Match is then cleaned-up: adjusted for redundancy to overlapping miss patterns.
Which can be done in several ways, including:
clean match = min / difference: neither is redefined by division, only the clean-up is.
clean match = multiple / (multiple * fraction): both are redefined by division.
I am still figuring this out.

Now it starts to make sense to me, but I also wonder is it a matter of
technicality (in the long run).

It's not, we have to account for redundancy, otherwise match evaluation is pretty meaningless.
This redefines comp by division: a core operation in comparing patterns, even if it's not used on initial inputs.

division, compute-wise I don't know is it more efficient (I know the "predictive value" is more important)
and floating-point fractions may requre more memory (but it depends on the optimality of the
representation, it could be the same if using floating point for everything).

Yes, I don't know if I want to use it on brightness. Because lighting is unlikely to change within a pattern,
and there already is a conditional div_comp between patterns.

"Tactile" impact is a different modality and the algo may find some
correlations between the two latter. I shouldn't be thinking about it now.

I disagree here, multimodality and the actuators are crucial for learning,..

Sorry, I won't get into this.

Owner

boris-kz replied Aug 14, 2018

Starting from complete ignorance, input is impact on a sensor. That's all that blank algorithm knows.

I.e. just data, nothing more conceptual that translates to higher level
concepts?... If it's so, IMO calling it "impact" is a bit confusing, because suggests
an... impact, something more "real" and direct.

It's the only reality that blank alg knows

The mechanical input has real impact right from the beginning, while visual means nothing alone,

Meaning is a feedback, initial assumption is always value = magnitude. This is built into comp, which is a core operation on all modalities and derived variables of the input. That's what makes this alg general.

Higher levels will compare normalized summed match per variable with that of other variables to determine relative predictive value of each. Then that relative higher-level match is fed back as a filter per variable.
We can increase a brightness match filter (ave_m) manually, to reduce the number (thus cost) of vPs.
Maybe down to a single negative vP per input span, if predictive value of min brightness is negligible.

But comp still needs to be defined from the first principles: it is used to compare all other variables too. Starting from d: comp(d) forms d_dPs and d_vPs, although with different ave_dm filter.

Besides, the mechanical "impact" and input is simpler and more reliable early input

That's how it started in simple animals, but it's not a good way to collect information.
There isn't much bandwidth or range in tactile input.
Vision provides ~80% of primary input in humans for a reason, and it's plenty sufficient.

It may later realise that impact itself is not as predictive (matching) as the difference | ratio
between impacts, but only after both input match and difference | ratio match are computed.
Which would occur sparsely because comparison among differences is conditional.
Whatever the comparands are, match is defined as a common subset, this is tautological.
But comparison generates redundant vPs and dPs, so I use min - (diff | diff/2) to adjust for redundancy.
This adjustment is recent modification, it replaces specific olp and olp_ in my code.

Now I see that formula: min - (diff | diff/2) makes sense, compensating for having the same match,
but bigger difference, but shouldn't diff be abs(diff) from the first level?

Yes, sorry for sloppy typing.
Justification for min - |diff|/2 is that |diff| indicates value of redundant dPs, which adds to the cost.
It's /2 because mim represents two comparands but diff only the larger of them.

Or it is in the code?
d += p - pri_p # fuzzy d accumulates differences between p and all prior and subsequent ps...
m = min(p, pri_p)
v += m + pri_m - abs(d + pri_d) /4 - ave *2 # fuzzy v accumulates deviation of match within bilateral...

Yes, though it's a bit out of date: /4 only adjusts for counter-projection of negative d:
/2 to adjust for lower value of d vs. m, + another /2 because it's only negative in one of two directions.
Combined with adjustment for redundancy (above), it would be m - |d| * 3/4, or approximately m - |d|.
It's a justification for evaluating clean match, but I am not sure if that's complete.

So, back to reflected nature of visual input. Given constant pattern of albedo in observed object,
lateral ratio of brightness will be stable under changing lighting, but lateral match and difference won't.
That ratio would replace difference to define dPs, the question is how to re-define match.

"Object" as images of a scene or "something that's watched" (and not subject),

Yes.

Regarding light, I think one reason for the ANN to need thousands and
thousands of the same class of annotated samples per class is because they
don't "understand" light as a pattern and the 3D-reconstruction, it doesn't
work with the "real" models, which are 3D geometrical ones, of the scene,
the predicted lighting and textures also in 3D (so they can be normalized
back to orthogonal view). It also doesn't focus and segment well as "real"
sub-patterns.
However if such a "normalized" segmented representation is present,
comparison between objects becomes easier, even trivial.
There were 3D-reconstruction algorithms even slightly before the DNN
revolution, now they are much better and a combination of them with ANN
could make a leap, either if it's with a "glue" or is an end-to-end DNN.

That's the idea behind CapsNet, but they don't work very well at the moment.
And then there is an issue of how to do comparison itself.

https://arxiv.org/abs/1808.02201
Holistic 3D Scene Parsing and Reconstruction from a Single RGB Image

That's interesting, but it's a combination of different methods, not a general (scalable) solution.
All this segmentation etc. should be done by core comp between various derivatives from previous comps.

In my scheme, ratio contains both extended match: integer multiple, and remaining miss: fraction.
Match is then cleaned-up: adjusted for redundancy to overlapping miss patterns.
Which can be done in several ways, including:
clean match = min / difference: neither is redefined by division, only the clean-up is.
clean match = multiple / (multiple * fraction): both are redefined by division.
I am still figuring this out.

Now it starts to make sense to me, but I also wonder is it a matter of
technicality (in the long run).

It's not, we have to account for redundancy, otherwise match evaluation is pretty meaningless.
This redefines comp by division: a core operation in comparing patterns, even if it's not used on initial inputs.

division, compute-wise I don't know is it more efficient (I know the "predictive value" is more important)
and floating-point fractions may requre more memory (but it depends on the optimality of the
representation, it could be the same if using floating point for everything).

Yes, I don't know if I want to use it on brightness. Because lighting is unlikely to change within a pattern,
and there already is a conditional div_comp between patterns.

"Tactile" impact is a different modality and the algo may find some
correlations between the two latter. I shouldn't be thinking about it now.

I disagree here, multimodality and the actuators are crucial for learning,..

Sorry, I won't get into this.

@Twenkid

This comment has been minimized.

Show comment
Hide comment
@Twenkid

Twenkid Aug 15, 2018

Collaborator
Collaborator

Twenkid replied Aug 15, 2018

@boris-kz

This comment has been minimized.

Show comment
Hide comment
@boris-kz

boris-kz Aug 16, 2018

Owner

Human Alg has also cheaper to discover "pre-installed" strong sensori-motor-proprioceptive patterns,
found by sending explorative feedback to them: the actuators...

If there is data on sensor-to-object distance, either changing or combined from multiple sensors, great.
If there is a way to change sensor position by feedback, even better.
But neither is necessary, almost all videos show motion. Which induces affine transformations, and they are easy to translate into depth. That has to be done anyway, direct depth info is easier but a lot scarcer.

Motor feedback and additional modalities are nice to have, but definitely not necessary and thinking about it now is a distraction. I know you love distractions, but I need to work on the algorithm.

Owner

boris-kz replied Aug 16, 2018

Human Alg has also cheaper to discover "pre-installed" strong sensori-motor-proprioceptive patterns,
found by sending explorative feedback to them: the actuators...

If there is data on sensor-to-object distance, either changing or combined from multiple sensors, great.
If there is a way to change sensor position by feedback, even better.
But neither is necessary, almost all videos show motion. Which induces affine transformations, and they are easy to translate into depth. That has to be done anyway, direct depth info is easier but a lot scarcer.

Motor feedback and additional modalities are nice to have, but definitely not necessary and thinking about it now is a distraction. I know you love distractions, but I need to work on the algorithm.

@Twenkid

This comment has been minimized.

Show comment
Hide comment
@Twenkid

Twenkid Aug 16, 2018

Collaborator
Collaborator

Twenkid replied Aug 16, 2018

@boris-kz

This comment has been minimized.

Show comment
Hide comment
@boris-kz

boris-kz Aug 16, 2018

Owner
Owner

boris-kz replied Aug 16, 2018

@Twenkid

This comment has been minimized.

Show comment
Hide comment
@Twenkid

Twenkid Aug 17, 2018

Collaborator

Regarding the photorealism of simulations, a few days ago a real-time ray-tracing GPU was announced, implying real time for sophisticated "professional" scenes:

https://nvidianews.nvidia.com/news/nvidia-reinvents-computer-graphics-with-turing-architecture?linkId=100000003236223:

https://www.nvidia.com/en-us/design-visualization/technologies/turing-architecture/

( I know it's not essential, the very high resolution as well - more computations for the same underlying patterns. )

Collaborator

Twenkid replied Aug 17, 2018

Regarding the photorealism of simulations, a few days ago a real-time ray-tracing GPU was announced, implying real time for sophisticated "professional" scenes:

https://nvidianews.nvidia.com/news/nvidia-reinvents-computer-graphics-with-turing-architecture?linkId=100000003236223:

https://www.nvidia.com/en-us/design-visualization/technologies/turing-architecture/

( I know it's not essential, the very high resolution as well - more computations for the same underlying patterns. )

@Twenkid

This comment has been minimized.

Show comment
Hide comment
@Twenkid

Twenkid Aug 25, 2018

Collaborator

Regarding:

It's not training, I don't have this distinction between training and
inference. Well, except for filter update, but it's not the same.

IMO it'd be interesting if you elaborate conceptually about the filter update (while I understand that the Alg as a whole is about filt.upd., the whole has to be comprehended).

In the counterpart in ANN, the stochastic gradient descent is the "theoretical" part and the adaptive part, a method that converges to "correct" weights over the whole input.
I recalled also that you've mentioned in an email last year that:

"My algorithm is supposed to do what NN do far more efficiently, one of the two is redundant. It's either-or, on all levels." (emails)

If you can clarify or give some mapping, suggestions which are easier for the ANN-people how your filter-update is different and why it's better, why it converges faster and without the faults of the SGD which require manual corrections.

I know of one:

  1. It's not "stochastic", it's based on real input and patterns and is more analytical and focused on small adjustments per-pattern.
  2. ANN are too holistic, not hierarchical, and not analytical enough.
  3. In image recognition task, they also are not trained the "right way". IMO they (a system) has to learn to see and recognize parts, to understand light etc., to segment the image, and then be presented with the usual tasks of discriminating classes of objects, based on such constituent patterns.

"Catastrophic forgetfulness"

One of the problems of ANN which prevent them to generalize and get incremental is the so called "Catastrophic forgetfulness".
https://steemit.com/science/@firstscience/what-can-and-what-can-not-neural-network-five-minute-guide-for-beginners

I can imagine it and explain it by the lack of the formation of real "structures" within the network, besides the fixed "architecture" of certain number of neurons, layers and operations which are defined like regularization or whatever. All neurons and filters seem to be equally important (per level) or something.

In order to avoid that, some of the elements have to be "frozen". They may try to adjust that in ANN with inclusion of priorities. Another solution I foresee is to remember a multitude of earlier states of the whole network - I think your related term for that is "buffering"?

When processing new input, especially if recognition fails or is poor, the input would be tested through various older/different configurations and states.

I assume that in CgA the freezing is related to the so called "strong patterns"?

Anyway, some patterns or schema which are harder to change need to emerge and orchestrate the others.

The filter updates and the same-filter-span-thing seem mapped to the "learning rate" in ANN and they're supposed to be small enough, to avoid catastr.forg. or "swings".

I think the forgetfulness could be improved by more levels and domains, i.e. fewer patterns per level and scope, and smaller "learning rate" - filter-adjustment steps. However the latter goes with more input, more buffering and slower learning.

Another conceptual solution I see is if the Alg intrinsically selects and keeps only the "really general" patterns - ones which are always present per level and scope, given the representation of the Alg, thus these patterns are being constantly "reminded" by the input itself. That'd suggest the code of the algorithm, too.

...

  • I remember I might have asked something about that "freezing" or something, but it was long ago.
Collaborator

Twenkid replied Aug 25, 2018

Regarding:

It's not training, I don't have this distinction between training and
inference. Well, except for filter update, but it's not the same.

IMO it'd be interesting if you elaborate conceptually about the filter update (while I understand that the Alg as a whole is about filt.upd., the whole has to be comprehended).

In the counterpart in ANN, the stochastic gradient descent is the "theoretical" part and the adaptive part, a method that converges to "correct" weights over the whole input.
I recalled also that you've mentioned in an email last year that:

"My algorithm is supposed to do what NN do far more efficiently, one of the two is redundant. It's either-or, on all levels." (emails)

If you can clarify or give some mapping, suggestions which are easier for the ANN-people how your filter-update is different and why it's better, why it converges faster and without the faults of the SGD which require manual corrections.

I know of one:

  1. It's not "stochastic", it's based on real input and patterns and is more analytical and focused on small adjustments per-pattern.
  2. ANN are too holistic, not hierarchical, and not analytical enough.
  3. In image recognition task, they also are not trained the "right way". IMO they (a system) has to learn to see and recognize parts, to understand light etc., to segment the image, and then be presented with the usual tasks of discriminating classes of objects, based on such constituent patterns.

"Catastrophic forgetfulness"

One of the problems of ANN which prevent them to generalize and get incremental is the so called "Catastrophic forgetfulness".
https://steemit.com/science/@firstscience/what-can-and-what-can-not-neural-network-five-minute-guide-for-beginners

I can imagine it and explain it by the lack of the formation of real "structures" within the network, besides the fixed "architecture" of certain number of neurons, layers and operations which are defined like regularization or whatever. All neurons and filters seem to be equally important (per level) or something.

In order to avoid that, some of the elements have to be "frozen". They may try to adjust that in ANN with inclusion of priorities. Another solution I foresee is to remember a multitude of earlier states of the whole network - I think your related term for that is "buffering"?

When processing new input, especially if recognition fails or is poor, the input would be tested through various older/different configurations and states.

I assume that in CgA the freezing is related to the so called "strong patterns"?

Anyway, some patterns or schema which are harder to change need to emerge and orchestrate the others.

The filter updates and the same-filter-span-thing seem mapped to the "learning rate" in ANN and they're supposed to be small enough, to avoid catastr.forg. or "swings".

I think the forgetfulness could be improved by more levels and domains, i.e. fewer patterns per level and scope, and smaller "learning rate" - filter-adjustment steps. However the latter goes with more input, more buffering and slower learning.

Another conceptual solution I see is if the Alg intrinsically selects and keeps only the "really general" patterns - ones which are always present per level and scope, given the representation of the Alg, thus these patterns are being constantly "reminded" by the input itself. That'd suggest the code of the algorithm, too.

...

  • I remember I might have asked something about that "freezing" or something, but it was long ago.
@boris-kz

This comment has been minimized.

Show comment
Hide comment
@boris-kz

boris-kz Aug 26, 2018

Owner

Regarding:

It's not training, I don't have this distinction between training and
inference. Well, except for filter update, but it's not the same.

IMO it'd be interesting if you elaborate conceptually about the filter update (while I understand that the
Alg as a whole is about filt.upd., the whole has to be comprehended).

Ultimate hierarchical algorithm will have three parts:

  • comparison over initial D-cycle: 3 | 4 dimensions,
  • evaluation of filter update for lower levels, with incremental filter complexity,
  • incrementing current comp() for higher level, with additional input complexity

In the counterpart in ANN, the stochastic gradient descent is the "theoretical" part and the adaptive part,
a method that converges to "correct" weights over the whole input.

SGD alters all weights on all nodes. I don't have any weights, filter is a single value per variable type per level, for all of its inputs.

I recalled also that you've mentioned in an email last year that:

"My algorithm is supposed to do what NN do far more efficiently, one of the two is redundant. It's either-or, on all levels." (emails)

If you can clarify or give some mapping, suggestions which are easier for the ANN-people how your
filter-update is different and why it's better, why it converges faster and without the faults of the
SGD which require manual corrections.

I think I explained it:
Core operation in cogalg is input-to-input comp(), bit-filtering inputs, integer-filtering matches,
also higher-order filters for higher-power comps.
Core operation in ANN is input matrix * weight matrix, which is a dirty and coarse mix of last-layer
"error" with every weight in the network.
Mine is comparison-first, ANN is comparison-last, and even that comparison is not done right.
How do you compare a clean theory with a bunch of hacks?

One of the problems of ANN which prevent them to generalize and get incremental is the so called
"Catastrophic forgetfulness".
I can imagine it and explain it by the lack of the formation of real "structures" within the network,
besides the fixed "architecture" of certain number of neurons, layers and operations which are defined
like regularization or whatever. All neurons and filters seem to be equally important (per level) or
something.

It's just generally retarded scaling, because everything is ridiculously coarse.

Another solution I foresee is to remember a multitude of earlier states of the whole network -
I think your related term for that is "buffering"?

Primarily pipelining (my hierarchy is pipeline), secondarily buffering.

When processing new input, especially if recognition fails or is poor, the input would be tested
through various older/different configurations and states.
I assume that in CgA the freezing is related to the so called "strong patterns"?

I don't have any separate testing, everything is done by the same comp().
If by freezing you mean extended / recursive search, yes, that's done within above-average vPs or dPs.
But I don't consider that a failure.

Anyway, some patterns or schema which are harder to change need to emerge and orchestrate the others.
The filter updates and the same-filter-span-thing seem mapped to the "learning rate" in ANN
and they're supposed to be small enough, to avoid catastr.forg. or "swings".

Small or not depends on the input. If you have drastic change in the input, then potential match from comparing before and after goes down. That's not forgetting, just search segmentation.

I think the forgetfulness could be improved by more levels and domains, i.e. fewer patterns per level
and scope, and smaller "learning rate" - filter-adjustment steps.
However the latter goes with more input, more buffering and slower learning.
Another conceptual solution I see is if the Alg intrinsically selects and keeps only the "really general"
patterns - ones which are always present per level and scope, given the representation of the Alg,
thus these patterns are being constantly "reminded" by the input itself.
That'd suggest the code of the algorithm, too.

None of that is controlled separately. You simply have hierarchical comparison between filter spans and then between their patterns: if higher levels of compared patterns match, then you dig deeper.

Owner

boris-kz replied Aug 26, 2018

Regarding:

It's not training, I don't have this distinction between training and
inference. Well, except for filter update, but it's not the same.

IMO it'd be interesting if you elaborate conceptually about the filter update (while I understand that the
Alg as a whole is about filt.upd., the whole has to be comprehended).

Ultimate hierarchical algorithm will have three parts:

  • comparison over initial D-cycle: 3 | 4 dimensions,
  • evaluation of filter update for lower levels, with incremental filter complexity,
  • incrementing current comp() for higher level, with additional input complexity

In the counterpart in ANN, the stochastic gradient descent is the "theoretical" part and the adaptive part,
a method that converges to "correct" weights over the whole input.

SGD alters all weights on all nodes. I don't have any weights, filter is a single value per variable type per level, for all of its inputs.

I recalled also that you've mentioned in an email last year that:

"My algorithm is supposed to do what NN do far more efficiently, one of the two is redundant. It's either-or, on all levels." (emails)

If you can clarify or give some mapping, suggestions which are easier for the ANN-people how your
filter-update is different and why it's better, why it converges faster and without the faults of the
SGD which require manual corrections.

I think I explained it:
Core operation in cogalg is input-to-input comp(), bit-filtering inputs, integer-filtering matches,
also higher-order filters for higher-power comps.
Core operation in ANN is input matrix * weight matrix, which is a dirty and coarse mix of last-layer
"error" with every weight in the network.
Mine is comparison-first, ANN is comparison-last, and even that comparison is not done right.
How do you compare a clean theory with a bunch of hacks?

One of the problems of ANN which prevent them to generalize and get incremental is the so called
"Catastrophic forgetfulness".
I can imagine it and explain it by the lack of the formation of real "structures" within the network,
besides the fixed "architecture" of certain number of neurons, layers and operations which are defined
like regularization or whatever. All neurons and filters seem to be equally important (per level) or
something.

It's just generally retarded scaling, because everything is ridiculously coarse.

Another solution I foresee is to remember a multitude of earlier states of the whole network -
I think your related term for that is "buffering"?

Primarily pipelining (my hierarchy is pipeline), secondarily buffering.

When processing new input, especially if recognition fails or is poor, the input would be tested
through various older/different configurations and states.
I assume that in CgA the freezing is related to the so called "strong patterns"?

I don't have any separate testing, everything is done by the same comp().
If by freezing you mean extended / recursive search, yes, that's done within above-average vPs or dPs.
But I don't consider that a failure.

Anyway, some patterns or schema which are harder to change need to emerge and orchestrate the others.
The filter updates and the same-filter-span-thing seem mapped to the "learning rate" in ANN
and they're supposed to be small enough, to avoid catastr.forg. or "swings".

Small or not depends on the input. If you have drastic change in the input, then potential match from comparing before and after goes down. That's not forgetting, just search segmentation.

I think the forgetfulness could be improved by more levels and domains, i.e. fewer patterns per level
and scope, and smaller "learning rate" - filter-adjustment steps.
However the latter goes with more input, more buffering and slower learning.
Another conceptual solution I see is if the Alg intrinsically selects and keeps only the "really general"
patterns - ones which are always present per level and scope, given the representation of the Alg,
thus these patterns are being constantly "reminded" by the input itself.
That'd suggest the code of the algorithm, too.

None of that is controlled separately. You simply have hierarchical comparison between filter spans and then between their patterns: if higher levels of compared patterns match, then you dig deeper.

@Twenkid

This comment has been minimized.

Show comment
Hide comment
@Twenkid

Twenkid Aug 28, 2018

Collaborator
Collaborator

Twenkid replied Aug 28, 2018

@boris-kz

This comment has been minimized.

Show comment
Hide comment
@boris-kz

boris-kz Aug 29, 2018

Owner

SGD alters all weights on all nodes. I don't have any weights, filter is a
single value per variable type per level, for all of its inputs.

"Weights" are their and graph theory's term for "variables" or "values"
(one type of them).

My inputs are multivariate patterns, so type has a different meaning here.
I don't think that can be explained before explaining how pattern is formed.
I changed / added two paragraphs to Comparison to ANN and BNN in readme:

It is an inherently statistical method: inputs are summed within samples defined by initially random weights. These weights are trained into meaningful values by Stochastic Gradient Descent, but only after weighed inputs propagate through the whole network, then are compared to top-layer template to form an error, which then backpropagates through the network again. This cycle is too coarse to scale without supervision or task-specific reinforcement, especially since it must be repeated thousands of times during training.

So, ANN is a comparison-last algorithm. CogAlg is comparison-first: my initial feedback per pixel is simply a prior or adjacent pixel. This is infinitely finer-grained than backpropagation, and resulting patterns are immediately meaningful, if tentative. I also have a higher-order feedback: filters, but they are optional and there is only one filter per pattern’s variable type. ANN has specific weight for each input: a combination of template and filter. I think these are different orders of feedback, with different scope.
More broadly, SGD minimizes error (my miss), which doesn’t fully correlate with maximizing match.

Yes, too much interdependence and too many variables for each pattern.

The problem is that they mix different orders of feedback: templates and filters.
They should have different specificity: template per search range, filter per filter span.

FILTERS

IMO these "filters-filtering filters" would benefit from semantic
adjustments and an "essay" on filters, because ANN had also the term with
different meaning and different application, and weights are also called
"filter values" :
https://adeshpande3.github.io/A-Beginner%27s-Guide-To-Understanding-Convolutional-Neural-Networks/
The ANN filters also do "filter", allowing the previous values to pass, or
be amplified or zeroed.
For an image-processing aware persons to/a "filter" generally means
"Convolution", e.g. Gaussian blur, which explains why they call the
matrices multiplication and the respective matrices "filters"

The best way to explain is to show how they are used.

CgA still doesn't have any executable practical code and no material
feedback, that is one reason why it's harder to compare it...

Yes, those who don't care about theory won't be impressed.
That's not my target audience.

As of the hacks, it's assumed that humans and their brain in particular are
a product of "hacks".

And it shows.

Overall, there are evidences that the ad-hoc methods with shorter feedback
are more successful than the over-planned ones in this Universe, they also
converge to higher structures even without the higher level of
cognition/government being aware of all of the details.

This "universe" never had introspection before.
We do have introspection, but it comes with severely retarded attention span and integrity.
If you fix the latter, previous "evidence" is irrelevant.

With so many practitioners and ad-hoc and "glue-code" solutions, published
every day with immediate feedback, the progress is amazing.
Even if it's not encompassed with a general theory yet of CogAlg type, from
the pixel to the concepts*, it could be made up ad-hoc using the NN as
subsystems-elements of another general model which just uses their output.

That's possible, but where is your part in it?
Also, that hack will have to be supervised long before it can scale to unsupervised.
And I will know where to guide it.

  • BTW, in fact there is a theory for some domain, it's just "too simple":
    just find and compute or apply the mutual probabilities of the input
    variables, given a data set and selected constraints. "Training" is mapping
    different probabilities to labels (higher level patterns), which allows for
    selective access of the lower patterns (the distributions over the input
    domain). When applied at pixel-level, it produces similar images to the
    experienced ones or modifies them in an interesting way (auto-encoders,
    GANs).

Yes, "theory" is relative. My theory is more like a theorem: deduction from the definition, although still partly intuitive.

The early test with CogAlg and the first level suggested a huge memory
footprint even there for one image 1024x768 or something.

I not concerned about efficiency on lower levels, as long as costs and benefits are explicit.
The latter is what makes it expensive on lower levels, but keeping that info is the only way
to select for (benefits - costs) on higher levels, which is where it really matters.
Anyway, I have specific reasons for each operation, you are welcome to address them.

Owner

boris-kz replied Aug 29, 2018

SGD alters all weights on all nodes. I don't have any weights, filter is a
single value per variable type per level, for all of its inputs.

"Weights" are their and graph theory's term for "variables" or "values"
(one type of them).

My inputs are multivariate patterns, so type has a different meaning here.
I don't think that can be explained before explaining how pattern is formed.
I changed / added two paragraphs to Comparison to ANN and BNN in readme:

It is an inherently statistical method: inputs are summed within samples defined by initially random weights. These weights are trained into meaningful values by Stochastic Gradient Descent, but only after weighed inputs propagate through the whole network, then are compared to top-layer template to form an error, which then backpropagates through the network again. This cycle is too coarse to scale without supervision or task-specific reinforcement, especially since it must be repeated thousands of times during training.

So, ANN is a comparison-last algorithm. CogAlg is comparison-first: my initial feedback per pixel is simply a prior or adjacent pixel. This is infinitely finer-grained than backpropagation, and resulting patterns are immediately meaningful, if tentative. I also have a higher-order feedback: filters, but they are optional and there is only one filter per pattern’s variable type. ANN has specific weight for each input: a combination of template and filter. I think these are different orders of feedback, with different scope.
More broadly, SGD minimizes error (my miss), which doesn’t fully correlate with maximizing match.

Yes, too much interdependence and too many variables for each pattern.

The problem is that they mix different orders of feedback: templates and filters.
They should have different specificity: template per search range, filter per filter span.

FILTERS

IMO these "filters-filtering filters" would benefit from semantic
adjustments and an "essay" on filters, because ANN had also the term with
different meaning and different application, and weights are also called
"filter values" :
https://adeshpande3.github.io/A-Beginner%27s-Guide-To-Understanding-Convolutional-Neural-Networks/
The ANN filters also do "filter", allowing the previous values to pass, or
be amplified or zeroed.
For an image-processing aware persons to/a "filter" generally means
"Convolution", e.g. Gaussian blur, which explains why they call the
matrices multiplication and the respective matrices "filters"

The best way to explain is to show how they are used.

CgA still doesn't have any executable practical code and no material
feedback, that is one reason why it's harder to compare it...

Yes, those who don't care about theory won't be impressed.
That's not my target audience.

As of the hacks, it's assumed that humans and their brain in particular are
a product of "hacks".

And it shows.

Overall, there are evidences that the ad-hoc methods with shorter feedback
are more successful than the over-planned ones in this Universe, they also
converge to higher structures even without the higher level of
cognition/government being aware of all of the details.

This "universe" never had introspection before.
We do have introspection, but it comes with severely retarded attention span and integrity.
If you fix the latter, previous "evidence" is irrelevant.

With so many practitioners and ad-hoc and "glue-code" solutions, published
every day with immediate feedback, the progress is amazing.
Even if it's not encompassed with a general theory yet of CogAlg type, from
the pixel to the concepts*, it could be made up ad-hoc using the NN as
subsystems-elements of another general model which just uses their output.

That's possible, but where is your part in it?
Also, that hack will have to be supervised long before it can scale to unsupervised.
And I will know where to guide it.

  • BTW, in fact there is a theory for some domain, it's just "too simple":
    just find and compute or apply the mutual probabilities of the input
    variables, given a data set and selected constraints. "Training" is mapping
    different probabilities to labels (higher level patterns), which allows for
    selective access of the lower patterns (the distributions over the input
    domain). When applied at pixel-level, it produces similar images to the
    experienced ones or modifies them in an interesting way (auto-encoders,
    GANs).

Yes, "theory" is relative. My theory is more like a theorem: deduction from the definition, although still partly intuitive.

The early test with CogAlg and the first level suggested a huge memory
footprint even there for one image 1024x768 or something.

I not concerned about efficiency on lower levels, as long as costs and benefits are explicit.
The latter is what makes it expensive on lower levels, but keeping that info is the only way
to select for (benefits - costs) on higher levels, which is where it really matters.
Anyway, I have specific reasons for each operation, you are welcome to address them.

@Twenkid

This comment has been minimized.

Show comment
Hide comment
@Twenkid

Twenkid Aug 29, 2018

Collaborator
Collaborator

Twenkid replied Aug 29, 2018

@boris-kz

This comment has been minimized.

Show comment
Hide comment
@boris-kz

boris-kz Aug 30, 2018

Owner
Owner

boris-kz replied Aug 30, 2018

@Twenkid

This comment has been minimized.

Show comment
Hide comment
@Twenkid

Twenkid Sep 12, 2018

Collaborator

With so many practitioners and ad-hoc and "glue-code" solutions, published
every day with immediate feedback, the progress is amazing.
Even if it's not encompassed with a general theory yet of CogAlg type, from
the pixel to the concepts*, it could be made up ad-hoc using the NN as
subsystems-elements of another general model which just uses their output.

That's possible, but where is your part in it?
Also, that hack will have to be supervised long before it can scale to unsupervised.

Boris, did you read about HyperNet?

A novelty which nests other networks. See the abstract and the keywords.

"HyperNets and their application to learning spatial
transformations"

https://arxiv.org/pdf/1807.09226.pdf

Abstract. In this paper we propose a conceptual framework for higher-order artificial
neural networks. The idea of higher-order networks arises naturally
when a model is required to learn some group of transformations, every element
of which is well-approximated by a traditional feedforward network. Thus the
group as a whole can be represented as a hyper network. One of typical examples
of such groups is spatial transformations. We show that the proposed
framework, which we call HyperNets, is able to deal with at least two basic spatial
transformations of images: rotation and affine transformation. We show that
HyperNets are able not only to generalize rotation and affine transformation,
but also to compensate the rotation of images bringing them into canonical
forms.

Keywords: Artificial neural networks, higher-order models, affine transformation,
rotation compensation, ...

Collaborator

Twenkid replied Sep 12, 2018

With so many practitioners and ad-hoc and "glue-code" solutions, published
every day with immediate feedback, the progress is amazing.
Even if it's not encompassed with a general theory yet of CogAlg type, from
the pixel to the concepts*, it could be made up ad-hoc using the NN as
subsystems-elements of another general model which just uses their output.

That's possible, but where is your part in it?
Also, that hack will have to be supervised long before it can scale to unsupervised.

Boris, did you read about HyperNet?

A novelty which nests other networks. See the abstract and the keywords.

"HyperNets and their application to learning spatial
transformations"

https://arxiv.org/pdf/1807.09226.pdf

Abstract. In this paper we propose a conceptual framework for higher-order artificial
neural networks. The idea of higher-order networks arises naturally
when a model is required to learn some group of transformations, every element
of which is well-approximated by a traditional feedforward network. Thus the
group as a whole can be represented as a hyper network. One of typical examples
of such groups is spatial transformations. We show that the proposed
framework, which we call HyperNets, is able to deal with at least two basic spatial
transformations of images: rotation and affine transformation. We show that
HyperNets are able not only to generalize rotation and affine transformation,
but also to compensate the rotation of images bringing them into canonical
forms.

Keywords: Artificial neural networks, higher-order models, affine transformation,
rotation compensation, ...

@boris-kz

This comment has been minimized.

Show comment
Hide comment
@boris-kz

boris-kz Sep 12, 2018

Owner
Owner

boris-kz replied Sep 12, 2018

Please sign in to comment.