Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add OpenNARS-style Distributor in Bag #84

Merged
merged 5 commits into from
Jan 11, 2024
Merged

add OpenNARS-style Distributor in Bag #84

merged 5 commits into from
Jan 11, 2024

Conversation

maxeeem
Copy link
Collaborator

@maxeeem maxeeem commented Dec 18, 2023

See #83

I added the distributor but looking for your suggestions to see if we can improve the implementation. The issue is that initializing the Distributor object is not exactly quick, there are multiple loops in the __init__ method. To avoid recalculation I added a cache in the form of a static method new but as I am myself new to python, perhaps you guys can think of better ways to optimize the calculations. With this change tests run slower but still less than 1 min.

@maxeeem maxeeem self-assigned this Dec 18, 2023
@maxeeem maxeeem linked an issue Dec 18, 2023 that may be closed by this pull request
@ccrock4t
Copy link
Collaborator

Hi Maxim, I will need more info about what is the Distributor

@maxeeem
Copy link
Collaborator Author

maxeeem commented Dec 19, 2023

@ccrock4t you can see the code from OpenNARS.

As the comments suggest (I will add them to this PR shortly)

A pseudo-random number generator, used in Bag.

And

For any number N < range, there is N+1 copies of it in the array, distributed as evenly as possible

This is all following our discussion over email, regarding selection probability in bag being propositional to priority.

Without this change, the result of test_bag_take_task() looks like this (as @bowen-xu pointed out)

Figure_1

After the change it looks like this which I believe is more along the lines of what we want

Figure_2

@maxeeem maxeeem marked this pull request as ready for review December 19, 2023 17:58
@ccrock4t
Copy link
Collaborator

In terms of speed, it seems like that is the cost of using this algorithm. If we can find out the name of the algorithm, maybe there are discussions online about improving it?

pynars/NARS/DataStructures/_py/Bag.py Outdated Show resolved Hide resolved
pynars/NARS/DataStructures/_py/Bag.py Outdated Show resolved Hide resolved

if self._is_current_level_empty():
self._move_to_next_nonempty_level()
if self._is_current_level_empty() or self.currenCounter == 0:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you help me understand why we need currentCounter? It seems to hold the most recent index of an Item taken from a Bucket. What is special about index 0 to make Bag change its current Level?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we need to change the level at some point, but why not at every Take()?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you help me understand why we need currentCounter? It seems to hold the most recent index of an Item taken from a Bucket. What is special about index 0 to make Bag change its current Level?

So this is a fun one :) I basically just converted Java to Python and adapted it, so there may be better ways to write all this stuff.

This is related to the THRESHOLD parameter I was asking about over email. It seems similar to the PyNARS take_in_order as far as I understand. In OpenNARS if currentLevel is below THRESHOLD, then currentCounter will be set to 0 and at next call to take we will go inside the conditional, otherwise we will continue taking from the current level until empty.

I suppose we need to change the level at some point, but why not at every Take()?

The way it's currently implemented it is indeed going to change levels at every take since take_in_order is set to True so the implementation is not complete in that regard. In OpenNARS I also wasn't able to find any instances where it would fire off the level completely as THRESHOLD is always set to 1. So I'm not entirely sure what the use case is for this property or how to fully reconcile the two implementations since Bag in PyNARS has quite a few other methods and the code is different compared to OpenNARS. If you guys want, we can have a design discussion and fully document the Bag implementation and figure out all of these edge cases. I would need your help though.

As it is yeah, if take_in_order is set to False then the logic is kinda weird. There isn't a direct match for it in the Java code so looking to you guys for suggestions on how to handle this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so the graph is showing the behavior when the level changes every take(), which is good because it matches our expectation. I am not sure about take_in_order itself, but in my opinion for take() we should choose a random level every time, just remove the self.currenCounter unless @bowen-xu sees it should be used for take_in_order.

I agree it is a good idea to document Bag more thoroughly. We could put more info in the Conceptual Design document we have going

Copy link
Owner

@bowen-xu bowen-xu Dec 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

take_in_order does not appear in the conceptual design. It can be removed from the code.
I didn't really read the code of Bag in OpenNARS 3.0.4; but I only referred to NARS-Python before. I just found that the previous implementation of Bag in PyNARS is not consistant with OpenNARS.

I suggest that we shall faithfully adopt the orignial design of OpenNARS in terms of Bag. So thanks @maxeeem for helping us consider this issue carefully!

@maxeeem Why not just re-implement the Bag data-structure in python, so that it could be fully the same as the java version?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can look into aligning the implementations and porting it from Java to Python. I did notice that Buffer uses take_in_order=False so we'll need to examine that as well and I imagine Channels which inherits from Buffer.

@ccrock4t
Copy link
Collaborator

ccrock4t commented Dec 21, 2023

It seems a Distributor with value $N$ takes $O(N^2)$ time for initialization, so that is why the slowdown is occurring.

My initial thoughts: the self.order array is being filled one value at a time (in a loop) using local variables rank, time, and index. Can we simply save the state of those variables as members of the object, then calculate the value for self.order on the fly, whenever pick() is called, using the same logic in __init__?

EDIT: it will probably not be that simple

self.levels = tuple(list() for i in range(self.n_levels)) # initialize buckets between 0 and capacity

self.currentCounter = 0
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the "lower camel case" style is better

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out. Old habits are hard to break :) I'll push up a fix shortly.

@bowen-xu
Copy link
Owner

It seems a Distributor with value N takes O(N2) time for initialization, so that is why the slowdown is occurring.

My initial thoughts: the self.order array is being filled one value at a time (in a loop) using local variables rank, time, and index. Can we simply save the state of those variables as members of the object, then calculate the value for self.order on the fly, whenever pick() is called, using the same logic in __init__?

EDIT: it will probably not be that simple

@ccrock4t Is the slowdown significant? I found that the initialization takes only around 8ms in my laptop.
image

@ccrock4t
Copy link
Collaborator

ccrock4t commented Dec 21, 2023

@ccrock4t Is the slowdown significant? I found that the initialization takes only around 8ms in my laptop.

Not for the current default number of buckets (100), but the time for initialization could become an issue if we need to significantly increase the granularity of Bag (e.g., to 10,000 or more).

Since it is only a 1-time operation, the slowdown would not really impact user experience during NARS operation. Only on startup. However the memory also takes up $O(n^2)$.

@ccrock4t
Copy link
Collaborator

@bowen-xu

@bowen-xu
Copy link
Owner

bowen-xu commented Dec 21, 2023

@ccrock4t
I agree that "the time for initialization could become an issue if we need to significantly increase the granularity of Bag (e.g., to 10,000 or more)." However, do we really need 10,000 levels? I think 100 levels is enough, as we don't need such high accuracy for priority. The control mechanism is not accurate.

@ccrock4t
Copy link
Collaborator

ccrock4t commented Dec 21, 2023

do we really need 10,000 levels? I think 100 levels is enough

I agree it is OK for now, since we only tend to use 100 buckets.

But, if we want to try more a lot more buckets in the future, we should revisit this.

@bowen-xu bowen-xu merged commit 0632551 into dev Jan 11, 2024
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Add OpenNARS-style Distributor in Bag
3 participants