Skip to content

Conversation

@alooow
Copy link

@alooow alooow commented Dec 27, 2016

I am working on the implementation of oblivious B-trees, as proposed in issue #219, based on the paper Cache-Oblivious B-Trees, Bender, Demaine and Farach-Colton, SIAM J. Comput. 35 (2006) 341-358

The structure is quite complicated, but on the bottom of everything lies a memory packed array (description of it can be found in the above paper, as well as in: An adaptive packed-memory array, Bender, Hu, ACM Transactions on Database Systems v.32 i.4, (2007) ). For use in the B-trees the structure must offer an "insert after" method and a "delete" method. Presented implementation is still a subject to change. One of the problems is that the array must be always within lower and upper threshold. I tested the current implementation only with the lower threshold set to 0, as otherwise an empty array wouldn't be within the threshold, preventing from inserting any element. I will try to add some test by the end of this or future week. As the array is a basis for future work on the cache-oblivious b-trees, I would really appreciate if someone could look at it and share his opinion.

I was also wondering if the memory packed array would be useful not only for the cache oblivious B-trees. I am not sure, what is right now the best way of inserting an element after a given index (I tried only insert!(collection, index, item)). I was thinking about making some tests and checking if memory packed array is a faster solution for this issue.

@codecov-io
Copy link

codecov-io commented Dec 27, 2016

Codecov Report

Merging #241 into master will decrease coverage by 8.86%.
The diff coverage is 0%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #241      +/-   ##
==========================================
- Coverage   96.24%   87.38%   -8.87%     
==========================================
  Files          28       29       +1     
  Lines        2051     2259     +208     
==========================================
  Hits         1974     1974              
- Misses         77      285     +208
Impacted Files Coverage Δ
src/memory_packed_array.jl 0% <0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 830413f...5698f34. Read the comment docs.

@StephenVavasis
Copy link
Contributor

This looks like a good start! I'll mention a few stylistic points about Julia:

  • the 'local' keyword is rarely needed
  • statements generally do not need a semicolon terminator
  • Usually j+=1 is preferred to j=j+1.
  • The expression sum(A.exists[from:(to-1)]) will needlessly create a temporary array. You can either replace this with an explicit loop, or if you are targeting 0.6.0, then you can use a new generator expression to loop over the subarray.
  • it is probably best to avoid using size as a variable name since this is also the name of a function in Base.

With regard to the data structure itself: if you are planning to develop a drop-in replacement for the current 2-3-tree structure that underlies SortedDict, SortedMultiDict and SortedSet, then almost certainly you will need 'up' pointers at the leaves of the tree. These are necessary to support the token interface to the data structures, and they are not mentioned in the Bender et al. paper quoted above. As far as I can see, your code can already handle 'up' pointers if you designate D (the data type of the array) to be a tuple or small immutable one of whose fields is an 'up' pointer.

@kmsquire
Copy link
Member

Looking forward to this!

@tjgreen42
Copy link

Is this pull request still active? I just stumbled upon it, after doing my own implementation of (adaptive) packed-memory arrays, which I was considering packaging up for candidacy to DataStructures.jl. A second question I have for folks here is whether there are any efforts underway to create a standard library of paged data structures, such as the cache-oblivious B-trees mentioned here. I don't think these would fit properly into DataStructures.jl, because they require a notion of page-access/buffer pool implemented by components TBD.

@alooow
Copy link
Author

alooow commented Feb 27, 2018

I'm sorry, I had a lot going on last year and I didn't have time give this issue proper attention. @tjgreen, I am new to cache oblivious algorithms (got involved through this implementation) so my implementation of MPA can still be very faulty. During the last commits I was still fixing the delete procedure (should be working now). If you think your implementation works better, or if you have any suggestions I will be happy to hear them out.

@kmsquire
Copy link
Member

I'm going to close this for now, and suggest that if @alooow (or anyone else) wants to pursue this further, it can probably be done in a separate package.

@kmsquire kmsquire closed this Sep 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants