Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issues with with et. al #8

Open
Palmik opened this issue Apr 11, 2013 · 4 comments
Open

Performance issues with with et. al #8

Palmik opened this issue Apr 11, 2013 · 4 comments

Comments

@Palmik
Copy link
Contributor

Palmik commented Apr 11, 2013

Hi, I have created a benchmark that compares the performance of data-store and tables.

What I found is that tables seems to have much worse performance than data-store when querying. I think the problem might be that with et. al not only select the matching elements from the table, but also create new table from these elements (which uses insert, which uses deleteCollisions which, as the insert-collisions benchmark indicates, also seems to be underperfoming).

The problem might also be caused by the benchamark (or bug in data-store, but the results of the benchmarks seem to be the same for both packages), so I encourage everyone to take a look. (For anyone wondering, the NFData instance for tables comes from here)

I am open to any and all suggestions.

@ekmett
Copy link
Owner

ekmett commented Apr 11, 2013

I fully anticipate that tables is about half the speed it could be given a more robust implementation than just deleting all the things and reinserting them. Your performance numbers seem to be about what I'd expect.

That is the trade-off for the nicer API.

We might be able to do much better if we replaced all of the Map, IntMap, and HashMap machinery with something that could track what branches have changed, and/or allow for in place filtering.

Note: with both selects and rebuilds, but when you use the Const Applicative, that reconstruction never happens it is in a function that gets thrown away.

instance Functor (Const a) where
   fmap _ (Const a) = Const a

When you use ^. to read from with, that "reconstruction" never happens.

Another aspect that could lead to better performance would be to enable IntMap and HashMap primary keys.

@Palmik
Copy link
Contributor Author

Palmik commented Apr 14, 2013

Yeah, it seems that most of the time is spent creating the table from the list of 'selected' elements ((xs^.table) in the go functions), which is what I meant by "creating new table".

As a proof of concept, I have added lens based interface to data-store and, as expected, the performance sunk considerably indeed.

I have not yet measured the performance after the commit Taneb pushed to tables few hours ago.

By the way, data-store does not (yet) use (or makes it possible to use) Data.IntMap.IntMap in case the key dimension type (type of the column) is Int. So I do not think that is the main bottleneck.

@startling
Copy link
Collaborator

Could there be a variant of with that's a fold over rows rather than a lens on a new table? Then we could do e.g.

myTable ^.. with' Fst (==) 10

and receive a list rather than a new table, avoiding the overhead of creating a new table. Obviously this isn't as compositional as with, but it could make sense for small queries and for the last of a chain of queries.

@ekmett
Copy link
Owner

ekmett commented Oct 17, 2013

I don't have a particular objection to adding it, though it is a lot less compositional than the current approach.

One option might be to make a couple of RULES that would let you extract

myTable ^.. with Fst (==) 10 . rows

more efficiently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants