You can clone with
No one assigned
It appears each time a view is used, all data is rerun through the map and reduce functions.
Instead, it would be nice if PouchDB cached incremental data for the output of map and reduce functions, like CouchDB does, so that future uses of the view would be faster, at the expense of additional storage space.
Want to back this issue? Place a bounty on it! We accept bounties via Bountysource.
There was a thread on the mailing list titled "Indexes, views and queries in PouchDB" that discussed this but I didn't find any issue tracking such a feature.
Yup worth linking
As stated, PouchDB's primary use case right now is for the type of application data that is likely not to be hurt by doing full table scans (I am thinking todo lists, calendar events, a subset of your emails)
But in time we obviously dont want the limits on what data can be queried,
Since we cant dynamically generate object stores, this will likely be a single table for all views that is keyed by viewname (alternatively a whole hog new database per view), but first I would love to see some performance tests.
In my case, I'm trying to make a syncable language dictionary with a tagged index (I.E. entry 'pear' is tagged 'fruit' and 'green'). This results in 10-100k's of rows. As a workaround, I can manually store additional records and use range lookups on them, but I was hoping for something a little bit more automatic. What is making me interested in PouchDB is its built-in sync functionality that I'd need to implement on my own if using IndexedDB directly.
Yes, that would be incredibly useful. For example, I am about to use PouchDB for a music player, and each document represents a track. I may have thousands of those, and views have to aggregate artists/albums and other stuff.
Each aggregation takes about 300ms with 500 records, imagine what would be with 2k or 3k. As a result, the app would be incredibly unresponsive (e.g. on clicking "artists" or "albums"). Of course, this can always be worked around by building/maintaining custom data structures, but if that is the case, there is little to no point of using PouchDB instead of IndexedDB.
In my case I'm writing an app that will store all it's data on the client with sync to couchdb for backup. Incremental view data will be crucial. I also think this will make Pouch the goto solution for offline HTML5 web apps which will be a big thing. I'm betting big on Pouch but I need this feature. Go Pouch! to inifinity and beyond.....
Any update on this?
Just wanted to give a quick headup on how this likely should be implemented, so the mapreduce.js is decoupled from the rest of pouch, right now when a query is made is goes through the entire changes feed and builds the query in memory.
To make this incrementally build, mapreduce.js will keep a track of _design document changes via the changes feed, and store a sequence so it knows which view has which data, then all it needs to do is process the incoming changes through the map function and store the results, the subsequent queries will then be key range queries as opposed to table scans
The hard part about this is mostly how we store the results, the mapreduce, obviously this should work in both websql and indexeddb, either the mapreduce code will have to handle them both seperately, or we can attempt to write a thin wrapper over both, #317 might be a good start
I leave a link here to some technical overview of View Indexes in CouchDB that could give some hints on how to implement it in PouchDB:http://wiki.apache.org/couchdb/Technical%20Overview#View_Indexeshttp://wiki.apache.org/couchdb/Why%20are%20all%20Views%20in%20a%20single%20Index
View indexes are created/updated on demand, storing the sequence ID of the last document processed so next time the indexes are updated from there. Also all the views of a single design document are indexed at the same time to make it more efficient/performant.
'The hard part about this is mostly how we store the results'
This actually isnt hard at all, views are simply new PouchDB databases, all_docs with keys gives you the key lookups, cross platform and upgrade issues are all handled, this could actually end up being a fairly tiny patch
First of all, I really like PouchDB and find all the replication functionality and compatibility with CouchDB awesome.
Looking forward for a better (more performing) View Query implementation for querying bigger datasets.
Just would like to discuss on the topic and give some other view on it.
I can see 2 different ways of implementing this:
Each one has its own pros/cons.
I think B option is neater in that each database backend (websql, idb, ...) is self contained and doesn't pollute the environment. Think on how many databases you could end up with if you create one per view! I think it's easier to manage if everything is in one database.
Each database backend can have a "view-store" to keep the calculated views returned from executing the "map" function on the view. Don't know yet what would be the best way to store the "reduce" part on the current backends (websql, idb).
Some extra metadata should be stored for each view in the database to keep track of the lastSequence processed for each view (this metadata would be created on demand as views are queried).
Some functions would have to be added to the adapters to allow for updating and querying views.
The "view-store" could have the following attributes:
The reason of having this "view-store" is to allow finer control and more options on the Query API and more performing implementation for bigger datasets.
What do you think in comparison to option A?
Why do you think its 'polluting' to use more than one internal database per pouchdb (we already do this) its also very similiar to how views are built in couchdb, one .view file per view.
I dont understand how putting all the views in a single database is expected to be more performant than having a seperate database either, and all the options to query pouchdb views are already implemented in PouchDB.
We already have a cross browser implementation of a k/v store that knows how to upgrade and is going to be well supported, it took a lot of code to write it in the first place so I think it would need a very very good reason to not use it for storing view data and have to duplicate a lot of that code from scratch. Of course the mapreduce stuff is all part of an optional plugin, so this doesnt need to be fixed at all, but I expect the default enabled one will use pouch for view data
I can see the benefit of reusing the adapter layer for executing view queries but I can see some issues too.
If you create a new internal database for each view to store the view results for speeding up queries.
How are you going to retrieve the documents if "include_docs" option is specified when querying?
Please correct me if I'm wrong, but you'll have to copy all the documents as well into the view thus duplicating a lot of documents for each view created.
Or if not, you won't be able to perform fast queries using joins (or similar depending on the adapter).
Also, this is cosmetic, how do you delete the view databases in websql if you want to tidy up a bit? I know you can delete the tables but don't know how to delete/drop the database programatically.
I'm trying to reimplement this query functionality (similar to mapreduce plugin) and currently have some early working version but is not using the adapter layer and only works for websql adapter so I'm trying to see what's best option.
My local device database has thousands of records and while current query implementation in mapreduce plugin performs perfectly fine on a desktop browser on a mobile device (where I need it) is slow so that's why I'm reimplementing part of it.
An an absolutely bizarre omission there is no way to delete databases in websql
Whatever you do with views you will need to do a key read when you use include_docs (unless you force include the doc every time), which table it does the key read from I dont think will have any effect on performance
What's the status on this? Keeping map-reduce view re-calculation efficient is a big deal for us because we'll be using complex views with lots of data on mobile clients.