-
-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Facets #650
Comments
More or less, yes. I'd like to at least make it possible to query this kind of data, eventually, but technically it's a bit difficult right now because it's just too slow. Arguably, SQL isn't the best fit for analytics data, but SQLite makes it really easy to self-host so that's why I decided to stick with SQL for now. I've spent quite a bit of time last month on improving the database structure which should make things a lot faster, but haven't had the time to fully finish it (it's a bit painstaking work, because what works well for ~200M of data doesn't necessarily work well for 100G of data, and testing this sort of thing takes forever). |
Hey @arp242 thanks for the response. yeah i was almost certain that the speed thing would be the point of contention. You can pre populate using materialised views at the db level. done it this way before.. It requires changed to the db such that when a record changes or is created and "process" happens that then populates he materialised view. SQLite can easily do this. the materialised views don't have to be like in fancy db. they are just normal db records where all their data is derived from that "process". you can also have two abs. one is the Materialised view db. I am not expecting this to just be done. Just wanted to explain one way to do it if you ever wan to go that far. https://github.com/cashapp/pranadb is a decent example..... |
That's pretty much how it works now, but it takes a lot of disk space for all possible combinations (dozens of GB per "facet") and it really degrades insert performance as well as it needs to update a lot of stuff. |
Hmm I guess the facets are also stored in the db ? Hence why the slow down . If it helps the example above has the facets not in the db. But still sql searchable . So it’s a true CQRS . Writes are independent of reads basically |
And yeah there is nothing you can do about disk space . It’s a lot . But disk is cheap these days. Ram is not |
Sure, but it's actually not that straight-forward; more disk space also means more blocks for the database server to cache in memory, and more blocks to read from the filesystem (i.e. slower performance). That is, it's not "just" disk space. |
Yep no free lunch for disk or otherwise. suggest CQRS then so each db stays fast. use cdc to keep loose coupling there are cdc drivers forvsqlite and postresql . All in golang. use nats to forward the cdc events to the other facet system (s) Use pranadb approach as the materialised facet server. mall the above can be self hosted as a single binary but scaled out do cloud from a single binary . |
I found a solution which should work and be pretty simple to run. zinc search which is a golang system. it parses the sql db to produce the aggregation and hence facets. |
I did not look at the underlying aggregation logic , but this thing would be awesome if it also had facets .
So you can do cross referencing of a specific page against location of visitors , etc etc
i expect the vision is to keep this out of the feature set ? YAGNI o guess for most people ?
The text was updated successfully, but these errors were encountered: