Facets #650

gedw99 · 2022-12-08T14:48:33Z

I did not look at the underlying aggregation logic , but this thing would be awesome if it also had facets .

So you can do cross referencing of a specific page against location of visitors , etc etc

i expect the vision is to keep this out of the feature set ? YAGNI o guess for most people ?

arp242 · 2022-12-10T11:05:07Z

i expect the vision is to keep this out of the feature set ? YAGNI o guess for most people ?

More or less, yes.

I'd like to at least make it possible to query this kind of data, eventually, but technically it's a bit difficult right now because it's just too slow. Arguably, SQL isn't the best fit for analytics data, but SQLite makes it really easy to self-host so that's why I decided to stick with SQL for now.

I've spent quite a bit of time last month on improving the database structure which should make things a lot faster, but haven't had the time to fully finish it (it's a bit painstaking work, because what works well for ~200M of data doesn't necessarily work well for 100G of data, and testing this sort of thing takes forever).

gedw99 · 2022-12-10T12:05:45Z

Hey @arp242

thanks for the response.

yeah i was almost certain that the speed thing would be the point of contention.

You can pre populate using materialised views at the db level.
so for each facet they get filling in based on the main data changing.

done it this way before.. It requires changed to the db such that when a record changes or is created and "process" happens that then populates he materialised view. SQLite can easily do this.

the materialised views don't have to be like in fancy db. they are just normal db records where all their data is derived from that "process".

you can also have two abs. one is the Materialised view db.
you can blow it away and replay the "events data" at any time to rebuild.
that's one way of thinking about it.

I am not expecting this to just be done. Just wanted to explain one way to do it if you ever wan to go that far.

https://github.com/cashapp/pranadb is a decent example.....

arp242 · 2022-12-10T14:18:46Z

That's pretty much how it works now, but it takes a lot of disk space for all possible combinations (dozens of GB per "facet") and it really degrades insert performance as well as it needs to update a lot of stuff.

gedw99 · 2022-12-11T14:09:09Z

Hmm I guess the facets are also stored in the db ? Hence why the slow down .

If it helps the example above has the facets not in the db. But still sql searchable .

So it’s a true CQRS . Writes are independent of reads basically

gedw99 · 2022-12-11T14:10:12Z

And yeah there is nothing you can do about disk space . It’s a lot . But disk is cheap these days. Ram is not

arp242 · 2022-12-14T17:57:45Z

And yeah there is nothing you can do about disk space . It’s a lot . But disk is cheap these days. Ram is not

Sure, but it's actually not that straight-forward; more disk space also means more blocks for the database server to cache in memory, and more blocks to read from the filesystem (i.e. slower performance). That is, it's not "just" disk space.

gedw99 · 2023-01-22T19:06:35Z

Yep no free lunch for disk or otherwise.

suggest CQRS then so each db stays fast.

use cdc to keep loose coupling

there are cdc drivers forvsqlite and postresql . All in golang.

use nats to forward the cdc events to the other facet system (s)

Use pranadb approach as the materialised facet server.

mall the above can be self hosted as a single binary but scaled out do cloud from a single binary .

gedw99 · 2023-10-16T13:57:41Z

I found a solution which should work and be pretty simple to run. zinc search which is a golang system.

it parses the sql db to produce the aggregation and hence facets.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Facets #650

Facets #650

gedw99 commented Dec 8, 2022

arp242 commented Dec 10, 2022

gedw99 commented Dec 10, 2022

arp242 commented Dec 10, 2022

gedw99 commented Dec 11, 2022

gedw99 commented Dec 11, 2022

arp242 commented Dec 14, 2022

gedw99 commented Jan 22, 2023

gedw99 commented Oct 16, 2023

Facets #650

Facets #650

Comments

gedw99 commented Dec 8, 2022

arp242 commented Dec 10, 2022

gedw99 commented Dec 10, 2022

arp242 commented Dec 10, 2022

gedw99 commented Dec 11, 2022

gedw99 commented Dec 11, 2022

arp242 commented Dec 14, 2022

gedw99 commented Jan 22, 2023

gedw99 commented Oct 16, 2023