Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Facets #650

Open
gedw99 opened this issue Dec 8, 2022 · 8 comments
Open

Facets #650

gedw99 opened this issue Dec 8, 2022 · 8 comments

Comments

@gedw99
Copy link

gedw99 commented Dec 8, 2022

I did not look at the underlying aggregation logic , but this thing would be awesome if it also had facets .

So you can do cross referencing of a specific page against location of visitors , etc etc

i expect the vision is to keep this out of the feature set ? YAGNI o guess for most people ?

@arp242
Copy link
Owner

arp242 commented Dec 10, 2022

i expect the vision is to keep this out of the feature set ? YAGNI o guess for most people ?

More or less, yes.

I'd like to at least make it possible to query this kind of data, eventually, but technically it's a bit difficult right now because it's just too slow. Arguably, SQL isn't the best fit for analytics data, but SQLite makes it really easy to self-host so that's why I decided to stick with SQL for now.

I've spent quite a bit of time last month on improving the database structure which should make things a lot faster, but haven't had the time to fully finish it (it's a bit painstaking work, because what works well for ~200M of data doesn't necessarily work well for 100G of data, and testing this sort of thing takes forever).

@gedw99
Copy link
Author

gedw99 commented Dec 10, 2022

Hey @arp242

thanks for the response.

yeah i was almost certain that the speed thing would be the point of contention.

You can pre populate using materialised views at the db level.
so for each facet they get filling in based on the main data changing.

done it this way before.. It requires changed to the db such that when a record changes or is created and "process" happens that then populates he materialised view. SQLite can easily do this.

the materialised views don't have to be like in fancy db. they are just normal db records where all their data is derived from that "process".

you can also have two abs. one is the Materialised view db.
you can blow it away and replay the "events data" at any time to rebuild.
that's one way of thinking about it.

I am not expecting this to just be done. Just wanted to explain one way to do it if you ever wan to go that far.

https://github.com/cashapp/pranadb is a decent example.....

@arp242
Copy link
Owner

arp242 commented Dec 10, 2022

That's pretty much how it works now, but it takes a lot of disk space for all possible combinations (dozens of GB per "facet") and it really degrades insert performance as well as it needs to update a lot of stuff.

@gedw99
Copy link
Author

gedw99 commented Dec 11, 2022

Hmm I guess the facets are also stored in the db ? Hence why the slow down .

If it helps the example above has the facets not in the db. But still sql searchable .

So it’s a true CQRS . Writes are independent of reads basically

@gedw99
Copy link
Author

gedw99 commented Dec 11, 2022

And yeah there is nothing you can do about disk space . It’s a lot . But disk is cheap these days. Ram is not

@arp242
Copy link
Owner

arp242 commented Dec 14, 2022

And yeah there is nothing you can do about disk space . It’s a lot . But disk is cheap these days. Ram is not

Sure, but it's actually not that straight-forward; more disk space also means more blocks for the database server to cache in memory, and more blocks to read from the filesystem (i.e. slower performance). That is, it's not "just" disk space.

@gedw99
Copy link
Author

gedw99 commented Jan 22, 2023

Yep no free lunch for disk or otherwise.

suggest CQRS then so each db stays fast.

use cdc to keep loose coupling

there are cdc drivers forvsqlite and postresql . All in golang.

use nats to forward the cdc events to the other facet system (s)

Use pranadb approach as the materialised facet server.

mall the above can be self hosted as a single binary but scaled out do cloud from a single binary .

@gedw99
Copy link
Author

gedw99 commented Oct 16, 2023

I found a solution which should work and be pretty simple to run. zinc search which is a golang system.

it parses the sql db to produce the aggregation and hence facets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants