Atomic index #5

avigoldman · 2018-06-23T14:31:11Z

Fixes #1

Haroenv · 2018-06-23T18:06:59Z

cool, thanks! I think this should work, but the main reason I wanted atomic operations is to avoid doing a complete reindex on every build. You're right that this will be more correct (i.e. no downtime & no deleted data in the index), but it will still do as many operations as objects in the index.

It would be cool to save all the hashes of the objects in a second index, compare those to the hashes that should be pushed, lastly delete or update those hashes.

Does that make sense?

Since this is such a simple file you could already for now definitely use this solution in your own app, but just was wondering if you'd be interested in exploring further

avigoldman · 2018-06-23T19:20:27Z

Ah, yes. I think I follow now.

So just to outline the steps:

Pull the current index
Hash the new index and the pulled index
Find the diffs
Delete, update, or add the differences

Sound right?

Haroenv · 2018-06-23T21:18:55Z

Note that we have the hashes already since every node in gatsby has the hash in there somehow. Storing these in a second index would indeed be the preferred way I think.

So 1. Would be: “get the hashes” (each objectID needs to have a hash) from the hashes index. Then calculate hashtable for the “to push index”. Do diffs of the hashes and push/delete etc. This last step can probably happen directly in the prod index since batch operations are considered atomic by Algolia (in the order they arrive in the index).

This should be a good way of handling it.

Thanks again for picking this up (and sorry for not being able to clone/contribute for now, I’m only on my phone in the weekend)

coreyward · 2018-09-04T23:03:27Z

Weighing in here in hopes of getting some additional attention on this issue. I'm scoping out the stack for a project for a client now and, on account of this issue, I'll be using Lunr.js instead of Algolia. I believe this is the third separate projects I've had to do this on. Perhaps it doesn't seem like a big deal, but due to the way development works in an organization (many people running local instances, lots of restarts to pick up new data, test builds, etc) and the way Algolia prices per indexing op, this ends up crazy expensive.

For example: with 500 blog posts on a website under active development where Gatsby gets booted 20 times per day on average (fairly conservative), there end up 300,000 records in Algolia within 30 days, costing over $300 a month on the Essential plan. That continues to grow as Gatsby gets rebooted.

Comparatively, I can build this into a Lunr.js index and send it to the client compressed in about 200kb and have a reasonable search for free. I'd rather use Algolia for the additional features, but again, cost, which is really tracking back to this specific issue.

Hopefully Algolia can dedicate some resources to this issue or otherwise make it possible to use this library by the time my next client with search project begins.

Haroenv · 2018-09-04T23:26:16Z

Hey @coreyward, I'm aware that this is definitely something to work on, but since I'm working on lots of other things at the moment, I haven't yet had time to fit this in.

Note that this PR was already tested by @avigoldman and he said it worked, where I was looking for a solution that does even less operations. The plan I had in mind for this is:

add proper tests
merge this PR
make a real atomic indexing solution

Haroenv · 2018-09-05T20:53:54Z

@coreyward, are you using unique objectIDs or not?

Haroenv · 2018-09-05T21:35:58Z

Thanks @avigoldman and sorry for the delay here.

@coreyward you probably just need to be sure you use unique objects. If you update 1000 times per day, or once per day doesn't change how many records you have. However, this will cause as many operations as there are items on every build. This is a separate issue we can fix another time.

avigoldman added 3 commits June 23, 2018 10:30

atomic indexing

a98efb2

added .gitignore

627e54c

corrected descriptions

f19c332

Haroenv mentioned this pull request Jul 10, 2018

Time frame to exit beta? #6

Closed

Haroenv merged commit f19c332 into algolia:master Sep 5, 2018

Haroenv mentioned this pull request Sep 5, 2018

Reduce build operations #9

Closed

avigoldman deleted the atomic-index branch September 17, 2018 15:20

avigoldman restored the atomic-index branch September 17, 2018 15:20

avigoldman deleted the atomic-index branch September 17, 2018 15:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Atomic index #5

Atomic index #5

avigoldman commented Jun 23, 2018

Haroenv commented Jun 23, 2018

avigoldman commented Jun 23, 2018

Haroenv commented Jun 23, 2018

coreyward commented Sep 4, 2018 •

edited

Loading

Haroenv commented Sep 4, 2018

Haroenv commented Sep 5, 2018

Haroenv commented Sep 5, 2018

Atomic index #5

Atomic index #5

Conversation

avigoldman commented Jun 23, 2018

Haroenv commented Jun 23, 2018

avigoldman commented Jun 23, 2018

Haroenv commented Jun 23, 2018

coreyward commented Sep 4, 2018 • edited Loading

Haroenv commented Sep 4, 2018

Haroenv commented Sep 5, 2018

Haroenv commented Sep 5, 2018

coreyward commented Sep 4, 2018 •

edited

Loading