Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Populating index #71

Closed
caseydm opened this issue Dec 10, 2017 · 5 comments
Closed

Populating index #71

caseydm opened this issue Dec 10, 2017 · 5 comments

Comments

@caseydm
Copy link

caseydm commented Dec 10, 2017

I'm populating an index with around 12 million records. For some reason, the process stopped at around 9 million records. I'm curious, what happens when I run python manage.py search_index --populate --models employee.Employee. Will it skip records that already exist in the index? Will it be a faster process to get to those remaining 3 million records? Or will it take the same amount of time since it needs to iterate through every record?

@sabricot
Copy link
Member

sabricot commented Dec 12, 2017

Yes it will iterate though every records, this could be a good thing to add a way to filter/order via the command for this kind of problem. There is a PR (#58) but i didn't really look into it yet.

@sabricot sabricot reopened this Dec 12, 2017
@caseydm
Copy link
Author

caseydm commented Dec 14, 2017

Ok thanks. I am having trouble indexing all of my 12 million records. It seems to stop around 9.5 million. I wonder if using queryset has anything to do with it? Should I write my own bulk update function or is it already doing that in the background?

@caseydm
Copy link
Author

caseydm commented Dec 16, 2017

Hi all. As an update, I realized the reason the index was stopping short was due to a detached Heroku dyno timing out after 24 hours. I increased the pagination size and was able to complete the job within 24 hours.

I am really interested in a way to limit the index job size. I have someone adding around 8,000 records per week, so if I could just set a job to run each night and index the last 10,000 records that would be perfect. Anybody think of a way to easily implement that?

@cocoakekeyu
Copy link

Is that same as #118?

@sabricot
Copy link
Member

sabricot commented Nov 7, 2018

Yes, close in favor of #118

@sabricot sabricot closed this as completed Nov 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants