Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

NewsItem creation is slow #4

Open
copelco opened this Issue Mar 7, 2012 · 2 comments

Comments

Projects
None yet
1 participant
Contributor

copelco commented Mar 7, 2012

The property transaction scraper imports ~2 records/sec on our laptops and <1 records/sec on AWS. I don't remember the previous scrapers being this slow. Can we speed it up?

@copelco copelco was assigned Mar 14, 2012

Contributor

copelco commented Mar 14, 2012

I looked into this briefly and believe the slowness is caused by a database-level trigger in PostgreSQL. The trigger is installed in a migration and performs a potentially intensive spatial query to see if the inserted/updated/deleted NewsItem geometry intersects any Location objects. Currently we only have zipcode and city locations and these alone greatly slow down NewsItem creation. If we add more (Townships, etc.) it would get slower.

I don't know if this is a major issue for us yet. I think this would be better implemented as a background task that's run after specific functions or on some set schedule. But if our datasets are small enough for each import, this may be a non-issue.

Contributor

copelco commented Mar 28, 2012

As a test, I disable the location_updater trigger and was able to create 30k NewsItems in about 10 minutes. Paul passed along a simplified trigger definition (https://gist.github.com/2121898), so we can try that. We could also background the task entirely and use Celery to schedule update tasks.

@copelco copelco removed their assignment Sep 23, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment