Skip to content

Conversation

@tumb1er
Copy link

@tumb1er tumb1er commented May 22, 2014

No description provided.

@chrisdev
Copy link
Owner

👯 well done @tumb1er

@BertrandBordage
Copy link
Contributor

Sorry @tumb1er, but I couldn't see a performance improvement. Did you measure it?

I made a simple test using IPython:

%timeit MyModel.objects.all()[:1000].to_dataframe()

Where MyModel is a real-life Model with 25 columns and about 3M rows.

Without your pull request I got about 32 ms.
With your pull request I got about 32 ms.

Besides, I have complex views using django-pandas and I didn't see a difference neither.

:\

@BertrandBordage
Copy link
Contributor

In fact, this pull request has a negative impact on performance if you use johnny-cache or django-cacheops. These two ORM caching tools are automatically saving in cache the data fetched by the ORM.

With this pull request, the ORM builds the SQL query but it is executed by pandas. And therefore it disables these caching tools.
Using johnny cache, one of my views takes 150 ms without this pull request, and 220 ms with it…

@tumb1er
Copy link
Author

tumb1er commented May 26, 2014

Wel, django-cacheops is an excellent reason to close this issue :)
Cacheops monkey-patches QuerySet.iterator method for getting result from cache, so you have no place to put custom cursor parsing.
PS. Measure performance while fetching only 1000 rows? really?
In #22 visible speedup had been seen only for fetching 100K-1M rows.

@tumb1er tumb1er closed this May 26, 2014
@BertrandBordage
Copy link
Contributor

I know pandas is made for Big Data, but I don't think anyone would allow django to retrieve > 100K rows as a view is meant to respond in less than 500 ms.

By the way, what you did with the ValueQuerySets is good! In my opinion, we can merge the changes in manager.py and test_manager.py as is.

@tumb1er
Copy link
Author

tumb1er commented May 26, 2014

It's not only about views, it's also about background report computations. So, 1M rows is OK :). By the way, you can add flag to_dataframe(use_raw_cursor=True).

@BertrandBordage
Copy link
Contributor

Yes, that use_raw_cursor is an excellent idea :D

@chrisdev
Copy link
Owner

so @tumb1er @BertrandBordage is there scope for a refactored PR to include the ValueQuerySet and the raw cursor stuff?

@BertrandBordage
Copy link
Contributor

I think that's a good idea!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants