#22 read_frame performance #24

tumb1er · 2014-05-22T13:15:30Z

No description provided.

Online github editor is unusable for editing python code...

chrisdev · 2014-05-22T13:37:44Z

👯 well done @tumb1er

BertrandBordage · 2014-05-24T01:16:04Z

Sorry @tumb1er, but I couldn't see a performance improvement. Did you measure it?

I made a simple test using IPython:

%timeit MyModel.objects.all()[:1000].to_dataframe()

Where MyModel is a real-life Model with 25 columns and about 3M rows.

Without your pull request I got about 32 ms.
With your pull request I got about 32 ms.

Besides, I have complex views using django-pandas and I didn't see a difference neither.

:\

BertrandBordage · 2014-05-24T15:27:12Z

In fact, this pull request has a negative impact on performance if you use johnny-cache or django-cacheops. These two ORM caching tools are automatically saving in cache the data fetched by the ORM.

With this pull request, the ORM builds the SQL query but it is executed by pandas. And therefore it disables these caching tools.
Using johnny cache, one of my views takes 150 ms without this pull request, and 220 ms with it…

tumb1er · 2014-05-26T06:29:56Z

Wel, django-cacheops is an excellent reason to close this issue :)
Cacheops monkey-patches QuerySet.iterator method for getting result from cache, so you have no place to put custom cursor parsing.
PS. Measure performance while fetching only 1000 rows? really?
In #22 visible speedup had been seen only for fetching 100K-1M rows.

BertrandBordage · 2014-05-26T08:28:12Z

I know pandas is made for Big Data, but I don't think anyone would allow django to retrieve > 100K rows as a view is meant to respond in less than 500 ms.

By the way, what you did with the ValueQuerySets is good! In my opinion, we can merge the changes in manager.py and test_manager.py as is.

tumb1er · 2014-05-26T08:35:59Z

It's not only about views, it's also about background report computations. So, 1M rows is OK :). By the way, you can add flag to_dataframe(use_raw_cursor=True).

BertrandBordage · 2014-05-26T09:03:14Z

Yes, that use_raw_cursor is an excellent idea :D

chrisdev · 2014-05-26T13:13:50Z

so @tumb1er @BertrandBordage is there scope for a refactored PR to include the ValueQuerySet and the raw cursor stuff?

BertrandBordage · 2014-05-26T15:52:48Z

I think that's a good idea!

tumb1er added 5 commits May 22, 2014 15:03

use pandas

41d4d24

fix read_frame redefinition chrisdev#22

75a4112

chrisdev#22 fix another redefinition

1a1e6d5

Online github editor is unusable for editing python code...

Update io.py

0687e99

chrisdev#22 another implementation for Dj==1.6

246b328

tumb1er closed this May 26, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

#22 read_frame performance #24

#22 read_frame performance #24

Uh oh!

tumb1er commented May 22, 2014

Uh oh!

chrisdev commented May 22, 2014

Uh oh!

BertrandBordage commented May 24, 2014

Uh oh!

BertrandBordage commented May 24, 2014

Uh oh!

tumb1er commented May 26, 2014

Uh oh!

BertrandBordage commented May 26, 2014

Uh oh!

tumb1er commented May 26, 2014

Uh oh!

BertrandBordage commented May 26, 2014

Uh oh!

chrisdev commented May 26, 2014

Uh oh!

BertrandBordage commented May 26, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

#22 read_frame performance #24

#22 read_frame performance #24

Uh oh!

Conversation

tumb1er commented May 22, 2014

Uh oh!

chrisdev commented May 22, 2014

Uh oh!

BertrandBordage commented May 24, 2014

Uh oh!

BertrandBordage commented May 24, 2014

Uh oh!

tumb1er commented May 26, 2014

Uh oh!

BertrandBordage commented May 26, 2014

Uh oh!

tumb1er commented May 26, 2014

Uh oh!

BertrandBordage commented May 26, 2014

Uh oh!

chrisdev commented May 26, 2014

Uh oh!

BertrandBordage commented May 26, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants