Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

by function then count_values creates wrong result. #1202

Closed
ghost opened this issue Aug 11, 2015 · 1 comment
Closed

by function then count_values creates wrong result. #1202

ghost opened this issue Aug 11, 2015 · 1 comment
Labels
Milestone

Comments

@ghost
Copy link

ghost commented Aug 11, 2015

I'm using the lahman2013.sqlite file to learn blaze. I believe the following is producing the wrong result.

import blaze as bz
_db = bz.Data('sqlite:///data/lahman2013.sqlite')

_sy=bz.by(_db.Teams.name, start_year=_db.Teams.yearID.min())
print(str(bz.compute(_sy, post_compute=False)))

produces

SELECT "Teams".name, min("Teams"."yearID") AS start_year 
FROM "Teams" GROUP BY "Teams".name

which seems correct to me.

I want to now use this result (the first year a team was in the MLB) to see how many teams entered the MLB for each year. I believe the following should produce that..

print(list(_sy.start_year.count_values())

but it produces incorrect counts. When I look at the sql produced I get the following:

print(str(bz.compute(_sy.start_year.count_values(), post_compute=False)))

SELECT anon_1.start_year, anon_1.count 
FROM (SELECT alias_1.start_year AS start_year, count(alias_2.start_year) AS count 
FROM (SELECT min("Teams"."yearID") AS start_year 
FROM "Teams" GROUP BY "Teams".name) AS alias_1, (SELECT alias_1.start_year AS start_year 
FROM (SELECT min("Teams"."yearID") AS start_year 
FROM "Teams" GROUP BY "Teams".name) AS alias_1) AS alias_2 GROUP BY alias_1.start_year) AS anon_1 ORDER BY anon_1.count DESC

which is incorrect. It should produce something close to:

select start_year, count(*)
from
 (SELECT "Teams".name, min("Teams"."yearID") AS start_year 
FROM "Teams" GROUP BY "Teams".name)
group by start_year

I would love to take a stab at fixing this myself. What I would like is for people to verify that I'm doing this correctly..

Thanks!

Billy

@cpcloud
Copy link
Member

cpcloud commented Aug 11, 2015

@Earney Thanks for the report! I have some thoughts on this, but I'm going to be out for the rest of the night. I'll post them first thing tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants