Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect query output from group by query (regression in 0.9.0) #9399

Closed
1 task done
segasai opened this issue Oct 19, 2023 · 3 comments · Fixed by #9411
Closed
1 task done

Incorrect query output from group by query (regression in 0.9.0) #9399

segasai opened this issue Oct 19, 2023 · 3 comments · Fixed by #9411

Comments

@segasai
Copy link

segasai commented Oct 19, 2023

What happens?

Hi,
I have a query where I group by an integer column and I was expecting that I would only get unique values, however recent versions of duckdb (0.9.0+) return duplicates.

Query:

with a as (select label,                                                        
   avg(x) as xx,                                                                
   avg(y) as yy                                                                 
from tab group by label)                                                        
select * from a 

Full reproduction is given below.
The issue does not happen with duckdb 0.8.1

Thanks !

To Reproduce

Here is the code

It requires a data-file ( https://gist.github.com/segasai/36a73d6f3b140e513e1adfc5d05f2c83 )

import duckdb
import numpy as np
import pandas as pd

pdf = pd.read_csv('aa.csv')
conn1 = duckdb.connect(':memory:')
conn1.register('tab', pdf)
R = conn1.execute('''                                                       
with a as (select label,
   avg(x) as xx,
   avg(y) as yy
from tab group by label)                            
select * from a 
                           
''').fetchnumpy()
print(len(R['label']), len(np.unique(R['label'])))

The code should print the length of the result and the number of unique values for the column used as a key. They must be equal but they are not.

For clarity the underlying query is this

with a as (select label,
   avg(x) as xx,
   avg(y) as yy
from tab group by label)                            
select * from a 

OS:

Linux

DuckDB Version:

0.9.1

DuckDB Client:

Python

Full Name:

Sergey Koposov

Affiliation:

University of Edinburgh

Have you tried this on the latest main branch?

I have tested with a main build

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • Yes, I have
@szarnyasg
Copy link
Collaborator

Thanks, I could reproduce the issue

@lnkuiper
Copy link
Contributor

PR is up #9411

@segasai
Copy link
Author

segasai commented Oct 27, 2023

Thank you for the fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants