When two result columns have the same name, one is dropped #280

Closed
davidcrawford opened this Issue Feb 22, 2013 · 11 comments

Comments

4 participants
@davidcrawford

This most commonly happens when issuing queries with functions. If you don't explicitly alias the result, the column is named after the function. For instance:

select origin, avg(delay), avg(distance) 
from flights
group by 1
limit 10;

 origin |       avg        |       avg        
--------+------------------+------------------
 SAN    | 10.3516289991195 | 475.812298209569
 SLC    | 6.66094166419899 | 579.881551673083
 SJC    | 11.6489714664897 | 394.197909754479
 BWI    | 8.25543071161049 | 615.350280898876
 ELP    | 6.72572101236021 |  496.80665097116
 HOU    | 7.62836264353971 |  413.10896972617
 BNA    | 5.42857142857143 | 749.543639921722
 MSY    | 7.46954113924051 | 506.503362341772
 SEA    | 5.51911027568922 | 665.635964912281
 IAH    | 4.94343065693431 |              217
(10 rows)

As far as I can tell, node-postgres only returns rows as objects (not arrays, e.g.). So the second 'avg' overwrites the first one in the object.

That's a great default, but since I'm using node-postgres to build a web client for sql analytic queries, I have to expect my users to write queries like this.

I'd like to propose an addition to the API of an option to emit rows as arrays (or maybe even just a separate event like 'row-array'). I'd love your feedback on whether this is something you'd be interested in pulling into the library, and how you'd recommend doing it.

One consideration is that if you send back an array, you don't get column names. I would propose a 'columns' event that fires before any row events and sends the column information for rows to come. This would be one way of addressing issue #209.

I'm happy to help out on this, though I have no familiarity with the V8 engine so it would be great to get pointers to good docs.

@brianc

This comment has been minimized.

Show comment
Hide comment
@brianc

brianc Feb 23, 2013

Owner

This is pretty easy to do with the JavaScript API, but not very easy with the native bindings.

The JavaScript API uses a Connection instance within the Client. The Connection is responsible for parsing up PostgreSQL socket data and emitting it as events. The events it emits are parsed into JSON structures but otherwise the same exact data sent from the server. One of those events is RowDescription which happens to fire right before any rows come back from the server. You can manually grab this and the DataRow events and do your own rolling up of the data. Example:

pg.connect(function(err, client, done) {
  client.connection.on('rowDescription', function(msg) {
    //figure out some way to make sense of two columns with the same name
  });
  client.connection.on('dataRow', function(msg) {
    //map the data rows to result structures any way you see fit
  });
});

The code above is similar but slightly simplified to how node-postgres itself handles the messages.

Here's how node-postgres parses the row information:

https://github.com/brianc/node-postgres/blob/master/lib/query.js#L58

Owner

brianc commented Feb 23, 2013

This is pretty easy to do with the JavaScript API, but not very easy with the native bindings.

The JavaScript API uses a Connection instance within the Client. The Connection is responsible for parsing up PostgreSQL socket data and emitting it as events. The events it emits are parsed into JSON structures but otherwise the same exact data sent from the server. One of those events is RowDescription which happens to fire right before any rows come back from the server. You can manually grab this and the DataRow events and do your own rolling up of the data. Example:

pg.connect(function(err, client, done) {
  client.connection.on('rowDescription', function(msg) {
    //figure out some way to make sense of two columns with the same name
  });
  client.connection.on('dataRow', function(msg) {
    //map the data rows to result structures any way you see fit
  });
});

The code above is similar but slightly simplified to how node-postgres itself handles the messages.

Here's how node-postgres parses the row information:

https://github.com/brianc/node-postgres/blob/master/lib/query.js#L58

@davidcrawford

This comment has been minimized.

Show comment
Hide comment
@davidcrawford

davidcrawford Feb 23, 2013

Awesome, thanks for the quick feedback! I will give the JavaScript API a try, but would eventually be interested in adding this capability to the native bindings. Is that something you'd want to incorporate?

Awesome, thanks for the quick feedback! I will give the JavaScript API a try, but would eventually be interested in adding this capability to the native bindings. Is that something you'd want to incorporate?

@davidcrawford

This comment has been minimized.

Show comment
Hide comment
@davidcrawford

davidcrawford Feb 23, 2013

Also, how do you link the rowDescription and dataRow events to a particular query? Do you just ensure only one query runs on the connection at a time?

Also, how do you link the rowDescription and dataRow events to a particular query? Do you just ensure only one query runs on the connection at a time?

@brianc

This comment has been minimized.

Show comment
Hide comment
@brianc

brianc Feb 23, 2013

Owner

It can definitely be added to both. Having thought about this a bit more here's what we currently have when we run a query:

var result = {
  rows:[{ colName: 'val1', colName2: 'val2' }]
}

We could modify it to be this:

var result = {
   rows: [...]
   values: ['val1', 'val2']
}

The only thing I don't like about this exactly is the performance implications of creating an array in every result set. Maybe it could be opt-ip:

pg.connect(function(err, client, done) {
  var query = client.query('SELECT NOW()', function(err, res) {
    console.log(res.values); //['2013-02-23 11:11:11T'];
  });
  query.includeValues = true;
});

Thoughts?

Owner

brianc commented Feb 23, 2013

It can definitely be added to both. Having thought about this a bit more here's what we currently have when we run a query:

var result = {
  rows:[{ colName: 'val1', colName2: 'val2' }]
}

We could modify it to be this:

var result = {
   rows: [...]
   values: ['val1', 'val2']
}

The only thing I don't like about this exactly is the performance implications of creating an array in every result set. Maybe it could be opt-ip:

pg.connect(function(err, client, done) {
  var query = client.query('SELECT NOW()', function(err, res) {
    console.log(res.values); //['2013-02-23 11:11:11T'];
  });
  query.includeValues = true;
});

Thoughts?

@davidcrawford

This comment has been minimized.

Show comment
Hide comment
@davidcrawford

davidcrawford Feb 23, 2013

It seems like you want to either get the object or the row, but not both. Don't want to create two representations of a large result set if you'll only use one. If you could get just values, you save the memory footprint of storing a copy of the column names for every row.

Python's psycopg2 gives you arrays by default but lets you pass in a cursor_factory when creating the cursor (analogous to query here, I suppose).

http://www.initd.org/psycopg/docs/extras.html

That seems like a pretty reasonable way to do it.

It seems like you want to either get the object or the row, but not both. Don't want to create two representations of a large result set if you'll only use one. If you could get just values, you save the memory footprint of storing a copy of the column names for every row.

Python's psycopg2 gives you arrays by default but lets you pass in a cursor_factory when creating the cursor (analogous to query here, I suppose).

http://www.initd.org/psycopg/docs/extras.html

That seems like a pretty reasonable way to do it.

@brianc

This comment has been minimized.

Show comment
Hide comment
@brianc

brianc Feb 23, 2013

Owner

Good point about not needing both the object & array type for the same query. Something like a cursor factory would work. Even if it was just as simple as the Query object using a function like "buildResults(rowData)" or something which could be overridden.

Owner

brianc commented Feb 23, 2013

Good point about not needing both the object & array type for the same query. Something like a cursor factory would work. Even if it was just as simple as the Query object using a function like "buildResults(rowData)" or something which could be overridden.

@brianc

This comment has been minimized.

Show comment
Hide comment
@brianc

brianc Feb 23, 2013

Owner

To answer your earlier question: rowDescription and dataRow events are tied to a particular query because each Client instance queues and executes queries one at a time. PostgreSQL backend cannot support more than 1 query at a time per connection.

Owner

brianc commented Feb 23, 2013

To answer your earlier question: rowDescription and dataRow events are tied to a particular query because each Client instance queues and executes queries one at a time. PostgreSQL backend cannot support more than 1 query at a time per connection.

@davidcrawford

This comment has been minimized.

Show comment
Hide comment
@davidcrawford

davidcrawford Feb 23, 2013

Okay, that's what I figured. Thanks for the help.

On Sat, Feb 23, 2013 at 3:17 PM, Brian C notifications@github.com wrote:

To answer your earlier question: rowDescription and dataRow events are
tied to a particular query because each Client instance queues and
executes queries one at a time. PostgreSQL backend cannot support more than
1 query at a time per connection.


Reply to this email directly or view it on GitHubhttps://github.com/brianc/node-postgres/issues/280#issuecomment-14000102.

Okay, that's what I figured. Thanks for the help.

On Sat, Feb 23, 2013 at 3:17 PM, Brian C notifications@github.com wrote:

To answer your earlier question: rowDescription and dataRow events are
tied to a particular query because each Client instance queues and
executes queries one at a time. PostgreSQL backend cannot support more than
1 query at a time per connection.


Reply to this email directly or view it on GitHubhttps://github.com/brianc/node-postgres/issues/280#issuecomment-14000102.

@rpedela

This comment has been minimized.

Show comment
Hide comment
@rpedela

rpedela Jul 19, 2013

Contributor

Why not use an alias?

SELECT origin, AVG(delay) AS avg_delay, AVG(distance) AS avg_distance
FROM flights
GROUP BY 1
LIMIT 10;
Contributor

rpedela commented Jul 19, 2013

Why not use an alias?

SELECT origin, AVG(delay) AS avg_delay, AVG(distance) AS avg_distance
FROM flights
GROUP BY 1
LIMIT 10;
@brianc

This comment has been minimized.

Show comment
Hide comment
@brianc

brianc Jul 19, 2013

Owner

Yea definitely. I think @davidcrawford's original problem was he wasn't in control of the query so he couldn't add an alias to them. The good news is this has been closed by a pull request!

#393

Owner

brianc commented Jul 19, 2013

Yea definitely. I think @davidcrawford's original problem was he wasn't in control of the query so he couldn't add an alias to them. The good news is this has been closed by a pull request!

#393

@brianc brianc closed this Jul 19, 2013

@olsonpm

This comment has been minimized.

Show comment
Hide comment
@olsonpm

olsonpm Jan 11, 2015

Thanks much for this solution - but just wondering, is it really a limitation of the server that they don't return table information along with the result?

e.g. table1.duplicateColumnName vs table2.duplicateColumnName

It just seems odd that although you don't lose data, you still don't know which table the field is from.

olsonpm commented Jan 11, 2015

Thanks much for this solution - but just wondering, is it really a limitation of the server that they don't return table information along with the result?

e.g. table1.duplicateColumnName vs table2.duplicateColumnName

It just seems odd that although you don't lose data, you still don't know which table the field is from.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment