Permalink
Fetching contributors…
Cannot retrieve contributors at this time
137 lines (122 sloc) 11.6 KB

SQL to Blaze

This page maps SQL expressions to blaze expressions.

Note

The following SQL expressions are somewhat specific to PostgreSQL, but blaze itself works with any database for which a SQLAlchemy dialect exists.

Prerequisites

If you're interested in testing these against a PostgreSQL database, make sure you've executed the following code in psql session:

CREATE TABLE df (
    id BIGINT,
    amount DOUBLE PRECISION,
    name TEXT
);

On the blaze side of things, the table below assumes the following code has been executed:

>>> from blaze import symbol, by, join, concat
>>> df = symbol('df', 'var * {id: int64, amount: float64, name: string}')

Note

Certain SQL constructs such as window functions don't directly correspond to a particular Blaze expression. Map expressions are the closest representation of window functions in Blaze.

Computation SQL Blaze
Column Arithmetic
select amount * 2 from df
df.amount * 2
Multiple Columns
select id, amount from df
df[['id', 'amount']]
Selection
select * from df where amount > 300
df[df.amount > 300]
Group By
select avg(amount) from df group by name
by(df.name, amount=df.amount.mean())
select avg(amount) from df group by name, id
by(merge(df.name, df.id),
   amount=df.amount.mean())
Join
select * from
    df inner join df2
on df.name = df2.name
join(df, df2, 'name')
Map
select amount + 1 over () from df
df.amount.map(lambda x: x + 1,
              'int64')
Relabel Columns
select
    id,
    name as alias,
    amount as dollars
 from df
df.relabel(name='alias',
           amount='dollars')
Drop duplicates
select distinct * from df
df.distinct()
select distinct(name) from df
df.name.distinct()
/* postgresql only */
select distinct on (name) * from
df order by name
 
Reductions
select avg(amount) from df
df.amount.mean()
select amount, count(amount)
from df group by amount
df.amount.count_values()
Concatenate
select * from df
union all
select * from df
concat(df, df)
Column Type Information
select
    column_name,
    data_type,
    character_maximum_length
from
    information_schema.columns
where
    table_name = 'df'
df.dshape
select
    column_name,
    data_type,
    character_maximum_length
from
    information_schema.columns
where
    table_name = 'df'
        and
    column_name = 'amount'
df.amount.dshape