# BlazingContext API

## Query Tables
[Docs](https://docs.blazingdb.com/docs/single-gpu) | [BlazingSQL Notebooks](https://app.blazingsql.com/jupyter/user-redirect/lab/workspaces/auto-b/tree/Welcome_to_BlazingSQL_Notebooks/docs/blazingsql.ipynb#Create-Tables)

In [None]:
from blazingsql import BlazingContext
bc = BlazingContext()

In [None]:
bc.create_table('taxi', '../../../data/sample_taxi.parquet')

Pull all rows and all columns.

In [None]:
bc.sql('SELECT * FROM taxi')

Determine the average number riders per trip by hour of the day.

In [None]:
avg_riders_by_hour = '''
                     select
                         avg(cast(passenger_count as float)) as avg_passenger_count,
                         hour(dropoff_ts) as hour_of_the_day
                     from (
                         select
                             passenger_count, 
                             cast(tpep_dropoff_datetime || '.0' as timestamp) dropoff_ts
                         from
                             taxi
                             )
                     group by
                         hour(dropoff_ts)
                     order by
                         hour(dropoff_ts)
                         '''
bc.sql(avg_riders_by_hour)

Convert results `.to_pandas()` for easy Matplotlib visualization.

In [None]:
bc.sql(avg_riders_by_hour).to_pandas().plot(x='hour_of_the_day', y='avg_passenger_count')

### Distributed Queries
[Docs](https://docs.blazingdb.com/docs/distributed) | [BlazingSQL Notebooks](https://app.blazingsql.com/jupyter/user-redirect/lab/workspaces/auto-b/tree/Welcome_to_BlazingSQL_Notebooks/docs/blazingsql.ipynb#Distributed-Queries)

BlazingSQL can easily distribute query execution across multiple GPUs or servers with Dask. You don't have to pass a list of IPs and ports to BSQL, whatever you configure with Dask will give your BlazingContext instance awareness of where all the GPUs or servers are. Check out blog_posts/[distributed_sql_with_dask.ipynb](../blog_posts/distributed_sql_with_dask.ipynb) or [Distributed SQL with Dask](https://blog.blazingdb.com/distributed-sql-with-dask-2979262acc8a?source=friends_link&sk=077319064cd7d9e18df8c0292eb5d33d) for more.

In [None]:
from dask_cuda import LocalCUDACluster
from dask.distributed import Client
cluster = LocalCUDACluster()
client = Client(cluster)

from blazingsql import BlazingContext
bc = BlazingContext(dask_client=client, network_interface='lo')

bc.s3('bsql', bucket_name='blazingsql-colab')
bc.create_table('taxi', 's3://bsql/yellow_taxi/taxi_data.parquet')

Distributed queries return a dask_cudf.DataFrame. Learn more with [The DataFrame introductory Notebook](../../../intro_notebooks/the_dataframe.ipynb#Dask-cuDF).

In [None]:
type(bc.sql('SELECT * FROM taxi'))

Pull all rows and all columns.

In [None]:
bc.sql('SELECT * FROM taxi').compute()

Determine the average number riders per trip by hour of the day.

In [None]:
avg_riders_by_hour = '''
                     select
                         avg(cast(passenger_count as float)) as avg_passenger_count,
                         hour(dropoff_ts) as hour_of_the_day
                     from (
                         select
                             passenger_count, 
                             cast(tpep_dropoff_datetime || '.0' as timestamp) dropoff_ts
                         from
                             taxi
                             )
                     group by
                         hour(dropoff_ts)
                     order by
                         hour(dropoff_ts)
                         '''
bc.sql(avg_riders_by_hour).compute()

`.compute()` then convert results `.to_pandas()` for easy Matplotlib visualization.

In [None]:
bc.sql(avg_riders_by_hour).compute().to_pandas().plot(x='hour_of_the_day', y='avg_passenger_count')

# BlazingSQL Docs
**[Table of Contents](../TABLE_OF_CONTENTS.ipynb) | [Issues (GitHub)](https://github.com/BlazingDB/Welcome_to_BlazingSQL_Notebooks/issues)**