## Create BlazingSQL tables

This Notebook covers creating BlazingSQL tables from 

[Docs](https://docs.blazingdb.com/docs/text-files) | [Welcome notebook](../welcome.ipynb#BlazingContext-API)

In [1]:
from blazingsql import BlazingContext

bc = BlazingContext()

BlazingContext ready


BlazingSQL requires the full path to your data to create tables. The next cell will identify the path to the data directory for you.

In [2]:
import os
# tag path to data directory
data_dir = f"{os.getcwd().split('/intro_notebooks')[0]}/data"

#### Text Files

You can use BlazingSQL to SQL query plain text files (flat files), such as:

- CSV files (comma-separated values)
- TSV files (tab-separated values)
- PSV files (pipe-separated values)

BlazingSQL relies on cuIO when reading files, which means we can leverage numerous features, such as inferring column names through a header row and data types through a sampling method. 

[Docs](https://docs.blazingdb.com/docs/text-files)

In [3]:
# CSV
bc.create_table('iris_csv', f'{data_dir}/iris.csv', header=0)

<pyblazing.apiv2.context.BlazingTable at 0x7f1836ffb510>

#### JSON Files

[Docs](https://docs.blazingdb.com/docs/json)

In [None]:
# create table from JSON file
bc.create_table('iris_csv', f'{data_dir}/iris.csv')

#### Apache Parquet

[Docs](https://docs.blazingdb.com/docs/apache-parquet)

In [None]:
# create table from parquet file
bc.create_table('iris_csv', f'{data_dir}/iris.csv')

#### Apache ORC

[Docs](https://docs.blazingdb.com/docs/apache-orc)

In [None]:
# create table from ORC file
bc.create_table('iris_csv', f'{data_dir}/iris.csv')

#### Apache Hive

[Docs](https://docs.blazingdb.com/docs/apache-hive)

In [None]:
from pyhive import hive

# connect to Hive and obtain a cursor
cursor = hive.connect('your_hive_ip_address').cursor()

# give create_table the Hive cursor
# the table name must match the same table name as in Hive
bc.create_table("hive_table_name", cursor)

# query table (result = cuDF DataFrame)
result = bc.sql("select * from hive_table_name")

# create table from Hive table
bc.create_table('iris_csv', f'{data_dir}/iris.csv')

#### cuDF or pandas DataFrame

[Docs](https://docs.blazingdb.com/docs/gpu-dataframe-gdf)

In [4]:
import cudf

# create cuDF DataFrame
gdf = cudf.read_csv('../data/iris.csv')

# create table from cuDF DataFrame
bc.create_table('cudf_iris', gdf)

<pyblazing.apiv2.context.BlazingTable at 0x7f1836ffb850>

In [5]:
import pandas as pd

# create pandas DataFrame
df = pd.read_csv('../data/iris.csv')

# create table from pandas DataFrame
bc.create_table('pandas_iris', df)

<pyblazing.apiv2.context.BlazingTable at 0x7f18524a3c50>

#### Storage Plugins

We think you should let data rest wherever it likes. Don't worry about synching, directly query files wherever they reside.

With the BlazingSQL Filesystem API, you can register and connect to multiple storage solutions. 

- [AWS](https://docs.blazingdb.com/docs/s3) 
- [Google Storage](https://docs.blazingdb.com/docs/google-cloud-storage)
- [HDFS](https://docs.blazingdb.com/docs/hdfs)

Once a filesystem is registered you can reference the user-defined file path when creating a new table off of a file.
    
[Docs](https://docs.blazingdb.com/docs/connecting-data-sources) | [Intro notebook](storage_plugins.ipynb)

In [6]:
# register AWS S3 storage bucket 
bc.s3('bsql_data', bucket_name='blazingsql-colab')

# create table from S3 bucket
bc.create_table('orders', 's3://bsql_data/tpch_sf1/orders/0_0_0.parquet')

file s3://bsql_data/tpch_sf1/orders/0_0_0.parquet


<pyblazing.apiv2.context.BlazingTable at 0x7f182c462390>

## Query BlazingSQL tables

[Docs](https://docs.blazingdb.com/docs/single-gpu) | [Intro notebook](query_tables.ipynb)

In [7]:
bc.sql('SELECT * FROM orders')

Unnamed: 0,o_orderkey,o_custkey,o_orderstatus,o_totalprice,o_orderdate,o_orderpriority,o_clerk,o_shippriority,o_comment
0,1,36901,O,173665.468750,1996-01-02,5-LOW,Clerk#000000951,0,nstructions sleep furiously among
1,2,78002,O,46929.179688,1996-12-01,1-URGENT,Clerk#000000880,0,"foxes. pending accounts at the pending, silen..."
2,3,123314,F,193846.250000,1993-10-14,5-LOW,Clerk#000000955,0,sly final accounts boost. carefully regular id...
3,4,136777,O,32151.779297,1995-10-11,5-LOW,Clerk#000000124,0,"sits. slyly regular warthogs cajole. regular, ..."
4,5,44485,F,144659.203125,1994-07-30,5-LOW,Clerk#000000925,0,quickly. bold deposits sleep slyly. packages u...
...,...,...,...,...,...,...,...,...,...
1499995,5999972,143594,O,114856.679688,1996-05-02,3-MEDIUM,Clerk#000000536,0,y express accounts above the blithely bold
1499996,5999973,32071,O,68906.562500,1997-07-13,4-NOT SPECIFIED,Clerk#000000130,0,special ideas use pending pinto beans. reques...
1499997,5999974,55448,F,92750.898438,1993-07-28,3-MEDIUM,Clerk#000000776,0,fts. requests affix furiously a
1499998,5999975,113398,F,63216.648438,1993-07-25,1-URGENT,Clerk#000000813,0,oost! ironic instructions h


In [8]:
bc.sql('SELECT sepal_length, sepal_width, target FROM iris_csv WHERE target <> 1')

Unnamed: 0,sepal_length,sepal_width,target
0,5.1,3.5,0
1,4.9,3.0,0
2,4.7,3.2,0
3,4.6,3.1,0
4,5.0,3.6,0
...,...,...,...
95,6.7,3.0,2
96,6.3,2.5,2
97,6.5,3.0,2
98,6.2,3.4,2
