# Querying Multiple Data Formats 
In this notebook, we will cover: 
- How to create and then join BlazingSQL tables from CSV, Parquet, and GPU DataFrame (GDF) sources. 

## Import packages and create Blazing Context
You can think of the BlazingContext much like a Spark Context (i.e. where information such as FileSystems you have registered and Tables you have created will be stored). If you have issues running this cell, restart runtime and try running it again.

In [1]:
# Import RAPIDS AI stack
from blazingsql import BlazingContext
import cudf

bc = BlazingContext()

BlazingContext ready


### Create Table from CSV
Here we create a BlazingSQL table directly from a comma-separated values (CSV) file.

In [3]:
# define column names and types
column_names = ['diagnosis_result', 'radius', 'texture', 'perimeter']
column_types = ['float32', 'float32', 'float32', 'float32']

# identify local directory path (SList)
local_path = !pwd

# create table from CSV file
bc.create_table('data_00', local_path[0] + '/data/cancer_data_00.csv', 
                delimiter=',', dtype=column_types, names=column_names)

<pyblazing.apiv2.sql.Table at 0x7f405ae86828>

### Create Table from Parquet
Here we create a BlazingSQL table directly from an Apache Parquet file.

In [4]:
# create table from Parquet file
bc.create_table('data_01', local_path[0] + '/data/cancer_data_01.parquet')

<pyblazing.apiv2.sql.Table at 0x7f405ae86ef0>

### Create Table from GPU DataFrame
Here we use cuDF to create a GPU DataFrame (GDF), then use BlazingSQL to create a table from that GDF.

The GDF is the standard memory representation for the RAPIDS AI ecosystem.

In [5]:
# define column names and types
column_names = ['compactness', 'symmetry', 'fractal_dimension']
column_types = ['float32', 'float32', 'float32', 'float32']

# make GDF with cuDF (uses relative path)
gdf_02 = cudf.read_csv('data/cancer_data_02.csv',delimiter=',', dtype=column_types, names=column_names)

# create BlazingSQL table from GDF
bc.create_table('data_02', gdf_02)

<pyblazing.apiv2.sql.Table at 0x7f405afe8be0>

# Join Tables Together 

Now we can use BlazingSQL to join all three data formats in a single federated query. 

In [6]:
# define a query
query = '''
        SELECT 
            a.*, b.area, b.smoothness, c.* from main.data_00 as a
            LEFT JOIN data_01 as b
                ON (a.perimeter = b.perimeter)
            LEFT JOIN data_02 as c
                ON (b.compactness = c.compactness)
        '''

# join the tables together
join = bc.sql(query).get()

# extract dataframe
result = join.columns

# display results
result

Unnamed: 0,diagnosis_result,radius,texture,perimeter,area,smoothness,compactness,symmetry,fractal_dimension
0,1.0,19.0,27.0,72.0,394.0,0.081,0.046999998,0.15200001,0.057
1,1.0,18.0,25.0,97.0,668.0,0.117,0.148000002,0.194999993,0.067000002
2,1.0,23.0,12.0,151.0,954.0,0.143,0.277999997,0.242000014,0.079000004
3,0.0,9.0,13.0,133.0,1326.0,0.143,0.079000004,0.181000009,0.057
4,1.0,21.0,27.0,130.0,1203.0,0.125,0.159999996,0.207000002,0.059999999
5,1.0,14.0,16.0,78.0,386.0,0.070,0.284000009,0.25999999,0.097000003
6,1.0,9.0,19.0,135.0,1297.0,0.141,0.133000001,0.181000009,0.059
7,0.0,25.0,25.0,83.0,477.0,0.128,0.170000002,0.209000006,0.075999998
8,1.0,19.0,24.0,88.0,520.0,0.127,0.193000004,0.234999999,0.074000001
9,1.0,24.0,21.0,103.0,798.0,0.082,0.067000002,0.153000012,0.057


# You're Ready to Rock
And... thats it! You are now live with BlazingSQL.

Check out our [docs](https://docs.blazingdb.com) to get fancy or to learn more about how BlazingSQL works with the rest of [RAPIDS AI](https://rapids.ai/).