# Querying Multiple Data Formats 
In this notebook, we will cover: 
- How to create and then join BlazingSQL tables from CSV, Parquet, and GPU DataFrame (GDF) sources. 

## Imports

In [1]:
import os
import cudf
from blazingsql import BlazingContext

## Import packages and create BlazingContext
You can think of the BlazingContext much like a SparkContext; this is where information such as FileSystems you have registered and Tables you have created will be stored. 

In [2]:
# start up BlazingSQL
bc = BlazingContext()

BlazingContext ready


### Create Table from CSV
Here we create a BlazingSQL table directly from a comma-separated values (CSV) file.

In [3]:
# define column names and types
column_names = ['diagnosis_result', 'radius', 'texture', 'perimeter']
column_types = ['float32', 'float32', 'float32', 'float32']

# identify local directory path 
cwd = os.getcwd()
# add path to data
data_path = cwd + '/data/cancer_data_00.csv'

# create table from CSV file
bc.create_table('data_00', data_path, dtype=column_types, names=column_names)

<pyblazing.apiv2.context.BlazingTable at 0x7f2a31333da0>

### Create Table from Parquet
Here we create a BlazingSQL table directly from an Apache Parquet file.

In [4]:
# create table from Parquet file
bc.create_table('data_01', cwd + '/data/cancer_data_01.parquet')

<pyblazing.apiv2.context.BlazingTable at 0x7f2a300b7588>

### Create Table from GPU DataFrame
Here we use cuDF to create a GPU DataFrame (GDF), then use BlazingSQL to create a table from that GDF.

The GDF is the standard memory representation for the RAPIDS AI ecosystem.

In [5]:
# define column names and types
column_names = ['compactness', 'symmetry', 'fractal_dimension']
column_types = ['float32', 'float32', 'float32', 'float32']

# make GDF with cuDF (uses relative path)
gdf_02 = cudf.read_csv('data/cancer_data_02.csv', dtype=column_types, names=column_names)

# create BlazingSQL table from GDF
bc.create_table('data_02', gdf_02)

<pyblazing.apiv2.context.BlazingTable at 0x7f2a300b9080>

# Join Tables Together 

Now we can use BlazingSQL to join all three data formats in a single federated query. 

In [6]:
# grab everything from 00 & 02, area & smoothness from 01
query = '''
        SELECT 
            a.*, 
            b.area, b.smoothness, 
            c.* 
        FROM 
            data_00 AS a
        LEFT JOIN 
            data_01 AS b
            ON (a.perimeter = b.perimeter)
        LEFT JOIN 
            data_02 AS c
            ON (b.compactness = c.compactness)
        '''

# join the tables together (type(gdf)==cudf.core.dataframe.Dataframe)
gdf = bc.sql(query)

# display result
gdf

Unnamed: 0,diagnosis_result,radius,texture,perimeter,area,smoothness,compactness,symmetry,fractal_dimension
0,1.0,11.0,21.0,120.0,1033.0,0.115,0.149000004,0.209000006,0.063000001
1,0.0,17.0,21.0,86.0,563.0,0.082,0.059999999,0.178000003,0.056000002
2,1.0,19.0,26.0,94.0,578.0,0.113,0.229000002,0.207000002,0.077
3,1.0,19.0,11.0,122.0,1094.0,0.094,0.107000001,0.170000002,0.057
4,0.0,10.0,17.0,87.0,566.0,0.098,0.081,0.18900001,0.058000002
5,1.0,16.0,19.0,83.0,477.0,0.128,0.170000002,0.209000006,0.075999998
6,0.0,22.0,16.0,83.0,477.0,0.128,0.170000002,0.209000006,0.075999998
7,0.0,17.0,21.0,86.0,535.0,0.116,0.123000003,0.213000014,0.067999996
8,0.0,10.0,17.0,87.0,545.0,0.104,0.143999994,0.196999997,0.067999996
9,1.0,23.0,16.0,132.0,1123.0,0.097,0.246000007,0.24000001,0.078000002


# You're Ready to Rock
And... thats it! You are now live with BlazingSQL.

Check out our [docs](https://docs.blazingdb.com) to get fancy or to learn more about how BlazingSQL works with the rest of [RAPIDS AI](https://rapids.ai/).