# 10 minutes to BlazingSQL

![BlazingSQL](https://raw.githubusercontent.com/gumdropsteve/bsql-demos/getting_started/imgs/bsql_main.png)

In this notebook, we will cover: 
- how to set up [`BlazingSQL`](https://blazingsql.com) and the [RAPIDS AI](https://rapids.ai/) suite 
- how to read and query csv files with `cuDF` and `BlazingSQL`


## Setup
### Environment Sanity Check 

- All RAPIDS packages (BlazingSQL included) require a T4 GPU instance to run in Google Colab
  - The cell below will let you know what type of GPU you've been allocated, and how to proceed 

In [16]:
# tag specs
colab_smi = !nvidia-smi
# focus GPU type
try:
  my_gpu = ' '.join(colab_smi[7].split()[2:4])
# not on gpu acceleration 
except:
  raise Exception("\nUnfortunately this instance does not have a T4 GPU.\n\n"
                  "Please make sure you've configured Colab to request a GPU instance type.\n\n"
                  "At top of Colab, try: Runtime -> Change runtime type -> Hardware accelerator -> GPU -> Save\n")

# not allocated T4
if my_gpu != b'Tesla T4':
  # allocated K80
  if my_gpu == 'Tesla K80':
    raise Exception("\nYou've been allocated a K80 instance\n\n"
                    "Unfortunately, this demo requires a T4 instance\n\n"
                    "At top of Colab, try: Runtime -> Reset all runtimes...\n")
  else:
    raise Exception("\nYou've achieved wizardy.\nPlease inform winston@blazingsql.com")

# allocated T4
else:
  print('Woo! You got the right kind of GPU!')

Exception: ignored

## Installs 

Below you will find three code blocks
1. installs miniconda
2. installs RAPIDS AI 
  - and sets up the system environment 
3. installs BlazingSQL

### Miniconda

In [0]:
# intall miniconda
!wget -c https://repo.continuum.io/miniconda/Miniconda3-4.5.4-Linux-x86_64.sh
!chmod +x Miniconda3-4.5.4-Linux-x86_64.sh
!bash ./Miniconda3-4.5.4-Linux-x86_64.sh -b -f -p /usr/local

### RAPIDS AI

In [0]:
# install RAPIDS packages
!conda install -q -y --prefix /usr/local -c nvidia -c rapidsai \
  -c numba -c conda-forge -c pytorch -c defaults \
  cudf=0.9 cuml=0.9 cugraph=0.9 python=3.6 cudatoolkit=10.0

# set environment vars
import sys, os, shutil
sys.path.append('/usr/local/lib/python3.6/site-packages/')
os.environ['NUMBAPRO_NVVM'] = '/usr/local/cuda/nvvm/lib64/libnvvm.so'
os.environ['NUMBAPRO_LIBDEVICE'] = '/usr/local/cuda/nvvm/libdevice/'

# copy .so files to current working dir
for fn in ['libcudf.so', 'librmm.so']:
  shutil.copy('/usr/local/lib/'+fn, os.getcwd())

### BlazingSQL

In [0]:
# Install BlazingSQL for CUDA 10.0
! conda install -q -y --prefix /usr/local -c conda-forge -c defaults -c nvidia -c rapidsai \
   -c blazingsql/label/cuda10.0 -c blazingsql \
   blazingsql-calcite blazingsql-orchestrator blazingsql-ral blazingsql-python

In [0]:
# NEEDS TESTING IN COLAB
!pip install flatbuffers

## Import packages and create Blazing Context
You can think of the BlazingContext much like a Spark Context 
- I.e. this is where information such as FileSystems you have registered and Tables you have created will be stored
  - If you have issues running this cell, restart runtime and try running it again


In [0]:
from blazingsql import BlazingContext
import cudf

bc = BlazingContext()

Already connected to the Orchestrator


## Read CSV
- First we uploaded a CSV file through the Google Colab interface 
  - Then we use cuDF to read the CSV file, which gives us a GPU DataFrame (GDF)
    - To learn more about the GDF and how it enables end to end workloads on rapids, check out our [blog post](https://blog.blazingdb.com/blazingsql-part-1-the-gpu-dataframe-gdf-and-cudf-in-rapids-ai-96ec15102240)

In [0]:
#Download the test CSV
!wget 'https://s3.amazonaws.com/blazingsql-colab/Music.csv'

--2019-08-15 23:32:28--  https://s3.amazonaws.com/blazingsql-colab/Music.csv
Resolving s3.amazonaws.com (s3.amazonaws.com)... 52.216.107.206
Connecting to s3.amazonaws.com (s3.amazonaws.com)|52.216.107.206|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 10473 (10K) [text/csv]
Saving to: ‘Music.csv’


2019-08-15 23:32:29 (155 MB/s) - ‘Music.csv’ saved [10473/10473]



In [0]:
# like pandas, cudf can simply read the csv
gdf = cudf.read_csv('Music.csv')

# let's see how it looks
gdf.head()

## Create a Table

Now we just need to create a table. Apache Parquet is a great, open source and distributed file format built for systems like HDFS
- Apache Parquet files also have metadata which self describes the schema, making import a cinch!

In [0]:
bc.create_table('music', gdf)

## Query a Table
That's it! Now you can write a SQL query and the data will get processed on the GPU with BlazingSQL, and the output will be a GPU DataFrame (GDF) inside RAPIDS!

In [0]:
#Query
result = bc.sql('SELECT * FROM main.music').get()

#Get GDF
result_gdf = result.columns

#Print GDF
print(result_gdf)

                                  ARTIST RATING  ...          LOCATION FESTIVAL_SET
0                            Arcade Fire   10.0  ...         Las Vegas          1.0
1                                Justice   10.0  ...         Las Vegas          1.0
2               Florence and The Machine   10.0  ...         Las Vegas          1.0
3                                 Odesza   10.0  ...             Indio          1.0
4                               Bon Iver   10.0  ...             Indio          1.0
5            LA Philharmonic + Sigur Ros   10.0  ...                LA          0.0
6                              Sigur Ros   10.0  ...             Malmo          0.0
7                            Arcade Fire   10.0  ...             Indio          1.0
8                                 Escort    9.0  ...     San Francisco          0.0
9                                Phoenix    9.0  ...          Berkeley          0.0
10                              Jamie XX    9.0  ...  Golden Gate Park      

# You're Ready to Rock
And... thats it! You are now live with BlazingSQL.
- Check out our [docs](https://docs.blazingdb.com) to get fancy 
  - as well as to learn more about how BlazingSQL works with the rest of [RAPIDS AI](https://rapids.ai/)