# Getting Started with BlazingSQL

In this notebook, we will cover: 
- How to set up [BlazingSQL](https://blazingsql.com) and the [RAPIDS AI](https://rapids.ai/) suite.
- How to read and query csv files with cuDF and BlazingSQL.
![Impression](https://www.google-analytics.com/collect?v=1&tid=UA-39814657-5&cid=555&t=event&ec=guides&ea=bsql-quick-start-guide&dt=bsql-quick-start-guide)

## Setup
### Environment Sanity Check 

RAPIDS packages (BlazingSQL included) require Pascal+ architecture to run. For Colab, this translates to a T4 GPU instance. 

The cell below will let you know what type of GPU you've been allocated, and how to proceed.

In [9]:
# tag specs
colab_smi = !nvidia-smi

# focus GPU type
try:
    my_gpu = ' '.join(colab_smi[7].split()[2:4])
# not on gpu acceleration 
except:
    raise Exception("\nPlease make sure you've configured Colab to request a GPU instance type.\n\n"
                    "At top of Colab, try: Runtime -> Change runtime type -> Hardware accelerator -> GPU -> Save\n")

# not allocated compatable GPU
if (my_gpu != b'Tesla T4') and (my_gpu != 'Tesla P100-PCIE...'):
    # allocated K80
    if my_gpu == 'Tesla K80':
        raise Exception("\nYou've been allocated a K80 instance\n\n"
                    "Unfortunately, this demo requires a T4 instance\n\n"
                    "At top of Colab, try: Runtime -> Reset all runtimes...\n")
    else:
        raise Exception(f"\nYou've achieved wizardy.\nyour GPU is {my_gpu}\nPlease inform info@blazingsql.com")

# allocated compatable GPU
else:
    print('Woo! You got the right kind of GPU!')

Woo! You got the right kind of GPU!


## Installs 

Below you will find three code blocks:
1. The first installs miniconda.
2. The second installs RAPIDS AI and sets up the system environment. 
3. The third installs BlazingSQL.

### Miniconda

In [None]:
# intall miniconda
!wget -c https://repo.continuum.io/miniconda/Miniconda3-4.5.4-Linux-x86_64.sh
!chmod +x Miniconda3-4.5.4-Linux-x86_64.sh
!bash ./Miniconda3-4.5.4-Linux-x86_64.sh -b -f -p /usr/local

### RAPIDS AI

In [None]:
# install RAPIDS packages
!conda install -q -y --prefix /usr/local -c nvidia -c rapidsai \
  -c numba -c conda-forge -c pytorch -c defaults \
  cudf=0.9 cuml=0.9 cugraph=0.9 python=3.6 cudatoolkit=10.0

# set environment vars
import sys, os, shutil
sys.path.append('/usr/local/lib/python3.6/site-packages/')
os.environ['NUMBAPRO_NVVM'] = '/usr/local/cuda/nvvm/lib64/libnvvm.so'
os.environ['NUMBAPRO_LIBDEVICE'] = '/usr/local/cuda/nvvm/libdevice/'

# copy .so files to current working dir
for fn in ['libcudf.so', 'librmm.so']:
    shutil.copy('/usr/local/lib/'+fn, os.getcwd())

### BlazingSQL

In [None]:
# Install BlazingSQL for CUDA 10.0
! conda install -q -y --prefix /usr/local -c conda-forge -c defaults -c nvidia -c rapidsai \
   -c blazingsql/label/cuda10.0 -c blazingsql \
   blazingsql-calcite blazingsql-orchestrator blazingsql-ral blazingsql-python

!pip install flatbuffers

## Import packages and create Blazing Context
You can think of the BlazingContext much like a Spark Context (i.e. where information such as FileSystems you have registered and Tables you have created will be stored). If you have issues running this cell, restart runtime and try running it again.


In [2]:
from blazingsql import BlazingContext
import cudf

bc = BlazingContext()

BlazingContext ready


## Read CSV
First we need to download a CSV file. Then we use cuDF to read the CSV file, which gives us a GPU DataFrame (GDF). To learn more about the GDF and how it enables end to end workloads on rapids, check out our [blog post](https://blog.blazingdb.com/blazingsql-part-1-the-gpu-dataframe-gdf-and-cudf-in-rapids-ai-96ec15102240).

In [3]:
#Download the test CSV
!wget 'https://s3.amazonaws.com/blazingsql-colab/Music.csv'

--2019-10-16 20:40:36--  https://s3.amazonaws.com/blazingsql-colab/Music.csv
Resolving s3.amazonaws.com (s3.amazonaws.com)... 52.216.113.141
Connecting to s3.amazonaws.com (s3.amazonaws.com)|52.216.113.141|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 10473 (10K) [text/csv]
Saving to: ‘Music.csv.1’


2019-10-16 20:40:36 (215 MB/s) - ‘Music.csv.1’ saved [10473/10473]



In [4]:
# like pandas, cudf can simply read the csv
gdf = cudf.read_csv('Music.csv')

# let's see how it looks
gdf.head()

Unnamed: 0,ARTIST,RATING,YEAR,LOCATION,FESTIVAL_SET
0,Arcade Fire,10.0,2018.0,Las Vegas,1.0
1,Justice,10.0,2018.0,Las Vegas,1.0
2,Florence and The Machine,10.0,2018.0,Las Vegas,1.0
3,Odesza,10.0,2018.0,Indio,1.0
4,Bon Iver,10.0,2017.0,Indio,1.0


## Create a Table
Now we just need to create a table. 

In [5]:
bc.create_table('music', gdf)

<pyblazing.apiv2.sql.Table at 0x7f91f27ddc88>

## Query a Table
That's it! Now when you can write a SQL query the data will get processed on the GPU with BlazingSQL, and the output will be a GPU DataFrame (GDF) inside RAPIDS!

In [6]:
# query 10 events with a rating of at least 7
result = bc.sql('SELECT * FROM main.music where RATING >= 7 LIMIT 10').get()

# get GDF
result_gdf = result.columns

# display GDF (just like pandas)
result_gdf

Unnamed: 0,ARTIST,RATING,YEAR,LOCATION,FESTIVAL_SET
0,Arcade Fire,10.0,2018.0,Las Vegas,1.0
1,Justice,10.0,2018.0,Las Vegas,1.0
2,Florence and The Machine,10.0,2018.0,Las Vegas,1.0
3,Odesza,10.0,2018.0,Indio,1.0
4,Bon Iver,10.0,2017.0,Indio,1.0
5,LA Philharmonic + Sigur Ros,10.0,2017.0,LA,0.0
6,Sigur Ros,10.0,2014.0,Malmo,0.0
7,Arcade Fire,10.0,2014.0,Indio,1.0
8,Escort,9.0,2018.0,San Francisco,0.0
9,Phoenix,9.0,2018.0,Berkeley,0.0


# You're Ready to Rock
And... thats it! You are now live with BlazingSQL.


Check out our [docs](https://docs.blazingdb.com) to get fancy or to learn more about how BlazingSQL works with the rest of [RAPIDS AI](https://rapids.ai/).