# Getting Started with BlazingSQL

In this notebook, we will cover how to read and query csv files with cuDF and BlazingSQL.

## Imports

In [1]:
import cudf
from blazingsql import BlazingContext

## Create BlazingContext
You can think of the BlazingContext much like a SparkContext; this is where information such as FileSystems you have registered and Tables you have created will be stored.

In [2]:
bc = BlazingContext()

BlazingContext ready


## Read CSV
First we need to download a CSV file. Then we use cuDF to read the CSV file, which gives us a GPU DataFrame (GDF). To learn more about the GDF and how it enables end to end workloads on rapids, check out our [blog post](https://blog.blazingdb.com/blazingsql-part-1-the-gpu-dataframe-gdf-and-cudf-in-rapids-ai-96ec15102240).

In [3]:
# cudf (gpu) dataframe from csv 
gdf = cudf.read_csv('data/Music.csv')

# let's see how it looks
gdf.head()

Unnamed: 0,ARTIST,RATING,YEAR,LOCATION,FESTIVAL_SET
0,Arcade Fire,10.0,2018.0,Las Vegas,1.0
1,Justice,10.0,2018.0,Las Vegas,1.0
2,Florence and The Machine,10.0,2018.0,Las Vegas,1.0
3,Odesza,10.0,2018.0,Indio,1.0
4,Bon Iver,10.0,2017.0,Indio,1.0


## Create a Table
Now we can easily create a table. 

In [4]:
bc.create_table('music', gdf)

<pyblazing.apiv2.context.BlazingTable at 0x7fbf904009e8>

## Query a Table
That's it! Now when you can write a SQL query the data will get processed on the GPU with BlazingSQL, and the output will be a GPU DataFrame (GDF) inside RAPIDS!

In [5]:
# query 10 events with a rating of at least 7
gdf = bc.sql('SELECT * FROM music where RATING >= 7 LIMIT 10')

# display GDF -- cuDF (GPU) DataFrame
gdf

Unnamed: 0,ARTIST,RATING,YEAR,LOCATION,FESTIVAL_SET
0,Arcade Fire,10.0,2018.0,Las Vegas,1.0
1,Justice,10.0,2018.0,Las Vegas,1.0
2,Florence and The Machine,10.0,2018.0,Las Vegas,1.0
3,Odesza,10.0,2018.0,Indio,1.0
4,Bon Iver,10.0,2017.0,Indio,1.0
5,LA Philharmonic + Sigur Ros,10.0,2017.0,LA,0.0
6,Sigur Ros,10.0,2014.0,Malmo,0.0
7,Arcade Fire,10.0,2014.0,Indio,1.0
8,Escort,9.0,2018.0,San Francisco,0.0
9,Phoenix,9.0,2018.0,Berkeley,0.0


In [6]:
# find instances either in San Francisco or rated 10
query = """
        select
            *
        from
            music
        where
            LOCATION = 'San Francisco'
            or RATING = 10
        """

# execute query (type(result)==cudf.core.dataframe.DataFrame)
result = bc.sql(query)

# flip to pandas & sample the results
result.to_pandas().sample(7)

Unnamed: 0,ARTIST,RATING,YEAR,LOCATION,FESTIVAL_SET
24,The Kooks,7.0,2014.0,San Francisco,1.0
30,James Vincent McMorrow,6.0,2017.0,San Francisco,1.0
55,Chromeo,4.0,2014.0,San Francisco,1.0
13,Griz,8.0,2016.0,San Francisco,1.0
44,First Aid Kit,5.0,2011.0,San Francisco,1.0
5,LA Philharmonic + Sigur Ros,10.0,2017.0,LA,0.0
46,RL Grime,5.0,2011.0,San Francisco,1.0


# You're Ready to Rock
And... thats it! You are now live with BlazingSQL.


Check out our [docs](https://docs.blazingdb.com) to get fancy or to learn more about how BlazingSQL works with the rest of [RAPIDS AI](https://rapids.ai/).