# 1. CQL to cuDF

## (a) via `pandas`

**Pros:**
- already implemented, convenient

**Cons:**
- makes request to C* DB, taking computation power
- "unnecessary" data transformations on client side

In [1]:
from cassandra.cluster import Cluster
from cassandra.auth import PlainTextAuthProvider

import pandas as pd
import pyarrow as pa
from blazingsql import BlazingContext
import cudf

import config

ModuleNotFoundError: No module named 'blazingsql'

In [None]:
# connect to the Cassandra server in the cloud and configure the session settings
cloud_config= {
        'secure_connect_bundle': '/home/ubuntu/secure-connect-clitest.zip'
}
auth_provider = PlainTextAuthProvider(config.username, config.password)
cluster = Cluster(cloud=cloud_config, auth_provider=auth_provider)
session = cluster.connect('clitest')
session.default_fetch_size = 1e7

In [None]:
def pandas_factory(colnames, rows):
    """Read the data returned by the driver into a pandas DataFrame"""
    return pd.DataFrame(rows, columns=colnames)
session.row_factory = pandas_factory

# run the CQL query and get the data
result_set = session.execute("select * from clitest.chipotle_stores limit 100;")
df = result_set._current_rows
df

In [None]:
# create a new dataframe with the same data, but stored on the GPU
gdf = cudf.DataFrame.from_pandas(df)
gdf["latitude"] = gdf["latitude"].astype("double")
gdf["longitude"] = gdf["longitude"].astype("double")
gdf.head(5)

In [None]:
# BlazingSQL helps us speed up SQL queries using the GPU
bc = BlazingContext(initial_pool_size=1.0486e+10)
bc.create_table("cql_table", gdf)
bc.describe_table("cql_table")
result = bc.sql("select * from cql_table")
result

## (b) via `pyarrow`

This approach is almost identical, but transforms the data from the driver
into a pyarrow Table (which lives on the GPU and can be easily transformed
into a cuDF DataFrame) rather than a pandas DataFrame.

**Pros:**
- cleans up some transformations on client side

**Cons:**
- still makes request to Cassandra server, taking computation power

In the following block of code, we replace the `pandas_factory` fn from the previous example, which returns a pandas DataFrame, with `row_factory`, which returns a pyarrow Table.

In [None]:
def getcol(col):
    rtn = pa.array(col)
    # This is a sample type conversion. For a full implementation,
    # you'd need to fully check which arrow types need to be manually casted for compatibility with cuDF
    if pa.types.is_decimal(rtn.type):
        return rtn.cast('float32')
    return rtn

def row_factory(colnames, rows):
    # is there a faster way to do this zip operation?
    # essentially we just need to convert from the row format passed by
    # CQL into the column format of arrow
    cols = [getcol(col) for col in zip(*rows)]
    table = pa.table({ colnames[i]: cols[i] for i in range(len(colnames)) })
    return table

session.row_factory = row_factory
result_set = session.execute("SELECT * FROM clitest.chipotle_stores LIMIT 100;")
df = result_set.current_rows
print(df)
gdf = cudf.DataFrame.from_arrow(df)

# then go up and run the above cells using gdf

In [2]:
from cassandra.cluster import Cluster

In [3]:
cluster = Cluster()

In [4]:
session = cluster.connect('baselines')

In [5]:
rows = session.execute('SELECT * FROM iot')

In [7]:
for row in rows:
    print(row)

cilisis neque, eget gravida erat dapibus nec. Suspendisse id pulvinar neque, dictum rhoncus velit. Aliquam erat volutpat. Nulla et ex', sensor_value=97.70635717926946, station_id=UUID('249d236b-e9a5-4070-9afa-8fae9060d959'))
Row(machine_id=UUID('787ccf18-2669-45dd-9fcc-c6f8e713b701'), sensor_name='population', time=datetime.datetime(1970, 1, 1, 0, 0, 4, 545000), data=' sem vel orci venenatis, a rutrum odio viverra. Nam vulputate nulla dui, tincidunt hendrerit augue euismod ut. Ut cursus aliquam diam, placerat molestie urna luctus vel. Donec at nisi nec velit lacinia dapibus vitae ac lorem. Nulla pretium eu lorem eget pretium. Donec vel malesuada dolor. Proin suscipit iaculis magna, eget vestibulum justo interdum sit amet. Donec hendrerit orci nec ex accumsan facilisis. Vivamus placerat elit nec sem fermentum dictum. Aenean vitae pellentesque tortor, in auctor risus. Aliquam at dui turpis. Vestibulum interdum bibendum fermentum. Etiam ac placerat dui.\nNam sodales egestas lectus suscipi

KeyboardInterrupt: 