# 3. Cassandra to Arrow

We use some code from the Cassandra server to read the SSTable, but instead of de/serializing to/from CQL, we use an [Arrow IPC stream](http://arrow.apache.org/), which is stored in a columnar format and better suited for analytics.

Data transformations:

1. SSTable on disk
2. Deserialized into Java Object in C* server
3. Client makes request to server (not to C* DB)
4. Data serialized via Arrow IPC stream
5. Sent across network
6. Arrow IPC stream received by client
7. Transformed into Arrow Table / cuDF

**Pros:**
- doesn't make request to the main Cassandra DB, which lessens the load and allows for other operations to run
- less de/serialization involved using the Arrow IPC stream

**Cons:**
- don't want to have to start Cassandra or use the JVM
- complex architecture

In [1]:
import pyarrow as pa
import pandas as pd
import socket

HOST = '127.0.0.1'
PORT = 9143

In [2]:
# read data from socket
def fetch_data():
    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
        s.connect((HOST, PORT))
        s.sendall(b'hello world\n')
        data = b''
        while True:
            newdata = s.recv(1024)
            if not newdata:
                break
            data += newdata
    return data

In [3]:
buffer = fetch_data()
reader = pa.ipc.open_stream(buffer)
arrow_table = reader.read_all()
arrow_table.to_pandas() # for visualization

Unnamed: 0,partition key,liveness_info_tstamp,clustering key,data,sensor_value,station_id
0,{'org.apache.cassandra.db.marshal.UUIDType': b...,1970-01-01 00:00:00.042,1970-01-01 00:00:00.042,ollis quis risus sit amet venenatis. Suspendis...,94.353640,b'(\xdfc\xb7\xccWC\xcb\x97R\xfa\xe6\x9d\x16S\xda'
1,{'org.apache.cassandra.db.marshal.UUIDType': b...,1970-01-01 00:00:00.002,1970-01-01 00:00:00.002,"ue sapien et, fermentum neque. Pellentesque mo...",95.759791,b'(\xdfc\xb7\xccWC\xcb\x97R\xfa\xe6\x9d\x16S\xda'
2,{'org.apache.cassandra.db.marshal.UUIDType': b...,1970-01-01 00:00:00.041,1970-01-01 00:00:00.041,"Ut suscipit sem vel orci venenatis, a rutrum ...",86.207276,b'(\xdfc\xb7\xccWC\xcb\x97R\xfa\xe6\x9d\x16S\xda'
3,{'org.apache.cassandra.db.marshal.UUIDType': b...,1970-01-01 00:00:00.025,1970-01-01 00:00:00.025,"tortor hendrerit, nec ultricies dui vestibulum...",97.209750,b'(\xdfc\xb7\xccWC\xcb\x97R\xfa\xe6\x9d\x16S\xda'
4,{'org.apache.cassandra.db.marshal.UUIDType': b...,1970-01-01 00:00:00.009,1970-01-01 00:00:00.009,ctus mauris nec urna. Duis sit amet enim trist...,106.497951,b'(\xdfc\xb7\xccWC\xcb\x97R\xfa\xe6\x9d\x16S\xda'
...,...,...,...,...,...,...
4995,{'org.apache.cassandra.db.marshal.UUIDType': b...,1970-01-01 00:00:00.018,1970-01-01 00:00:00.018,"mi non consectetur pretium, diam augue maximus...",107.025525,b'(\xdfc\xb7\xccWC\xcb\x97R\xfa\xe6\x9d\x16S\xda'
4996,{'org.apache.cassandra.db.marshal.UUIDType': b...,1970-01-01 00:00:00.000,1970-01-01 00:00:00.000,et odio a dolor placerat bibendum. Praesent au...,106.780531,b'(\xdfc\xb7\xccWC\xcb\x97R\xfa\xe6\x9d\x16S\xda'
4997,{'org.apache.cassandra.db.marshal.UUIDType': b...,1970-01-01 00:00:00.031,1970-01-01 00:00:00.031,". Mauris vestibulum leo eu nunc commodo, at te...",100.584490,b'(\xdfc\xb7\xccWC\xcb\x97R\xfa\xe6\x9d\x16S\xda'
4998,{'org.apache.cassandra.db.marshal.UUIDType': b...,1970-01-01 00:00:00.014,1970-01-01 00:00:00.014,"o, sollicitudin eget iaculis vitae, dapibus eu...",99.889286,b'(\xdfc\xb7\xccWC\xcb\x97R\xfa\xe6\x9d\x16S\xda'


In [5]:
!ls ../cpp/build

CMakeCache.txt          deletion_time.h         sstable_statistics.cpp
[1m[36mCMakeFiles[m[m              results.json            sstable_statistics.h
CPackConfig.cmake       rules.ninja             sstable_summary.cpp
CPackSourceConfig.cmake sstable_data.cpp        sstable_summary.h
build.ninja             sstable_data.h          [31msstable_to_arrow[m[m
cmake_install.cmake     sstable_index.cpp       table.parquet
deletion_time.cpp       sstable_index.h


In [6]:
parquet_table = pd.read_parquet("../cpp/build/table.parquet")

In [7]:
parquet_table

Unnamed: 0,_timestamp,partition key,clustering key,data,sensor_value,station_id
0,1924-04-12,{'org.apache.cassandra.db.marshal.UUIDType': b...,1970-01-01,"vulputate. Vestibulum at imperdiet metus, et ...",97.651955,b'(\xdfc\xb7\xccWC\xcb\x97R\xfa\xe6\x9d\x16S\xda'
