# 3. Cassandra to Arrow

We use some code from the Cassandra server to read the SSTable, but instead of de/serializing to/from CQL, we use an [Arrow IPC stream](http://arrow.apache.org/), which is stored in a columnar format and better suited for analytics.

Data transformations:

1. SSTable on disk
2. Deserialized into Java Object in C* server
3. Client makes request to server (not to C* DB)
4. Data serialized via Arrow IPC stream
5. Sent across network
6. Arrow IPC stream received by client
7. Transformed into Arrow Table / cuDF

**Pros:**
- doesn't make request to the main Cassandra DB, which lessens the load and allows for other operations to run
- less de/serialization involved using the Arrow IPC stream

**Cons:**
- don't want to have to start Cassandra or use the JVM
- complex architecture

In [1]:
import pyarrow as pa
import pandas as pd
from utils import fetch_data

In [2]:
buffers = fetch_data()
tables = [pa.ipc.open_stream(buf).read_all() for buf in buffers]

receiving table 0
receiving table 1
receiving table 2


In [7]:
tables[2].flatten().to_pandas().head()

Unnamed: 0,partition_key,_ts_row_liveness,_del_time_row_liveness,_ttl_row_liveness,_local_del_time_partition,_marked_for_del_at_partition,_local_del_time_row,_marked_for_delete_at_row
0,9,NaT,NaT,NaT,2021-08-15 20:55:20,2021-08-15 20:55:20.051420,NaT,NaT
1,4,NaT,NaT,NaT,2021-08-15 20:55:15,2021-08-15 20:55:15.208783,NaT,NaT
2,3,NaT,NaT,NaT,2021-08-15 20:55:12,2021-08-15 20:55:12.761366,NaT,NaT
3,5,NaT,NaT,NaT,2021-08-15 20:54:58,2021-08-15 20:54:58.051701,NaT,NaT
4,2,NaT,NaT,NaT,2021-08-15 20:54:24,2021-08-15 20:54:24.977026,NaT,NaT


In [3]:
pd.concat(map(lambda x: x.to_pandas(), tables))

Unnamed: 0,partition_key,_ts_row_liveness,_del_time_row_liveness,_ttl_row_liveness,_local_del_time_partition,_marked_for_del_at_partition,_local_del_time_row,_marked_for_delete_at_row,value,_ts_value,_del_time_value,_ttl_value
0,6,2021-08-15 21:06:44.780999,NaT,NaT,NaT,NaT,NaT,NaT,439790106.0,NaT,NaT,NaT
1,7,2021-08-15 21:06:44.781860,NaT,NaT,NaT,NaT,NaT,NaT,564330072.0,NaT,NaT,NaT
2,9,2021-08-15 21:06:44.786618,NaT,NaT,NaT,NaT,NaT,NaT,97405552.0,NaT,NaT,NaT
3,4,2021-08-15 21:06:44.770663,NaT,NaT,NaT,NaT,NaT,NaT,351686621.0,NaT,NaT,NaT
4,3,2021-08-15 21:06:44.768762,NaT,NaT,NaT,NaT,NaT,NaT,352527683.0,NaT,NaT,NaT
5,5,2021-08-15 21:06:44.777487,NaT,NaT,NaT,NaT,NaT,NaT,114304900.0,NaT,NaT,NaT
6,0,2021-08-15 21:06:44.764681,NaT,NaT,NaT,NaT,NaT,NaT,382062539.0,NaT,NaT,NaT
7,8,2021-08-15 21:06:44.785427,NaT,NaT,NaT,NaT,NaT,NaT,296173906.0,NaT,NaT,NaT
8,2,2021-08-15 21:06:44.759422,NaT,NaT,NaT,NaT,NaT,NaT,949364593.0,NaT,NaT,NaT
9,1,2021-08-15 21:06:44.756894,NaT,NaT,NaT,NaT,NaT,NaT,774912474.0,NaT,NaT,NaT
