# 3. Cassandra to Arrow

We use some code from the Cassandra server to read the SSTable, but instead of de/serializing to/from CQL, we use an [Arrow IPC stream](http://arrow.apache.org/), which is stored in a columnar format and better suited for analytics.

Data transformations:

1. SSTable on disk
2. Deserialized into Java Object in C* server
3. Client makes request to server (not to C* DB)
4. Data serialized via Arrow IPC stream
5. Sent across network
6. Arrow IPC stream received by client
7. Transformed into Arrow Table / cuDF

**Pros:**
- doesn't make request to the main Cassandra DB, which lessens the load and allows for other operations to run
- less de/serialization involved using the Arrow IPC stream

**Cons:**
- don't want to have to start Cassandra or use the JVM
- complex architecture

In [1]:
import pyarrow as pa
import pandas as pd
from utils import fetch_data

In [2]:
buffers = fetch_data()
tables = [pa.ipc.open_stream(buf).read_all() for buf in buffers]

receiving table 0


In [5]:
tables[0].flatten().to_pandas().head()

Unnamed: 0,partition_key_part1.org.apache.cassandra.db.marshal.UUIDType,partition_key_part1.org.apache.cassandra.db.marshal.UTF8Type,partition_key_part2.org.apache.cassandra.db.marshal.UUIDType,partition_key_part2.org.apache.cassandra.db.marshal.UTF8Type,_ts_row_liveness,_del_time_row_liveness,_ttl_row_liveness,_local_del_time_partition,_marked_for_del_at_partition,_local_del_time_row,...,_ttl_data,sensor_value,_ts_sensor_value,_del_time_sensor_value,_ttl_sensor_value,station_id_part1,station_id_part2,_ts_station_id,_del_time_station_id,_ttl_station_id
0,1828142208147734908,dispersion,11081545588760870504,dispersion,1970-01-01 00:00:00.002,NaT,NaT,NaT,NaT,NaT,...,NaT,95.759791,NaT,NaT,NaT,2945182322382029771,10904053516202300378,NaT,NaT,NaT
1,8329893407965204367,solubility,12954906978328135592,solubility,1970-01-01 00:00:00.009,NaT,NaT,NaT,NaT,NaT,...,NaT,106.497951,NaT,NaT,NaT,2945182322382029771,10904053516202300378,NaT,NaT,NaT
2,4678114788215243590,fitness,11367188723867789301,fitness,1970-01-01 00:00:00.002,NaT,NaT,NaT,NaT,NaT,...,NaT,100.090271,NaT,NaT,NaT,2945182322382029771,10904053516202300378,NaT,NaT,NaT
3,1127822330704970463,phase_offset,10548166629717554178,phase_offset,1970-01-01 00:00:00.001,NaT,NaT,NaT,NaT,NaT,...,NaT,94.181632,NaT,NaT,NaT,2945182322382029771,10904053516202300378,NaT,NaT,NaT
4,4544790122793091762,periodicity,11718901302575831064,periodicity,1970-01-01 00:00:00.000,NaT,NaT,NaT,NaT,NaT,...,NaT,99.909094,NaT,NaT,NaT,2945182322382029771,10904053516202300378,NaT,NaT,NaT
