# 3. Cassandra to Arrow

We use some code from the Cassandra server to read the SSTable, but instead of de/serializing to/from CQL, we use an [Arrow IPC stream](http://arrow.apache.org/), which is stored in a columnar format and better suited for analytics.

Data transformations:

1. SSTable on disk
2. Deserialized into Java Object in C* server
3. Client makes request to server (not to C* DB)
4. Data serialized via Arrow IPC stream
5. Sent across network
6. Arrow IPC stream received by client
7. Transformed into Arrow Table / cuDF

**Pros:**
- doesn't make request to the main Cassandra DB, which lessens the load and allows for other operations to run
- less de/serialization involved using the Arrow IPC stream

**Cons:**
- don't want to have to start Cassandra or use the JVM
- complex architecture

In [None]:
import pyarrow as pa
# my local computer doesn't have cudf installed
# import cudf
import socket

HOST = '127.0.0.1'
PORT = 9143

In [None]:
# read data from socket
def fetch_data():
    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
        s.connect((HOST, PORT))
        s.sendall(b'hello world\n')
        data = b''
        while True:
            newdata = s.recv(1024)
            if not newdata:
                break
            data += newdata
    return data

In [None]:
buffer = fetch_data()
reader = pa.ipc.open_stream(buffer)
arrow_table = reader.read_all()
arrow_table.to_pandas() # for visualization