## Real Time Analytics Final Project

<b> CryptoStreamLab: Real-Time Analysis of Cryptocurrency Markets Using Streaming Data </b> \
An Exploration of Real-Time Price and Volume Analysis with Spark, Kafka, and Streamlit for Enhanced Trading Strategies

<b> Author:\
    1. Cuong Vo - 131116\
    2. Trang Linh Nguyen - 131036\
    3. Aisel Akhmedova - 131008 </b>

<b> Tech Stack:\
    1. Spark: 3.5.0\
    2. Python: 3.10.12\
    3. OS: WSL Linux\
    4. Streamlit: \
    5. Scala: 2.12.18\
    6. Java OpenJDK 64-Bit Server VM: 11.0.25</b>

## Abstract

#### Library Import

In [33]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

#### Create Spark Session

In [34]:
from pyspark.sql import SparkSession
import pyspark.sql.functions as F
import pyspark.sql.types 

spark = SparkSession.builder.appName("StreamQuant").getOrCreate()

## Data Description

The dataset consists of real-time cryptocurrency market data streamed from Binance. Each record includes the following fields:

- **Ticker**: The symbol representing the cryptocurrency pair (e.g., BTCUSDT).
- **Timestamp**: The date and time when the data was recorded.
- **Open**: The opening price of the cryptocurrency for the given time interval.
- **Close**: The closing price of the cryptocurrency for the given time interval.
- **Price**: The current price at the time of data capture.
- **Volume**: The total trading volume of the cryptocurrency during the interval.

This data enables real-time analysis of price movements and trading activity for various cryptocurrencies.

In [35]:
# Setup the connection to Binance
from binance.client import Client

#  using API keys
api_key = 'giuBTEvNmtfaSuPpCZfmF7uXzRYfKzk7sAwC4ezjB3KbfGLS30UnQMDxcxs15WSB'
api_secret = 'SOmHSWFBuTa20grpf8r87c9qm9tym1oHkjktpu4mIwB9L08qvXW4W9HK7FSt1y6o'

client = Client(api_key, api_secret)


In [36]:
# define the function to capture the data from Binance 
def get_historical_data(symbol, interval, lookback):
    """
    Fetch historical data from Binance for a given symbol and interval.
    :param symbol: The trading pair symbol (e.g., 'BTCUSDT').
    :param interval: The time interval for the data (e.g., '1m', '5m', '1h', '1d').
    :param lookback: The lookback period for the data (e.g., '1 day ago UTC', '1 hour ago UTC').
    """
    try:
        klines = client.get_historical_klines(symbol, interval, lookback)
        return klines
    except Exception as e:
        print(f"Error fetching data for {symbol}: {e}")
        return None

In [37]:
binance_data = get_historical_data('BTCUSDT', '1m', '1 minute ago UTC')

## Kafka Server setup

Kafka Server is created on <b>localhost:9092</b> on WSL Linux Environment

To check status of Kafka Server

## Streaming Data

### Create Kafka topic
In this part, using Linux to create Kafka topic name StreamQuant

Check for topic

### Create Kafka Procedure
In this part, combined with the API key from Binance, we pull the data from Binance and send it to Kakfa producer

In [53]:
from kafka import KafkaProducer
import json

producer = KafkaProducer(
    bootstrap_servers='localhost:9092',  # Replace with your Kafka broker if different
    value_serializer=lambda v: json.dumps(v).encode('utf-8')  # Serialize to JSON bytes
)

topic = 'StreamQuant'

for i in range(100):  
    producer.send(topic, value=binance_data[0])
    print(f"Sent: {binance_data[0]}")
    producer.flush()

Sent: [1747413300000, '103916.91000000', '103933.56000000', '103898.14000000', '103928.62000000', '8.80384000', 1747413359999, '914861.61599580', 1874, '6.91312000', '718390.06250700', '0']
Sent: [1747413300000, '103916.91000000', '103933.56000000', '103898.14000000', '103928.62000000', '8.80384000', 1747413359999, '914861.61599580', 1874, '6.91312000', '718390.06250700', '0']
Sent: [1747413300000, '103916.91000000', '103933.56000000', '103898.14000000', '103928.62000000', '8.80384000', 1747413359999, '914861.61599580', 1874, '6.91312000', '718390.06250700', '0']
Sent: [1747413300000, '103916.91000000', '103933.56000000', '103898.14000000', '103928.62000000', '8.80384000', 1747413359999, '914861.61599580', 1874, '6.91312000', '718390.06250700', '0']
Sent: [1747413300000, '103916.91000000', '103933.56000000', '103898.14000000', '103928.62000000', '8.80384000', 1747413359999, '914861.61599580', 1874, '6.91312000', '718390.06250700', '0']
Sent: [1747413300000, '103916.91000000', '103933.5

## Trading Strategy

In [None]:
$ bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic sampleTopic --from-beginning

## Visualization