#### Binance Project

This project is dedicated to collecting accurate price and timestamp data from Binance using both the REST API and Websocket. the goal is to build a reliable data pipeline that extracts raw market data, transforms it into a clean and usable format, loads it into the appropriate storage layers, and ultimately analyzes it to uncover insights.

To maintain clarity and structure throughout the development process, the documentation and codebase will be organized into four main sections: data extraction, data transformation, data loading, and data analysis. Each section will detail the methods, tools, and design choices involved, creating a clear end-to-end overview of the entire workflow.

In [2]:
import websocket
import json
import pandas as pd
from binance.client import Client
import time

In [3]:
ws_data = []
start_time = None
duration_time = None
ws = None

symbol = "btcusdc"
interval = "1s"
socket_info = f"wss://stream.binance.com:9443/ws/{symbol}@kline_{interval}"

api_key = "SSPb2vTZrlFlSvq08yoVOVcEQeGI7MEryavlnoLikFNqDoEpwcBOcD2GhNlEilGi"
secret_api_key = "GEjFIi5B2a50aikp4MlAJZ7yue0lBsZJ9pTxcprtwMbnKH2TZilkDz9h3YUjhSTo"
client = Client(api_key,secret_api_key)

btc = "BTCUSDC"
time_frame = "1s"
lookback_period = "1 minute ago"


In [4]:

def on_message(ws,msg):

    msg = json.loads(msg)
    candlestick = msg["k"]

    if candlestick["x"]:
        ws_data.append({
            "timestamp":candlestick["t"],
            "price_ws":float(candlestick["c"])})
        
    print("Data stream is intact and running, wait for the desired duration to end")
    
    if time.time() - start_time >= duration_time:
        print(f"Reached {duration_time} seconds. Closing WebSocket...")
        ws.close()

def on_error(ws, error):
    print("ERROR:", error)

def on_close(ws, code, msg):
    print("CONNECTION IS CLOSED")

def start_ws(seconds:int):

    global start_time,duration_time,ws

    start_time = time.time()
    duration_time = seconds

    ws = websocket.WebSocketApp(
        socket_info,
        on_message=on_message,
        on_error=on_error,
        on_close=on_close
    )
    ws.run_forever()

start_ws(60)

df_ws = pd.DataFrame(ws_data)
df_ws["timestamp"] = pd.to_datetime(df_ws["timestamp"], unit="ms")

def get_btc_data(btc,time_frame,lookback_period):

    btc_info = client.get_historical_klines(btc,time_frame,lookback_period)

    df_rest = pd.DataFrame(btc_info, columns = ["timestamp", "open","high","low","close","volume","close_time",
                                               "quote_asset_volume","number_of trades","taker_buy_base_asset_volume",
                                                "taker_buy_quote_asset_volume","ignore"])
    
    df_rest["timestamp"]= pd.to_datetime(df_rest["close_time"],unit="ms").dt.floor("s")
    df_rest["price_rest"] = (pd.to_numeric(df_rest["close"]).round(2))

    return df_rest[["timestamp","price_rest"]]

df_rest = pd.DataFrame(get_btc_data(btc,time_frame,lookback_period))

df_rest["timestamp"] = pd.to_datetime(df_rest["timestamp"]).dt.floor("s")
df_ws["timestamp"]   = pd.to_datetime(df_ws["timestamp"]).dt.floor("s")

df_final = pd.merge(df_rest, df_ws, on="timestamp", how="inner")


Data stream is intact and running, wait for the desired duration to end
Data stream is intact and running, wait for the desired duration to end
Data stream is intact and running, wait for the desired duration to end
Data stream is intact and running, wait for the desired duration to end
Data stream is intact and running, wait for the desired duration to end
Data stream is intact and running, wait for the desired duration to end
Data stream is intact and running, wait for the desired duration to end
Data stream is intact and running, wait for the desired duration to end
Data stream is intact and running, wait for the desired duration to end
Data stream is intact and running, wait for the desired duration to end
Data stream is intact and running, wait for the desired duration to end
Data stream is intact and running, wait for the desired duration to end
Data stream is intact and running, wait for the desired duration to end
Data stream is intact and running, wait for the desired duration

In [5]:
df_final

Unnamed: 0,timestamp,price_rest,price_ws
0,2025-11-21 14:52:54,85069.14,85069.14
1,2025-11-21 14:52:55,85046.11,85046.11
2,2025-11-21 14:52:56,85063.01,85063.01
3,2025-11-21 14:52:57,85064.97,85064.97
4,2025-11-21 14:52:58,85064.98,85064.98
5,2025-11-21 14:52:59,85072.0,85072.0
6,2025-11-21 14:53:00,85060.89,85060.89
7,2025-11-21 14:53:01,85065.81,85065.81
8,2025-11-21 14:53:02,85032.46,85032.46
9,2025-11-21 14:53:03,85019.0,85019.0
