# Reading BPipe Data

[![DBR](https://img.shields.io/badge/DBR-16.4-red?logo=databricks&style=for-the-badge)](https://docs.databricks.com/release-notes/runtime/16.4.html)
[![CLOUD](https://img.shields.io/badge/CLOUD-ALL-blue?style=for-the-badge)](https://databricks.com/try-databricks)

This project provides a Spark connector for Bloomberg B-Pipe, allowing real-time and reference market data to stream directly into Databricks. It enables use cases such as intraday risk calculations, while Unity Catalog ensures entitlements, governance, and full auditability of data access. 

In [0]:
%scala
// Make sure that we have both blpapi and databricks wrapper on classpath
import com.bloomberglp.blpapi._
import com.databricks.fsi.bpipe._

## Static Reference Data

`//blp/staticMktData` is a type of API call (using Bloomberg’s BLPAPI) that asks for metadata or static attributes about securities — things that generally do not change tick-by-tick, such as Security descriptions, ISINs, CUSIPs, SEDOLs, Exchange codes, Sector classifications, etc. We mapped its service again spark application accepting below parameters.

|                |  |
|----------------| ----------- |
| B-Pipe service | `//blp/staticMktData` |
| Delivery mode      | request / response |
| Spark mode      | batch |

In [0]:
staticMktData = (
  spark
    .read
    .format("//blp/staticMktData")

    # B-PIPE connection
    .option("serverAddresses", "['endpoint1', 'endpoint2']")
    .option("serverPort", 8194)

    # Files were added as spark local files in cluster configuration
    .option("tlsCertificatePath", "rootCertificate.pk7")
    .option("tlsPrivateKeyPath", "privateKey.pk12")
    .option("tlsPrivateKeyPassword", "privateKeyPassword")
    .option("authApplicationName", "applicationName")
    .option("correlationId", 999)

    # Service configuration
    .option("serviceName", "ReferenceDataRequest")
    .option("fields", "['BID', 'ASK', 'LAST_PRICE']")
    .option("securities", "['BGN Curncy', 'GBP BGN Curncy', 'EUR BGN Curncy', 'JPYEUR BGN Curncy']")
    .option("returnEids", True)

    # Start batch ingest
    .load()
)

In [0]:
display(staticMktData)

SECURITY,BID,ASK,LAST_PRICE
BGN Curncy,1.6721,1.6818,1.6769
GBP BGN Curncy,1.334,1.3341,1.3341
EUR BGN Curncy,1.1662,1.1663,1.1663
JPYEUR BGN Curncy,0.57265,0.57269,0.57267


## Real Time data

`//blp/mktdata` is the Bloomberg API service that streams real-time market data (quotes, trades, market depth) to client applications by subscription. 
Live data from the exchanges, it is critical to ensure delivery to specific applications only by tracking entitlement and lineage through unity catalog. A B-Pipe feed of market data must be limited to a given application only.

|                |  |
|----------------| ----------- |
| B-Pipe service | `//blp/mktdata` |
| Delivery mode      | publish / subscribe |
| Spark mode      | streaming |


In [0]:
securities = {
  '33828': ['BGN Curncy', 'GBP BGN Curncy', 'EUR BGN Curncy', 'JPYEUR BGN Curncy'],
  '38736': ['5RIZFU2 BGN Curncy', 'AB020310 BGN Curncy', 'AB020406 BGN Curncy', 'AB021030 BGN Curncy', 'AB030510 BGN Curncy', 'AB050710 BGN Curncy'],
  '39489': ['0UU5C 95.3750 COMB Comdty', '0UU5C 95.3750 ELEC Comdty', '0UU5C 95.3750 PIT Comdty', '0UU5C 95.5000 COMB Comdty', '0UU5C 95.5000 ELEC Comdty', '0UU5C 95.5000 PIT Comdty']
}
all_securities = [sec for sublist in securities.values() for sec in sublist]

In [0]:
from pyspark.sql.functions import current_timestamp, to_utc_timestamp, split
import json

mktData = (
  
  spark
  
      # mktData is a streaming endpoint
      .readStream
      .format("//blp/mktData")

      # B-PIPE connection
      .option("serverAddresses", "['endpoint1', 'endpoint2']")
      .option("serverPort", 8194)

      # Files were added as spark local files in cluster configuration
      .option("tlsCertificatePath", "rootCertificate.pk7")
      .option("tlsPrivateKeyPath", "privateKey.pk12")
      .option("tlsPrivateKeyPassword", "privateKeyPassword")
      .option("authApplicationName", "applicationName")
      .option("correlationId", 999)

      # Service configuration
      .option("fields", "['MKTDATA_EVENT_TYPE','MKTDATA_EVENT_SUBTYPE','EID','LAST_PRICE','IS_DELAYED_STREAM','TRADE_UPDATE_STAMP_RT']")
      .option("securities", json.dumps(all_securities))

      # Custom logic
      .option("permissive", value = True)
      .option("timezone", "UTC")
      .load()

      # Add processing timestamp we can use for temporal entitlement
      .withColumn(
        'processed_timestamp',
        to_utc_timestamp(current_timestamp(), "UTC")
      )
)

In [0]:
display(mktData.limit(5))

SECURITY,MKTDATA_EVENT_TYPE,MKTDATA_EVENT_SUBTYPE,EID,LAST_PRICE,IS_DELAYED_STREAM,TRADE_UPDATE_STAMP_RT,processed_timestamp
AB021030 BGN Curncy,SUMMARY,INITPAINT,33828,,False,,2025-09-25T20:37:16.875Z
EUR BGN Curncy,SUMMARY,INITPAINT,33828,1.1668,False,,2025-09-25T20:37:16.875Z
GBP BGN Curncy,SUMMARY,INITPAINT,33828,1.3344,False,,2025-09-25T20:37:16.875Z
GBP BGN Curncy,TRADE,NEW,33828,1.3344,False,2025-09-25T20:37:20Z,2025-09-25T20:37:16.875Z
BGN Curncy,SUMMARY,INITPAINT,33828,1.6764,False,,2025-09-25T20:37:16.875Z


In [0]:
_ = (
  mktData
    .writeStream
    .outputMode('append')
    .option('checkpointLocation', '/Volumes/market_data/providers/bloomberg/checkpoints')
    .toTable('market_data.providers.bloomberg_bpipe')
)