# Reading BPipe Data

[![DBR](https://img.shields.io/badge/DBR-16.4-red?logo=databricks&style=for-the-badge)](https://docs.databricks.com/release-notes/runtime/16.4.html)
[![CLOUD](https://img.shields.io/badge/CLOUD-ALL-blue?style=for-the-badge)](https://databricks.com/try-databricks)

This project provides a Spark connector for Bloomberg B-Pipe, allowing real-time and reference market data to stream directly into Databricks. It enables use cases such as intraday risk calculations, while Unity Catalog ensures entitlements, governance, and full auditability of data access. 

In [0]:
%scala

// Make sure that we have both blpapi and databricks wrapper on classpath
import com.databricks.fsi.bpipe._
import com.bloomberglp.blpapi._

// Make sure that we have our private key and certificates on executors' classpath
import org.apache.spark.sql.functions.udf
import org.apache.commons.io.IOUtils
import spark.implicits._

// We load certificate at an executor level
def isInClassPath(resourceName: String): Boolean = {
  try {
    val stream = BPipeConfig.getClass.getResourceAsStream(resourceName)
    IOUtils.toByteArray(stream)
    true
  } catch {
    case _: Throwable => false
  }
}

val isInClassPathUdf = udf(isInClassPath _)
val df = Seq(
  ("/certificates/rootCertificate.pk7"),
  ("/certificates/073BE6888AE987A5FC5C3C288CBC89E3.pk12")
).toDF("resource_name").withColumn("is_in_classpath", isInClassPathUdf($"resource_name"))

display(df)

resource_name,is_in_classpath
/certificates/rootCertificate.pk7,True
/certificates/073BE6888AE987A5FC5C3C288CBC89E3.pk12,True


## Retrieve specific instruments
We manually selected some instruments we know are part of different EIDs, goal being to test permissioning

In [0]:
import os
instruments_df = []
instruments_base_dir = '/Volumes/market_data/providers/bloomberg/instruments'
for instrument_file_name in os.listdir(instruments_base_dir):
    instrument_file = os.path.join(instruments_base_dir, instrument_file_name)
    eid = instrument_file_name.split('_')[0]
    with open(instrument_file, 'r') as f:
        for instrument in f.read().split('\n'):
            instruments_df.append([eid, instrument])

import pandas as pd
instruments_df = pd.DataFrame(instruments_df, columns=['EID', 'SECURITY'])
display(instruments_df.groupby('EID').sample(n=3))

EID,SECURITY
33828,JPVM215 BGN Curncy
33828,DFPI0625 BGN Curncy
33828,THNA0730 BGN Curncy
38736,USDNGNH1Y BGNL Curncy
38736,GBPEURK3M BGNL Curncy
38736,USDHRKH24H BGNL Curncy
39489,S:FFFF 4-28 ELEC Comdty
39489,SMZ5P 420 ELEC Comdty
39489,VBOV25P1 102.00 Comdty


In [0]:
# Let's get only 10 securities per EID fow now
sample_instruments = instruments_df.groupby('EID').sample(n=10)['SECURITY'].to_list()
print('{} securities loaded'.format(len(sample_instruments)))

30 securities loaded


## Static Reference Data

`//blp/staticMktData` is a type of API call (using Bloomberg’s BLPAPI) that asks for metadata or static attributes about securities — things that generally do not change tick-by-tick, such as Security descriptions, ISINs, CUSIPs, SEDOLs, Exchange codes, Sector classifications, etc. We mapped its service again spark application accepting below parameters.

|                |  |
|----------------| ----------- |
| B-Pipe service | `//blp/staticMktData` |
| Delivery mode      | request / response |
| Spark mode      | batch |

In [0]:
staticMktData = (
  spark
    .read
    .format("//blp/staticMktData")

    # B-PIPE connection
    .option("serverAddresses", "['cloudpoint1.bloomberg.com', 'cloudpoint2.bloomberg.com']")
    .option("serverPort", 8194)

    # Files were added as spark local files in cluster configuration
    .option("tlsCertificatePath", '/certificates/rootCertificate.pk7')
    .option("tlsPrivateKeyPath", '/certificates/073BE6888AE987A5FC5C3C288CBC89E3.pk12')
    .option("tlsPrivateKeyPassword", "VcRC3uY48vp2wZj5")
    .option("authApplicationName", "blp:dbx-src-test")
    .option("correlationId", 999)

    # Service configuration
    .option("serviceName", "ReferenceDataRequest")
    .option("fields", "['NAME_RT', 'SECURITY_DESCRIPTION_RT', 'BID', 'ASK', 'LAST_PRICE']")
    .option("securities", sample_instruments)
    .option("returnEids", True)

    # Start batch ingest
    .load()
)

In [0]:
display(staticMktData)

SECURITY,NAME_RT,SECURITY_DESCRIPTION_RT,BID,ASK,LAST_PRICE
CNYJ0120 BGN Curncy,CNY OFF SWPT PREM 100 1Y,CNYJ0120 Curncy,0.0,0.0,0.0
CLSE0515 BGN Curncy,CAD SPR %V-100 COR 5Y15Y,CLSE0515 Curncy,0.0,0.0,0.0
USNBFSAO BGN Curncy,USD Cap 4Y13Y,USNBFSAO Curncy,0.0,0.0,0.0
GBPGHFEA BGN Curncy,GBP SWPT %VOL 50 20Y15Y,GBPGHFEA Curncy,0.0,0.0,0.0
BPNI35 BGN Curncy,GBP SWPT NVOL OIS-150 3Y,BPNI35 Curncy,0.0,0.0,0.0
UDTL1003 BGN Curncy,AED SWPT SPRD NVOL 250 1,UDTL1003 Curncy,0.0,0.0,0.0
SAPQ1202 BGN Curncy,ZAR SWPT PREM 75 12Y2Y,SAPQ1202 Curncy,0.0,0.0,0.0
USPUAZ30 BGN Curncy,US SP PR SOFR 350 30Y30Y,USPUAZ30 Curncy,0.0,0.0,0.0
SBBL1205 BGN Curncy,SEK SWPT %VOL OIS-350 12,SBBL1205 Curncy,0.0,0.0,0.0
ISFS121F BGN Curncy,ILS FORWARD SWAP 12YX18M,ISFS121F Curncy,0.0,0.0,0.0


## Real Time data

`//blp/mktdata` is the Bloomberg API service that streams real-time market data (quotes, trades, market depth) to client applications by subscription. 
Live data from the exchanges, it is critical to ensure delivery to specific applications only by tracking entitlement and lineage through unity catalog. A B-Pipe feed of market data must be limited to a given application only.

|                |  |
|----------------| ----------- |
| B-Pipe service | `//blp/mktdata` |
| Delivery mode      | publish / subscribe |
| Spark mode      | streaming |


In [0]:
from pyspark.sql.functions import current_timestamp, to_utc_timestamp, split, col

sample_fields = [
  'NAME_RT', 
  'MKTDATA_EVENT_TYPE',
  'MKTDATA_EVENT_SUBTYPE',
  'EID',
  'BID',
  'ASK',
  'LAST_PRICE',
  'VOLUME',
  'BID_SIZE',
  'ASK_SIZE',
  'SIZE_LAST_TRADE',
  'IS_DELAYED_STREAM',
  'TRADE_UPDATE_STAMP_RT'
]

mktData = (
  
  spark
  
      # mktData is a streaming endpoint
      .readStream
      .format("//blp/mktData")

      # B-PIPE connection
      .option("serverAddresses", "['cloudpoint1.bloomberg.com', 'cloudpoint2.bloomberg.com']")
      .option("serverPort", 8194)
      .option("tlsCertificatePath", "/certificates/rootCertificate.pk7")
      .option("tlsPrivateKeyPath", "/certificates/073BE6888AE987A5FC5C3C288CBC89E3.pk12")
      .option("tlsPrivateKeyPassword", "VcRC3uY48vp2wZj5")
      .option("authApplicationName", "blp:dbx-src-test")
      .option("correlationId", 999)

      # Service configuration
      .option("fields", sample_fields)
      .option("securities", sample_instruments)

      # Custom logic
      .option("permissive", value = True)
      .option("timezone", "UTC")
      .load()

      # Add processing timestamp we can use for temporal entitlement
      .withColumn(
        'processed_timestamp',
        to_utc_timestamp(current_timestamp(), "UTC")
      )
)

In [0]:
display(mktData.limit(20))

SECURITY,NAME_RT,MKTDATA_EVENT_TYPE,MKTDATA_EVENT_SUBTYPE,EID,BID,ASK,LAST_PRICE,VOLUME,BID_SIZE,ASK_SIZE,SIZE_LAST_TRADE,IS_DELAYED_STREAM,TRADE_UPDATE_STAMP_RT,processed_timestamp
SBBL1205 BGN Curncy,SEK SWPT %VOL OIS-350 12,REFERENCE,INITPAINT,35009,,,,,,,,False,,2025-09-26T20:18:44.293Z
SBBL1205 BGN Curncy,,SUMMARY,INITPAINT,33828,,,,,,,,False,,2025-09-26T20:18:44.293Z
CLSE0515 BGN Curncy,CAD SPR %V-100 COR 5Y15Y,REFERENCE,INITPAINT,35009,,,,,,,,False,,2025-09-26T20:18:44.293Z
CLSE0515 BGN Curncy,,SUMMARY,INITPAINT,33828,,,,,,,,False,,2025-09-26T20:18:44.293Z
ISFS121F BGN Curncy,ILS FORWARD SWAP 12YX18M,REFERENCE,INITPAINT,35009,,,,,,,,False,,2025-09-26T20:18:44.293Z
ISFS121F BGN Curncy,,SUMMARY,INITPAINT,33828,,,,,,,,False,,2025-09-26T20:18:44.293Z
CNYJ0120 BGN Curncy,CNY OFF SWPT PREM 100 1Y,REFERENCE,INITPAINT,35009,,,,,,,,False,,2025-09-26T20:18:44.293Z
CNYJ0120 BGN Curncy,,SUMMARY,INITPAINT,33828,,,,,,,,False,,2025-09-26T20:18:44.293Z
UDTL1003 BGN Curncy,AED SWPT SPRD NVOL 250 1,REFERENCE,INITPAINT,35009,,,,,,,,False,,2025-09-26T20:18:44.293Z
UDTL1003 BGN Curncy,,SUMMARY,INITPAINT,33828,,,,,,,,False,,2025-09-26T20:18:44.293Z


In [0]:
_ = (
  mktData
    .writeStream
    .outputMode('append')
    .option('checkpointLocation', '/Volumes/market_data/providers/bloomberg/checkpoints')
    .toTable('market_data.providers.bloomberg_bpipe')
)