# B-Pipe Spark Wrapper
In this notebook, we will cover a few service names from B-Pipe and map them to a spark / spark streaming applications. For testing purpose, we use an [emulator](https://github.com/Robinson664/bemu) that must be available as part of your cluster libraries. Please note that we could not validate this approach against live B-Pipe feed, hence limiting the scope of this exercise to `//blp/refdata` and `//blp/mktdata` service names only. For the purpose of this exercise, let's create a synthetic portfolio of 6 US and London based equities. 

In [0]:
import pandas as pd

portfolio = [
  ["AMZN US Equity", 1000],
  ["VOD LN Equity", 599],
  ["AAPL US Equity", 823],
  ["MSFT US Equity", 122],
  ["BARC LN Equity", 1321]
]

securities = [p[0] for p in portfolio]
portfolio_df = pd.DataFrame(portfolio, columns=['security', 'shares'])

## Historical Data Request

`HistoricalDataRequest` is a specific request type sent via Bloomberg’s BLPAPI (Bloomberg’s API used with B-Pipe and other Bloomberg services). It asks for time series of historical data - for example, daily closing prices of Apple stock for the past 6 months, or historical yields for a government bond. We mapped its service again spark application accepting below parameters.

**Required fields:**

|Option|Type|Default|Description|
|---|---|---|---|
|`fields`|`List[String]`|-|fields we want to return for our given securities. See list of supported field from [DATA\<GO\>](https://data.bloomberg.com/)|
|`securities`|`List[String]`|-|list of securities|
|`startDate`|`Date`|-|Starting date we want to get history from, formatted as YYYY-MM-dd|

**Following options are supported:**

|Option|Type|Default|Description|
|---|---|---|---|
|`endDate`|`Date`|`NOW`|Ending date we want to get history to, formatted as YYYY-MM-dd|
|`periodicityAdjustment`|`Option[String]`|`NONE`|Must be valid periodicity adjustment, like `ACTUAL`, `CALENDAR` or `FISCAL`|
|`periodicitySelection`|`Option[String]`|`NONE`|Must be valid periodicity selection, like `DAILY`, `WEEKLY`, `MONTHLY`, etc.|
|`pricingOption`|`Option[String]`|`NONE`|`PRICING_OPTION_PRICE` or `PRICING_OPTION_YIELD`|
|`overrideOption`|`Option[String]`|`NONE`|`OVERRIDE_OPTION_GPA` or `OVERRIDE_OPTION_CLOSE`|
|`adjustmentNormal`|`Option[String]`|`NONE`||
|`adjustmentAbnormal`|`Option[String]`|`NONE`||
|`adjustmentSplit`|`Option[String]`|`NONE`||
|`maxDataPoints`|`Option[Int]`|`NONE`||

Given distributed nature of spark, one can specify the number of partitions we want to distribute this request against. In this case, each partition will be responsible for a specific B-Pipe request against a subset of securities provided. Suboptimal in specific cases where portfolio is made of securities of different liquidity (different traded volumes), this might remove the bottleneck of streaming an entire portfolio through 1 single request. See next notebook for more information about imbalanced dataset.

In [0]:
hist_df = (
  spark
    .read
    .format("//blp/refdata")
    .option("serviceName", "HistoricalDataRequest")
    .option("serviceHost", "127.0.0.1")
    .option("servicePort", 8954)
    .option("correlationId", 999)
    .option("fields", "['BID', 'ASK']")
    .option("securities", securities)
    .option("startDate", "2022-01-01")
    # naive partitioning by securities
    .option("partitions", 10)
    .load().toPandas()
)

In [0]:
display(hist_df)

In [0]:
import plotly.express as px

hist_df['SPREAD'] = hist_df['BID'] - hist_df['ASK']
df = hist_df.pivot(index='TIME', columns='SECURITY', values='SPREAD')
fig = px.area(df, facet_col="SECURITY", facet_col_wrap=2, height=800)
fig.show()

## Reference Data Request

ReferenceDataRequest is a type of API call (using Bloomberg’s BLPAPI) that asks for metadata or static attributes about securities — things that generally do not change tick-by-tick, such as Security descriptions, ISINs, CUSIPs, SEDOLs, Exchange codes, Sector classifications, etc. We mapped its service again spark application accepting below parameters.

**Required fields:**

|Option|Type|Default|Description|
|---|---|---|---|
|`fields`|`List[String]`|-|fields we want to return for our given securities. See list of supported field from [DATA\<GO\>](https://data.bloomberg.com/)|
|`securities`|`List[String]`|-|list of securities|

**Following options are supported:**

|Option|Type|Default|Description|
|---|---|---|---|
|`overrides`|`Map[String, String]`|`EMPTY`|See list of overrides from [DATA\<GO\>](https://data.bloomberg.com/)|

Given distributed nature of spark, one can specify the number of partitions we want to distribute this request against. In this case, each partition will be responsible for a specific B-Pipe request against a subset of securities provided. Suboptimal in specific cases where portfolio is made of securities of different liquidity (different traded volumes), this might remove the bottleneck of streaming an entire portfolio through 1 single request. See next notebook for more information about imbalanced dataset.

In [0]:
ref_df = (
  spark
    .read
    .format("//blp/refdata")
    .option("serviceName", "ReferenceDataRequest")
    .option("serviceHost", "127.0.0.1")
    .option("servicePort", 8954)
    .option("correlationId", 999)
    .option("fields", "['PX_LAST','BID','ASK','TICKER','CHAIN_TICKERS']")
    .option("securities", securities)
    .option("overrides", "{'CHAIN_PUT_CALL_TYPE_OVRD':'C','CHAIN_POINTS_OVRD':'4','CHAIN_EXP_DT_OVRD':'20141220'}")
    # naive partitioning by securities
    .option("partitions", 5)
    .load().toPandas()
)

In [0]:
display(ref_df)

## Intraday Tick Request

A B-Pipe IntradayTickRequest is a type of request sent through Bloomberg B-Pipe to retrieve tick-by-tick historical data — meaning individual trades, bids, asks, or quote changes — over a specific short time window (typically minutes to hours within a day). We mapped its service again spark application accepting below parameters.

**Required fields:**

|Option|Type|Default|Description|
|---|---|---|---|
|`security`|`String`|-|Security we want to get intra day tick from|
|`startDateTime`|`Date`|-|Starting date we want to get tick from, formatted as YYYY-MM-dd HH:mm:ss|

**Following options are supported:**

|Option|Type|Default|Description|
|---|---|---|---|
|`endDateTime`|`Date`|`NOW`|Ending date we want to get tick to, formatted as YYYY-MM-dd HH:mm:ss|
|`eventTypes`|`List[String]`|`EMPTY`|Must be valid event types such as `TRADE`, `BID`, `ASK`, `SETTLE`, etc.|
|`returnEids`|`Option[Boolean]`|`NONE`||
|`includeConditionCodes`|`Option[Boolean]`|`NONE`|optionally include special trade flags like auctions|
|`includeExchangeCodes`|`Option[Boolean]`|`NONE`||
|`includeNonPlottableEvents`|`Option[Boolean]`|`NONE`||
|`includeBrokerCodes`|`Option[Boolean]`|`NONE`||
|`includeRpsCodes`|`Option[Boolean]`|`NONE`||
|`includeBicMicCodes`|`Option[Boolean]`|`NONE`||

Given distributed nature of spark, one can specify the number of partitions we want to distribute this request against. In this case, each partition will be responsible for a specific B-Pipe request against a given time window between `startDate` and `endDate` parameters.

In [0]:
tick_df = (
  spark
    .read
    .format("//blp/refdata")
    .option("serviceName", "IntradayTickRequest")
    .option("serviceHost", "127.0.0.1")
    .option("servicePort", 8954)
    .option("correlationId", 999)
    .option("security", securities[0])
    .option("returnEids", True)
    .option("includeNonPlottableEvents", True)
    .option("startDateTime", "2022-11-01")
    # partitioning by date
    .option("partitions", 10)
    .load().toPandas()
)

In [0]:
display(tick_df)

In [0]:
import matplotlib.pyplot as plt
fig, axs = plt.subplots(1, 1, figsize=(10, 4), tight_layout=True, dpi=200)
plt.xlabel("TIME")
plt.ylabel("TRADE")
plt.title(securities[0])
tick_df = tick_df.head(100)
axs.plot(tick_df['TIME'],tick_df['VALUE'])
plt.show()

## Intraday Bar Request

IntradayBarRequest is an API request (using BLPAPI) that asks for time-aggregated market data instead of individual ticks — each “bar” summarizes trading over a fixed interval. Each bar typically contains time (timestamp of the bar start), open (first trade price in the interval), high (highest trade price), low (lowest trade price), close (last trade price), volume (total volume traded in the interval) and numEvents (number of ticks that occurred during the interval). We mapped its service again spark application accepting below parameters.

**Required fields:**

|Option|Type|Default|Description|
|---|---|---|---|
|`security`|`String`|-|Security we want to get intra day tick from|
|`startDateTime`|`Date`|-|Starting date we want to get tick from, formatted as YYYY-MM-dd HH:mm:ss|
|`interval`|`Int`|-|Interval needs to be between 1 and 1440 minutes|

**Following options are supported:**

|Option|Type|Default|Description|
|---|---|---|---|
|`endDateTime`|`Date`|`NOW`|Ending date we want to get tick to, formatted as YYYY-MM-dd HH:mm:ss|
|`eventType`|`Option[String]`|`NONE`|Must be valid event type such as `TRADE`, `BID`, `ASK`, `SETTLE`, etc.|
|`returnEids`|`Option[Boolean]`|`NONE`||
|`gapFillInitialBar`|`Option[Boolean]`|`NONE`||
|`adjustmentNormal`|`Option[Boolean]`|`NONE`||
|`adjustmentAbnormal`|`Option[Boolean]`|`NONE`||
|`adjustmentSplit`|`Option[Boolean]`|`NONE`||
|`adjustmentFollowDPDF`|`Option[Boolean]`|`NONE`||

Given distributed nature of spark, one can specify the number of partitions we want to distribute this request against. In this case, each partition will be responsible for a specific B-Pipe request against a given time window between `startDate` and `endDate` parameters.

In [0]:
bar_df = (
  spark
    .read
    .format("//blp/refdata")
    .option("serviceName", "IntradayBarRequest")
    .option("serviceHost", "127.0.0.1")
    .option("servicePort", 8954)
    .option("correlationId", 999)
    .option("interval", 1000)
    .option("security", securities[0])
    .option("startDateTime", "2022-11-01")
    # partitioning by date
    .option("partitions", 10)
    .load().toPandas()
)

In [0]:
display(bar_df)

In [0]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import pandas as pd

# Create figure with secondary y-axis
fig = make_subplots(specs=[[{"secondary_y": True}]])

# include candlestick with rangeselector
bar_df = bar_df.head(200)
fig.add_trace(
  go.Candlestick(x=bar_df['TIME'],
  open=bar_df['OPEN'], high=bar_df['HIGH'],
  low=bar_df['LOW'], close=bar_df['OPEN']),
  secondary_y=True)

# include a go.Bar trace for volumes
fig.add_trace(
  go.Bar(x=bar_df['TIME'], y=bar_df['VOLUME']),
  secondary_y=False)

fig.layout.yaxis2.showgrid=False
fig.show()

## Market Data
`//blp/mktdata` is the Bloomberg API service that streams real-time market data (quotes, trades, market depth) to client applications by subscription. We mapped its service again spark application accepting below parameters.

**Required fields:**

|Option|Type|Default|Description|
|---|---|---|---|
|`fields`|`List[String]`|-|fields we want to return for our given securities. See list of supported field from [DATA\<GO\>](https://data.bloomberg.com/)|
|`securities`|`List[String]`|-|list of securities|

Given distributed nature of spark, one can specify the number of partitions we want to distribute this request against. In this case, each partition will be responsible for a specific B-Pipe request against a subset of securities provided. Suboptimal in specific cases where portfolio is made of securities of different liquidity (different traded volumes), this might remove the bottleneck of streaming an entire portfolio through 1 single request. See next notebook for more information about imbalanced dataset.

In [0]:
market_df = (
  spark
    .readStream
    .format("//blp/mktdata")
    .option("serviceHost", "127.0.0.1")
    .option("servicePort", 8954)
    .option("correlationId", 999)
    .option("fields", "['BID','ASK','TRADE_UPDATE_STAMP_RT']")
    .option("securities", "['SPY US EQUITY','MSFT US EQUITY']")
    # naive partitioning by security
    # see next notebook for more advanced partitioning logic
    .option("partitions", 10)
    .load()
)

market_sq = (
  market_df
    .writeStream
    .format("memory")         # memory = store in-memory table (for testing only)
    .queryName("market_data") # counts = name of the in-memory table
    .start()
)

In [0]:
import time
time.sleep(200)

In [0]:
display(spark.table('market_data'))

In [0]:
market_sq.stop()