# Request Financial Statement Datasets
For the analysis in this report, we shall be requesting financial statement datasets for the Dow Jones index from 2020-01-01 to 2025-03-31. The helper file in the datarequest folder makes the requests and downloads the datasets into a format ready to use by the Strategy Construction module.

In [1]:
import bql
import os
import importlib
import json

import pandas as pd

from ipywidgets import IntProgress
from IPython.display import display, Markdown

import requesters.data_request_helper as helper
import requesters.financial_news as news_helper
from utils.s3_helper import S3Helper

In [9]:
importlib.reload(helper)
importlib.reload(news_helper)

<module 'requesters.financial_news' from '/project/requesters/financial_news.py'>

### Quarterly Data for the Dow Jones

In [5]:
# Index to use for point in time firms
index = 'INDU Index'
filename = 'data_quarterly.json'
reporting_period = 'Q'
start_date = '2020-01-01'

# rebalance dates for the index
rebalance_dates = ['2024-12-31',
        '2024-09-30',
        '2024-06-30',
        '2024-03-31',
        '2023-12-31',
        '2023-09-30',
        '2023-06-30',
        '2023-03-31',
        '2022-12-31',
        '2022-09-30',
        '2022-06-30',
        '2022-03-31',
        '2021-12-31',
        '2021-09-30',
        '2021-06-30',
        '2021-03-31',
        '2020-12-31',
        '2020-09-30',
        '2020-06-30',
        '2020-03-31',
        '2019-12-31',
        '2019-09-30',
        '2019-06-30',
        '2019-03-31']

In [28]:
data_helper = helper.FinancialDataRequester(index_id=index,
                                            dataset_name='quarterly_pit_indu_blended',
                                            rebalance_dates=rebalance_dates,
                                            reporting_frequency=reporting_period,
                                            start_date=start_date)

100%|██████████| 24/24 [1:04:26<00:00, 161.12s/it]


In [17]:
df_rebalance_dates = data_helper.get_rebalance_dates()

100%|██████████| 24/24 [00:16<00:00,  1.54it/s]

In [18]:
df_rebalance_dates

Unnamed: 0_level_0,Unnamed: 1_level_0,PERIOD_END_DATE
AS_OF_DATE,ID,Unnamed: 2_level_1
2020-01-07,GS UN Equity,2019-09-30
2020-01-07,NKE UN Equity,2019-11-30
2020-01-08,WBA UW Equity,2019-11-30
2020-01-08,WBA UQ Equity,2019-11-30
2020-01-14,JPM UN Equity,2019-12-31
...,...,...
2025-04-24,IBM UN Equity,2025-03-31
2025-04-24,INTC UQ Equity,2025-03-29
2025-04-24,INTC UW Equity,2025-03-29
2025-04-24,MRK UN Equity,2025-03-31


In [19]:
all_data = data_helper.create_financial_dataset()

100%|██████████| 24/24 [02:48<00:00,  7.01s/it]
100%|██████████| 24/24 [00:22<00:00,  1.06it/s]
100%|█████████▉| 456/457 [28:16<00:04,  4.03s/it]

#### Save to Bloomberg Lab S3 Storage

In [20]:
# Write the data to local ephemeral storage
local_file = '/tmp/dow_quarterly_ltm_v3.json'
with open(local_file, 'w') as f:
    json.dump(all_data, f)

In [21]:
# Create S3 Helper object
s3_helper = S3Helper('tmp/fs')

In [22]:
# Upload to Bloomberg Lab S3 Storage
s3_helper.add_file(local_filename=local_file)

### Annual Data for the Dow Jones

In [67]:
# Index to use for point in time firms
index = 'INDU Index'
filename = 'data_annual_pit_dow.json'
reporting_period = 'A'

# rebalance dates for the index
rebalance_dates = ['2024-12-31',
        '2024-09-30',
        '2024-06-30',
        '2024-03-31',
        '2023-12-31',
        '2023-09-30',
        '2023-06-30',
        '2023-03-31',
        '2022-12-31',
        '2022-09-30',
        '2022-06-30',
        '2022-03-31',
        '2021-12-31',
        '2021-09-30',
        '2021-06-30',
        '2021-03-31',
        '2020-12-31',
        '2020-09-30',
        '2020-06-30',
        '2020-03-31',
        '2019-12-31',
        '2019-09-30',
        '2019-06-30',
        '2019-03-31',]

In [68]:
data_helper = helper.FinancialDataRequester(index_id=index,
                                            dataset_name='annual_pit_indu_blended',
                                            rebalance_dates=rebalance_dates,
                                            reporting_frequency=reporting_period)

In [69]:
all_data = data_helper.create_financial_dataset()

100%|██████████| 24/24 [00:48<00:00,  2.04s/it]
 99%|█████████▉| 145/146 [08:51<00:03,  3.69s/it]

In [66]:
all_data['2020-04-24']['AXP UN Equity']['mt']

{'name': 'American Express Co', 'figi': 'BBG000BCQZS4', 'sector': 'Financials'}

### Request Data for Training

In [14]:
# select the index
training_index = 'SPX Index'
filename = 'data_quarterly_pit_spx_refresh_blended.json'
reporting_period = 'Q'
start_date = '2020-01-01'

# rebalance dates for the index
rebalance_dates = ['2024-12-31',
        '2024-09-30',
        '2024-06-30',
        '2024-03-31',
        '2023-12-31',
        '2023-09-30',
        '2023-06-30',
        '2023-03-31',
        '2022-12-31',
        '2022-09-30',
        '2022-06-30',
        '2022-03-31',
        '2021-12-31',
        '2021-09-30',
        '2021-06-30',
        '2021-03-31',
        '2020-12-31',
        '2020-09-30',
        '2020-06-30',
        '2020-03-31',
        '2019-12-31',
        '2019-09-30',
        '2019-06-30',
        '2019-03-31',]

In [16]:
data_helper = helper.FinancialDataRequester(index_id=index,
                                            dataset_name=filename,
                                            rebalance_dates=rebalance_dates,
                                            reporting_frequency=reporting_period,
                                            start_date=start_date)

In [17]:
training_data = data_helper.create_financial_dataset()

100%|██████████| 24/24 [00:32<00:00,  1.36s/it]
100%|█████████▉| 456/457 [29:01<00:03,  3.81s/it]

### Request Financial News Datasets

In [6]:
index_members = ['BBG000B9XRY4', 'BBG000BBJQV0', 'BBG000BBS2Y0', 'BBG000BCQZS4',
       'BBG000BCSST7', 'BBG000BF0K17', 'BBG000BH4R78', 'BBG000BJ81C1',
       'BBG000BKZB36', 'BBG000BLNNH6', 'BBG000BMHYD1', 'BBG000BMX289',
       'BBG000BN2DC2', 'BBG000BNSZP1', 'BBG000BP52R2', 'BBG000BPD168',
       'BBG000BPH459', 'BBG000BR2B91', 'BBG000BR2TH3', 'BBG000BSXQV7',
       'BBG000BVPV84', 'BBG000BW8S60', 'BBG000BWLMJ4', 'BBG000BWXBC2',
       'BBG000C0G1D1', 'BBG000C3J3C9', 'BBG000C5HS04', 'BBG000C6CFJ5',
       'BBG000CH5208', 'BBG000DMBXR2', 'BBG000GZQ728', 'BBG000H556T9',
       'BBG000HS77T5', 'BBG000K4ND22', 'BBG000PSKYX7', 'BBG00BN96922']


In [10]:
news = news_helper.FinancialNewsRequester()

In [12]:
news_headlines = news.build_news_dataset(index_members=index_members,
                                         start_date=start_date)

Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: org.apache.spark.SparkException: Only one SparkContext should be running in this JVM (see SPARK-2243).The currently running SparkContext was created at:
org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
py4j.Gateway.invoke(Gateway.java:238)
py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
py4j.GatewayConnection.run(GatewayConnection.java:238)
java.base/java.lang.Thread.run(Thread.java:834)
	at org.apache.spark.SparkContext$.$anonfun$assertNoOtherContextIsRunning$2(SparkContext.scala:2658)
	at scala.Option.foreach(Option.scala:407)
	at org.apache.spark.SparkContext$.assertNoOtherContextIsRunning(SparkContext.scala:2655)
	at org.apache.spark.SparkContext$.setActiveContext(SparkContext.scala:2756)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:2613)
	at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:238)
	at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
	at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.base/java.lang.Thread.run(Thread.java:834)


### Example prompt
Below is an example of the Income Statement, Balance Sheet and Historical price data that has been generated from the SecurityData class.

In [14]:
with open('Data/prompts.json', 'rb') as f:
    prompts = json.load(f)

In [15]:
print(prompts[0]['prompt'][0]['content'] + prompts[0]['prompt'][1]['content'])

You are a financial analyst.Use the following income statement, balance sheet to estimate the Basic EPS for the next fiscal period. Use only the data in the prompt. Provide a confidence score for how confident you are of the decision. If you are not confident then lower the confidence score.


Income Statement:
                                                        t           t-1           t-2           t-3           t-4           t-5
items                                                                                                                          
Revenue                                      1.387040e+11  1.374120e+11  1.368660e+11  1.363540e+11  1.360970e+11  1.345900e+11
Cost of Revenue                              1.092480e+11  1.077140e+11  1.067900e+11  1.059300e+11  1.053460e+11  1.034980e+11
Gross Profit                                 2.945600e+10  2.969800e+10  3.007600e+10  3.042400e+10  3.075100e+10  3.109200e+10
Operating Expenses                           2.