## Parallel programming with Refinitiv Data Library
This notebook is a POC to give an overview on the multitreading and multiprocessing with Refinitiv Data Library.

#### Learn more

To learn more about the Refinitiv Data Library for Python please join the Refinitiv Developer Community. By [registering](https://developers.refinitiv.com/iam/register) and [logging](https://developers.refinitiv.com/content/devportal/en_us/initCookie.html) into the Refinitiv Developer Community portal you will have free access to a number of learning materials like 
 [Quick Start guides](https://developers.refinitiv.com/en/api-catalog/refinitiv-data-platform/refinitiv-data-library-for-python/quick-start), 
 [Tutorials](https://developers.refinitiv.com/en/api-catalog/refinitiv-data-platform/refinitiv-data-library-for-python/learning), 
 [Documentation](https://developers.refinitiv.com/en/api-catalog/refinitiv-data-platform/refinitiv-data-library-for-python/docs)
 and much more.

#### Getting Help and Support

If you have any questions regarding using the API, please post them on 
the [Refinitiv Data Q&A Forum](https://community.developers.refinitiv.com/spaces/321/index.html). 
The Refinitiv Developer Community will be happy to help. 

----

## Some Imports to start with

In [1]:
# Basic Python libraries
import time
import openpyxl

# The main processing class built to compare the different approaches
from data_processor import DataProcessor

## Some constants the script will work with

The following constants are used to drive tests that request for a significant amout of data and used to drive our comparisons.  These values can be adjusted to suit your requirements.

In [2]:
# Maximum number of items requested for each testing algorithm.
MAX_ITEMS_PER_REQUEST_FOR_GET_DATA = 7500

# fields to request
data_columns = ["TR.CommonName", "TR.ISINCode", "TR.PriceClose.Currency", "TR.HeadquartersCountry", "TR.TRBCEconomicSector",
                "TR.TRBCBusinessSector", "TR.PriceMainIndexRIC", "TR.FreeFloat", "TR.SharesOutstanding", "TR.CompanyMarketCapitalization",
                "TR.IssuerRating(IssuerRatingSrc=SPI,RatingScope=DMS)", "TR.IssuerRating(IssuerRatingSrc=MIS,RatingScope=DMS)",
                "TR.IssuerRating(IssuerRatingSrc=FDL,RatingScope=DMS)", "TR.PriceClose", "TR.PricePctChg1D", "TR.Volatility5D",
                "TR.Volatility10D", "TR.Volatility30D", "TR.RSISimple14D", "TR.WACCBeta", "TR.BetaFiveYear", "TR.DivAnnouncementDate",
                "TR.DivExDate", "TR.DivPayDate", "TR.DivAdjustedGross", "TR.DivAdjustedNet", "TR.RelValPECOmponent",
                "TR.RelValEVEBITDACOmponent", "TR.RelValDividendYieldCOmponent", "TR.RelValEVSalesCOmponent",
                "TR.RelValPriceCashFlowCOmponent", "TR.RelValPriceBookCOmponent", "TR.RecMean", "TR.RecLabel", "TR.RevenueSmartEst",
                "TR.RevenueMean", "TR.RevenueMedian", "TR.NetprofitSmartEst", "TR.NetProfitMean", "TR.NetProfitMedian", "TR.DPSSmartEst",
                "TR.DPSMean", "TR.DPSMedian", "TR.EpsSmartEst", "TR.EPSMean", "TR.EPSMedian", "TR.PriceSalesRatioSmartEst",
                "TR.PriceSalesRatioMean", "TR.PriceSalesRatioMedian", "TR.PriceMoRegionRank", "TR.SICtryRank", "TR.EQCountryListRank1_Latest",
                "TR.CreditComboRegionRank", "TR.CreditRatioRegionRank", "TR.CreditStructRegRank", "TR.CreditTextRegRank",
                "TR.TRESGScoreGrade(Period=FY0)", "TR.TRESGCScoreGrade(Period=FY0)", "TR.TRESGCControversiesScoreGrade(Period=FY0)"
                ]

## Data Processing functions
For simplification in understanding our workflow, we've placed the main processing functions within a separate module.  For our purposes,
create an instance to access each of these processing methods.  The processing module is responsible for session management within the 
data libraries.

In [3]:
# Create our processor - opens a session
processor = DataProcessor()

Opening desktop session...
Desktop Session State: OpenState.Opened


In [None]:
# For a brief overview of this class and the different methods
help(processor)

### The file of rics

In [5]:
# The input file name
INSTRUMENTS_FILE = "Instruments.txt"

### Load the test data...

In [6]:
INSTRUMENTS_LIST = []

try:
    with open(INSTRUMENTS_FILE, "r") as myInstrumentFile:
        for readedLine in myInstrumentFile:
            instrument = readedLine.replace('\n', '')
            if (instrument):
                INSTRUMENTS_LIST.append(instrument)
except Exception as e:
    print(f"The file '{INSTRUMENTS_FILE}' does not exist.")

universe_for_screening = INSTRUMENTS_LIST[:MAX_ITEMS_PER_REQUEST_FOR_GET_DATA]
print(f"Total items in the instruments list: {len(INSTRUMENTS_LIST)}. Using {len(universe_for_screening)} items for our tests.")

Total items in the instruments list: 10194. Using 7500 items for our tests.


### Normal processing

Get the data for the entire universe with the normal process.  The normal process involves passing the entire request to the data library without intervention.

In [7]:
start_time = time.time()
results = processor.internal_get_data(universe = universe_for_screening, fields = data_columns)
print("Normal processing finished: --- %s seconds elapsed ---" % (time.time() - start_time))

Batch of 7500 requested at: 10:54:41.960
Normal processing finished: --- 103.11336874961853 seconds elapsed ---


In [8]:
# Display results
results

Unnamed: 0,Instrument,Company Common Name,ISIN Code,Currency,Country of Headquarters,TRBC Economic Sector Name,TRBC Business Sector Name,Main Index RIC,Free Float,Outstanding Shares,...,Price Momentum Region Rank,Short Interest Country Rank,"Earnings Quality Region Rank.01, Latest",Credit Combined Region Rank,Credit SmartRatios Region Rank,Credit Structural Region Rank,Credit Text Mining Region Rank,ESG Score Grade,ESG Combined Score Grade,ESG Controversies Score Grade
0,ARB.AX,ARB Corporation Ltd,AU000000ARB5,AUD,Australia,Consumer Cyclicals,Automobiles & Auto Parts,.AORD,73805587,82220441,...,79,,47,94,79,91,37,C,C,A+
1,EVT.AX,EVT Ltd,AU000000EVT1,AUD,Australia,Consumer Cyclicals,Cyclical Consumer Services,.AORD,116409167,162275357,...,14,,29,20,14,61,26,C,C,A+
2,ALL.AX,Aristocrat Leisure Ltd,AU000000ALL7,AUD,Australia,Consumer Cyclicals,Cyclical Consumer Services,.AORD,574432197,648560092,...,70,,77,94,53,93,90,B-,B-,A+
3,ANZ.AX,ANZ Group Holdings Ltd,AU000000ANZ3,AUD,Australia,Financials,Banking & Investment Services,.AORD,2980191835,3001241961,...,70,,25,55,38,50,12,A-,C+,D
4,AMC.AX,Amcor PLC,AU000000AMC4,AUD,United Kingdom,Basic Materials,Applied Resources,.AORD,639059601,1444343212,...,11,,60,3,,,61,B+,B+,A+
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7495,DDD.BK,Do Day Dream PCL,TH8365010009,THB,Thailand,Consumer Non-Cyclicals,Personal & Household Products & Services,.SETI,85962600,317887700,...,1,,11,75,57,78,,,,
7496,LILA.OQ,Liberty Latin America Ltd,BMG9001E1021,USD,United States of America,Technology,Telecommunications Services,.IXIC,35557361,40800000,...,42,100,27,5,3,5,24,D+,D+,A+
7497,LILAK.OQ,Liberty Latin America Ltd,BMG9001E1286,USD,United States of America,Technology,Telecommunications Services,.IXIC,128250028,164039541,...,40,,,5,3,5,24,D+,D+,A+
7498,NTR.TO,Nutrien Ltd,CA67077M1086,CAD,Canada,Basic Materials,Chemicals,.GSPTSE,494354917,494547340,...,12,,54,53,55,33,49,B+,B+,B


### Multithreading
Get the data for the entire universe with a pool of threads using the ThreadPoolExecutor creating multiple threads to run.

In [9]:
start_time = time.time()
results = processor.threadPoolExecutor_get_data(universe = universe_for_screening, fields = data_columns)
print("Multithreading finished: --- %s seconds elapsed ---" % (time.time() - start_time))

Process ID:41812 ==> sending batches with max size of 800 on 10 threads at 10:56:33.681
Batch of 800 requested at: 10:56:33.682
Batch of 800 requested at: 10:56:33.684
Batch of 800 requested at: 10:56:33.686
Batch of 800 requested at: 10:56:33.688
Batch of 800 requested at: 10:56:33.690
Batch of 800 requested at: 10:56:33.693
Batch of 800 requested at: 10:56:33.695
Batch of 800 requested at: 10:56:33.697
Batch of 800 requested at: 10:56:33.698
Batch of 300 requested at: 10:56:33.700
Multithreading finished: --- 29.326457738876343 seconds elapsed ---


In [10]:
# Display results
results

Unnamed: 0,Instrument,Company Common Name,ISIN Code,Currency,Country of Headquarters,TRBC Economic Sector Name,TRBC Business Sector Name,Main Index RIC,Free Float,Outstanding Shares,...,Price Momentum Region Rank,Short Interest Country Rank,"Earnings Quality Region Rank.01, Latest",Credit Combined Region Rank,Credit SmartRatios Region Rank,Credit Structural Region Rank,Credit Text Mining Region Rank,ESG Score Grade,ESG Combined Score Grade,ESG Controversies Score Grade
0,ARB.AX,ARB Corporation Ltd,AU000000ARB5,AUD,Australia,Consumer Cyclicals,Automobiles & Auto Parts,.AORD,73805587,82220441,...,79,,47,94,79,91,37,C,C,A+
1,EVT.AX,EVT Ltd,AU000000EVT1,AUD,Australia,Consumer Cyclicals,Cyclical Consumer Services,.AORD,116409167,162275357,...,14,,29,20,14,61,26,C,C,A+
2,ALL.AX,Aristocrat Leisure Ltd,AU000000ALL7,AUD,Australia,Consumer Cyclicals,Cyclical Consumer Services,.AORD,574432197,648560092,...,70,,77,94,53,93,90,B-,B-,A+
3,ANZ.AX,ANZ Group Holdings Ltd,AU000000ANZ3,AUD,Australia,Financials,Banking & Investment Services,.AORD,2980191835,3001241961,...,70,,25,55,38,50,12,A-,C+,D
4,AMC.AX,Amcor PLC,AU000000AMC4,AUD,United Kingdom,Basic Materials,Applied Resources,.AORD,639059601,1444343212,...,11,,60,3,,,61,B+,B+,A+
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7495,DDD.BK,Do Day Dream PCL,TH8365010009,THB,Thailand,Consumer Non-Cyclicals,Personal & Household Products & Services,.SETI,85962600,317887700,...,1,,11,75,57,78,,,,
7496,LILA.OQ,Liberty Latin America Ltd,BMG9001E1021,USD,United States of America,Technology,Telecommunications Services,.IXIC,35557361,40800000,...,42,100,27,5,3,5,24,D+,D+,A+
7497,LILAK.OQ,Liberty Latin America Ltd,BMG9001E1286,USD,United States of America,Technology,Telecommunications Services,.IXIC,128250028,164039541,...,40,,,5,3,5,24,D+,D+,A+
7498,NTR.TO,Nutrien Ltd,CA67077M1086,CAD,Canada,Basic Materials,Chemicals,.GSPTSE,494354917,494547340,...,12,,54,53,55,33,49,B+,B+,B


### Multiprocessing

Get the data for the entire universe with a pool of processses

In [11]:
start_time = time.time()
results = processor.processPoolExecutor_get_data(universe = universe_for_screening, fields = data_columns)
print("Multiprocessing finished: --- %s seconds elapsed ---" % (time.time() - start_time))

Launching process execution using 2 child processes...
Multiprocessing finished: --- 50.276334047317505 seconds elapsed ---


In [12]:
results

Unnamed: 0,Instrument,Company Common Name,ISIN Code,Currency,Country of Headquarters,TRBC Economic Sector Name,TRBC Business Sector Name,Main Index RIC,Free Float,Outstanding Shares,...,Price Momentum Region Rank,Short Interest Country Rank,"Earnings Quality Region Rank.01, Latest",Credit Combined Region Rank,Credit SmartRatios Region Rank,Credit Structural Region Rank,Credit Text Mining Region Rank,ESG Score Grade,ESG Combined Score Grade,ESG Controversies Score Grade
0,ARB.AX,ARB Corporation Ltd,AU000000ARB5,AUD,Australia,Consumer Cyclicals,Automobiles & Auto Parts,.AORD,73805587,82220441,...,79,,47,94,79,91,37,C,C,A+
1,EVT.AX,EVT Ltd,AU000000EVT1,AUD,Australia,Consumer Cyclicals,Cyclical Consumer Services,.AORD,116409167,162275357,...,14,,29,20,14,61,26,C,C,A+
2,ALL.AX,Aristocrat Leisure Ltd,AU000000ALL7,AUD,Australia,Consumer Cyclicals,Cyclical Consumer Services,.AORD,574432197,648560092,...,70,,77,94,53,93,90,B-,B-,A+
3,ANZ.AX,ANZ Group Holdings Ltd,AU000000ANZ3,AUD,Australia,Financials,Banking & Investment Services,.AORD,2980191835,3001241961,...,70,,25,55,38,50,12,A-,C+,D
4,AMC.AX,Amcor PLC,AU000000AMC4,AUD,United Kingdom,Basic Materials,Applied Resources,.AORD,639059601,1444343212,...,11,,60,3,,,61,B+,B+,A+
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7495,DDD.BK,Do Day Dream PCL,TH8365010009,THB,Thailand,Consumer Non-Cyclicals,Personal & Household Products & Services,.SETI,85962600,317887700,...,1,,11,75,57,78,,,,
7496,LILA.OQ,Liberty Latin America Ltd,BMG9001E1021,USD,United States of America,Technology,Telecommunications Services,.IXIC,35557361,40800000,...,42,100,27,5,3,5,24,D+,D+,A+
7497,LILAK.OQ,Liberty Latin America Ltd,BMG9001E1286,USD,United States of America,Technology,Telecommunications Services,.IXIC,128250028,164039541,...,40,,,5,3,5,24,D+,D+,A+
7498,NTR.TO,Nutrien Ltd,CA67077M1086,CAD,Canada,Basic Materials,Chemicals,.GSPTSE,494354917,494547340,...,12,,54,53,55,33,49,B+,B+,B


### Hybrid
Using a hybrid of the 2 approaches, get the data for the entire universe with a pool of processes that each segment the requests across multiple threads.

In [14]:
start_time = time.time()
results = processor.hybridPoolExecutor_get_data(universe = universe_for_screening, fields = data_columns)
print("Hybrid threading finished: --- %s seconds elapsed ---" % (time.time() - start_time))

Launching process execution using 2 child processes...
Hybrid threading finished: --- 33.30439591407776 seconds elapsed ---


In [15]:
# Display results
results

Unnamed: 0,Instrument,Company Common Name,ISIN Code,Currency,Country of Headquarters,TRBC Economic Sector Name,TRBC Business Sector Name,Main Index RIC,Free Float,Outstanding Shares,...,Price Momentum Region Rank,Short Interest Country Rank,"Earnings Quality Region Rank.01, Latest",Credit Combined Region Rank,Credit SmartRatios Region Rank,Credit Structural Region Rank,Credit Text Mining Region Rank,ESG Score Grade,ESG Combined Score Grade,ESG Controversies Score Grade
0,ARB.AX,ARB Corporation Ltd,AU000000ARB5,AUD,Australia,Consumer Cyclicals,Automobiles & Auto Parts,.AORD,73805587,82220441,...,79,,47,94,79,91,37,C,C,A+
1,EVT.AX,EVT Ltd,AU000000EVT1,AUD,Australia,Consumer Cyclicals,Cyclical Consumer Services,.AORD,116409167,162275357,...,14,,29,20,14,61,26,C,C,A+
2,ALL.AX,Aristocrat Leisure Ltd,AU000000ALL7,AUD,Australia,Consumer Cyclicals,Cyclical Consumer Services,.AORD,574432197,648560092,...,70,,77,94,53,93,90,B-,B-,A+
3,ANZ.AX,ANZ Group Holdings Ltd,AU000000ANZ3,AUD,Australia,Financials,Banking & Investment Services,.AORD,2980191835,3001241961,...,70,,25,55,38,50,12,A-,C+,D
4,AMC.AX,Amcor PLC,AU000000AMC4,AUD,United Kingdom,Basic Materials,Applied Resources,.AORD,639059601,1444343212,...,11,,60,3,,,61,B+,B+,A+
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7495,DDD.BK,Do Day Dream PCL,TH8365010009,THB,Thailand,Consumer Non-Cyclicals,Personal & Household Products & Services,.SETI,85962600,317887700,...,1,,11,75,57,78,,,,
7496,LILA.OQ,Liberty Latin America Ltd,BMG9001E1021,USD,United States of America,Technology,Telecommunications Services,.IXIC,35557361,40800000,...,42,100,27,5,3,5,24,D+,D+,A+
7497,LILAK.OQ,Liberty Latin America Ltd,BMG9001E1286,USD,United States of America,Technology,Telecommunications Services,.IXIC,128250028,164039541,...,40,,,5,3,5,24,D+,D+,A+
7498,NTR.TO,Nutrien Ltd,CA67077M1086,CAD,Canada,Basic Materials,Chemicals,.GSPTSE,494354917,494547340,...,12,,54,53,55,33,49,B+,B+,B


### Export to excel

Export the result to excel for future usage.

In [None]:
results.to_excel("Results.xlsx")

### Close the session

In [None]:
processor.close()

### Conclusion

#### When analysing the execution time of getting data, we note the following points:
- Using the thread-pool or hybrid approach gives the best results
- Using the multiprocess approach generally gives poor results because starting a process is slower and each process requires the overhead of session connection establishment
- In some cases, depending on the number of child processes, the multiprocess approach can be worse than the normal behavior

Results, in general, largely depend on the type of underlying data request; the number of fields and the number of instruments requested.  The goal is to provide some useful mechanisms to allow users to optimize requests.  It is also important to consider that Python is a single-threaded language that can only execute threads concurrently.  However, this limitation may not be impactful due to the nature of the requests.  That is, the types of requests we're executing are all I/O-based.  While the GIL can only execute one task at a time, most of the execution for each task is performed on the server and the only job of the GIL is to flip between each task to monitor and wait for the I/O operation to complete.  