In [1]:
import refinitiv.dataplatform as rdp
import pandas as pd

rdp.open_desktop_session('Your API Key here')

<refinitiv.dataplatform.core.session.desktop_session.DesktopSession at 0x50658d0>

In [2]:
pd.set_option('display.max_colwidth', 140)
rdp.__version__

'1.0.0a7.post1'

## Limits
RDP Search does impose limits on the size of the result set when requesting for large data sets.  The following examples provide some useful techniques when dealing with results reaching the upper limits imposed by the backend.


#### Grouping
There may be instances where the result set contains groups of values for properties based on your request.  For example, if I'm interested in retrieving all exchanges within the USA, I can execute this request:

In [9]:
response = rdp.Search.search(
    view = rdp.SearchViews.EquityQuotes,
    filter = "RCSExchangeCountryLeaf eq 'United States'",
    top = 10000,
    select = "ExchangeCode, RIC"
)
response.data.total

4155273

In [10]:
response.data.df

Unnamed: 0,RIC,ExchangeCode
0,EScv1,IOM
1,NQcv1,IOM
2,ESc1,IOM
3,SPc1,IOM
4,NQc1,IOM
...,...,...
9995,UNXP.PK,PNK
9996,CVEI.PK,PNK
9997,NMNX.PK,PNK
9998,BFGX.PK,PNK


In the above example, you can see the total available documents is over 4,000,000.  However, due to the nature of the data set, the exchange codes have been repeated which brought back the upper limit of documents within the result set.  **Note**: At the time of this writing, the upper limit has been defined as 10000 result sets.

Instead of performing multiple calls and pulling out the unique codes within each result set, I can apply the grouping features offered by Search to significantly reduce the result set returned.  For example:

In [5]:
rdp.search(
    view = rdp.SearchViews.EquityQuotes,
    filter = "RCSExchangeCountryLeaf eq 'United States'",
    top = 10000,
    select = "ExchangeCode",
    group_by = "ExchangeCode",    # Exchange codes can be grouped
    group_count = 1               # Then limited to 1 for each to create uniqueness
)

Unnamed: 0,ExchangeCode
0,IOM
1,CBT
2,CBF
3,NSQ
4,NYQ
...,...
141,CCE
142,CMX
143,CSC
144,HAM


As you can see, I've significantly reduced the result set by grouping which now allows the result set using a single API call.  Using the 'grouping' technique to pull out the unique exchange codes is very useful if you wish to return many other properties as part of your results.  However, if you are stricly after the list of exchange codes, the preferred approach is to use Navigators.

#### Navigators
If the goal of your search is to simply capture the list of exchange codes, then the preferred approach in this case is to use Navigators.  A navigator allows the ability to categorize and summarize properties within the result set.  For example, I can provide a simple navigator where I want to bucket all the exchange codes found within the result set.  You can do this using the following request:

In [6]:
response=rdp.Search.search(
    view = rdp.SearchViews.EquityQuotes,
    filter = "RCSExchangeCountryLeaf eq 'United States'",
    top = 0,
    navigators = "ExchangeCode(buckets:1000)"
)

In [7]:
codes=response.data.raw["Navigators"]["ExchangeCode"]["Buckets"]
print(f'Total exchange codes found: {len(codes)}')

Total exchange codes found: 146


In [8]:
codes

[{'Label': 'ONE', 'Count': 1441128},
 {'Label': 'OPQ', 'Count': 1271714},
 {'Label': 'IOM', 'Count': 821526},
 {'Label': 'PNK', 'Count': 70284},
 {'Label': 'CBT', 'Count': 57871},
 {'Label': 'OBB', 'Count': 32934},
 {'Label': 'OTC', 'Count': 22418},
 {'Label': 'BOS', 'Count': 18419},
 {'Label': 'THM', 'Count': 17951},
 {'Label': 'XPH', 'Count': 15776},
 {'Label': 'MID', 'Count': 14700},
 {'Label': 'PSE', 'Count': 14630},
 {'Label': 'NYS', 'Count': 14232},
 {'Label': 'NYQ', 'Count': 12716},
 {'Label': 'CIN', 'Count': 12680},
 {'Label': 'NMS', 'Count': 10987},
 {'Label': 'NAS', 'Count': 10159},
 {'Label': 'NTV', 'Count': 10142},
 {'Label': 'BZX', 'Count': 10035},
 {'Label': 'BYX', 'Count': 10034},
 {'Label': 'WCB', 'Count': 9915},
 {'Label': 'IUS', 'Count': 9377},
 {'Label': 'BT1', 'Count': 8304},
 {'Label': 'ADC', 'Count': 8272},
 {'Label': 'ARC', 'Count': 7735},
 {'Label': 'ASE', 'Count': 7254},
 {'Label': 'BAT', 'Count': 6976},
 {'Label': 'DEA', 'Count': 6188},
 {'Label': 'DEX', 'Coun

#### Segmenting the search

When we started with the above search to retrieve the list of exchange codes within the United States, we discovered that the result set returned the entire universe of instruments.  If our goal is to capture the entire instrument list, we cannot group and bucket the result set as we did above.  The # of hits returned is over 4 million so we are forced to go through a tedious process of segmenting the requests.

One way to do this is to choose some kind of indicator that will allow you to group your individual requests to successfully segment the result set.  However, you need to first ask yourself - do I need the entire data universe?  You may only be interested in a specific asset category thus reducing the universe of results significantly.

One possible way to approach this is to first capture the list of asset categories using a navigator on the property: 'RCSAssetCategoryLeaf'.  
For example:

In [11]:
response=rdp.Search.search(
    view = rdp.SearchViews.EquityQuotes,
    filter = "RCSExchangeCountryLeaf eq 'United States'",
    top = 0,
    navigators = "RCSAssetCategoryLeaf"
)
response.data.raw['Navigators']['RCSAssetCategoryLeaf']['Buckets']

[{'Label': 'Equity Future', 'Count': 1459351},
 {'Label': 'Equity Cash Option', 'Count': 1327063},
 {'Label': 'Stock Index Future Option', 'Count': 705436},
 {'Label': 'Ordinary Share', 'Count': 375056},
 {'Label': 'Stock Index Cash Option', 'Count': 70981},
 {'Label': 'American Depository Receipt', 'Count': 27709},
 {'Label': 'Unit', 'Count': 22923},
 {'Label': 'Equity Future Option', 'Count': 21068},
 {'Label': 'Preferred Share', 'Count': 18452},
 {'Label': 'Equity Future Spread', 'Count': 17561},
 {'Label': 'Stock Index Future', 'Count': 16760},
 {'Label': 'Preference Share', 'Count': 12795},
 {'Label': 'Depository Receipt', 'Count': 8018},
 {'Label': 'Company Warrant', 'Count': 7963},
 {'Label': 'Depository Share', 'Count': 5949},
 {'Label': 'Bond', 'Count': 5765},
 {'Label': 'Equity-Linked Security', 'Count': 5497},
 {'Label': 'Right', 'Count': 5220},
 {'Label': 'Fully Paid Ordinary Share', 'Count': 5206},
 {'Label': 'Index-Linked Security', 'Count': 4436},
 {'Label': 'Convertible

The result of this will not only provide the complete list of categories for you to potentially select the desired ones, but for each, you can see the number of results.  This will further allow you to tune your requests based on these totals.

However, the above summary shows many categories that easily exceed the limits of the server.  If you need to further segment, you can possibly use the ***market cap*** to segment a specific asset category.

For example, let's choose an asset category where we can get a breakdown of the market cap:

In [12]:
# The following navigator will prepare the buckets of evenly distributed market cap ranges such that they fulfill 
# the limit requirements.  Below, I chose 12 as this will produce reasonable buckets we can work with.
response=rdp.Search.search(
    view = rdp.SearchViews.EquityQuotes,
    filter = "RCSExchangeCountryLeaf eq 'United States' and RCSAssetCategoryLeaf xeq 'Ordinary Share'",
    top = 0,
    navigators = "MktCapTotal(type:range, buckets:12)"
)
response.data.raw["Navigators"]["MktCapTotal"]["Buckets"]

[{'Label': 'Below 2641502.85',
  'Filter': 'MktCapTotal lt 2641502.85',
  'Count': 9143},
 {'Label': 'Between 2641502.85 And 16922967.89',
  'Filter': '(MktCapTotal ge 2641502.85 and MktCapTotal lt 16922967.89)',
  'Count': 9140},
 {'Label': 'Between 16922967.89 And 50256419.01',
  'Filter': '(MktCapTotal ge 16922967.89 and MktCapTotal lt 50256419.01)',
  'Count': 9143},
 {'Label': 'Between 50256419.01 And 118758453.87',
  'Filter': '(MktCapTotal ge 50256419.01 and MktCapTotal lt 118758453.87)',
  'Count': 9147},
 {'Label': 'Between 118758453.87 And 239386327.47',
  'Filter': '(MktCapTotal ge 118758453.87 and MktCapTotal lt 239386327.47)',
  'Count': 9114},
 {'Label': 'Between 239386327.47 And 434231718.15',
  'Filter': '(MktCapTotal ge 239386327.47 and MktCapTotal lt 434231718.15)',
  'Count': 9118},
 {'Label': 'Between 434231718.15 And 795912673.35',
  'Filter': '(MktCapTotal ge 434231718.15 and MktCapTotal lt 795912673.35)',
  'Count': 9164},
 {'Label': 'Between 795912673.35 And 146

The first thing to note is that the 'Count' values for each bucket are within the valid limit of the server.  Based on this output, we can use the convenient Filter expressions provided to drive our segmented search requests.

For demonstration purposes, I will select one to retrieve the list of RICs for the specific asset category with the specified market cap range.

In [13]:
# Define our filter
range1 = response.data.raw["Navigators"]["MktCapTotal"]["Buckets"][1]["Filter"]
filter = f"RCSExchangeCountryLeaf eq 'United States' and RCSAssetCategoryLeaf xeq 'Ordinary Share' and {range1}"
filter

"RCSExchangeCountryLeaf eq 'United States' and RCSAssetCategoryLeaf xeq 'Ordinary Share' and (MktCapTotal ge 2641502.85 and MktCapTotal lt 16922967.89)"

In [14]:
rdp.search(
    view = rdp.SearchViews.EquityQuotes,
    filter = filter,
    top = 10000
)

Unnamed: 0,PI,RIC,BusinessEntity,DocumentTitle,PermID
0,239209890,OBLN.O,QUOTExEQUITY,"Obalon Therapeutics Inc, Ordinary Share, NASDAQ Global Market Consolidated",25769765340
1,17578958,NURO.O,QUOTExEQUITY,"NeuroMetrix Inc, Ordinary Share, NASDAQ Capital Market Consolidated",55839263413
2,50438793,RCON.O,QUOTExEQUITY,"Recon Technology Ltd, Ordinary Share, NASDAQ Capital Market Consolidated",55851636267
3,85531230,FRAN.O,QUOTExEQUITY,"Francesca's Holdings Corp, Ordinary Share, NASDAQ Global Select Consolidated",21475810695
4,130103904,CGIX.O,QUOTExEQUITY,"Cancer Genetics Inc, Ordinary Share, NASDAQ Capital Market Consolidated",21521824998
...,...,...,...,...,...
9135,191993984,TIL.DY^E18,QUOTExEQUITY,"Till Capital Ord Shs, Ordinary Share, Delisted, Cboe EDGA Exchange - Nasdaq Capital Market",21564515770
9136,23637191,RUCW.PK^J09,QUOTExEQUITY,"Rubicon Medical Ord Shs, Ordinary Share, Delisted, OTC Markets Group Inc - No Information",21475467767
9137,55387019,ORNI.PK^L08,QUOTExEQUITY,"Oragenics Ord Shs, Ordinary Share, Delisted, OTC Markets Group Inc - Current Information",21475535438
9138,695963,RHSL.PK^I10,QUOTExEQUITY,"Royal Holdings Services Ord Shs, Ordinary Share, Delisted, OTC Markets Group Inc - No Information",55835418417


Based on the buckets I defined, I can now safely use a filter to pull out a segment of instruments.  Despite using a combination of navigators and filters to conveniently define how to break up the segments to avoid these limits, the work to do so is still relatively complicated.

While it may be possible to pull out excessive amounts of data, you should ask yourself if you need to do this.  In most cases, you may be able to reduce the result set when you set up your search instead of pulling in everything then massage the results once you have them in hand.  Search was designed specifically to allow users to filter out unwanted content prior to returning the results.  If you think this way through your searching patterns, you will undoubtedly avoid situations where you need to create complicated algorithms to unnecessarily pull excessive amounts of data. Whether narrowing the request based on interested categories, or data for a specific region, you will find that you can significantly simplify your logic and avoid issues with limits.