## Limits
RDP Search does impose limits on the size of the result set when requesting for large data sets.  The following examples provide some useful techniques when dealing with results reaching the upper limits imposed by the backend.

In [1]:
import refinitiv.data as rd
from refinitiv.data.content import search
import pandas as pd

# Default session - desktop
rd.open_session()

<refinitiv.data.session.Definition object at 0x195f54c1840 {name='workspace'}>

In [2]:
pd.set_option('display.max_colwidth', 140)
rd.__version__

'1.1.0'

#### Grouping
There may be instances where the result set contains groups of values for properties based on your request.  For example, if I'm interested in retrieving all exchanges within the USA, I can execute this request:

In [3]:
response = search.Definition(
    view = search.Views.EQUITY_QUOTES,
    filter = "RCSExchangeCountryLeaf eq 'United States'",
    top = 10000,
    select = "ExchangeCode, RIC"
).get_data()
response.total

Result is maxed at 10000 while the total is 3994341 rows.
Requested - 10000, skipped - 0 rows.


3994341

In [4]:
response.data.df

Unnamed: 0,ExchangeCode,RIC
0,NSQ,TSLA.O
1,NSQ,NVDA.O
2,NSQ,AAPL.O
3,NSQ,MSFT.O
4,NSQ,AMZN.O
...,...,...
9995,PNK,MBCI.PK
9996,PNK,NWYU.PK
9997,PNK,DEER.PK
9998,PNK,FSMK.PK


In the above example, you can see the total available documents is nearly 4,000,000 and Search provide a warning.  However, due to the nature of the data set, the exchange codes have been repeated which brought back the upper limit of documents within the result set.  **Note**: At the time of this writing, the upper limit has been defined as 10000 result sets.

Instead of performing multiple calls and pulling out the unique codes within each result set, I can apply the grouping features offered by Search to significantly reduce the result set returned.  For example:

In [5]:
rd.discovery.search(
    view = search.Views.EQUITY_QUOTES,
    filter = "RCSExchangeCountryLeaf eq 'United States'",
    top = 10000,
    select = "ExchangeCode",
    group_by = "ExchangeCode",    # Exchange codes can be grouped
    group_count = 1               # Then limited to 1 for each to create uniqueness
)

Unnamed: 0,ExchangeCode
0,NSQ
1,NYQ
2,NMQ
3,ASQ
4,NAQ
...,...
146,CC3
147,CCE
148,CSC
149,HAM


As you can see, I've significantly reduced the result set by grouping which now allows the result set using a single API call.  Using the 'grouping' technique to pull out the unique exchange codes is very useful if you wish to return many other properties as part of your results.  However, if you are strictly after the list of exchange codes, the preferred approach is to use Navigators.

#### Navigators
If the goal of your search is to simply capture the list of exchange codes, then the preferred approach in this case is to use Navigators.  A navigator allows the ability to categorize and summarize properties within the result set.  For example, I can provide a simple navigator where I can limit the number of buckets, or results, within the result set.

> Note: You can find more details and examples within the 'Search - Navigators' notebook.

You can do this using the following request:

In [6]:
response=search.Definition(
    view = search.Views.EQUITY_QUOTES,
    filter = "RCSExchangeCountryLeaf eq 'United States'",
    top = 0,
    navigators = "ExchangeCode(buckets:300)"   # Limit the results
).get_data()

In [9]:
codes=response.data.raw["Navigators"]["ExchangeCode"]["Buckets"]
print(f'Total exchange codes found: {len(codes)}')

Total exchange codes found: 151


In [10]:
codes

[{'Label': 'ONE', 'Count': 1441126},
 {'Label': 'OPQ', 'Count': 1390740},
 {'Label': 'IOM', 'Count': 358413},
 {'Label': 'PNK', 'Count': 72396},
 {'Label': 'CBT', 'Count': 61977},
 {'Label': 'CME', 'Count': 34439},
 {'Label': 'OBB', 'Count': 32916},
 {'Label': 'OTC', 'Count': 26543},
 {'Label': 'CBO', 'Count': 20421},
 {'Label': 'BOS', 'Count': 19415},
 {'Label': 'THM', 'Count': 19003},
 {'Label': 'XPH', 'Count': 16773},
 {'Label': 'MID', 'Count': 15697},
 {'Label': 'PSE', 'Count': 15636},
 {'Label': 'NYS', 'Count': 15238},
 {'Label': 'IUS', 'Count': 14590},
 {'Label': 'CIN', 'Count': 13677},
 {'Label': 'NTV', 'Count': 13654},
 {'Label': 'NYQ', 'Count': 13548},
 {'Label': 'BZX', 'Count': 13546},
 {'Label': 'BYX', 'Count': 13545},
 {'Label': 'NMS', 'Count': 11846},
 {'Label': 'BT1', 'Count': 11815},
 {'Label': 'NAS', 'Count': 11492},
 {'Label': 'ARC', 'Count': 11250},
 {'Label': 'WCB', 'Count': 9914},
 {'Label': 'NQT', 'Count': 9792},
 {'Label': 'ADC', 'Count': 9267},
 {'Label': 'NOI', 

#### Segmenting the search
When we started with the above search to retrieve the list of exchange codes within the United States, we discovered that the result set returned the entire universe of instruments.  If our goal is to capture the entire instrument list, we cannot group and bucket the result set as we did above.  The # of hits returned is nearly 4 million so we are forced to go through a tedious process of segmenting the requests.

One way to do this is to choose some kind of indicator that will allow you to group your individual requests to successfully segment the result set.  However, you need to first ask yourself - do I need the entire data universe?  You may only be interested in a specific asset category thus reducing the universe of results significantly.

One possible way to approach this is to first capture the list of asset categories using a navigator on the property: 'RCSAssetCategoryLeaf'.  
For example:

In [11]:
response=search.Definition(
    view = search.Views.EQUITY_QUOTES,
    filter = "RCSExchangeCountryLeaf eq 'United States'",
    top = 0,
    navigators = "RCSAssetCategoryLeaf"
).get_data()
response.data.raw['Navigators']['RCSAssetCategoryLeaf']['Buckets']

[{'Label': 'Equity Future', 'Count': 1461265},
 {'Label': 'Equity Cash Option', 'Count': 1444454},
 {'Label': 'Ordinary Share', 'Count': 438208},
 {'Label': 'Stock Index Future Option', 'Count': 261729},
 {'Label': 'Stock Index Cash Option', 'Count': 97348},
 {'Label': 'Equity Future Spread', 'Count': 49253},
 {'Label': 'Unit', 'Count': 47701},
 {'Label': 'American Depository Receipt', 'Count': 31775},
 {'Label': 'Stock Index Future', 'Count': 23091},
 {'Label': 'Company Warrant', 'Count': 22537},
 {'Label': 'Preferred Share', 'Count': 21870},
 {'Label': 'Preference Share', 'Count': 12756},
 {'Label': 'Right', 'Count': 8185},
 {'Label': 'Depository Share', 'Count': 8184},
 {'Label': 'Depository Receipt', 'Count': 8112},
 {'Label': 'Equity-Linked Security', 'Count': 6840},
 {'Label': 'Bond', 'Count': 6608},
 {'Label': 'Fully Paid Ordinary Share', 'Count': 5892},
 {'Label': 'Index-Linked Security', 'Count': 4436},
 {'Label': 'Convertible Preference Share', 'Count': 2876},
 {'Label': 'Cur

The result of this will not only provide the complete list of categories for you to potentially select the desired ones, but for each, you can see the number of results.  This will further allow you to tune your requests based on these totals.

However, the above summary shows many categories that easily exceed the limits of the server.  If you need to further segment, you can possibly use the ***market cap*** to segment a specific asset category.

For example, let's choose an asset category where we can get a breakdown of the market cap:

In [12]:
# The following navigator will prepare the buckets of evenly distributed market cap ranges such that they fulfill 
# the limit requirements.  Below, I chose 12 as this will produce reasonable buckets we can work with.
response=search.Definition(
    view = search.Views.EQUITY_QUOTES,
    filter = "RCSExchangeCountryLeaf eq 'United States' and RCSAssetCategoryLeaf xeq 'Ordinary Share'",
    top = 0,
    navigators = "MktCapTotal(type:range, buckets:13)"
).get_data()
response.data.raw["Navigators"]["MktCapTotal"]["Buckets"]

[{'Label': 'Below 2741194.19',
  'Filter': 'MktCapTotal lt 2741194.19',
  'Count': 9622},
 {'Label': 'Between 2741194.19 And 14267535.77',
  'Filter': '(MktCapTotal ge 2741194.19 and MktCapTotal lt 14267535.77)',
  'Count': 9624},
 {'Label': 'Between 14267535.77 And 39227374.59',
  'Filter': '(MktCapTotal ge 14267535.77 and MktCapTotal lt 39227374.59)',
  'Count': 9642},
 {'Label': 'Between 39227374.59 And 88167264.13',
  'Filter': '(MktCapTotal ge 39227374.59 and MktCapTotal lt 88167264.13)',
  'Count': 9594},
 {'Label': 'Between 88167264.13 And 174202330.2',
  'Filter': '(MktCapTotal ge 88167264.13 and MktCapTotal lt 174202330.2)',
  'Count': 9604},
 {'Label': 'Between 174202330.2 And 300766245.49',
  'Filter': '(MktCapTotal ge 174202330.2 and MktCapTotal lt 300766245.49)',
  'Count': 9680},
 {'Label': 'Between 300766245.49 And 496317381.96',
  'Filter': '(MktCapTotal ge 300766245.49 and MktCapTotal lt 496317381.96)',
  'Count': 9588},
 {'Label': 'Between 496317381.96 And 879779951.0

The first thing to note is that the 'Count' values for each bucket are within the valid limit of the server.  Based on this output, we can use the convenient Filter expressions provided to drive our segmented search requests.

For demonstration purposes, I will select one to retrieve the list of RICs for the specific asset category with the specified market cap range.

In [13]:
# Define our filter
range1 = response.data.raw["Navigators"]["MktCapTotal"]["Buckets"][1]["Filter"]
filter = f"RCSExchangeCountryLeaf eq 'United States' and RCSAssetCategoryLeaf xeq 'Ordinary Share' and {range1}"
filter

"RCSExchangeCountryLeaf eq 'United States' and RCSAssetCategoryLeaf xeq 'Ordinary Share' and (MktCapTotal ge 2741194.19 and MktCapTotal lt 14267535.77)"

In [15]:
response = search.Definition(
    view = search.Views.EQUITY_QUOTES,
    filter = filter,
    top = 10000
).get_data()
f'Request resulted in a segement of {response.total} documents.'

'Request resulted in a segement of 9624 documents.'

Based on the buckets I defined, I can now safely use a filter to pull out a segment of instruments.  Despite using a combination of navigators and filters to conveniently define how to break up the segments to avoid these limits, the work to do so is still relatively complicated.

While it may be possible to pull out excessive amounts of data, you should ask yourself if you need to do this.  In most cases, you may be able to reduce the result set when you set up your search instead of pulling in everything then massage the results once you have them in hand.  Search was designed specifically to allow users to filter out unwanted content prior to returning the results.  If you think this way through your searching patterns, you will undoubtedly avoid situations where you need to create complicated algorithms to unnecessarily pull excessive amounts of data. Whether narrowing the request based on interested categories, or data for a specific region, you will find that you can significantly simplify your logic and avoid issues with limits.

In [16]:
rd.close_session()