## Limits
RDP Search does impose limits on the size of the result set when requesting for large data sets.  The following examples provide some useful techniques when dealing with results reaching the upper limits imposed by the backend.

In [1]:
import refinitiv.data as rd
from refinitiv.data.content import search
import pandas as pd

# Default session - desktop
rd.open_session(app_key='Your API Key here')

<refinitiv.data.session.Definition object at 0x188e2421220 {name='default'}>

In [None]:
pd.set_option('display.max_colwidth', 140)
rd.__version__

'1.0.0b9'

#### Grouping
There may be instances where the result set contains groups of values for properties based on your request.  For example, if I'm interested in retrieving all exchanges within the USA, I can execute this request:

In [5]:
response = search.Definition(
    view = search.SearchViews.EQUITY_QUOTES,
    filter = "RCSExchangeCountryLeaf eq 'United States'",
    top = 10000,
    select = "ExchangeCode, RIC"
).get_data()
response.total

4093040

In [6]:
response.data.df

Unnamed: 0,ExchangeCode,RIC
0,IOM,EScv1
1,IOM,NQcv1
2,IOM,ESc1
3,IOM,NQc1
4,CBT,YMc1
...,...,...
9995,PNK,CLLA.PK
9996,PNK,PBEV.PK
9997,PNK,BDYS.PK
9998,PNK,MGGI.PK


In the above example, you can see the total available documents is over 4,000,000.  However, due to the nature of the data set, the exchange codes have been repeated which brought back the upper limit of documents within the result set.  **Note**: At the time of this writing, the upper limit has been defined as 10000 result sets.

Instead of performing multiple calls and pulling out the unique codes within each result set, I can apply the grouping features offered by Search to significantly reduce the result set returned.  For example:

In [5]:
response=search.Definition(
    view = search.SearchViews.EQUITY_QUOTES,
    filter = "RCSExchangeCountryLeaf eq 'United States'",
    top = 10000,
    select = "ExchangeCode",
    group_by = "ExchangeCode",    # Exchange codes can be grouped
    group_count = 1               # Then limited to 1 for each to create uniqueness
).get_data()
response.data.df

Unnamed: 0,ExchangeCode
0,IOM
1,CBT
2,CBF
3,IMM
4,NSQ
...,...
144,CC3
145,CCE
146,CSC
147,HAM


As you can see, I've significantly reduced the result set by grouping which now allows the result set using a single API call.  Using the 'grouping' technique to pull out the unique exchange codes is very useful if you wish to return many other properties as part of your results.  However, if you are strictly after the list of exchange codes, the preferred approach is to use Navigators.

#### Navigators
If the goal of your search is to simply capture the list of exchange codes, then the preferred approach in this case is to use Navigators.  A navigator allows the ability to categorize and summarize properties within the result set.  For example, I can provide a simple navigator where I can limit the number of buckets, or results, within the result set.  You can do this using the following request:

In [19]:
response=search.Definition(
    view = search.SearchViews.EQUITY_QUOTES,
    filter = "RCSExchangeCountryLeaf eq 'United States'",
    top = 0,
    navigators = "ExchangeCode(buckets:300)"   # Limit the results
).get_data()

In [20]:
codes=response.data.raw["Navigators"]["ExchangeCode"]["Buckets"]
print(f'Total exchange codes found: {len(codes)}')

Total exchange codes found: 149


In [21]:
codes

[{'Label': 'OPQ', 'Count': 1507604},
 {'Label': 'ONE', 'Count': 1441126},
 {'Label': 'IOM', 'Count': 375379},
 {'Label': 'PNK', 'Count': 71492},
 {'Label': 'CBT', 'Count': 64209},
 {'Label': 'CME', 'Count': 38056},
 {'Label': 'OBB', 'Count': 32916},
 {'Label': 'OTC', 'Count': 24817},
 {'Label': 'BOS', 'Count': 19401},
 {'Label': 'THM', 'Count': 18975},
 {'Label': 'XPH', 'Count': 16758},
 {'Label': 'MID', 'Count': 15683},
 {'Label': 'PSE', 'Count': 15611},
 {'Label': 'NYS', 'Count': 15223},
 {'Label': 'CIN', 'Count': 13663},
 {'Label': 'NYQ', 'Count': 13660},
 {'Label': 'NTV', 'Count': 13421},
 {'Label': 'BZX', 'Count': 13313},
 {'Label': 'BYX', 'Count': 13312},
 {'Label': 'NMS', 'Count': 11690},
 {'Label': 'BT1', 'Count': 11582},
 {'Label': 'NAS', 'Count': 11427},
 {'Label': 'ARC', 'Count': 11018},
 {'Label': 'IUS', 'Count': 10980},
 {'Label': 'CBO', 'Count': 10246},
 {'Label': 'WCB', 'Count': 9915},
 {'Label': 'NQT', 'Count': 9574},
 {'Label': 'ADC', 'Count': 9255},
 {'Label': 'ASE', 

#### Segmenting the search
When we started with the above search to retrieve the list of exchange codes within the United States, we discovered that the result set returned the entire universe of instruments.  If our goal is to capture the entire instrument list, we cannot group and bucket the result set as we did above.  The # of hits returned is over 4 million so we are forced to go through a tedious process of segmenting the requests.

One way to do this is to choose some kind of indicator that will allow you to group your individual requests to successfully segment the result set.  However, you need to first ask yourself - do I need the entire data universe?  You may only be interested in a specific asset category thus reducing the universe of results significantly.

One possible way to approach this is to first capture the list of asset categories using a navigator on the property: 'RCSAssetCategoryLeaf'.  
For example:

In [9]:
response=search.Definition(
    view = search.SearchViews.EQUITY_QUOTES,
    filter = "RCSExchangeCountryLeaf eq 'United States'",
    top = 0,
    navigators = "RCSAssetCategoryLeaf"
).get_data()
response.data.raw['Navigators']['RCSAssetCategoryLeaf']['Buckets']

[{'Label': 'Equity Cash Option', 'Count': 1498984},
 {'Label': 'Equity Future', 'Count': 1460808},
 {'Label': 'Ordinary Share', 'Count': 417961},
 {'Label': 'Stock Index Future Option', 'Count': 282823},
 {'Label': 'Stock Index Cash Option', 'Count': 98489},
 {'Label': 'Equity Future Spread', 'Count': 48767},
 {'Label': 'Unit', 'Count': 44397},
 {'Label': 'American Depository Receipt', 'Count': 30166},
 {'Label': 'Company Warrant', 'Count': 24757},
 {'Label': 'Preferred Share', 'Count': 20930},
 {'Label': 'Stock Index Future', 'Count': 20321},
 {'Label': 'Preference Share', 'Count': 12757},
 {'Label': 'Depository Receipt', 'Count': 7996},
 {'Label': 'Depository Share', 'Count': 7541},
 {'Label': 'Right', 'Count': 7020},
 {'Label': 'Bond', 'Count': 6447},
 {'Label': 'Equity-Linked Security', 'Count': 6253},
 {'Label': 'Fully Paid Ordinary Share', 'Count': 5632},
 {'Label': 'Index-Linked Security', 'Count': 4436},
 {'Label': 'Convertible Preference Share', 'Count': 2847},
 {'Label': 'Cum

The result of this will not only provide the complete list of categories for you to potentially select the desired ones, but for each, you can see the number of results.  This will further allow you to tune your requests based on these totals.

However, the above summary shows many categories that easily exceed the limits of the server.  If you need to further segment, you can possibly use the ***market cap*** to segment a specific asset category.

For example, let's choose an asset category where we can get a breakdown of the market cap:

In [22]:
# The following navigator will prepare the buckets of evenly distributed market cap ranges such that they fulfill 
# the limit requirements.  Below, I chose 12 as this will produce reasonable buckets we can work with.
response=search.Definition(
    view = search.SearchViews.EQUITY_QUOTES,
    filter = "RCSExchangeCountryLeaf eq 'United States' and RCSAssetCategoryLeaf xeq 'Ordinary Share'",
    top = 0,
    navigators = "MktCapTotal(type:range, buckets:13)"
).get_data()
response.data.raw["Navigators"]["MktCapTotal"]["Buckets"]

[{'Label': 'Below 3223369.15',
  'Filter': 'MktCapTotal lt 3223369.15',
  'Count': 9452},
 {'Label': 'Between 3223369.15 And 18680018.75',
  'Filter': '(MktCapTotal ge 3223369.15 and MktCapTotal lt 18680018.75)',
  'Count': 9452},
 {'Label': 'Between 18680018.75 And 52061314.88',
  'Filter': '(MktCapTotal ge 18680018.75 and MktCapTotal lt 52061314.88)',
  'Count': 9423},
 {'Label': 'Between 52061314.88 And 110563932.92',
  'Filter': '(MktCapTotal ge 52061314.88 and MktCapTotal lt 110563932.92)',
  'Count': 9451},
 {'Label': 'Between 110563932.92 And 207182282.56',
  'Filter': '(MktCapTotal ge 110563932.92 and MktCapTotal lt 207182282.56)',
  'Count': 9444},
 {'Label': 'Between 207182282.56 And 340020481.62',
  'Filter': '(MktCapTotal ge 207182282.56 and MktCapTotal lt 340020481.62)',
  'Count': 9447},
 {'Label': 'Between 340020481.62 And 562119974.92',
  'Filter': '(MktCapTotal ge 340020481.62 and MktCapTotal lt 562119974.92)',
  'Count': 9447},
 {'Label': 'Between 562119974.92 And 980

The first thing to note is that the 'Count' values for each bucket are within the valid limit of the server.  Based on this output, we can use the convenient Filter expressions provided to drive our segmented search requests.

For demonstration purposes, I will select one to retrieve the list of RICs for the specific asset category with the specified market cap range.

In [23]:
# Define our filter
range1 = response.data.raw["Navigators"]["MktCapTotal"]["Buckets"][1]["Filter"]
filter = f"RCSExchangeCountryLeaf eq 'United States' and RCSAssetCategoryLeaf xeq 'Ordinary Share' and {range1}"
filter

"RCSExchangeCountryLeaf eq 'United States' and RCSAssetCategoryLeaf xeq 'Ordinary Share' and (MktCapTotal ge 3223369.15 and MktCapTotal lt 18680018.75)"

In [25]:
response = search.Definition(
    view = search.SearchViews.EQUITY_QUOTES,
    filter = filter,
    top = 10000
).get_data()
f'Request resulted in a segement of {response.total} documents.'

'Request resulted in a segement of 9452 documents.'

Based on the buckets I defined, I can now safely use a filter to pull out a segment of instruments.  Despite using a combination of navigators and filters to conveniently define how to break up the segments to avoid these limits, the work to do so is still relatively complicated.

While it may be possible to pull out excessive amounts of data, you should ask yourself if you need to do this.  In most cases, you may be able to reduce the result set when you set up your search instead of pulling in everything then massage the results once you have them in hand.  Search was designed specifically to allow users to filter out unwanted content prior to returning the results.  If you think this way through your searching patterns, you will undoubtedly avoid situations where you need to create complicated algorithms to unnecessarily pull excessive amounts of data. Whether narrowing the request based on interested categories, or data for a specific region, you will find that you can significantly simplify your logic and avoid issues with limits.