In [1]:
'''
in this notebook i'm trying to get renames for all (active and inactive) tickers.

what i've found so far.

1) there are a lot of cases when polygon doesn't provide figi for a ticker. and this happens not only for tickers that 
    were delisted before a figi was assigned. it happens even for active companies where tradingview shows their figi.
    for example, XP BBG00QVJYGM9 is trading already 5 years starting from 2020. the ticker XP is present but no figi.

2) for 1 figi sometimes there are more than one ticker at the same time. some examples:

```
    2025-05-30,VLGE.A,BBG000BWGK40,True,          ,usd,,2016-05-18T00:00:00Z,us,stocks,VILLAGE SUPER MARKET CL-A NEW,XNAS,
    2025-05-30,VLGEA ,BBG000BWGK40,True,0000103595,usd,,2025-05-30T00:00:00Z,us,stocks,Village Super Market         ,XNAS,CS
```


    here tickers will be the same if '.' is removed but look at the next case:

```
    2012-04-24,ABD  ,BBG000J06K07,True,0000712034,usd,,2012-04-26T00:00:00Z,us,stocks,ACCO BRANDS CORPORATION,XNYS,CS
    2012-04-24,ACCOw,BBG000J06K07,True,0000712034,usd,,2012-04-30T00:00:00Z,us,stocks,ACCO BRANDS CORPORATION W.I.,XNYS,
```

    usually warrants are with 'w' suffix but here the whole ticker is different.

3) sometimes there are complete duplicates. for example, 
https://api.polygon.io/v3/reference/tickers?ticker=UST&market=stocks&date=2025-06-01&active=true&order=asc&limit=100&sort=ticker&apiKey= 

returns:

```
    {
    "results": [
        {
        "ticker": "UST",
        "name": "ProShares Ultra 7-10 Year Treasury",
        "market": "stocks",
        "locale": "us",
        "primary_exchange": "ARCX",
        "type": "ETF",
        "active": true,
        "currency_name": "usd",
        "composite_figi": "BBG000BH4371",
        "share_class_figi": "BBG001SK30C5",
        "last_updated_utc": "2024-09-23T00:00:00Z"
        },
        {
        "ticker": "UST",
        "name": "ProShares Ultra 7-10 Year Treasury",
        "market": "stocks",
        "locale": "us",
        "primary_exchange": "ARCX",
        "type": "ETF",
        "active": true,
        "currency_name": "usd",
        "cik": "0001373525",
        "composite_figi": "BBG000BH4371",
        "share_class_figi": "BBG001SK30C5",
        "last_updated_utc": "2025-06-02T00:00:00Z"
        }
    ],
    "status": "OK",
    "request_id": "54d441e0bb407fc21739f13b8a5af347",
    "count": 2
    }
```

    the only difference is cik.
'''

'\nin this notebook i\'m trying to get renames for all (active and inactive) tickers.\n\nwhat i\'ve found so far.\n\n1) there are a lot of cases when polygon doesn\'t provide figi for a ticker. and this happens not only for tickers that \n    were delisted before a figi was assigned. it happens even for active companies where tradingview shows their figi.\n    for example, XP BBG00QVJYGM9 is trading already 5 years starting from 2020. the ticker XP is present but no figi.\n\n2) for 1 figi sometimes there are more than one ticker at the same time. some examples:\n\n```\n    2025-05-30,VLGE.A,BBG000BWGK40,True,          ,usd,,2016-05-18T00:00:00Z,us,stocks,VILLAGE SUPER MARKET CL-A NEW,XNAS,\n    2025-05-30,VLGEA ,BBG000BWGK40,True,0000103595,usd,,2025-05-30T00:00:00Z,us,stocks,Village Super Market         ,XNAS,CS\n```\n\n\n    here tickers will be the same if \'.\' is removed but look at the next case:\n\n```\n    2012-04-24,ABD  ,BBG000J06K07,True,0000712034,usd,,2012-04-26T00:00:00Z,

In [2]:
import pandas as pd
import sys
import os
# Add the parent directory to Python path to import api_key module
sys.path.append(os.path.dirname(os.path.abspath('')))
import settings

In [3]:
tickers_dir = os.path.join(settings.ABSOLUTE_DATA_DIR, 'tickers')
active_tickers_file = os.path.join(tickers_dir, 'tickers_history_active.csv')
inactive_tickers_file = os.path.join(tickers_dir, 'tickers_history_inactive.csv')

In [4]:
with open(active_tickers_file, newline='', encoding='utf-8') as fin:
    print(fin.readline())  # Print header
    print(fin.readline())  # the first row

date,ticker,composite_figi,active,cik,currency_name,delisted_utc,last_updated_utc,locale,market,name,primary_exchange,type

2003-09-10,A,BBG000C2V3D6,True,0001090872,usd,,2004-06-24T00:00:00Z,us,stocks,"AGILENT TECHNOLOGIES, INC",XNYS,



In [5]:
'''
what will not work:
- inside a given day figi is not unique, because it's possible to have several tickers for the same figi.
- it's not a good idea to try to "compress" history by just removing duplicates in the rows because such 
  "compressed" data will loose information about when a ticker disappeared from history.
'''

'\nwhat will not work:\n- inside a given day figi is not unique, because it\'s possible to have several tickers for the same figi.\n- it\'s not a good idea to try to "compress" history by just removing duplicates in the rows because such \n  "compressed" data will loose information about when a ticker disappeared from history.\n'

In [None]:
'''
algorithm:
1. loop through active_tickers_file
2. collect all rows for the first day.
   need to check here that there are no duplicate tickers within one day.
   create a dictionary that maps a ticker to its row.
3. repeat (2) for the next day.
   now loop over all tickers in the current day and compare them to tickers in the previous day.
   if there is a change in (composite_figi, active, cik, delisted_utc, name, primary_exchange, type)
   then log it into history as history['figi']['ticker']['date']=row.
   if there are new tickers then they will be added as well.
   when processing a row in the active_today remove it from active_yesterday.
   in the end active_yesterday will have tickers that are missing today. add them to the history.
   history['figi']['ticker']['date']=null
'''
import csv
from datetime import datetime, timedelta

history = {}
counter = 0
active_today = {}
active_yesterday = {}

with open(active_tickers_file, newline='', encoding='utf-8') as fin:
    reader = csv.DictReader(fin)
    for row in reader:
        counter += 1
        if counter % (1000 * 1000) == 0:
            print(f"Processed {counter} rows")
        figi  = row["composite_figi"]
        if not figi:
            # Skip rows without FIGI
            continue
        ticker= row["ticker"]
        date  = datetime.fromisoformat(row["date"]).date()
        if figi not in history:
            # first time encountering this FIGI
            history[figi] = { ticker: { 'first_active' : date } }
        else:
            f = history[figi]
            if ticker not in f:
                # first time encountering this ticker for this FIGI
                f[ticker] = { 'first_active' : date }
            else:
                # known ticker for this FIGI
                f[ticker]['last_active'] = date
history

In [7]:
history['BBG000MM2P62']

{'FB': {'first_active': datetime.date(2012, 5, 18),
  'last_active': datetime.date(2022, 6, 8)},
 'META': {'first_active': datetime.date(2022, 6, 9),
  'last_active': datetime.date(2025, 6, 11)}}

In [8]:
history['BBG000J06K07']

{'ABDw': {'first_active': datetime.date(2005, 8, 9),
  'last_active': datetime.date(2005, 8, 16)},
 'ABD': {'first_active': datetime.date(2005, 8, 17),
  'last_active': datetime.date(2012, 4, 30)},
 'ACCOw': {'first_active': datetime.date(2012, 4, 24),
  'last_active': datetime.date(2012, 4, 30)},
 'ACCO': {'first_active': datetime.date(2012, 5, 2),
  'last_active': datetime.date(2025, 6, 11)}}

In [None]:
# now analyze inactive tickers
surprise = {}

with open(inactive_tickers_file, newline='', encoding='utf-8') as fin:
    reader = csv.DictReader(fin)
    for row in reader:
        counter += 1
        if counter % (1000 * 1000) == 0:
            print(f"Processed {counter} rows")
        figi  = row["composite_figi"]
        if not figi:
            # Skip rows without FIGI
            continue
        ticker= row["ticker"]
        date  = datetime.fromisoformat(row["date"]).date()
        if figi not in history:
            if figi not in surprise:
                surprise[figi] = { ticker : { 'first_time': date, 'unknown_figi': True } }
            else:
                surprise[figi][ticker]['last_time'] = date
            continue
        else:
            history_figi = history[figi]
            if ticker not in history_figi:
                # known FIGI, but new ticker
                if figi not in surprise:
                    surprise[figi] = { ticker: { 'first_time': date, 'unknown_ticker': True } }
                elif ticker not in surprise[figi]:
                    surprise[figi][ticker] = { 'first_time': date, 'unknown_ticker': True }
                else:
                    surprise[figi][ticker]['last_time'] = date
                continue
            else:
                history_figi_ticker = history_figi[ticker]
                if 'end' not in history_figi_ticker or history_figi_ticker['end'] is None:
                    history_figi_ticker['end'] = date
                    history_figi_ticker['delisted_utc'] = row['delisted_utc']
                else:
                    history_figi_ticker['last_inactive'] = date
surprise

In [10]:
'''
ok. so there are several observations here:

1) it happens that there appears a record about inactive ticker with figi while the there were no such figi among 
active tickers.

i did search manually for BBG00FZLQV86, BBG000F4MYC2, BBG000GQR282 in the active tickers csv and haven't found anything.
while they can be found in the inactive tickers csv.

also this can be proven via https://polygon.io/docs/rest/stocks/tickers/all-tickers
for example, for CNET on 2020-10-13 there is only one active ticker with "last_updated_utc": "2016-05-18T00:00:00Z".
on 2020-10-14 there appears BBG00DP4PXQ7 with "delisted_utc": "2020-10-14T00:00:00Z". looks like there were 4 years 
without deals for this ticker and figi was not updated until delisting.

2) if the figi is known for a ticker then the first deactivation record appears always after active records. means 
no crap here. but again - it's possible that there were no active records.

3) it happens that for a given figi deactivation records stop to appear. a good example is BBG00GVX5D49 that appeared
only once on 2021-08-02.
'''

'\nok. so there are several observations here:\n\n1) it happens that there appears a record about inactive ticker with figi while the there were no such figi among \nactive tickers.\n\ni did search manually for BBG00FZLQV86, BBG000F4MYC2, BBG000GQR282 in the active tickers csv and haven\'t found anything.\nwhile they can be found in the inactive tickers csv.\n\nalso this can be proven via https://polygon.io/docs/rest/stocks/tickers/all-tickers\nfor example, for CNET on 2020-10-13 there is only one active ticker with "last_updated_utc": "2016-05-18T00:00:00Z".\non 2020-10-14 there appears BBG00DP4PXQ7 with "delisted_utc": "2020-10-14T00:00:00Z". looks like there were 4 years \nwithout deals for this ticker and figi was not updated until delisting.\n\n2) if the figi is known for a ticker then the first deactivation record appears always after active records. means \nno crap here. but again - it\'s possible that there were no active records.\n\n3) it happens that for a given figi deactiva

In [11]:
# the code below shows that it's possible that the ticker remains active even after an inactive record appears.
count = 0
for figi, tickers in history.items():
    for ticker, dates in tickers.items():
        last_active_date = dates.get('last_active', None)
        end_date = dates.get('end', None)
        if last_active_date and end_date and last_active_date > end_date:
            # This is a case where the last active date is after the end date, which is unexpected
            print(f"Unexpected dates for {figi} {ticker}: last_active={last_active_date}, end={end_date}")
            count += 1
    if count > 50:
        print("found enough already")
        break

Unexpected dates for BBG000B9XB24 AAME: last_active=2025-06-11, end=2009-10-29
Unexpected dates for BBG000C2LZP3 AAON: last_active=2025-06-11, end=2009-10-29
Unexpected dates for BBG000B9XRY4 AAPL: last_active=2025-06-11, end=2009-10-29
Unexpected dates for BBG000DK5Q25 ABB: last_active=2024-05-17, end=2023-05-23
Unexpected dates for BBG000CK8P25 ABMC: last_active=2009-09-02, end=2004-04-19
Unexpected dates for BBG000C101X4 ABMD: last_active=2022-12-22, end=2009-10-29
Unexpected dates for BBG000HYNQP6 AXAS: last_active=2021-08-03, end=2009-10-29
Unexpected dates for BBG00D8FFF27 ACET: last_active=2024-05-17, end=2009-10-29
Unexpected dates for BBG000BB2LK1 ACMT.A: last_active=2004-12-03, end=2004-11-08
Unexpected dates for BBG000KF9J02 ACTG: last_active=2025-06-11, end=2009-10-29
Unexpected dates for BBG000BB5006 ADBE: last_active=2025-06-11, end=2009-10-29
Unexpected dates for BBG000JG0547 ADP: last_active=2025-06-11, end=2009-10-29
Unexpected dates for BBG000BM7HL0 ADSK: last_active=

In [12]:
# more observations
# ticker events api doesn't have data for delisted tickers. for example:

import sys
import os
# Add the parent directory to Python path to import api_key module
sys.path.append(os.path.dirname(os.path.abspath('')))

import api_key
from polygon import RESTClient

API_KEY = api_key.read_api_key()
client = RESTClient(API_KEY)

print("first check that api works for META")
events = client.get_ticker_events('META')
print(events)

print()
print("now check that it doesn't return data for the delisted ticker FNM")
try:
    events = client.get_ticker_events('FNM')
    print(events)
except Exception as e:
    print(f"Error fetching events for FNM: {e}")

print()
print("now check that there are ticker details for FNM in the past:")
details = client.get_ticker_details('FNM', '2004-11-01')
print(details)

print()
print("and there are aggregates for FNM in the past:")
aggs = client.get_aggs('FNM', 1, 'day', '2004-11-01', '2004-11-02')
print(aggs)

Successfully read api_key.txt. Key len 32
first check that api works for META
TickerChangeResults(name='Meta Platforms, Inc. Class A Common Stock', composite_figi='BBG000MM2P62', cik='0001326801', events=[{'ticker_change': {'ticker': 'META'}, 'type': 'ticker_change', 'date': '2022-06-09'}, {'ticker_change': {'ticker': 'FB'}, 'type': 'ticker_change', 'date': '2012-05-18'}])

now check that it doesn't return data for the delisted ticker FNM
Error fetching events for FNM: {"status":"NOT_FOUND","request_id":"e6d250958318aedc5acdcb19b403f7bc","message":"ID not found"}

now check that there are ticker details for FNM in the past:
TickerDetails(active=True, address=CompanyAddress(address1='3900 WISCONSIN AVE N.W.', address2=None, city='WASHINGTON', state='DC', country=None, postal_code='20016'), branding=None, cik='0000310522', composite_figi='BBG000BJQ328', currency_name='usd', currency_symbol=None, base_currency_name=None, base_currency_symbol=None, delisted_utc=None, description=None, tick

In [13]:
# but in fact from history analysis i can see:
def print_details(ticker, date):
    try:
        details = client.get_ticker_details(ticker, date)
        print(f'{ticker} on {date}: {details}')
    except Exception as e:
        print(f"Error fetching details for {ticker} on {date}: {e}")

import pandas as pd

def timestamp_to_date(timestamp):
    try:
        date = pd.to_datetime(timestamp, unit='ms').date()
        return date
    except Exception as e:
        print(f"Error converting timestamp {timestamp} to date: {e}")
        return None

def print_day_of_week(date):
    try:
        day_of_week = pd.to_datetime(date).day_name()
        print(f'{date} is a {day_of_week}')
    except Exception as e:
        print(f"Error fetching day of week for {date}: {e}")

def print_aggs(ticker, start_date, end_date):
    try:
        aggs = client.get_aggs(ticker, 1, 'day', start_date, end_date)
        print(f'{ticker} from {start_date} to {end_date}: {aggs}')
    except Exception as e:
        print(f"Error fetching aggregates for {ticker} from {start_date} to {end_date}: {e}")

def pretty_print_map(data):
    for key, value in data.items():
        print(f'{key}: {value}')

print("# ticker events API says that FORTY was renamed from FORT on 2004-11-08:")
print()
# sometimes ticker events API is missleading.
# for example, for FORTY it says that there was a ticker change
events = client.get_ticker_events('FORTY')
print(events.events[0])
print(events.events[1])
print()
print("# that is wrong because it was trading under ticker FORTY since the beginning. see in aggregates API:")
print_aggs('FORTY', '2003-09-10', '2003-09-10')
print(f'1063166400000 is {timestamp_to_date(1063166400000)}')

print(
'''
# proof from flat files that FORTY was traded under ticker FORTY on 2003-09-10:
ticker,conditions,correction,exchange,id,participant_timestamp,price,sequence_number,sip_timestamp      ,size,tape,trf_id,trf_timestamp

FORTY,           ,0         ,11      ,  ,0                    ,12.13,54390          ,1063200772000000000,500 ,3   ,0,0
FORTY,,0,12,,0,12.18,54683,1063200774000000000,500,3,0,0
FORTY,,0,12,,0,12.02,164233,1063201566000000000,100,3,0,0
FORTY,,0,12,,0,12,164235,1063201566000000000,300,3,0,0
FORTY,,0,12,,0,12,186085,1063201737000000000,700,3,0,0
FORTY,,0,3,,0,12.01,186176,1063201738000000000,300,3,0,0
FORTY,,0,12,,0,12.02,405193,1063203807000000000,100,3,0,0
FORTY,,0,12,,0,12.01,405213,1063203808000000000,200,3,0,0
FORTY,,0,12,,0,12.011,451218,1063204318000000000,100,3,0,0
FORTY,,0,12,,0,12.01,1255112,1063218714000000000,1800,3,0,0
FORTY,,0,12,,0,12,1255113,1063218714000000000,200,3,0,0
FORTY,,0,12,,0,12,1257062,1063218737000000000,1200,3,0,0
FORTY,,0,3,,0,12,1257254,1063218739000000000,100,3,0,0
FORTY,15,0,12,,0,12,1743433,1063224092000000000,0,3,0,0
FORTY,15,0,3,,0,12,1746692,1063224185000000000,0,3,0,0
''')

print()
print("# and it was never traded as FORT and FORT.Y ever:")
print_aggs('FORT', '2003-09-10', '2025-06-11')
print_aggs('FORT.Y', '2003-09-10', '2025-06-11')

print()
print("# ticker details API also is wrong because it doesn't find FORTY before 2004-11-08:")
print_details('FORTY', '2004-11-01')
print_day_of_week('2004-11-01')
print()
print("# while there were trades on 2004-11-01:")
print_aggs('FORTY', '2004-11-01', '2004-11-01')
print()
print("# the same on 2003-09-10:")
print_details('FORTY', '2003-09-10')
print()
print("# trades:")
print_aggs('FORTY', '2003-09-10', '2003-09-10')

print()
print("# history collected from active tickers also is not good because FORTY appears in it only in on 2004-11-08.")
print('history["BBG000JLM9R9"] = {')
pretty_print_map(history['BBG000JLM9R9'])
print('}')
print("# before 2004-11-08 there is no FORTY in tickers history. there is only FORT.Y for which there is no data in aggregates API.")

# ticker events API says that FORTY was renamed from FORT on 2004-11-08:

{'ticker_change': {'ticker': 'FORTY'}, 'type': 'ticker_change', 'date': '2004-11-08'}
{'ticker_change': {'ticker': 'FORT'}, 'type': 'ticker_change', 'date': '2003-09-10'}

# that is wrong because it was trading under ticker FORTY since the beginning. see in aggregates API:
FORTY from 2003-09-10 to 2003-09-10: [Agg(open=12.13, high=12.18, low=12, close=12, volume=6100, vwap=12.03, timestamp=1063166400000, transactions=13, otc=None)]
1063166400000 is 2003-09-10

# proof from flat files that FORTY was traded under ticker FORTY on 2003-09-10:
ticker,conditions,correction,exchange,id,participant_timestamp,price,sequence_number,sip_timestamp      ,size,tape,trf_id,trf_timestamp

FORTY,           ,0         ,11      ,  ,0                    ,12.13,54390          ,1063200772000000000,500 ,3   ,0,0
FORTY,,0,12,,0,12.18,54683,1063200774000000000,500,3,0,0
FORTY,,0,12,,0,12.02,164233,1063201566000000000,100,3,0,0
FORTY,,0,1

In [None]:
'''
API looks unreliable. so for now my plan is to try to get the best possible data from what is available.

the plan is to use the following algorithm:
1. take the history from active tickers file.
2. start from the first active date in the history.
3. chose among all ticker for the same figi the one has has biggest USD volume.
and add a check that if the ticker that was traded yesterday is also traded today and the price differs from the
selected ticker more than 5% then log it as a potential issue.

but the first step probably is to just dump all daily aggreages for all tickers in the history.
this will speed up the process because i will not need to query the API for each ticker every time there is a doubt in
which ticker to prefer.
'''
