In [1]:
import os, re, wrds
import numpy as np
import pandas as pd
from tqdm.notebook import tqdm_notebook as tqdm

In [2]:
os.chdir('..')

---

In [None]:
db = wrds.Connection()

In [53]:
ccm = db.get_table(library = 'crsp_a_ccm', table = 'ccmxpf_lnkhist')

following: https://iangow.github.io/far_book/identifiers.html

"One thing you will see is that there are cases where lpermno is NA, so “matching” these rows will result in non-matches, which is of no real value. The only value might be in determining whether the non-match has linktype of NR, which means that lack of a link has been “confirmed by research” (presumably by CRSP), or of NU, which means the link is “not yet confirmed” by research."

In [54]:
ccm = ccm.loc[ccm['lpermno'].notna()]

"The cases where linktype is LD represent cases where two GVKEYs map to a single PERMNO at the same time and, according to WRDS, “this link should not be used.” Here is one example:"

In [55]:
ccm = ccm.loc[ccm['linktype'] != 'LD']

"The cases where linktype is LX represent cases where the security referred to on Compustat is one that trades on a foreign exchange and CRSP is merely “helpfully” linking to a different security that *is* found on CRSP. "

In [56]:
ccm = ccm.loc[ccm['linktype'] != "LX"]

"The remaining category for discussion is where linktype is LN. These are cases where a link exists, but Compustat does not have price data to allow CRSP to check the quality of the link. While researcher discretion might be used to include these, most researchers appear to exclude these cases and we will do likewise. Given the above, we are only including cases where linktype is in LC (valid, researched link), LU (unresearched link), or LS (link valid for this lpermno only)."

In [57]:
ccm = ccm.loc[ccm['linktype'] != "LN"]

"Now, let’s consider, linkprim, which WRDS explains as follows:

`linkprim clarifies the link’s relationship to Compustat’s marked primary security within the related range. “P” indicates a primary link marker, as identified by Compustat in monthly security data. “C” indicates a primary link marker, as identified by CRSP to resolve ranges of overlapping or missing primary markers from Compustat in order to produce one primary security throughout the company history. “J” indicates a joiner secondary issue of a company, identified by Compustat in monthly security data.`


This suggests we should omit cases where linkprim equals J. Given that cases where linkprim equals N are duplicated links due to the existence of Canadian securities for a US-traded firm, we will exclude these too."

In [58]:
ccm = ccm.loc[ccm['linkprim'].isin(['C', 'P'])]

from here on, own code:

In [59]:
ccm.head()

Unnamed: 0,gvkey,linkprim,liid,linktype,lpermno,lpermco,linkdt,linkenddt
2,1000,P,1,LU,25881.0,23369.0,1970-11-13,1978-06-30
4,1001,P,1,LU,10015.0,6398.0,1983-09-20,1986-07-31
8,1002,C,1,LC,10023.0,22159.0,1972-12-14,1973-06-05
11,1003,C,1,LU,10031.0,6672.0,1983-12-07,1989-08-16
14,1004,P,1,LU,54594.0,20000.0,1972-04-24,


In [60]:
ccm['date_from'], ccm['date_to'] = pd.to_datetime(ccm['linkdt']), pd.to_datetime(ccm['linkenddt'])

ccm['fyear_from'] = np.where(ccm['date_from'].dt.month <= 6,
                            ccm['date_from'].dt.year,
                            ccm['date_from'].dt.year + 1)

ccm['fyear_to'] = np.where(ccm['date_to'].dt.month <= 6,
                          ccm['date_to'].dt.year - 1,
                          ccm['date_to'].dt.year)

In [65]:
ccm = ccm.loc[~(ccm['fyear_from'] > ccm['fyear_to'])].reset_index(drop = True)

In [67]:
ccm['fyear_to'] = ccm['fyear_to'].fillna(2024)

In [71]:
ccm = ccm[['gvkey', 'lpermno', 'fyear_from', 'fyear_to']]

In [74]:
between = ccm.to_dict(orient = 'index')

In [76]:
in_between = []

for k, v in between.items():
    in_between.append([i for i in range(int(v['fyear_from']), int(v['fyear_to']) + 1)])

In [78]:
ccm['fyear'] = pd.Series(in_between)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ccm['fyear'] = pd.Series(in_between)


In [80]:
ccm = ccm.explode('fyear')

In [82]:
ccm = ccm.drop(columns = ['fyear_from', 'fyear_to'])

In [90]:
ccm.to_csv(os.path.join(os.getcwd(), '1_data', 'compustat_crsp_link.tsv'),
           sep = '\t', index = False)