# Project Guidance: Noisy Prices and Return-based Anomalies in Corporate Bonds
- Task: Replicate Table 1
- Data used: FINRA TRACE
- Available code and data: https://openbondassetpricing.com/code/

## Step 1: Bond Data Processing for Price, Volume, and Illiquidity Analysis
This Python script is designed to streamline the process of extracting, cleaning, and analyzing bond data from the WRDS TRACE database. The script focuses on three key areas of bond market analysis: price, volume, and illiquidity metrics. By leveraging data directly from WRDS, the script ensures access to comprehensive and accurate bond trading information, enabling a detailed examination of market dynamics.

### Initialization and Environment Setup
This section imports necessary libraries and sets up pandas options to suppress warnings, defining paths for data output and storage.

In [None]:
import pandas as pd
import numpy as np
import wrds
from itertools import chain
import datetime as dt
import warnings
warnings.filterwarnings("ignore")

import config
from pathlib import Path
OUTPUT_DIR = Path(config.OUTPUT_DIR)
DATA_DIR = Path(config.DATA_DIR)
WRDS_USERNAME = config.WRDS_USERNAME

Libraries like pandas and numpy are imported for data manipulation, while wrds is used for database access. Warning suppression and directory paths are set up for efficient data handling.

### Connecting to WRDS Database
This snippet establishes a connection to the WRDS database using the username provided in the configuration file.

In [None]:
db = wrds.Connection(wrds_username=WRDS_USERNAME)

The connection to the WRDS database is essential for querying and downloading bond data directly from their platform.

### Downloading and Processing Mergent Files
Queries the WRDS database to download issuer and issue data, merging them based on issuer_id.

In [None]:
fisd_issuer = db.raw_sql("""SELECT issuer_id,country_domicile FROM fisd.fisd_mergedissuer""")
fisd_issue = db.raw_sql("""SELECT ... FROM fisd.fisd_mergedissue""")
fisd = pd.merge(fisd_issue, fisd_issuer, on=['issuer_id'], how="left")

This step retrieves and combines issuer and issue data from WRDS, creating a comprehensive dataset for bond analysis.

### Filtering Bond Data
Applies filters to the merged dataset to retain bonds meeting specific criteria, such as being US-based and non-convertible.

In [None]:
fisd = fisd[(fisd.country_domicile == 'USA') & (fisd.convertible == 'N')]
...
fisd = fisd[~fisd.dated_date.isnull()]

Filters are used to refine the bond dataset, focusing on bonds of interest based on predefined criteria.

### Saving the Processed FISD Data
The filtered dataset is saved to a specified directory for future use, ensuring the processed data is accessible for analysis or further processing.

In [None]:
path = './' / Path(DATA_DIR) / "pulled" / "fisd.csv"
fisd.to_csv(path, index=False)

The processed and filtered bond data is stored in a CSV file, making it readily available for subsequent analysis or steps in the processing pipeline.

### Preparing and Cleaning Data for Analysis
Additional preparation and cleaning steps are undertaken, including ensuring unique identifiers for bonds and dividing the dataset into manageable chunks for processing.

In [None]:
CUSIP_Sample = list(fisd['complete_cusip'].unique())
cusip_chunks = list(divide_chunks(CUSIP_Sample, 500))

This section involves further preparation of the bond data, including deduplication and segmentation into smaller chunks for efficient processing.

### Iterative Data Cleaning and Analysis for Prices, Volume, and Illiquidity
Processes each data chunk to calculate daily prices, volumes, and illiquidity metrics.

In [None]:
for i in range(0, len(cusip_chunks)):
    trace = db.raw_sql('SELECT ... FROM trace.trace_enhanced WHERE cusip_id in %(cusip_id)s', params=parm)
    ...
    # Calculate Equal-Weighted Prices
    prc_EW = trace.groupby(['cusip_id', 'trd_exctn_dt'])['rptd_pr'].mean().to_frame(name='prc_ew')
    # Calculate Volume-Weighted Prices
    trace['dollar_vol'] = trace['entrd_vol_qt'] * trace['rptd_pr'] / 100
    prc_VW = trace.groupby(['cusip_id', 'trd_exctn_dt']).apply(lambda x: np.sum(x['rptd_pr'] * (x['entrd_vol_qt'] / x['entrd_vol_qt'].sum()))).to_frame(name='prc_vw')
    # Calculate Volumes
    VolumesAll = trace.groupby(['cusip_id', 'trd_exctn_dt'])['entrd_vol_qt'].sum().to_frame(name='volume')
    # Calculate Illiquidity Measures
    bid_ask_spread = calculate_bid_ask_spread(trace)

This section performs comprehensive data cleaning and calculates key financial metrics, including prices (equal-weighted and volume-weighted), trading volumes, and illiquidity measures such as the bid-ask spread.

### Saving Final Analysis Results
Stores the calculated prices, volumes, and illiquidity metrics for further analysis or reporting.

In [None]:
PricesExport.to_csv('./' / Path(DATA_DIR) / "pulled" / 'Prices.csv.gzip', compression='gzip')
VolumeExport.to_csv('./' / Path(DATA_DIR) / "pulled" / 'Volumes.csv.gzip', compression='gzip')
IlliqExport.to_csv('./' / Path(DATA_DIR) / "pulled" / 'Illiq.csv.gzip', compression='gzip')

This code saves the analysis results in compressed GZIP format, facilitating efficient storage and access. Through this comprehensive approach, the process not only provides insights into daily bond prices and trading volumes but also evaluates market liquidity, offering a deep dive into the dynamics of the bond market.

## Step 2: Converting WRDS Bond Ratings to Categorical Scores
This Python script automates fetching bond ratings from the WRDS (Wharton Research Data Services) database, specifically targeting Moody's and Standard & Poor's (S&P) ratings. Instead of converting these ratings into numeric scores, it categorizes them into three broad quality categories: 'A and above', 'BBB', or 'Junk'. The script then cleans and saves the processed data for subsequent analysis.

### Initialization and Setup
The script begins by importing necessary libraries, suppressing warnings, and setting up directories for output.

In [None]:
import pandas as pd
from dateutil.relativedelta import *
from pandas.tseries.offsets import *
import datetime as datetime
import wrds
import warnings
warnings.filterwarnings("ignore")

import config
from pathlib import Path
OUTPUT_DIR = Path(config.OUTPUT_DIR)
DATA_DIR = Path(config.DATA_DIR)
WRDS_USERNAME = config.WRDS_USERNAME

### Ratings Conversion Mappings
Although mappings for S&P and Moody's ratings to numeric values are defined, the primary goal is to categorize these ratings into qualitative buckets.

In [None]:
sp_rating_mapping = { "AAA": 1, "AA+": 2, ..., "D":22 }
moody_rating_mapping = { "Aaa": 1, "Aa1": 2, ..., "C": 21 }

### Categorizing Numeric Ratings
The script categorizes numeric ratings into 'A and above', 'BBB', or 'Junk', facilitating easier qualitative analysis.

In [None]:
def rating_to_category(rating):
    if pd.isna(rating):
        return None
    if 0 <= rating <= 6:
        return 'A and above'
    elif 7 <= rating <= 9:
        return 'BBB'
    else:
        return 'Junk'

### Filtering and Processing Ratings
Defines functions to filter the DataFrame for S&P and Moody's ratings, convert them using the mappings, remove duplicates, and assign categories based on the converted ratings.

In [None]:
def get_sp_rating(df):
    rat = df[(df['rating_type'] == "SPR")]
    rat["spr"] = rat["rating"].map(sp_rating_mapping)
    rat['category'] = rat['spr'].apply(rating_to_category)
    return rat.drop_duplicates(subset=['issue_id', 'rating_date'])

def get_moody_rating(df):
    rat = df[(df['rating_type'] == "MR")]
    rat["mr"] = rat["rating"].map(moody_rating_mapping)
    rat['category'] = rat['mr'].apply(rating_to_category)
    return rat.drop_duplicates(subset=['issue_id', 'rating_date'])

### Merging and Cleaning the Data
Combines the processed S&P and Moody's ratings, ensures completeness, and removes non-rated or special cases, finalizing the dataset for export.

In [None]:
def concat_moody_sp(ratsp, ratsmd):
    df = pd.concat([ratsp, ratsmd], axis=0)
    df['spr'] = df['spr'].fillna(df['mr'])
    return df.drop_duplicates(subset=['issue_id', 'rating_date'])

### Execution and Data Saving
The main execution block connects to WRDS, downloads rating data, merges with issue IDs, processes ratings for both S&P and Moody's, combines them, filters out non-relevant ratings, and saves the categorized ratings.

In [None]:
if __name__ == "__main__":
    db = wrds.Connection(wrds_username=WRDS_USERNAME)
    rat_raw = db.raw_sql("""SELECT issue_id, rating_type, rating_date, rating FROM fisd.fisd_ratings""")
    id = db.raw_sql("""SELECT complete_cusip, issue_id, offering_date FROM fisd.fisd_mergedissue""")

    rat = pd.merge(rat_raw, id, how='inner', on='issue_id')
    ratsp = get_sp_rating(rat[rat['rating_type'] == "SPR"])
    ratsmd = get_moody_rating(rat[rat['rating_type'] == "MR"])

    rating = concat_moody_sp(ratsp, ratsmd)
    rating = rating[~rating['rating'].isin(["NR", 'NR/NR', 'SUSP', 'P-1', '0', 'NAV'])]

    rating.sort_values(['complete_cusip', 'rating_date'], inplace=True)
    rating.to_csv(Path(DATA_DIR) / "pulled" / 'rating.csv', index=False)

This script successfully categorizes bond ratings into three broad quality categories based on Moody's and S&P's ratings, providing a simplified but insightful view of bond credit quality for further analysis.

## Step 3: Downloading and Processing Bond Market Data
This script automates the process of downloading a compressed dataset of bond market transactions for December 2023 from a public source, extracting the contents, and preparing the data for analysis. The data is then loaded into a pandas DataFrame, with some initial cleaning applied to standardize column names and formats.

### Downloading the Dataset
The script uses the requests library to fetch a compressed (.zip) file containing the bond market data from a specified URL.

In [None]:
import requests

file_url = 'https://openbondassetpricing.com/wp-content/uploads/2023/12/BondDailyPublicDec2023.csv.zip'
response = requests.get(file_url)

with open('BondDailyPublicDec2023.csv.zip', 'wb') as file:
    file.write(response.content)

print('Download completed!')

This part of the script sends a GET request to the specified URL to download the file. It then saves the file locally, indicating completion with a print statement.

### Extracting the Dataset
After downloading, the script uses the zipfile library to extract the contents of the .zip file to a specified directory.

In [None]:
import zipfile

with zipfile.ZipFile('BondDailyPublicDec2023.csv.zip', 'r') as zip_ref:
    zip_ref.extractall('data')

print('Extraction completed! The file is now under folder data')

This segment opens the downloaded .zip file in read mode and extracts all files within it to the 'data' directory. It confirms completion by printing a message.

### Loading and Initial Data Cleaning
Finally, the script loads the extracted CSV file into a pandas DataFrame, performs some initial cleaning steps such as converting column names to lowercase and parsing dates, and removes any unnecessary columns.

In [None]:
import pandas as pd

data = pd.read_csv(r'data/BondDailyPublic.csv.gzip', compression='gzip')
data.columns = data.columns.str.lower()
data['trd_exctn_dt'] = pd.to_datetime(data['trd_exctn_dt'])

if 'unnamed: 0' in data.columns:
    data.drop(columns=['unnamed: 0'], inplace=True)

The data is read into a DataFrame with column names converted to lowercase for consistency. The script also converts the trading execution date column to datetime format for easier manipulation. If there are any unnamed columns (often artifacts from the CSV format or index columns), these are removed to clean the dataset further.

## Step 4: Enhancing Bond Market Data Analysis: Processing for Daily Returns and Credit Spread
Overview
This Python script is a crucial component of a larger workflow aimed at analyzing the bond market. Specifically, it:

1. Accepts bond market data generated by a previous script (load_return_cs.py).
2. Cleans and filters the data based on specific conditions derived from academic research.
3. Calculates daily returns and credit spreads for corporate bonds, outputting the refined data for further analysis.

###Initial Data Preparation
The script starts by importing necessary libraries and setting up environment configurations, including data directories and warning suppression.

In [None]:
import pandas as pd
from pandas.tseries.holiday import USFederalHolidayCalendar
from pandas.tseries.offsets import CustomBusinessDay
import numpy as np
from scipy.stats.mstats import winsorize
import datetime as dt
import warnings
warnings.filterwarnings("ignore")

import config
from pathlib import Path

OUTPUT_DIR = Path(config.OUTPUT_DIR)
DATA_DIR = Path(config.DATA_DIR)
WRDS_USERNAME = config.WRDS_USERNAME
START_DATE = config.START_DATE
END_DATE = config.END_DATE

###Extracting Price and Credit Spread Information
The first function, extract_price_cs, processes the raw data by standardizing column names, converting dates, and extracting essential information for analysis.

In [None]:
def extract_price_cs(data):
    data.columns = data.columns.str.lower()
    data['trd_exctn_dt'] = pd.to_datetime(data['trd_exctn_dt'])
    if 'unnamed: 0' in data.columns:
        data.drop(columns=['unnamed: 0'], inplace=True)

    data.sort_values(by=['cusip_id', 'trd_exctn_dt'], inplace=True)
    df_return_cs = data[['cusip_id','trd_exctn_dt','prclean','cs_dur']]

    return df_return_cs

###Filtering Based on Business Days
Next, filter_less_than_five_busn_days removes any trades with more than a five-day gap, ensuring analysis only includes bonds with regular trading activity.

In [None]:
def filter_less_than_five_busn_days(data):
    data['days_since_last_trade'] = data.groupby('cusip_id')['trd_exctn_dt'].diff().dt.days.fillna(0).astype(int)
    data['business_days_since_last_trade'] = data.apply(calculate_business_days, axis=1)
    data = data[data['business_days_since_last_trade'] <= 5]
    return data

###Monthly Trade Volume Filtering
The script further filters out bonds with fewer than five trades per month to focus on more liquid assets.

In [None]:
def filter_less_than_five_trades_per_months(data):
    data['year_month'] = data['trd_exctn_dt'].dt.to_period('M')
    eligible_bonds = data.groupby(['cusip_id', 'year_month']).size().reset_index(name='monthly_trades')
    data = data.merge(eligible_bonds[['cusip_id', 'year_month']], on=['cusip_id', 'year_month'], how='inner')
    return data

###Calculating Daily Returns and Handling Reversals
The final processing step involves calculating daily returns, removing significant reversals, and converting returns to basis points.

In [None]:
def calc_daily_returns_remove_large_reversals(data):
    data['daily_return'] = data.groupby('cusip_id')['prclean'].pct_change()
    data = data[~((abs(data['daily_return']) >= 0.2) & (data['daily_return'] * data['previous_return'] < 0))]
    data = data[abs(data['daily_return']) <= 0.2]
    data['daily_return'] = data['daily_return'] * 10000
    data.rename(columns={'daily_return': 'daily_return_bps', 'cs_dur': 'cs_dur_bps'}, inplace=True)
    return data

###Execution
The script reads the initial dataset, applies the defined processing functions sequentially, and saves the cleaned and enhanced dataset to a CSV file for further analysis.

In [None]:
if __name__ == "__main__":
    raw_price_cs = pd.read_csv(Path(DATA_DIR) / "pulled" /'BondDailyPublic.csv.gzip', compression='gzip')
    df = extract_price_cs(raw_price_cs)
    df_filter_less_five = filter_less_than_five_busn_days(df)
    df_less_five_trade = filter_less_than_five_trades_per_months(df_filter_less_five)
    df_final = calc_daily_returns_remove_large_reversals(df_less_five_trade)
    df_final.to_csv(Path(DATA_DIR) / "pulled" / 'daily_return_cs.csv', index=False)

This structured approach ensures that the bond market data is cleaned, filtered, and enhanced accurately and efficiently, making it ready for in-depth financial analysis or modeling.

##Step 5: Integrating and Analyzing Bond Market Data with Ratings
This Python script performs advanced processing on bond market data, integrating various sources including trading data, bid-ask spread, returns, and bond ratings. The goal is to prepare a comprehensive dataset for in-depth analysis, specifically focusing on calculating daily returns, credit spreads, and correlating these with bond ratings. The final output includes a LaTeX table summarizing key statistics across different market periods.

###Initial Setup and Data Preparation
The script begins by importing necessary libraries and configuring environment settings, including output and data directories.

In [None]:
import pandas as pd
from pandas.tseries.holiday import USFederalHolidayCalendar
from pandas.tseries.offsets import CustomBusinessDay
import numpy as np
from scipy.stats.mstats import winsorize
import datetime as dt
import warnings
warnings.filterwarnings("ignore")

import config
from pathlib import Path

OUTPUT_DIR = Path(config.OUTPUT_DIR)
DATA_DIR = Path(config.DATA_DIR)
WRDS_USERNAME = config.WRDS_USERNAME
START_DATE = config.START_DATE
END_DATE = config.END_DATE

###Processing Rating Data
process_rating_data prepares the raw rating data for merging by renaming columns and sorting.

In [None]:
def process_rating_data(df):
    df = df[['complete_cusip','rating_date', 'rating','category']].sort_values(by=['complete_cusip','rating_date']).reset_index(drop=True)
    df = df.rename(columns = {'complete_cusip':'cusip_id', 'rating_date':'date'})
    return df

###Merging DataFrames
merge_df combines two datasets, such as bid-ask spread and returns, matching by bond ID and date.

In [None]:
def merge_df(df1, df2):
    try:
        df1 = df1.rename(columns = {'trd_exctn_dt':'date'})
        df2 = df2.rename(columns = {'trd_exctn_dt':'date'})
    except Exception as e:
        print(e)

    res_df = pd.merge(df1,df2,on=['cusip_id','date'],how='inner')
    return res_df

###Integrating Ratings with Market Data
merge_rating aligns processed market data with rating data, ensuring ratings are forward-filled to match trade dates.

In [None]:
def merge_rating(data, rating):
    data['source'] = 'A'
    rating['source'] = 'R'

    df = pd.concat([data,rating],axis=0).sort_values(by=['cusip_id','date','source'])
    df_filled = df.groupby('cusip_id').apply(lambda group: group.ffill()).reset_index(drop=True)
    df_filled = df_filled[df_filled['source']=='A'][df_filled['category'].notna()].reset_index(drop=True).drop(columns = ['source','rating'])

    return df_filled

###Deriving Summary Statistics Table
derive_table calculates mean values for spread, bias, daily returns, and credit spread duration across various market periods, producing a LaTeX table for publication or further analysis.

In [None]:
def derive_table(res_df):
    df = res_df.copy()
    df['date'] = pd.to_datetime(df['date'])
    # Subsample definitions
    subsamples = {...}

    mean_values = {}
    for subsample, (start_date, end_date) in subsamples.items():
        ...
    for k,v in mean_values.items():
        ...
    res_df['variables'] = res_df.index
    res_df = res_df.set_index(['category','variables'])

    return res_df

###Main Execution
The script reads in trading data, spread-bias data, and ratings, processes each, and merges them to create a unified dataset. It then computes summary statistics and formats them into a LaTeX table, saved for easy inclusion in documents.

In [None]:
if __name__ == "__main__":
    # Loading data
    spreadbias, ret, rating = ...
    # Processing and merging data
    all_df = merge_df(spreadbias, ret_cs)
    all_df = merge_rating(all_df, rating)
    # Deriving the summary table and saving as LaTeX
    res_df = derive_table(all_df)
    with open(Path(OUTPUT_DIR) / 'derived_table.tex', "w") as text_file:
        text_file.write(res_df.to_latex(float_format=lambda x: '{:.3f}'.format(x)))

This script showcases a comprehensive approach to bond market data analysis, integrating diverse data sources to derive insightful metrics and summarizing the findings in a format suitable for academic publication.

## Conclusion

In this comprehensive analysis workflow, we utilized bond market trading data, bid-ask spreads, daily returns, credit spreads, and bond ratings sourced from various reliable databases. Our objective was to prepare a detailed and cleaned dataset that not only highlights the intricate dynamics of the bond market but also associates these financial metrics with the credit quality indicated by bond ratings.

Starting from the initial setup that involved importing necessary libraries and setting up configurations, we meticulously processed the raw data. This included standardizing column names, ensuring consistent date formats, and sorting data for efficient processing. We integrated key pieces of information such as bond IDs, trading dates, prices, and credit spreads into a cohesive dataset.

Through a series of functions, we merged trading data with bid-ask spreads and returns, while another critical step involved integrating the processed market data with bond ratings. This integration allowed us to forward-fill ratings to match the trading dates, ensuring each bond's financial metrics were accurately paired with its credit rating.

The workflow was designed to filter out irrelevant or insufficiently traded bonds, focusing on those with a minimum trading frequency. This ensured the reliability and relevance of our analysis. Additionally, we calculated daily returns, adjusted for extreme values to avoid skewing the analysis, and converted these returns and credit spreads into basis points for a standardized comparison.

Ultimately, we derived a comprehensive table summarizing key financial metrics across different market periods, such as the full sample period, pre-crisis, crisis, post-crisis, and more. This table, formatted in LaTeX for easy inclusion in academic publications, provides a clear overview of the bond market's behavior over time, segmented by credit quality categories.

In conclusion, by leveraging a rich dataset encompassing various aspects of the bond market and employing a detailed processing and analysis pipeline, we achieved our goal of elucidating the complex interplay between bond market dynamics and credit ratings. This not only aids in academic research but also supports investors and analysts in making informed decisions based on comprehensive market insights.