# Analysing the best performing model using Captum to understand the decision making process from the model

##### This notebook follows the official tutorial for BERT models from Captum as can be found in this link: https://captum.ai/tutorials/Bert_SQUAD_Interpret

In [45]:
import pandas as pd
import torch
from captum.attr import LayerIntegratedGradients, visualization as viz
from transformers import AutoModelForSequenceClassification, AutoTokenizer, AutoConfig

# Prepare data and load model

In [46]:
# Load prediction and test data
model_pred_df = pd.read_csv(
    '../2. models/predictions/nickmuchi-sec-bert-finetuned-finance-classification_False_FinancialStrengthText_MAMConfig_2024-04-25_predictions.csv')
test_df = pd.read_csv('../1. data/final/test.csv')

# Identify the indices of texts with the three lowest and highest prediction values
lowest_indices = model_pred_df['1'].nsmallest(3).index
highest_indices = model_pred_df['1'].nlargest(3).index

# Retrieve the actual texts corresponding to these indices for analysis
lowest_texts = test_df.loc[lowest_indices, 'FinancialStrengthText']
highest_texts = test_df.loc[highest_indices, 'FinancialStrengthText']

In [47]:
# Set up the computation device for PyTorch
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

In [48]:
# Load the pre-trained model and its configuration from a checkpoint
checkpoint_path = '../results/llm_fine-tine_MAMConfig/nickmuchi-sec-bert-finetuned-finance-classification/checkpoint-276'
config = AutoConfig.from_pretrained(checkpoint_path)

model = AutoModelForSequenceClassification.from_pretrained(checkpoint_path, config=config)
model.to(device)
model.eval()  # Set model to evaluation mode
model.zero_grad()  # Zero gradients to ready for new computation

In [49]:
# Load tokenizer to convert text into format suitable for the model
tokenizer = AutoTokenizer.from_pretrained('nickmuchi/sec-bert-finetuned-finance-classification')

# Define special token IDs that will be used to format input data properly
ref_token_id = tokenizer.pad_token_id  # Padding token for reference sequences
sep_token_id = tokenizer.sep_token_id  # Separator token (often used in BERT models)
cls_token_id = tokenizer.cls_token_id  # Classifier token (marks start of sequence in BERT)

# Captum part

#### Helper functions

In [50]:
def predict(inputs):
    """Make a forward pass through the model to obtain predictions."""
    return model(inputs)[0]

def construct_input_ref_pair(text, ref_token_id, sep_token_id, cls_token_id):
    """Prepare input and reference pairs required for Captum's attribution analysis."""
    text_ids = tokenizer.encode(text, add_special_tokens=False)
    input_ids = [cls_token_id] + text_ids + [sep_token_id]
    ref_input_ids = [cls_token_id] + [ref_token_id] * len(text_ids) + [sep_token_id]
    return torch.tensor([input_ids], device=device), torch.tensor([ref_input_ids], device=device)


def custom_forward(inputs):
    """Custom forward function to extract specific outputs for analysis with Captum."""
    preds = predict(inputs)
    return torch.softmax(preds, dim=1)[0][0].unsqueeze(-1)


def summarize_attributions(attributions):
    """Normalize and summarize attributions across the input tokens."""
    attributions = attributions.sum(dim=-1).squeeze(0)
    return attributions / torch.norm(attributions)

### Analyze the text

In [51]:
def analyze_texts(texts):
    """Analyze given texts to visualize model attributions and their impact on predictions."""
    lig = LayerIntegratedGradients(custom_forward, model.bert.embeddings)

    for text in texts:
        input_ids, ref_input_ids = construct_input_ref_pair(text, ref_token_id, sep_token_id, cls_token_id)
        score = predict(input_ids)

        attributions, delta = lig.attribute(inputs=input_ids, baselines=ref_input_ids, return_convergence_delta=True)
        attributions_sum = summarize_attributions(attributions)

        indices = input_ids[0].detach().tolist()
        all_tokens = tokenizer.convert_ids_to_tokens(indices)

        # Prepare data for visualization
        score_vis = viz.VisualizationDataRecord(
            attributions_sum,
            torch.softmax(score, dim=1)[0][0],
            torch.argmax(torch.softmax(score, dim=1)[0]),
            0,
            text,
            attributions_sum.sum(),
            all_tokens,
            delta)

        print('\033[1m', 'Visualization For Score', '\033[0m')
        viz.visualize_text([score_vis])

### Perform analysis on texts with the lowest and highest model predictions

In [52]:
print("Analyzing lowest texts:")
analyze_texts(lowest_texts)

Analyzing lowest texts:
[1m Visualization For Score [0m


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
0.0,0 (0.99),"As of September 2023, Philips had EUR 7 billion in net debt, which represented a 2.3 net debt/EBITDA ratio. Debt is denominated in euros and U.S. dollars. We dont think Philips' indebtedness level will be problematic, given its relatively stable cash flow generation, and we believe the company will have additional room for acquisitions, investments and dividends/buybacks.",3.07,"[CLS] as of september 2023 , philips had eur 7 billion in net debt , which represented a 2 . 3 net debt / ebitda ratio . debt is denominated in euros and u . s . dollars . we don ##t think philips ' indebtedness level will be problem ##atic , given its relatively stable cash flow generation , and we believe the company will have additional room for acquisitions , investments and dividends / buyback ##s . [SEP]"
,,,,


[1m Visualization For Score [0m


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
0.0,0 (0.99),"We think Hengan is in an average financial state, as it should have no difficulty in fulfilling its debt obligations but has relatively high leverage versus domestic peers. Hengans debt/equity ratio has hovered around 1 time historically, which is higher than 0.4 times for Vinda, whereas C&S Paper and Baiya Sanitary have minimal debt. But Hengans quick ratio of 1.0 time is above that of Vinda and C&S, despite being lower than that of Baiya. The company also has a decent current ratio of 1.4 times as of the end of 2022. Its EBITDA/interest coverage ratio of 8 times and the cash balance of roughly CNY 20 billion also affirm our confidence in its ability to fulfill its debt obligations. We think the company would have no difficulty in refinancing when its debt matures and there is no obvious need for additional leverage barring investments in new product categories.",2.23,"[CLS] we think heng ##an is in an average financial state , as it should have no difficulty in fulfilling its debt obligations but has relatively high leverage versus domestic peers . heng ##ans debt / equity ratio has hov ##ered around 1 time historically , which is higher than 0 . 4 times for vin ##da , whereas c & s paper and bai ##ya sanitary have minimal debt . but heng ##ans quick ratio of 1 . 0 time is above that of vin ##da and c & s , despite being lower than that of bai ##ya . the company also has a dece ##nt current ratio of 1 . 4 times as of the end of 2022 . its ebitda / interest coverage ratio of 8 times and the cash balance of roughly cn ##y 20 billion also affirm our confidence in its ability to fulfill its debt obligations . we think the company would have no difficulty in refinancing when its debt matures and there is no obvious need for additional leverage barring investments in new product categories . [SEP]"
,,,,


[1m Visualization For Score [0m


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
0.0,0 (0.99),"In our view, Newmont is in sound financial health. After acquiring Newcrest in November 2023, net debt of roughly USD 4.9 billion remains manageable, at about 1.1 times forecast 2023 EBITDA. We forecast net debt to decline and remain below 1 times trailing 12 months EBITDA over the remainder of our five-year forecast period, which we think is appropriate given operating leverage and exposure to cyclical gold prices. Newmont is in the midst of a material capital investment cycle, including development projects such as Tanami Expansion 2, Ahafo North, and Cerro Negro District Expansion 1. We think Newmont will likely be able to fund its development projects with cash on hand, its undrawn credit facilities, and internally generated cash flows. In terms of acquisitions, its purchase of Goldcorp in 2019 was funded primarily through equity, as was its purchase of competitor Newcrest in November 2023. While Newmont assumed Newcrests net debt, it didn't materially increase Newmont's leverage. We think any material future acquisitions will also likely mostly be funded by equity, given the companys focus on maintaining a strong balance sheet. Even after the completion of the acquisition of Goldcorp, Newmont's financial health remained in decent shape. The deal was mostly funded through equity, and the target was not particularly leveraged. Liquidity was further improved by the sale of Red Lake, the 50% stake in Kalgoorlie Consolidated Gold Mines, and the stake in Continental Gold. Newmont generated USD 1.4 billion in cash proceeds while receiving fair prices for assets sold. As such, we view Newmont's liquidity position as favorable, further supported by its cash and marketable securities and revolving credit facilities. After bedding down its Newcrest acquisition, Newmont is also targeting around USD 2 billion in proceeds from the likely sale of some of what it views as its less attractive assets.",9.39,"[CLS] in our view , newmont is in sound financial health . after acquiring new ##crest in november 2023 , net debt of roughly usd 4 . 9 billion remains manageable , at about 1 . 1 times forecast 2023 ebitda . we forecast net debt to decline and remain below 1 times trailing 12 months ebitda over the remainder of our five - year forecast period , which we think is appropriate given operating leverage and exposure to cyclical gold prices . newmont is in the mids ##t of a material capital investment cycle , including development projects such as tan ##ami expansion 2 , ah ##af ##o north , and cerro neg ##ro district expansion 1 . we think newmont will likely be able to fund its development projects with cash on hand , its undrawn credit facilities , and internally generated cash flows . in terms of acquisitions , its purchase of gold ##corp in 2019 was funded primarily through equity , as was its purchase of competitor new ##crest in november 2023 . while newmont assumed new ##crest ##s net debt , it did ##n ' t materially increase newmont ' s leverage . we think any material future acquisitions will also likely mostly be funded by equity , given the companys focus on maintaining a strong balance sheet . even after the completion of the acquisition of gold ##corp , newmont ' s financial health remained in dece ##nt shape . the deal was mostly funded through equity , and the target was not particularly leveraged . liquidity was further improved by the sale of red lake , the 50 % stake in kal ##go ##or ##lie consolidated gold mines , and the stake in continental gold . newmont generated usd 1 . 4 billion in cash proceeds while receiving fair prices for assets sold . as such , we view newmont ' s liquidity position as favorable , further supported by its cash and marketable securities and revolving credit facilities . after bedding down its new ##crest acquisition , newmont is also targeting around usd 2 billion in proceeds from the likely sale of some of what it views as its less attractive assets . [SEP]"
,,,,


In [53]:
print("Analyzing highest texts:")
analyze_texts(highest_texts)

Analyzing highest texts:
[1m Visualization For Score [0m


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
0.0,1 (0.00),"In a debt-free position, Medibank is in sound financial health. We believe Medibank can fund long-term organic growth from cash flows while maintaining the current 75%-85% target dividend payout range. As at June 30, 2023, Medibank held AUD 1.2 billion of insurance business-related capital, equating to 14.6% of annual premiums, above the firm's 10%-12% target range under the new accounting standards.Given low claims volatility in health insurance the insurer could carry some debt, but given a large acquisition is not expected, we believe the conservative balance sheet is likely to remain a feature of Medibank.Investment assets of AUD 3.3 billion were allocated 22% to cash, 58% to fixed income, and 20% to equities, property, and other assets as at June 30, 2023.",3.0,"[CLS] in a debt - free position , medi ##bank is in sound financial health . we believe medi ##bank can fund long - term organic growth from cash flows while maintaining the current 75 % - 85 % target dividend payout range . as at june 30 , 2023 , medi ##bank held aud 1 . 2 billion of insurance business - related capital , equ ##ating to 14 . 6 % of annual premiums , above the firm ' s 10 % - 12 % target range under the new accounting standards . given low claims volatility in health insurance the insurer could carry some debt , but given a large acquisition is not expected , we believe the conservative balance sheet is likely to remain a feature of medi ##bank . investment assets of aud 3 . 3 billion were allocated 22 % to cash , 58 % to fixed income , and 20 % to equities , property , and other assets as at june 30 , 2023 . [SEP]"
,,,,


[1m Visualization For Score [0m


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
0.0,1 (0.00),"We think Textron is in solid financial shape. Over the previous economic cycle, the company generally carried 2.5-3 turns of gross debt, relative to EBITDA, which we don't view as an unreasonable amount of leverage. Despite a sharp economic turn in 2020, we calculate that the companys more stable (primarily military) businesses still generated enough EBITDA to cover interest expense 5.6 times. We think the firm will be able to manage its moderate debt load in a variety of operating environments. Textron has minimal debt maturities in 2023. The next major tranche of debt coming due is $493 million in 2024.",-0.23,"[CLS] we think textron is in solid financial shape . over the previous economic cycle , the company generally carried 2 . 5 - 3 turns of gross debt , relative to ebitda , which we don ' t view as an unreasonable amount of leverage . despite a sharp economic turn in 2020 , we calculate that the companys more stable ( primarily military ) businesses still generated enough ebitda to cover interest expense 5 . 6 times . we think the firm will be able to manage its moderate debt load in a variety of operating environments . textron has minimal debt maturities in 2023 . the next major tranche of debt coming due is $ 493 million in 2024 . [SEP]"
,,,,


[1m Visualization For Score [0m


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
0.0,1 (0.00),"CNH maintains a sound balance sheet. Outstanding industrial debt in the third fiscal quarter stood at $4.6 billion. The captive finance arm holds considerably more debt than the industrial business, but this is reasonable, given its status as a lender to both customers and dealers. Total finance arm debt came in at $21.8 billion in the third quarter, along with $22.7 billion in finance receivables and $521 million in cash. In terms of liquidity, we believe the company can meet its near-term debt obligations, given its strong cash balance. The companys cash position as of the third quarter stood at $2.4 billion on its industrial balance sheet. We also find comfort in CNHs ability to tap into available lines of credit to meet any short-term needs. The company has access to $3.9 billion in credit facilities. In our view, CNH maintains a strong financial position supported by a clean balance sheet and strong free cash flow prospects.We also think CNH can generate solid free cash flow throughout the economic cycle. We believe the company can generate $1.2 billion in free cash flow in our midcycle year, supporting its ability to return free cash flow to shareholders, mostly through dividends. Additionally, we think management is determined to rationalize its product portfolio and manufacturing operations. The company is working to reduce a significant portion of its products in the construction business, refocusing their efforts on higher volume models. We believe this will allows CNH to run leaner in its manufacturing operations. If successful, this will put CNH on much better footing from a cost perspective, further supporting its ability to return cash to shareholders.",8.28,"[CLS] cnh maintains a sound balance sheet . outstanding industrial debt in the third fiscal quarter sto ##od at $ 4 . 6 billion . the captive finance arm holds considerably more debt than the industrial business , but this is reasonable , given its status as a lender to both customers and dealers . total finance arm debt came in at $ 21 . 8 billion in the third quarter , along with $ 22 . 7 billion in finance receivables and $ 521 million in cash . in terms of liquidity , we believe the company can meet its near - term debt obligations , given its strong cash balance . the companys cash position as of the third quarter sto ##od at $ 2 . 4 billion on its industrial balance sheet . we also find comfort in cnh ##s ability to tap into available lines of credit to meet any short - term needs . the company has access to $ 3 . 9 billion in credit facilities . in our view , cnh maintains a strong financial position supported by a clean balance sheet and strong free cash flow prospects . we also think cnh can generate solid free cash flow throughout the economic cycle . we believe the company can generate $ 1 . 2 billion in free cash flow in our mid ##cyc ##le year , supporting its ability to return free cash flow to shareholders , mostly through dividends . additionally , we think management is determined to rationalize its product portfolio and manufacturing operations . the company is working to reduce a significant portion of its products in the construction business , refocus ##ing their efforts on higher volume models . we believe this will allows cnh to run lean ##er in its manufacturing operations . if successful , this will put cnh on much better foot ##ing from a cost perspective , further supporting its ability to return cash to shareholders . [SEP]"
,,,,


### Manually add and analyze a specific custom text

In [55]:
custom_text = [
    "We believe Under Armour has enough liquidity to get through the current economic environment. Prior to the outbreak of COVID-19, the firms long-term debt consisted only of $593 million in 3.25% senior unsecured notes that mature in 2026. Then, in May 2020, Under Armour completed an offering of $500 million in 1.5% convertible senior notes that mature in 2024. However, as this additional funding has proven to be unnecessary, the firm has already paid down more than 80% of this convertible debt. Even after these debt repayments, at the end of June 2023, the firm had $704 million in cash as well as $1.1 billion in borrowing capacity under its revolver. We expect Under Armour will operate in a net cash position for the foreseeable future. Under Armours free cash flow to the firm has averaged about $300 million per year over the past five years but has been erratic. We forecast free cash flow at only $117 million in fiscal 2024, down from $332 million in the prior year, due to investments related to its Protect This House 3 plan and lower earnings due to low wholesale orders.Although Under Armour does not pay dividends, it recently authorized its first share buyback program. The firm repurchased $300 million in shares in February 2022, and then $125 million in buybacks in fiscal 2023. Moreover, its restructuring has reduced base operating expenses by about $200 million, and we forecast its capital expenditures will remain low at 3%-4% of sales in the long term. Under Armour may use some of its free cash flow for acquisitions, but we do not forecast acquisitions due to the uncertainty concerning timing and size. Although its growth has been largely organic, the firm acquired three fitness apps for a combined $710 million in past years as part of a strategy that has been mostly abandoned. It has also made some smaller investments, such as $39.2 million in its Japanese licensee, Dome, in 2018. Under Armour later had to write down this investment."]
analyze_texts(custom_text)

[1m Visualization For Score [0m


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
0.0,1 (0.00),"We believe Under Armour has enough liquidity to get through the current economic environment. Prior to the outbreak of COVID-19, the firms long-term debt consisted only of $593 million in 3.25% senior unsecured notes that mature in 2026. Then, in May 2020, Under Armour completed an offering of $500 million in 1.5% convertible senior notes that mature in 2024. However, as this additional funding has proven to be unnecessary, the firm has already paid down more than 80% of this convertible debt. Even after these debt repayments, at the end of June 2023, the firm had $704 million in cash as well as $1.1 billion in borrowing capacity under its revolver. We expect Under Armour will operate in a net cash position for the foreseeable future. Under Armours free cash flow to the firm has averaged about $300 million per year over the past five years but has been erratic. We forecast free cash flow at only $117 million in fiscal 2024, down from $332 million in the prior year, due to investments related to its Protect This House 3 plan and lower earnings due to low wholesale orders.Although Under Armour does not pay dividends, it recently authorized its first share buyback program. The firm repurchased $300 million in shares in February 2022, and then $125 million in buybacks in fiscal 2023. Moreover, its restructuring has reduced base operating expenses by about $200 million, and we forecast its capital expenditures will remain low at 3%-4% of sales in the long term. Under Armour may use some of its free cash flow for acquisitions, but we do not forecast acquisitions due to the uncertainty concerning timing and size. Although its growth has been largely organic, the firm acquired three fitness apps for a combined $710 million in past years as part of a strategy that has been mostly abandoned. It has also made some smaller investments, such as $39.2 million in its Japanese licensee, Dome, in 2018. Under Armour later had to write down this investment.",11.61,"[CLS] we believe under armour has enough liquidity to get through the current economic environment . prior to the outbreak of cov ##id - 19 , the firms long - term debt consisted only of $ 593 million in 3 . 25 % senior unsecured notes that mature in 2026 . then , in may 2020 , under armour completed an offering of $ 500 million in 1 . 5 % convertible senior notes that mature in 2024 . however , as this additional funding has proven to be unnecessary , the firm has already paid down more than 80 % of this convertible debt . even after these debt repayments , at the end of june 2023 , the firm had $ 704 million in cash as well as $ 1 . 1 billion in borrowing capacity under its revolver . we expect under armour will operate in a net cash position for the foreseeable future . under armour ##s free cash flow to the firm has averaged about $ 300 million per year over the past five years but has been err ##atic . we forecast free cash flow at only $ 117 million in fiscal 2024 , down from $ 332 million in the prior year , due to investments related to its protect this house 3 plan and lower earnings due to low wholesale orders . although under armour does not pay dividends , it recently authorized its first share buyback program . the firm repurchased $ 300 million in shares in february 2022 , and then $ 125 million in buyback ##s in fiscal 2023 . moreover , its restructuring has reduced base operating expenses by about $ 200 million , and we forecast its capital expenditures will remain low at 3 % - 4 % of sales in the long term . under armour may use some of its free cash flow for acquisitions , but we do not forecast acquisitions due to the uncertainty concerning timing and size . although its growth has been largely organic , the firm acquired three fitness apps for a combined $ 710 million in past years as part of a strategy that has been mostly abandoned . it has also made some smaller investments , such as $ 39 . 2 million in its japanese licensee , dome , in 2018 . under armour later had to write down this investment . [SEP]"
,,,,
