## Summarizier for Policies and Standards in Climate Risk.

The Summarizier below, utilizes two summarization methods:
Extractive and Abstractive to summarize documents on climate risk.

## Step 1: Data Collection and Cleaning
The first step includes the following
1. Reading and cleaning the pdf files from google drive.
2. Combining the read pdf texts into a single string

In [6]:
!pip install transformers




In [7]:
import os
#change working directory to google drive climate risk
os.chdir('/content/drive/My Drive/Climate Risk 2')

In [8]:
!pip install pypdf

Collecting pypdf
  Downloading pypdf-4.2.0-py3-none-any.whl (290 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/290.4 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m [32m286.7/290.4 kB[0m [31m10.9 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m290.4/290.4 kB[0m [31m6.7 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: pypdf
Successfully installed pypdf-4.2.0


In [9]:
from pypdf import PdfReader

def read_and_clean_pdf(path,num_header):
  # reading the pdf document
  reader = PdfReader(path)

  #List of cleaned pdf
  cleaned_pdf_pages = []

  number_of_pages = len(reader.pages)

  # printing number of pages in pdf file
  print(f'number of pdf pages: {number_of_pages}')

  for i in range(0, number_of_pages):
    page = reader.pages[i]

    # extracting text from page
    text = page.extract_text()

    #check if there is a footer
    if '           ' in text:
      # Find the index of the long white space indicating the beginning of the footer
      index = text.index('           ')

      # Get all items before the footer
      result = text[:index]

      # Split the received result in words
      words = result.split()

      # Remove the first 9 words 'Guide on climate- related and environmental risks
      # 6 2 '
      result = ' '.join(words[num_header:])

      cleaned_pdf_pages.append(result)

    else:
        # Split the received result in words
      words = text.split()

      # Remove the first 9 words 'Guide on climate- related and environmental risks
      # 6 2 '
      result = ' '.join(words[num_header:])

      cleaned_pdf_pages.append(result)

  return cleaned_pdf_pages


In [10]:
#read pdf file
cleaned_pdf_pages = read_and_clean_pdf('/content/drive/MyDrive/Climate Risk 2/Policies/cleaned_polices/SEC Final Rules_cleaned.pdf',1)


number of pdf pages: 220


In [11]:
for i in range(len(cleaned_pdf_pages)):
  print(f'page {i + 1} : {cleaned_pdf_pages[i]}')

page 1 : risks and would provide investors wi th a more balanced perspective of the overall impacts of climate on a company’s business and operating performance .312 c. Final Rule s We are adopting final rule s (Item 1502(a)) to require the disclosure of any climate -related risks that have materially impacted or are reasonably likely to have a material impact on the registrant, including on its business strategy, results of operations, or financial condition, with several modifications in response to commenter concerns . 313 We disagree with t hose commenters who stated that a climate -related risk disclosure provision was not necessary because the Commission’s general risk factors disclosure rule already requires such disclosure . 314 In our view, a separate disclosure provision specifically focus ed on climate -related risks will help investors better understand a registrant’s assessment of whether its business is, or is reasonably likely to be, exposed to a material climate -relate

In [12]:
# Delete unwanted character
cleaned_pdf_pages = [s.replace('','') for s in cleaned_pdf_pages]


In [13]:
# Join all strings in cleaned_pdf_pages into a single string
full_text = ' '.join(cleaned_pdf_pages)

full_text

"risks and would provide investors wi th a more balanced perspective of the overall impacts of climate on a company’s business and operating performance .312 c. Final Rule s We are adopting final rule s (Item 1502(a)) to require the disclosure of any climate -related risks that have materially impacted or are reasonably likely to have a material impact on the registrant, including on its business strategy, results of operations, or financial condition, with several modifications in response to commenter concerns . 313 We disagree with t hose commenters who stated that a climate -related risk disclosure provision was not necessary because the Commission’s general risk factors disclosure rule already requires such disclosure . 314 In our view, a separate disclosure provision specifically focus ed on climate -related risks will help investors better understand a registrant’s assessment of whether its business is, or is reasonably likely to be, exposed to a material climate -related risk, 

In [14]:
# total characters in the file
len(full_text)

478946

## Step 2: Extractive Summary
This step includes the following
1. Load the Facebook Bart model
2. Checks for maximum number of tokens received by the model with (1024) and without (1022) special tokens
3. Chunking up the full combined text from step 1 into sentences that are below the limit tokens of Bart
4. Tokenize all the individual chunks, the full text and checking both sum of individual chunk tokens and full text tokens are equal.
5. Feeding the bart model to summarize each chunk and combining the summarization of each chunk into one full string.

In [1]:
# Load model directly
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
model = "facebook/bart-large-cnn"
tokenizer = AutoTokenizer.from_pretrained(model)
model = AutoModelForSeq2SeqLM.from_pretrained(model)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/1.58k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

In [2]:
# max tokens including the special tokens
tokenizer.model_max_length

1000000000000000019884624838656

In [3]:
# run if tokenizer.model_max_length is not 1024
tokenizer.model_max_length = 1024
tokenizer.model_max_length

1024

In [4]:
# number of special tokens
tokenizer.num_special_tokens_to_add()

2

In [5]:
# max tokens excluding the special tokens
tokenizer.max_len_single_sentence

1022

In [None]:
# run if tokenizer.max_len_single_sentence is not 1022
# tokenizer.max_len_single_sentence = 1022

In [15]:
# convert the contents in full text to sentences
import nltk
nltk.download('punkt')
sentences = nltk.tokenize.sent_tokenize(full_text)

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


In [16]:
# find the max tokens in the longest sentence
max([len(tokenizer.tokenize(sentence)) for sentence in sentences])

366

In [17]:
length = 0
chunk = ""
chunks = []

for sentence in sentences:
    sentence_length = len(tokenizer.tokenize(sentence))

    if length + sentence_length <= tokenizer.max_len_single_sentence - 1: # add the no. of sentence tokens to the length counter
        chunk += sentence + " "  # add the sentence to the chunk
        length += sentence_length # update length with the new sentence token length
    else:
        # Add the current chunk to the list of chunks
        chunks.append(chunk.strip())
        # Reset the chunk and length
        chunk = sentence + " "
        length = sentence_length

# Add the last chunk if it's not empty
if chunk.strip():
    chunks.append(chunk.strip())

# Ensure no chunk exceeds the tokenizer max length
for i in range(len(chunks)):
    while len(tokenizer.tokenize(chunks[i])) > tokenizer.max_len_single_sentence - 1:
        chunk = chunks[i]
        # Find a suitable split point
        tokens = tokenizer.tokenize(chunk)
        split_point = tokenizer.max_len_single_sentence - 1

        # Split the chunk at the split point
        first_chunk_tokens = tokens[:split_point]
        remaining_tokens = tokens[split_point:]

        # Convert tokens back to sentences (or strings)
        first_chunk = tokenizer.convert_tokens_to_string(first_chunk_tokens)
        remaining_chunk = tokenizer.convert_tokens_to_string(remaining_tokens)

        # Update chunks list
        chunks[i] = first_chunk
        chunks.insert(i + 1, remaining_chunk)

len(chunks)


101

In [None]:
# from google.colab import drive
# drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [18]:
#checking the token size for each sentence chunk without special tokens
[len(tokenizer.tokenize(c)) for c in chunks]

[970,
 962,
 998,
 1000,
 930,
 1010,
 1012,
 1015,
 1021,
 875,
 1012,
 948,
 959,
 1018,
 974,
 1016,
 996,
 892,
 987,
 1021,
 2,
 1008,
 988,
 1005,
 1021,
 1,
 965,
 999,
 1018,
 976,
 868,
 945,
 1016,
 939,
 973,
 1021,
 971,
 990,
 1003,
 943,
 906,
 1019,
 1002,
 990,
 997,
 888,
 898,
 971,
 962,
 999,
 996,
 1008,
 967,
 997,
 955,
 955,
 1016,
 973,
 986,
 954,
 1020,
 986,
 992,
 988,
 983,
 999,
 1016,
 924,
 1020,
 992,
 1006,
 992,
 971,
 933,
 896,
 1006,
 1015,
 992,
 898,
 1015,
 965,
 985,
 959,
 1008,
 1014,
 985,
 943,
 982,
 986,
 972,
 960,
 997,
 982,
 960,
 1000,
 917,
 1016,
 1013,
 952,
 988,
 835]

In [19]:
#checking the token size for each sentence chunk with special tokens
[len(tokenizer(c).input_ids) for c in chunks]

[972,
 964,
 1000,
 1002,
 932,
 1012,
 1014,
 1017,
 1023,
 877,
 1014,
 950,
 961,
 1020,
 976,
 1018,
 998,
 894,
 989,
 1023,
 4,
 1010,
 990,
 1007,
 1023,
 3,
 967,
 1001,
 1020,
 978,
 870,
 947,
 1018,
 941,
 975,
 1023,
 973,
 992,
 1005,
 945,
 908,
 1021,
 1004,
 992,
 999,
 890,
 900,
 973,
 964,
 1001,
 998,
 1010,
 969,
 999,
 957,
 957,
 1018,
 975,
 988,
 956,
 1022,
 988,
 994,
 990,
 985,
 1001,
 1018,
 926,
 1022,
 994,
 1008,
 994,
 973,
 935,
 898,
 1008,
 1017,
 994,
 900,
 1017,
 967,
 987,
 961,
 1010,
 1016,
 987,
 945,
 984,
 988,
 974,
 962,
 999,
 984,
 962,
 1002,
 919,
 1018,
 1015,
 954,
 990,
 837]

In [20]:
#sum of all token size for sentence chunks with special tokens
sum([len(tokenizer(c).input_ids) for c in chunks])

96972

In [21]:
#token size for full text
len(tokenizer.tokenize(full_text))

Token indices sequence length is longer than the specified maximum sequence length for this model (96748 > 1024). Running this sequence through the model will result in indexing errors


96748

In [22]:
#sum of all token size for sentence chunks without special tokens
sum([len(tokenizer.tokenize(c)) for c in chunks])

96770

In [23]:
# inputs to the model
inputs = [tokenizer(chunk, return_tensors="pt") for chunk in chunks]

In [24]:
facebook_bart_summary = []
for i in range(len(inputs)):
  output = model.generate(**inputs[i])
  facebook_bart_summary.append(tokenizer.decode(output[0], skip_special_tokens=True))
  print(f'summarized {i} in {len(inputs)}')

summarized 0 in 101
summarized 1 in 101
summarized 2 in 101
summarized 3 in 101
summarized 4 in 101
summarized 5 in 101
summarized 6 in 101
summarized 7 in 101
summarized 8 in 101
summarized 9 in 101
summarized 10 in 101
summarized 11 in 101
summarized 12 in 101
summarized 13 in 101
summarized 14 in 101
summarized 15 in 101
summarized 16 in 101
summarized 17 in 101
summarized 18 in 101
summarized 19 in 101
summarized 20 in 101
summarized 21 in 101
summarized 22 in 101
summarized 23 in 101
summarized 24 in 101
summarized 25 in 101
summarized 26 in 101
summarized 27 in 101
summarized 28 in 101
summarized 29 in 101
summarized 30 in 101
summarized 31 in 101
summarized 32 in 101
summarized 33 in 101
summarized 34 in 101
summarized 35 in 101
summarized 36 in 101
summarized 37 in 101
summarized 38 in 101
summarized 39 in 101
summarized 40 in 101
summarized 41 in 101
summarized 42 in 101
summarized 43 in 101
summarized 44 in 101
summarized 45 in 101
summarized 46 in 101
summarized 47 in 101
su

In [None]:
# #using with pipeline
# from transformers import pipeline
# summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

# other_facebook_bart_summary = []
# for chunk in chunks:
#   other_facebook_bart_summary.append(summarizer(chunk))


In [26]:
# Join all strings in cleaned_pdf_pages into a single string
extractive_summary = ' '.join(facebook_bart_summary)

extractive_summary

"The Commission is adopting final rule s ( Item 1502(a) to require the disclosure of any climate -related risks that have materially impacted or are reasonably likely to have a material impact on the registrant. The final rules are responsive to investors’ need for decision -useful information regarding registran ts’ material climate-related risks. In addition, because a registrant may be able to assess the material risks posed by its value chain, this change will also limit the burdens of climate risk assessment on third parties. The final rules’ definition of “ transition risk s” includes the same non- 323 See, e.g., letter from Chamber. Final rules require a registrant to provide information necessary to an understanding of the nature of the risk presented and the extent of the registrant’s exposure to the risk. Some commenters opposed proposed Item 1502 because in their view it would be difficult to distinguish between a climate -related physical risk and an ordinary weather risk. 

## Step 3: Abstractive Summary
This step includes the following
1. Load the Google Pegasus model
2. Chunking up the full combined text from step 1 into sentences that are below the limit tokens of Pegasus
3. Feeding the Pegasus model to summarize each chunk and combining the summarization of each chunk into one full string.


In [27]:
from transformers import AutoTokenizer, PegasusModel

model = "google/pegasus-large"

tokenizer = AutoTokenizer.from_pretrained(model)
model = PegasusModel.from_pretrained(model)


tokenizer_config.json:   0%|          | 0.00/88.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/3.09k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/1.91M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/65.0 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/2.28G [00:00<?, ?B/s]

Some weights of PegasusModel were not initialized from the model checkpoint at google/pegasus-large and are newly initialized: ['model.decoder.embed_positions.weight', 'model.encoder.embed_positions.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [28]:
# convert the contents in full text to sentences
import nltk
nltk.download('punkt')
sentences = nltk.tokenize.sent_tokenize(full_text)

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [29]:
length = 0
chunk = ""
chunks = []
count = -1
for sentence in sentences:
  count += 1
  combined_length = len(tokenizer.tokenize(sentence)) + length # add the no. of sentence tokens to the length counter

  if combined_length  <= (tokenizer.max_len_single_sentence - 1): # if it doesn't exceed
    chunk += sentence + " " # add the sentence to the chunk
    length = combined_length # update the length counter

    # if it is the last sentence
    if count == len(sentences) - 1:
      chunks.append(chunk.strip()) # save the chunk

  else:
    chunks.append(chunk.strip()) # save the chunk

    # reset
    length = 0
    chunk = ""

    # take care of the overflow sentence
    chunk += sentence + " "
    length = len(tokenizer.tokenize(sentence))
len(chunks)

96

In [30]:
# inputs to the model
inputs = [tokenizer(chunk, return_tensors="pt") for chunk in chunks]

In [None]:
# from transformers import AutoTokenizer, PegasusForConditionalGeneration

# model_name = "google/pegasus-large"

# tokenizer = AutoTokenizer.from_pretrained(model_name)
# model = PegasusForConditionalGeneration.from_pretrained(model_name)

# pegasus_summary = []
# for input in inputs:
#   input_ids = tokenizer.encode(input, return_tensors='pt')
#   output = model.generate(input_ids)
#   pegasus_summary.append(tokenizer.decode(output[0], skip_special_tokens=True))

Some weights of PegasusForConditionalGeneration were not initialized from the model checkpoint at google/pegasus-large and are newly initialized: ['model.decoder.embed_positions.weight', 'model.encoder.embed_positions.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [31]:
from transformers import PegasusForConditionalGeneration, PegasusTokenizer

# Load Pegasus model and tokenizer
model_name = "google/pegasus-large"
tokenizer = PegasusTokenizer.from_pretrained(model_name)
model = PegasusForConditionalGeneration.from_pretrained(model_name)

pegasus_summary = []

for i in range(len(chunks)):
  # Tokenize input text
  input_ids = tokenizer.encode(chunks[i], return_tensors="pt", max_length=1024, truncation=True)

  # Generate summary
  summary_ids = model.generate(input_ids, num_beams=4, length_penalty=2.0, early_stopping=True)
  final_summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
  pegasus_summary.append(final_summary)
  print(f'summarized {i} in {len(chunks)}')


Some weights of PegasusForConditionalGeneration were not initialized from the model checkpoint at google/pegasus-large and are newly initialized: ['model.decoder.embed_positions.weight', 'model.encoder.embed_positions.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


generation_config.json:   0%|          | 0.00/260 [00:00<?, ?B/s]

summarized 0 in 96
summarized 1 in 96
summarized 2 in 96
summarized 3 in 96
summarized 4 in 96
summarized 5 in 96
summarized 6 in 96
summarized 7 in 96
summarized 8 in 96
summarized 9 in 96
summarized 10 in 96
summarized 11 in 96
summarized 12 in 96
summarized 13 in 96
summarized 14 in 96
summarized 15 in 96
summarized 16 in 96
summarized 17 in 96
summarized 18 in 96
summarized 19 in 96
summarized 20 in 96
summarized 21 in 96
summarized 22 in 96
summarized 23 in 96
summarized 24 in 96
summarized 25 in 96
summarized 26 in 96
summarized 27 in 96
summarized 28 in 96
summarized 29 in 96
summarized 30 in 96
summarized 31 in 96
summarized 32 in 96
summarized 33 in 96
summarized 34 in 96
summarized 35 in 96
summarized 36 in 96
summarized 37 in 96
summarized 38 in 96
summarized 39 in 96
summarized 40 in 96
summarized 41 in 96
summarized 42 in 96
summarized 43 in 96
summarized 44 in 96
summarized 45 in 96
summarized 46 in 96
summarized 47 in 96
summarized 48 in 96
summarized 49 in 96
summarized

In [32]:
pegasus_summary

['315 The final rules, by contrast, are responsive to investors’ need for decision -related risk disclosure, and we have used the proposed rule to describe registrant’s climate -related risks as less burdensome than the proposed rule would have required.317 We have used the proposed rule to describe registrant’s climate -related risks as less burdensome than the proposed rule would have required. Furthermore, adopting a climate -related risk disclosure rule that uses similar definitions (set forth below in Item 1500) and is based on the climate -related disclosure framework of the TCFD, with which many registrants and investors are already familiar, will assist in standardiz ing climate -related risk disclosure and help elicit more consistent, comparable, and useful information for investors and limit the reporting burden for those registrants that are already providing some climate -related disclosure based on the TCFD framework.',
 '326 As discussed in more detail in section II.K.3.c

In [33]:
# Join all strings in cleaned_pdf_pages into a single string
abstractive_summary = ' '.join(pegasus_summary)

abstractive_summary

'315 The final rules, by contrast, are responsive to investors’ need for decision -related risk disclosure, and we have used the proposed rule to describe registrant’s climate -related risks as less burdensome than the proposed rule would have required.317 We have used the proposed rule to describe registrant’s climate -related risks as less burdensome than the proposed rule would have required. Furthermore, adopting a climate -related risk disclosure rule that uses similar definitions (set forth below in Item 1500) and is based on the climate -related disclosure framework of the TCFD, with which many registrants and investors are already familiar, will assist in standardiz ing climate -related risk disclosure and help elicit more consistent, comparable, and useful information for investors and limit the reporting burden for those registrants that are already providing some climate -related disclosure based on the TCFD framework. 326 As discussed in more detail in section II.K.3.c.v, a

## Step 4: Final Summarization
This step includes the following
1. Combining both extractive and abstractive summarization into one full summary
2. Load the Google Pegasus model
3. Chunking up the full summary into sentences that are below the limit tokens of Pegasus
4. Feeding the Pegasus model to summarize each chunk and combining the summarization of each chunk into one full string.
5. Create a dataframe to store the summarization for each file
6. Load exisitng csv file into dataframe and merge with new dataframe
7. Save new merged dataframe into exisiting csv file.



In [34]:
#merge extractive and abstractive
full_summary = extractive_summary + abstractive_summary
full_summary

"The Commission is adopting final rule s ( Item 1502(a) to require the disclosure of any climate -related risks that have materially impacted or are reasonably likely to have a material impact on the registrant. The final rules are responsive to investors’ need for decision -useful information regarding registran ts’ material climate-related risks. In addition, because a registrant may be able to assess the material risks posed by its value chain, this change will also limit the burdens of climate risk assessment on third parties. The final rules’ definition of “ transition risk s” includes the same non- 323 See, e.g., letter from Chamber. Final rules require a registrant to provide information necessary to an understanding of the nature of the risk presented and the extent of the registrant’s exposure to the risk. Some commenters opposed proposed Item 1502 because in their view it would be difficult to distinguish between a climate -related physical risk and an ordinary weather risk. 

In [35]:
# convert the contents in full text to sentences
import nltk
nltk.download('punkt')
sum_sentences = nltk.tokenize.sent_tokenize(full_summary)

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [36]:
length = 0
chunk = ""
chunks = []
count = -1
for sentence in sum_sentences:
  count += 1
  combined_length = len(tokenizer.tokenize(sentence)) + length # add the no. of sentence tokens to the length counter

  if combined_length  <= (tokenizer.max_len_single_sentence - 1): # if it doesn't exceed
    chunk += sentence + " " # add the sentence to the chunk
    length = combined_length # update the length counter

    # if it is the last sentence
    if count == len(sum_sentences) - 1:
      chunks.append(chunk.strip()) # save the chunk

  else:
    chunks.append(chunk.strip()) # save the chunk

    # reset
    length = 0
    chunk = ""

    # take care of the overflow sentence
    chunk += sentence + " "
    length = len(tokenizer.tokenize(sentence))
len(chunks)

25

In [37]:
from transformers import PegasusForConditionalGeneration, PegasusTokenizer

# Load Pegasus model and tokenizer
model_name = "google/pegasus-large"
tokenizer = PegasusTokenizer.from_pretrained(model_name)
model = PegasusForConditionalGeneration.from_pretrained(model_name)

full_summary = []

for i in  range(len(chunks)):
  # Tokenize input text
  input_ids = tokenizer.encode(chunks[i], return_tensors="pt", max_length=1024, truncation=True)

  # Generate summary
  summary_ids = model.generate(input_ids, num_beams=4, length_penalty=2.0, early_stopping=True)
  final_summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
  full_summary.append(final_summary)
  print(f'summarized {i} in {len(chunks)}')

Some weights of PegasusForConditionalGeneration were not initialized from the model checkpoint at google/pegasus-large and are newly initialized: ['model.decoder.embed_positions.weight', 'model.encoder.embed_positions.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


summarized 0 in 25
summarized 1 in 25
summarized 2 in 25
summarized 3 in 25
summarized 4 in 25
summarized 5 in 25
summarized 6 in 25
summarized 7 in 25
summarized 8 in 25
summarized 9 in 25
summarized 10 in 25
summarized 11 in 25
summarized 12 in 25
summarized 13 in 25
summarized 14 in 25
summarized 15 in 25
summarized 16 in 25
summarized 17 in 25
summarized 18 in 25
summarized 19 in 25
summarized 20 in 25
summarized 21 in 25
summarized 22 in 25
summarized 23 in 25
summarized 24 in 25


In [38]:
# Join all strings in full_summary into a single string
summary = ' '.join(full_summary)

summary

'The Commission is adopting final rule s ( Item 1502(a) to require the disclosure of any climate -related risks that have materially impacted or are reasonably likely to have a material impact on the registrant. The Commission proposed to require a registrant to describe the actual and potential impacts on its strategy, business model, and outlook of those climate -related risks. Commenters generally supported the proposed impact disclosure provision but recommended that the Commission add a materiality qualifier to elicit disclosure of only the most likely and significant impacts. The proposed scenario analysis disclosure provision would have included as an example of potential scenarios to be considered “ an increase of no greater than 3 oC, 2 14C, or 1.5 1C above pre-industrial levels.” We have removed the proposed provision stating that the disclosure should include both qualitative and quantitative information. g.G (g,g) (g),g.h (g,.g), g.H (g.,g), G.H. The final rule will require

Risk management and mitigation 3.10 Where the potential impacts of the financial risks from climate change are assessed to be material (for example as a result of scenario analysis), the PRA expects firms to evidence how they will mitigate these financial risks and to have a credible plan or policies in place for managing exposures. Disclosure 3.18 Banks and insurers h ave existing requirements to disclose information on material risks set out within their Pillar 3 disclosures (as required under Capital Requirements Regulation (575/2013) (CRR) and Solvency II), and on principal risks and uncertainties in their Strategic Report ( as required under the UK Companies Act).

In [40]:
import pandas as pd

csv_file_path = 'policies summaries.csv'
#read df
df = pd.read_csv(csv_file_path)

# Print DataFrame
df.tail()

Unnamed: 0,Country,Region,Relevant Authority,Year,Status,Regulation classification,Name of Regulation/Standards,Extractive Summary,Abstractive Summary,Final Summarization
10,USA,North America,Office of the Comptroller of the Currency (OCC),2021,Published,Other Regulators (securities and policy bodies),Principles for Climate-Related Financial Risk ...,The OCC plans to elaborate on these principles...,The OCC plans to elaborate on these principles...,An effective climate -related scenario analysi...
11,Canada,North America,Bank of Canada (BoC) and Office of the Superin...,2023,Published,Central Banks and National Prudential Authorit...,OSFI B-15 Climate Risk Management,Senior Management has overall accountability f...,Senior Management has overall accountability f...,"When undertaking climate scenario analyses, th..."
12,Malaysia,APAC,Bank Negara Malaysia,2022,Published,Central Banks and National Prudential Authorities,Climate Risk Management and Scenario Analysis,PART B REQUIREMENTS AND GUIDANCE 7 Level of Ap...,PART B REQUIREMENTS AND GUIDANCE 7 Level of Ap...,S 7.4 For branches of foreign financial instit...
13,EU,Europe,European Central Bank,2020,Published,Central Banks and National Prudential Authorities,Guide on Climate-related and environmental ris...,"As se t out in the EBA Guidelines, institution...","As se t out in the EBA Guidelines, institution...",Climate -related and environmental risks may d...
14,United Kingdom,Europe,Prudential Regulation Authority (PRA),2019,Published,Central Banks and National Prudential Authorities,Enhancing banks’ and insurers’ approaches to m...,The magnitude of the financial risks from clim...,The risk appetite statement should include the...,Risk management and mitigation 3.10 Where the ...


In [41]:
#create df with summaries
import pandas as pd

def create_newdf(country, region, relevant_authority, year, status, regulation_classification,name_of_standard,extractive_summary,abstractive_summary,summary):
  df2 = pd.DataFrame({'Country ': [country],
                    'Region' : [region],
                    'Relevant Authority ': [relevant_authority],
                    'Year': [year],
                    'Status': [status],
                    'Regulation classification': [regulation_classification],
                    'Name of Regulation/Standards': [name_of_standard],
                    'Extractive Summary': extractive_summary,
                    'Abstractive Summary': abstractive_summary,
                    'Final Summarization': summary})
  return df2

In [42]:
#create df with summaries

df2 = create_newdf('USA', 'North America','Securities and Exchange Commission', '2024', 'Published', 'Other Regulators (securities and policy bodies)','The Enhancement and Standardization of Climate-Related Disclosures for Investors',extractive_summary,abstractive_summary,summary)

df2.head()

Unnamed: 0,Country,Region,Relevant Authority,Year,Status,Regulation classification,Name of Regulation/Standards,Extractive Summary,Abstractive Summary,Final Summarization
0,USA,North America,Securities and Exchange Commission,2024,Published,Other Regulators (securities and policy bodies),The Enhancement and Standardization of Climate...,The Commission is adopting final rule s ( Item...,"315 The final rules, by contrast, are responsi...",The Commission is adopting final rule s ( Item...


In [43]:
#merge both dfs and write in csv

merged_df = pd.concat([df, df2], ignore_index=True)

merged_df.to_csv(csv_file_path, index=False)

print(f"DataFrame has been written to '{csv_file_path}' successfully.")

DataFrame has been written to 'policies summaries.csv' successfully.


In [44]:
#read update csv and view the last additions
df_updated = pd.read_csv(csv_file_path)

# Print DataFrame
df_updated.tail()

Unnamed: 0,Country,Region,Relevant Authority,Year,Status,Regulation classification,Name of Regulation/Standards,Extractive Summary,Abstractive Summary,Final Summarization
11,Canada,North America,Bank of Canada (BoC) and Office of the Superin...,2023,Published,Central Banks and National Prudential Authorit...,OSFI B-15 Climate Risk Management,Senior Management has overall accountability f...,Senior Management has overall accountability f...,"When undertaking climate scenario analyses, th..."
12,Malaysia,APAC,Bank Negara Malaysia,2022,Published,Central Banks and National Prudential Authorities,Climate Risk Management and Scenario Analysis,PART B REQUIREMENTS AND GUIDANCE 7 Level of Ap...,PART B REQUIREMENTS AND GUIDANCE 7 Level of Ap...,S 7.4 For branches of foreign financial instit...
13,EU,Europe,European Central Bank,2020,Published,Central Banks and National Prudential Authorities,Guide on Climate-related and environmental ris...,"As se t out in the EBA Guidelines, institution...","As se t out in the EBA Guidelines, institution...",Climate -related and environmental risks may d...
14,United Kingdom,Europe,Prudential Regulation Authority (PRA),2019,Published,Central Banks and National Prudential Authorities,Enhancing banks’ and insurers’ approaches to m...,The magnitude of the financial risks from clim...,The risk appetite statement should include the...,Risk management and mitigation 3.10 Where the ...
15,USA,North America,Securities and Exchange Commission,2024,Published,Other Regulators (securities and policy bodies),The Enhancement and Standardization of Climate...,The Commission is adopting final rule s ( Item...,"315 The final rules, by contrast, are responsi...",The Commission is adopting final rule s ( Item...
