<a href="https://colab.research.google.com/github/LordLean/Extracting-Green-Bonds-Use-of-Proceeds/blob/main/QA_Pipeline.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Information Retieval

## Answer Retriever


In [None]:
!pip install rank-bm25

!pip install PyPDF2

!pip install tabula-py

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting rank-bm25
  Downloading rank_bm25-0.2.2-py3-none-any.whl (8.6 kB)
Installing collected packages: rank-bm25
Successfully installed rank-bm25-0.2.2
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting PyPDF2
  Downloading PyPDF2-2.10.2-py3-none-any.whl (214 kB)
[K     |████████████████████████████████| 214 kB 8.1 MB/s 
Installing collected packages: PyPDF2
Successfully installed PyPDF2-2.10.2
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting tabula-py
  Downloading tabula_py-2.5.0-py3-none-any.whl (12.0 MB)
[K     |████████████████████████████████| 12.0 MB 7.1 MB/s 
Collecting distro
  Downloading distro-1.7.0-py3-none-any.whl (20 kB)
Installing collected packages: distro, tabula-py
Successfully installed distro-1.7.0 tabula-py-2.5.0


In [None]:
import numpy as np

import tabula
from rank_bm25 import BM25Okapi
from PyPDF2 import PdfReader

In [None]:
class TableReader:

  def __init__(self, pdf):
    self.pdf = pdf
    self.dfs = None

  def read_pages(self, pages="all", multiple_tables=True, stream=True):
    '''
    Return tables discovered within pdf.
    '''
    self.dfs = tabula.read_pdf(self.pdf, pages=pages, multiple_tables=multiple_tables, stream=stream)
    self.__clean_dfs()
    return self.dfs

  def __clean_dfs(self, thresh=2):
    self.dfs = [df.dropna(thresh=thresh) for df in self.dfs]


class Reader:

  def __init__(self, filename):
    self.reader = PdfReader(filename)
    self.tb = TableReader(filename)
    self.page_viewer = {page_num : {} for page_num in range(self.reader.numPages)}
    self.idx2page_item = []
  
  def __extract_text(self,):
    '''
    Page-wise text extraction and tokenize for BM25.
    '''
    text_index_mem = 0
    # List to store each tokenized corpus
    tokenized_corpus_list = []
    for i in range(self.reader.numPages):
      raw_text = self.reader.getPage(i).extractText()
      self.page_viewer[i]["raw_text"] = raw_text
      # Split text
      corpus = raw_text.split("\n \n")
      # Store results.
      self.page_viewer[i]["corpus"] = corpus
      for item in corpus:
        self.idx2page_item.append((i, item)) # page,textItem
      # Tokenize
      tokenized_corpus = [doc.split(" ") for doc in corpus]
      tokenized_corpus_list.append(tokenized_corpus)
    # BM25 computations only after the complete tokenized corpus is collated. 
    # Merge tokenized corpus'.
    tokenized_corpus_complete = [item for sublist in tokenized_corpus_list for item in sublist]
    # BM25
    self.bm25 = BM25Okapi(tokenized_corpus_complete)

  def __extract_tables(self):
    '''
    Page-wise table extractor.
    '''
    for i in range(self.reader.numPages):
      # page=0 will throw error using tabula.
      page = str(i+1)
      self.page_viewer[i]["tables"] = self.tb.read_pages(pages=page)

  def extract_pdf(self):
    # Extract data
    self.__extract_text()
    # self.__extract_tables()

  def print_page(self, page_num):
    '''
    Print separated sections of text given a page.
    '''
    corpus = self.page_viewer[page_num]["corpus"]
    for item in (corpus):
      print("\n{}\n".format("-"*60))
      print(item)
    print("\n{}\n".format("-"*60))
    for df in self.page_viewer[page_num]["tables"]:
      print(df.style)
      display(df)

  def __score(self, queries, weights):
    '''
    Compute the average BM25 score of each given query on each page of text.
    '''
    self.ranked_scores = []
    for query in queries:
      # tokenize query by whitespace.
      tokenized_query = query.split()
      # Compute score.
      doc_scores = self.bm25.get_scores(tokenized_query)
      self.ranked_scores.append(doc_scores)
    # Compute average (weighted) score against all queries.
    if not len(weights):
      # Equal weighting.
      self.average_score = np.average(self.ranked_scores, axis=0)
    elif len(queries) != len(weights):
        # Unequal number of elements.
        raise ValueError("Number of query and weight elements passed must be equal.")
    else:
      # Weighted average.
      self.average_score = np.average(self.ranked_scores, weights=weights, axis=0)
 
  def get_ranked_texts(self, queries, weights=[], n=5):
    '''
    Return n pages which scored highest using BM25.
    '''
    # Run score method to calculate BM25.
    self.__score(queries, weights)
    try:
      idx = sorted(range(len(self.average_score)), key=lambda i: self.average_score[i], reverse=True)[:n]
    except IndexError:
      idx = sorted(range(len(self.average_score)), key=lambda i: self.average_score[i], reverse=True)
    final_results = []
    for i in range(n):
      page_num, text = self.idx2page_item[idx[i]]
      # tables = self.page_viewer[page_num]["tables"]
      # final_results.append({"page_num":page_num, "text":text, "tables":tables})
      final_results.append(text)

    return final_results
    

## Answer Re-ranker (Neural: BERT / T5)

In [None]:
!pip install pygaggle

!pip install transformers==4.6.1

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pygaggle
  Downloading pygaggle-0.0.3.1.tar.gz (33 kB)
Collecting coloredlogs==14.0
  Downloading coloredlogs-14.0-py2.py3-none-any.whl (43 kB)
[K     |████████████████████████████████| 43 kB 2.1 MB/s 
Collecting pydantic==1.5
  Downloading pydantic-1.5-cp37-cp37m-manylinux2014_x86_64.whl (7.3 MB)
[K     |████████████████████████████████| 7.3 MB 12.0 MB/s 
[?25hCollecting pyserini==0.10.1.0
  Downloading pyserini-0.10.1.0-py3-none-any.whl (63.3 MB)
[K     |████████████████████████████████| 63.3 MB 16 kB/s 
Collecting spacy==2.2.4
  Downloading spacy-2.2.4-cp37-cp37m-manylinux1_x86_64.whl (10.6 MB)
[K     |████████████████████████████████| 10.6 MB 18.9 MB/s 
Collecting tokenizers==0.9.4
  Downloading tokenizers-0.9.4-cp37-cp37m-manylinux2010_x86_64.whl (2.9 MB)
[K     |████████████████████████████████| 2.9 MB 38.6 MB/s 
[?25hCollecting tqdm==4.45.0
  Downloading tqdm-4.45

In [None]:
from pygaggle.rerank.base import Query, Text
from pygaggle.rerank.transformer import MonoT5, MonoBERT

class Reranker:

  def __init__(self):
    self.mono5t = MonoT5()
    self.monobert = MonoBERT()

  def rerank(self, query, texts, method="T5"):
    query = Query(query)
    texts = [Text(text, {"docid" : i}, 0) for i, text in enumerate(texts)]

    if method == "T5":
      reranker = self.mono5t
    if method == "BERT":
      reranker = self.monobert

    reranked = reranker.rerank(query, texts)
    reranked.sort(key=lambda x: x.score, reverse=True)

    return reranked

2022-08-18 11:45:24 [INFO] loader: Loading faiss with AVX2 support.
2022-08-18 11:45:24 [INFO] loader: Could not load library with AVX2 support due to:
ModuleNotFoundError("No module named 'faiss.swigfaiss_avx2'")
2022-08-18 11:45:24 [INFO] loader: Loading faiss.
2022-08-18 11:45:24 [INFO] loader: Successfully loaded faiss.


# QA Model

In [3]:
# Download zipped model
!gdown 10wA2fWuOUlGZCUDwUqruZSqDO5jn7HSg

# Unzip
!unzip 1892258.zip

# Delete
!rm 1892258.zip

Downloading...
From: https://drive.google.com/uc?id=10wA2fWuOUlGZCUDwUqruZSqDO5jn7HSg
To: /content/1892258.zip
100% 406M/406M [00:01<00:00, 203MB/s]
Archive:  1892258.zip
   creating: content/finbert-pretrain-finetuned-squad/model/
  inflating: content/finbert-pretrain-finetuned-squad/model/vocab.txt  
  inflating: content/finbert-pretrain-finetuned-squad/model/tokenizer.json  
  inflating: content/finbert-pretrain-finetuned-squad/model/pytorch_model.bin  
  inflating: content/finbert-pretrain-finetuned-squad/model/training_args.bin  
  inflating: content/finbert-pretrain-finetuned-squad/model/special_tokens_map.json  
  inflating: content/finbert-pretrain-finetuned-squad/model/config.json  
  inflating: content/finbert-pretrain-finetuned-squad/model/tokenizer_config.json  


In [2]:
# !pip install transformers 

from transformers import pipeline

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.21.2-py3-none-any.whl (4.7 MB)
[K     |████████████████████████████████| 4.7 MB 5.2 MB/s 
[?25hCollecting huggingface-hub<1.0,>=0.1.0
  Downloading huggingface_hub-0.9.1-py3-none-any.whl (120 kB)
[K     |████████████████████████████████| 120 kB 44.5 MB/s 
Collecting tokenizers!=0.11.3,<0.13,>=0.11.1
  Downloading tokenizers-0.12.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.6 MB)
[K     |████████████████████████████████| 6.6 MB 42.8 MB/s 
Installing collected packages: tokenizers, huggingface-hub, transformers
Successfully installed huggingface-hub-0.9.1 tokenizers-0.12.1 transformers-4.21.2


In [6]:
model_dir = "content/finbert-pretrain-finetuned-squad/model"

question_answering = pipeline("question-answering", model=model_dir, tokenizer=model_dir)

In [10]:
context = """
Vía Célere intends to report on allocation of proceeds on
its website, on an annual basis, until full allocation. The allocation
reporting will include the total amount allocated to projects, the share
of financing vs. refinancing, and unallocated proceeds. In addition,
Vía Célere is committed to reporting on relevant impact metrics, such
as energy consumption reduction (in kWh) or emission reduction (in
tons of CO2e). Sustainalytics views Vía Célere’s allocation and impact
reporting as aligned with market practice.
"""

question = "how often is allocation of proceeds reported?"

In [11]:
result = question_answering(question=question, context=context, device=0)
print("Answer:", result['answer'])
print("Score:", result['score'])

Answer: annual
Score: 0.4391261041164398


# ICMA Database Upload

In [None]:
!wget https://www.icmagroup.org/assets/documents/Sustainable-finance/Database/ICMA-Sustainable-Bonds-Database-120822.xlsx

--2022-08-18 11:45:53--  https://www.icmagroup.org/assets/documents/Sustainable-finance/Database/ICMA-Sustainable-Bonds-Database-120822.xlsx
Resolving www.icmagroup.org (www.icmagroup.org)... 91.216.93.249
Connecting to www.icmagroup.org (www.icmagroup.org)|91.216.93.249|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 274575 (268K) [application/vnd.openxmlformats-officedocument.spreadsheetml.sheet]
Saving to: ‘ICMA-Sustainable-Bonds-Database-120822.xlsx’


2022-08-18 11:45:55 (422 KB/s) - ‘ICMA-Sustainable-Bonds-Database-120822.xlsx’ saved [274575/274575]



In [None]:
import pandas as pd
import openpyxl

In [None]:
filename = "ICMA-Sustainable-Bonds-Database-120822.xlsx"

# select green bond sheet.
gb_sheet = pd.ExcelFile(filename).sheet_names[0] 

df = pd.read_excel(filename, sheet_name=gb_sheet, header=1)

In [None]:
# Use openpyxl to load xls with hyperlink text.
wb = openpyxl.load_workbook(filename)
ws = wb[gb_sheet]

hyperlink_list = []

for i in range(len(df)):
  try:
    hyperlink_list.append(ws.cell(row=(3+i), column=6).hyperlink.target)
  except:
    # Nan 
    hyperlink_list.append(None)

# Add list to df.
df["External Review Report Text"] = hyperlink_list

In [None]:
df["Issuer Category/Sector"].unique()

array(['Financial Institution', 'Corporate-Energy', 'Utility',
       'Corporate-Infrastructure', 'Corporate-Real Estate',
       'Corporate-Transportation', 'MDB', 'Agency', 'Corporate-agri food',
       'Corporate-Consumer services', 'Corporate-Consumer goods',
       'Sovereign', 'Corporate-Industry', 'Municipal', nan,
       'Corporate-Technology', 'Corporate-consumer services',
       'Corporate-Tourism', 'Corporate-Real estate', 'Corporate-Telecom',
       'Corporate-Water', 'Corporate-Healthcare'], dtype=object)

In [None]:
european = [
    'Spain', "The Netherlands", "Italy", "Sweden", "Norway", "France", "Luxembourg",
    "UK", "Belgium", "Hungary", "Switzerland", "Germany", "Finland", "Iceland", "Poland",
    "Czech Republic", "Denmark", "Ireland", "Greece", "Guernsey", "Austria", "Latvia",
    "Lithuania", "Romania", "Slovenia", "Slovakia",
]
sector = "Corporate-Energy"
external = "CICERO" # second-party opinion


df = df.loc[
    (df["Jurisdiction"].isin(european)) &
    (df["Issuer Category/Sector"] == sector) &
    (df["External Review Report"] == external)
] 

files = df["External Review Report Text"].to_list()

In [None]:
name2url = {link.strip().rsplit('/', 1)[-1] : link.strip() for link in files}
url2name = {link.strip() : link.strip().rsplit('/', 1)[-1]for link in files}
url2name

{'http://www.icmagroup.org/Emails/icma-vcards/Advanced%20Soltech_External%20Review%20Report.pdf': 'Advanced%20Soltech_External%20Review%20Report.pdf',
 'https://www.icmagroup.org/Emails/icma-vcards/Agder%20energi_External%20Review%20Report.pdf': 'Agder%20energi_External%20Review%20Report.pdf',
 'https://www.icmagroup.org/Emails/icma-vcards/AkershusEnergi_External%20Review%20Report.pdf': 'AkershusEnergi_External%20Review%20Report.pdf',
 'https://www.icmagroup.org/Emails/icma-vcards/Arendals_External%20Review%20Report.pdf': 'Arendals_External%20Review%20Report.pdf',
 'https://www.icmagroup.org/Emails/icma-vcards/Caruna_External%20Review%20Report.pdf': 'Caruna_External%20Review%20Report.pdf',
 'https://www.icmagroup.org/Emails/icma-vcards/Columbus_External%20Review%20Report.pdf': 'Columbus_External%20Review%20Report.pdf',
 'https://www.icmagroup.org/Emails/icma-vcards/East%20Renewable_External%20Review%20Report.pdf': 'East%20Renewable_External%20Review%20Report.pdf',
 'http://www.icmagrou

In [None]:
# Create documents folder
!mkdir documents

In [None]:
# Create sector Specific directory
!mkdir documents/Corporate-Energy

In [None]:
import os.path
import urllib.request

for link, name in url2name.items():
    filename = os.path.join('./documents/{}'.format(sector), name)
    if not os.path.isfile(filename):
        print('Downloading: ' + filename)
        try:
            urllib.request.urlretrieve(link, filename)
        except Exception as inst:
            print(inst)
            print('  Encountered unknown error. Continuing.')

Downloading: ./documents/Corporate-Energy/Advanced%20Soltech_External%20Review%20Report.pdf
Downloading: ./documents/Corporate-Energy/Agder%20energi_External%20Review%20Report.pdf
Downloading: ./documents/Corporate-Energy/AkershusEnergi_External%20Review%20Report.pdf
Downloading: ./documents/Corporate-Energy/Arendals_External%20Review%20Report.pdf
Downloading: ./documents/Corporate-Energy/Caruna_External%20Review%20Report.pdf
Downloading: ./documents/Corporate-Energy/Columbus_External%20Review%20Report.pdf
Downloading: ./documents/Corporate-Energy/East%20Renewable_External%20Review%20Report.pdf
Downloading: ./documents/Corporate-Energy/Fortum_External%20Review%20Report.pdf
Downloading: ./documents/Corporate-Energy/Hafslund_External%20Review%20Report.pdf
Downloading: ./documents/Corporate-Energy/Latvenergo%20AS_External%20Review%20Report.pdf
Downloading: ./documents/Corporate-Energy/Latvenergo_External%20Review%20Report.pdf
Downloading: ./documents/Corporate-Energy/Lietuvos%20Energija_E

# Run Questions

In [None]:
from tqdm.notebook import tqdm_notebook

In [None]:
# Set of query, question pairs per area of interest.

# For each area of questioning: list of IR queries with paired question.
query_questioner = {
    # Area of investigation.
    "Alignment" : [
        # (list of IR queries, question_for_QA_model)
        (["four core components of the GBP", "Alignment with Green Bond Principles"], "sustainalytics is of the opinion that the Bonds are what?"),
        (["Green finance framework"], "the green finance framework is what?"),
    ],
    "SDG Goals" : [
        (["UN Sustainable Development Goals", "SDG"], "Which sustainable development goals are advanced?"),
    ],
    "Use of Proceeds" : [
        ([" eligible category for the use of proceeds"], "What are the eligible categories for the use of proceeds?"),
        (["eligible category for the use of proceeds", "UN Sustainable Development Goals", "SDG"], "What do the eligible categories lead to?"),
    ],
    "Project Evaluation" : [
        (["Project Evaluation Selection"], "who manages evaluating and selecting projects?"),
    ],
    "Management of Proceeds" : [
        (["management proceeds"], "who is responsible for management of proceeds?"),
        (["unallocated proceeds"], "unallocated proceeds will be held where?"),
    ],
    "Reporting" : [
        (["reporting", "impact reporting"], "what is reported?"),
        (["until full allocation", "report", "on the allocation of the net proceeds of issued green finance instruments"], "on what basis?"),
    ],
}

In [None]:
# dataframe to hold results
df_test = pd.DataFrame(columns=["Company", "External Reviewer", "Sector", "Area of Interest", "IR Query", "Question", "Score", "Answer", "Answer Sentence", "Answer Full Text", "Priority Flag"])
df_test

Unnamed: 0,Company,External Reviewer,Sector,Area of Interest,IR Query,Question,Score,Answer,Answer Sentence,Answer Full Text,Priority Flag


In [None]:
class QueryAnswer:

  def __init__(self, company, area, retrieval_query, qa_question, score, answer, answer_sentence, answer_full_text):
    self.company = company
    self.area = area
    self.retrieval_query = retrieval_query
    self.qa_question = qa_question
    self.score = score
    self.answer = answer
    self.answer_sentence = answer_sentence
    self.answer_full_text = answer_full_text

In [None]:
reranker = Reranker()

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1841.0, style=ProgressStyle(description…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=891691413.0, style=ProgressStyle(descri…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1199.0, style=ProgressStyle(description…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=791656.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1389353.0, style=ProgressStyle(descript…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=314.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1340665848.0, style=ProgressStyle(descr…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=571.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=231508.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=28.0, style=ProgressStyle(description_w…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=466062.0, style=ProgressStyle(descripti…




In [None]:
bm25_count = 30
reranked_count = 15

In [None]:
# Run retrieval and QA models over specificed SPO documents.
for company in tqdm_notebook(name2url.keys()):
  # Extract PDF
  print(company)
  filename = "documents/{}/{}".format(sector,company)
  reader = Reader(filename)
  reader.extract_pdf()
  # Iterate through query, question pairs 
  for area, pair_list in tqdm_notebook(query_questioner.items()):
    for queries, question in pair_list:

      # Get BM25 rankings
      try:
        texts = reader.get_ranked_texts(queries, n=bm25_count)
      # No answers.
      except IndexError:
        print("No Results: {}\n   {}".format(company, pair_list))
        break
      # Rerank
      reranked = reranker.rerank(queries, texts, method="T5")
      reranked = [item for item in reranked if len(item.text.strip())>0]
      reranked = reranked[:reranked_count]

      # List to store results.
      queryAnswer_list = []
      # Iterate through reranker list of text.
      for i in range(len(reranked)):
        # Feed context and question in QA model, return top 3 results.
        context = reranked[i].text
        # topk answers
        topk = 3
        results = question_answering(question=question, context=context, device=0, topk=topk)
        # if len(results) == 4 then only one answer has been found and 4 corresponds to the number of keys.
        if len(results) == 4: 
          # Wrap in list to avoid TypeError later on.
          results = [results]
        # Iterate through each result to append to results list. -- ultimately we want to consider all results, including those on repeated texts, if those results best answered the question.
        for result in results:
          # Match answer to sentence
          all_stops = [i for i, ltr in enumerate(context) if ltr == "."]
          sentence = ""
          try:
            if all_stops:
              if result["end"] <= all_stops[0]:
                sentence = context[:all_stops[0]]
              elif result["start"] >= all_stops[-1]:
                sentence = context[all_stops[-1]:]
              else:
                for i, stop_idx in enumerate(all_stops):
                  if result["start"] >= stop_idx:
                    try:
                      sentence = context[all_stops[i] : all_stops[i+1]]
                    except IndexError:
                      sentence = context[all_stops[i]:]
            else:
              sentence = context
          except:
            pass
          # Save discovered results in QueryAnswer obj.
          item = QueryAnswer(
              company = company,
              area = area,
              retrieval_query = queries,
              qa_question = question,
              score = result["score"],
              answer = result["answer"],
              answer_sentence = sentence,
              answer_full_text=context
          )
          queryAnswer_list.append(item)

      # Get top 3 results based on QA returned score.
      top_n = 3
      queryAnswer_list.sort(key = lambda x : x.score, reverse=True)
      final_queryAnswers = queryAnswer_list[:top_n]
      # Add new record to dataframe.
      for i, result in enumerate(final_queryAnswers):
        new_row = {
            "Company" : result.company,
            "External Reviewer" : external, # second-party opinion provider
            "Sector" : sector, # issue sector.
            "Area of Interest" : result.area,
            "IR Query" : result.retrieval_query,
            "Question" : result.qa_question,
            "Score" : result.score,
            "Answer" : result.answer,
            "Answer Sentence" : result.answer_sentence,
            "Answer Full Text" : result.answer_full_text,
            "Priority Flag" : i
        }
        df_test = df_test.append(new_row, ignore_index=True)

HBox(children=(FloatProgress(value=0.0, max=18.0), HTML(value='')))

Advanced%20Soltech_External%20Review%20Report.pdf


HBox(children=(FloatProgress(value=0.0, max=6.0), HTML(value='')))

No Results: Advanced%20Soltech_External%20Review%20Report.pdf
   [(['four core components of the GBP', 'Alignment with Green Bond Principles'], 'sustainalytics is of the opinion that the Bonds are what?'), (['Green finance framework'], 'the green finance framework is what?')]
No Results: Advanced%20Soltech_External%20Review%20Report.pdf
   [(['UN Sustainable Development Goals', 'SDG'], 'Which sustainable development goals are advanced?')]
No Results: Advanced%20Soltech_External%20Review%20Report.pdf
   [([' eligible category for the use of proceeds'], 'What are the eligible categories for the use of proceeds?'), (['eligible category for the use of proceeds', 'UN Sustainable Development Goals', 'SDG'], 'What do the eligible categories lead to?')]
No Results: Advanced%20Soltech_External%20Review%20Report.pdf
   [(['Project Evaluation Selection'], 'who manages evaluating and selecting projects?')]
No Results: Advanced%20Soltech_External%20Review%20Report.pdf
   [(['management proceeds'], 

HBox(children=(FloatProgress(value=0.0, max=6.0), HTML(value='')))

  f"This sequence already has {self.eos_token}. In future versions this behavior may lead to duplicated eos tokens being added."



AkershusEnergi_External%20Review%20Report.pdf


HBox(children=(FloatProgress(value=0.0, max=6.0), HTML(value='')))


Arendals_External%20Review%20Report.pdf


HBox(children=(FloatProgress(value=0.0, max=6.0), HTML(value='')))

No Results: Arendals_External%20Review%20Report.pdf
   [(['four core components of the GBP', 'Alignment with Green Bond Principles'], 'sustainalytics is of the opinion that the Bonds are what?'), (['Green finance framework'], 'the green finance framework is what?')]
No Results: Arendals_External%20Review%20Report.pdf
   [(['UN Sustainable Development Goals', 'SDG'], 'Which sustainable development goals are advanced?')]
No Results: Arendals_External%20Review%20Report.pdf
   [([' eligible category for the use of proceeds'], 'What are the eligible categories for the use of proceeds?'), (['eligible category for the use of proceeds', 'UN Sustainable Development Goals', 'SDG'], 'What do the eligible categories lead to?')]
No Results: Arendals_External%20Review%20Report.pdf
   [(['Project Evaluation Selection'], 'who manages evaluating and selecting projects?')]
No Results: Arendals_External%20Review%20Report.pdf
   [(['management proceeds'], 'who is responsible for management of proceeds?'),

HBox(children=(FloatProgress(value=0.0, max=6.0), HTML(value='')))


Columbus_External%20Review%20Report.pdf


HBox(children=(FloatProgress(value=0.0, max=6.0), HTML(value='')))

No Results: Columbus_External%20Review%20Report.pdf
   [(['four core components of the GBP', 'Alignment with Green Bond Principles'], 'sustainalytics is of the opinion that the Bonds are what?'), (['Green finance framework'], 'the green finance framework is what?')]
No Results: Columbus_External%20Review%20Report.pdf
   [(['UN Sustainable Development Goals', 'SDG'], 'Which sustainable development goals are advanced?')]
No Results: Columbus_External%20Review%20Report.pdf
   [([' eligible category for the use of proceeds'], 'What are the eligible categories for the use of proceeds?'), (['eligible category for the use of proceeds', 'UN Sustainable Development Goals', 'SDG'], 'What do the eligible categories lead to?')]
No Results: Columbus_External%20Review%20Report.pdf
   [(['Project Evaluation Selection'], 'who manages evaluating and selecting projects?')]
No Results: Columbus_External%20Review%20Report.pdf
   [(['management proceeds'], 'who is responsible for management of proceeds?'),

HBox(children=(FloatProgress(value=0.0, max=6.0), HTML(value='')))

No Results: East%20Renewable_External%20Review%20Report.pdf
   [(['four core components of the GBP', 'Alignment with Green Bond Principles'], 'sustainalytics is of the opinion that the Bonds are what?'), (['Green finance framework'], 'the green finance framework is what?')]
No Results: East%20Renewable_External%20Review%20Report.pdf
   [(['UN Sustainable Development Goals', 'SDG'], 'Which sustainable development goals are advanced?')]
No Results: East%20Renewable_External%20Review%20Report.pdf
   [([' eligible category for the use of proceeds'], 'What are the eligible categories for the use of proceeds?'), (['eligible category for the use of proceeds', 'UN Sustainable Development Goals', 'SDG'], 'What do the eligible categories lead to?')]
No Results: East%20Renewable_External%20Review%20Report.pdf
   [(['Project Evaluation Selection'], 'who manages evaluating and selecting projects?')]
No Results: East%20Renewable_External%20Review%20Report.pdf
   [(['management proceeds'], 'who is re

HBox(children=(FloatProgress(value=0.0, max=6.0), HTML(value='')))

No Results: Fortum_External%20Review%20Report.pdf
   [(['four core components of the GBP', 'Alignment with Green Bond Principles'], 'sustainalytics is of the opinion that the Bonds are what?'), (['Green finance framework'], 'the green finance framework is what?')]
No Results: Fortum_External%20Review%20Report.pdf
   [(['UN Sustainable Development Goals', 'SDG'], 'Which sustainable development goals are advanced?')]
No Results: Fortum_External%20Review%20Report.pdf
   [([' eligible category for the use of proceeds'], 'What are the eligible categories for the use of proceeds?'), (['eligible category for the use of proceeds', 'UN Sustainable Development Goals', 'SDG'], 'What do the eligible categories lead to?')]
No Results: Fortum_External%20Review%20Report.pdf
   [(['Project Evaluation Selection'], 'who manages evaluating and selecting projects?')]
No Results: Fortum_External%20Review%20Report.pdf
   [(['management proceeds'], 'who is responsible for management of proceeds?'), (['unallo

HBox(children=(FloatProgress(value=0.0, max=6.0), HTML(value='')))

No Results: Hafslund_External%20Review%20Report.pdf
   [(['four core components of the GBP', 'Alignment with Green Bond Principles'], 'sustainalytics is of the opinion that the Bonds are what?'), (['Green finance framework'], 'the green finance framework is what?')]
No Results: Hafslund_External%20Review%20Report.pdf
   [(['UN Sustainable Development Goals', 'SDG'], 'Which sustainable development goals are advanced?')]
No Results: Hafslund_External%20Review%20Report.pdf
   [([' eligible category for the use of proceeds'], 'What are the eligible categories for the use of proceeds?'), (['eligible category for the use of proceeds', 'UN Sustainable Development Goals', 'SDG'], 'What do the eligible categories lead to?')]
No Results: Hafslund_External%20Review%20Report.pdf
   [(['Project Evaluation Selection'], 'who manages evaluating and selecting projects?')]
No Results: Hafslund_External%20Review%20Report.pdf
   [(['management proceeds'], 'who is responsible for management of proceeds?'),

HBox(children=(FloatProgress(value=0.0, max=6.0), HTML(value='')))

No Results: Latvenergo%20AS_External%20Review%20Report.pdf
   [(['four core components of the GBP', 'Alignment with Green Bond Principles'], 'sustainalytics is of the opinion that the Bonds are what?'), (['Green finance framework'], 'the green finance framework is what?')]
No Results: Latvenergo%20AS_External%20Review%20Report.pdf
   [(['UN Sustainable Development Goals', 'SDG'], 'Which sustainable development goals are advanced?')]
No Results: Latvenergo%20AS_External%20Review%20Report.pdf
   [([' eligible category for the use of proceeds'], 'What are the eligible categories for the use of proceeds?'), (['eligible category for the use of proceeds', 'UN Sustainable Development Goals', 'SDG'], 'What do the eligible categories lead to?')]
No Results: Latvenergo%20AS_External%20Review%20Report.pdf
   [(['Project Evaluation Selection'], 'who manages evaluating and selecting projects?')]
No Results: Latvenergo%20AS_External%20Review%20Report.pdf
   [(['management proceeds'], 'who is respons

HBox(children=(FloatProgress(value=0.0, max=6.0), HTML(value='')))

No Results: Latvenergo_External%20Review%20Report.pdf
   [(['four core components of the GBP', 'Alignment with Green Bond Principles'], 'sustainalytics is of the opinion that the Bonds are what?'), (['Green finance framework'], 'the green finance framework is what?')]
No Results: Latvenergo_External%20Review%20Report.pdf
   [(['UN Sustainable Development Goals', 'SDG'], 'Which sustainable development goals are advanced?')]
No Results: Latvenergo_External%20Review%20Report.pdf
   [([' eligible category for the use of proceeds'], 'What are the eligible categories for the use of proceeds?'), (['eligible category for the use of proceeds', 'UN Sustainable Development Goals', 'SDG'], 'What do the eligible categories lead to?')]
No Results: Latvenergo_External%20Review%20Report.pdf
   [(['Project Evaluation Selection'], 'who manages evaluating and selecting projects?')]
No Results: Latvenergo_External%20Review%20Report.pdf
   [(['management proceeds'], 'who is responsible for management of pr

HBox(children=(FloatProgress(value=0.0, max=6.0), HTML(value='')))

No Results: Lietuvos%20Energija_External%20Review%20Report.pdf
   [(['four core components of the GBP', 'Alignment with Green Bond Principles'], 'sustainalytics is of the opinion that the Bonds are what?'), (['Green finance framework'], 'the green finance framework is what?')]
No Results: Lietuvos%20Energija_External%20Review%20Report.pdf
   [(['UN Sustainable Development Goals', 'SDG'], 'Which sustainable development goals are advanced?')]
No Results: Lietuvos%20Energija_External%20Review%20Report.pdf
   [([' eligible category for the use of proceeds'], 'What are the eligible categories for the use of proceeds?'), (['eligible category for the use of proceeds', 'UN Sustainable Development Goals', 'SDG'], 'What do the eligible categories lead to?')]
No Results: Lietuvos%20Energija_External%20Review%20Report.pdf
   [(['Project Evaluation Selection'], 'who manages evaluating and selecting projects?')]
No Results: Lietuvos%20Energija_External%20Review%20Report.pdf
   [(['management proceed

HBox(children=(FloatProgress(value=0.0, max=6.0), HTML(value='')))

No Results: NEI_External%20Review%20Report.pdf
   [(['four core components of the GBP', 'Alignment with Green Bond Principles'], 'sustainalytics is of the opinion that the Bonds are what?'), (['Green finance framework'], 'the green finance framework is what?')]
No Results: NEI_External%20Review%20Report.pdf
   [(['UN Sustainable Development Goals', 'SDG'], 'Which sustainable development goals are advanced?')]
No Results: NEI_External%20Review%20Report.pdf
   [([' eligible category for the use of proceeds'], 'What are the eligible categories for the use of proceeds?'), (['eligible category for the use of proceeds', 'UN Sustainable Development Goals', 'SDG'], 'What do the eligible categories lead to?')]
No Results: NEI_External%20Review%20Report.pdf
   [(['Project Evaluation Selection'], 'who manages evaluating and selecting projects?')]
No Results: NEI_External%20Review%20Report.pdf
   [(['management proceeds'], 'who is responsible for management of proceeds?'), (['unallocated proceeds'

HBox(children=(FloatProgress(value=0.0, max=6.0), HTML(value='')))


Reykjavik%20Energy_External%20Review%20Report.pdf


HBox(children=(FloatProgress(value=0.0, max=6.0), HTML(value='')))


SFE_External%20Review%20Report.pdf


HBox(children=(FloatProgress(value=0.0, max=6.0), HTML(value='')))


Stockholm%20Exergi_External%20Review%20Report.pdf


HBox(children=(FloatProgress(value=0.0, max=6.0), HTML(value='')))

No Results: Stockholm%20Exergi_External%20Review%20Report.pdf
   [(['four core components of the GBP', 'Alignment with Green Bond Principles'], 'sustainalytics is of the opinion that the Bonds are what?'), (['Green finance framework'], 'the green finance framework is what?')]
No Results: Stockholm%20Exergi_External%20Review%20Report.pdf
   [(['UN Sustainable Development Goals', 'SDG'], 'Which sustainable development goals are advanced?')]
No Results: Stockholm%20Exergi_External%20Review%20Report.pdf
   [([' eligible category for the use of proceeds'], 'What are the eligible categories for the use of proceeds?'), (['eligible category for the use of proceeds', 'UN Sustainable Development Goals', 'SDG'], 'What do the eligible categories lead to?')]
No Results: Stockholm%20Exergi_External%20Review%20Report.pdf
   [(['Project Evaluation Selection'], 'who manages evaluating and selecting projects?')]
No Results: Stockholm%20Exergi_External%20Review%20Report.pdf
   [(['management proceeds'], 

HBox(children=(FloatProgress(value=0.0, max=6.0), HTML(value='')))





In [None]:
df_test.loc[
    (df_test["Area of Interest"] == "Use of Proceeds") &
    (df_test["Priority Flag"].isin([0,1])) &
    (df_test["Question"] == "What are the eligible categories for the use of proceeds?")
]["Answer"].to_list()

['green \nbond proceeds',
 'Green Shading',
 'hydropower, district heating, solar PV, and hydrogen',
 'money market instruments',
 'proje ct',
 'fossil power sources',
 'proje cts',
 'ren ewable energy  projects',
 'framew ork',
 'Fair, Good or Excellent',
 'Large hydro and road construction',
 'green bond  investments',
 'Fair, Good or Excellent',
 '9']

In [None]:
from google.colab import files
df_test.to_csv("experimental_results_cicero.csv")
files.download("experimental_results_cicero.csv")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>