# KPI Inference
This notebook takes in the relevant paragraphs to KPIs found in the relevance infer stage, the fine tuned KPI EXTRACTION model from the training stage, and performs inference to return specific answers to the KPIs.

In [1]:
from config_qa_farm_train import QAFileConfig, QAInferConfig
import pprint
import pathlib
import os
from src.data.s3_communication import S3Communication, S3FileType
from src.models.text_kpi_infer import TextKPIInfer
from dotenv import load_dotenv
import zipfile
import config

07/10/2022 10:32:46 - INFO - farm.modeling.prediction_head -   Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex .


In [2]:
# Load credentials
dotenv_dir = os.environ.get(
    "CREDENTIAL_DOTENV_DIR", os.environ.get("PWD", "/opt/app-root/src")
)
dotenv_path = pathlib.Path(dotenv_dir) / "credentials.env"
if os.path.exists(dotenv_path):
    load_dotenv(dotenv_path=dotenv_path, override=True)

In [3]:
# init s3 connector
s3c = S3Communication(
    s3_endpoint_url=os.getenv("S3_ENDPOINT"),
    aws_access_key_id=os.getenv("S3_LANDING_ACCESS_KEY"),
    aws_secret_access_key=os.getenv("S3_LANDING_SECRET_KEY"),
    s3_bucket=os.getenv("S3_BUCKET"),
)

In [4]:
#Settings data files and checkpoints parameters
file_config = QAFileConfig("infer_demo")
infer_config = QAInferConfig("infer_demo")

In [5]:
# When running in Automation using Elyra and Kubeflow Pipelines,
# set AUTOMATION = 1 as an environment variable
if os.getenv("AUTOMATION"):

    # inference results dir
    if not os.path.exists(infer_config.relevance_dir['Text']):
        pathlib.Path(infer_config.relevance_dir['Text']).mkdir(parents=True, exist_ok=True)

    # kpi inference results dir
    if not os.path.exists(infer_config.result_dir['Text']):
        pathlib.Path(infer_config.result_dir['Text']).mkdir(parents=True, exist_ok=True)

    # load dir
    if not os.path.exists(infer_config.load_dir['Text']):
        pathlib.Path(infer_config.load_dir['Text']).mkdir(parents=True, exist_ok=True)

    # download relevance predictions from s3
    s3c.download_files_in_prefix_to_dir(
        config.BASE_INFER_RELEVANCE_S3_PREFIX,
        infer_config.relevance_dir['Text'],
    )

In [6]:
config.CHECKPOINT_S3_PREFIX

'test_cdp/saved_models'

In [8]:
str(model_rel_zip)

'/opt/app-root/src/aicoe-osc-demo/models/KPI_EXTRACTION.zip'

In [12]:
model_root = pathlib.Path(file_config.saved_models_dir).parent
model_rel_zip = pathlib.Path(model_root, 'KPI_EXTRACTION.zip')
s3c.download_file_from_s3(model_rel_zip, config.CHECKPOINT_S3_PREFIX, "KPI_EXTRACTION.zip")
with zipfile.ZipFile(model_rel_zip, 'r') as z:
    z.extractall(model_root)

## Inference

We can use the saved model and test it on some real examples.<br><br>
First let's load the model:

In [13]:
file_config.saved_models_dir

'/opt/app-root/src/aicoe-osc-demo/models/KPI_EXTRACTION'

In [14]:
tki = TextKPIInfer(infer_config)



Now, let's make prediction on a pair of paragraph and question.

In [15]:
context = """the paris agreement on climate change drafted in 2015 aims to reduce worldwide emissions of greenhouse
gases to a level intended to limit a rise in global temperatures to below 2 degrees or, better still,
to below 1.5 degrees. verbund’s target of reducing greenhouse gas emissions by 90% measured beginning from
the basis year 2011 5 million tonnes co2e until 2021 includes scope 1, scope 2 market- based and parts of scope 3 emissions
for energy and air travel. the science based targets initiative validated this goal as science-based in october 2016,
i.e. it meets global standards. according to current planning, the target can be achieved.
however, if the grid operator requires higher generation volumes
"""
question = "What is the target year for climate commitment?"


In [16]:
QA_input = [
    {
        "qas": [question],
        "context":  context
    }
]

result = tki.infer_on_dict(QA_input)[0]
pprint.pprint(result)

Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 23.68 Batches/s]


{'predictions': [{'answers': [{'answer': '2021',
                               'context': 'the basis year 2011 5 million '
                                          'tonnes co2e until 2021 includes '
                                          'scope 1, scope 2 market- based and '
                                          'par',
                               'document_id': '0-0',
                               'offset_answer_end': 364,
                               'offset_answer_start': 360,
                               'offset_context_end': 412,
                               'offset_context_start': 312,
                               'probability': None,
                               'score': -3.186553478240967},
                              {'answer': 'no_answer',
                               'context': '',
                               'document_id': '0-0',
                               'offset_answer_end': 0,
                               'offset_answer_start': 0,
     

What does the prediction result show? 

In [17]:
# This is the best answer. Generally it can be span-based or it can be no-answer, which ever is higher
# Here the top answer is the span '2021'
result['predictions'][0]['answers'][0]

{'score': -3.186553478240967,
 'probability': None,
 'answer': '2021',
 'offset_answer_start': 360,
 'offset_answer_end': 364,
 'context': 'the basis year 2011 5 million tonnes co2e until 2021 includes scope 1, scope 2 market- based and par',
 'offset_context_start': 312,
 'offset_context_end': 412,
 'document_id': '0-0'}

In [18]:
# Non-answerable score: The model is pretty confident that the answer to the question can be in the context.
result['predictions'][0]['answers'][1]

{'score': -6.710481643676758,
 'probability': None,
 'answer': 'no_answer',
 'offset_answer_start': 0,
 'offset_answer_end': 0,
 'context': '',
 'offset_context_start': 0,
 'offset_context_end': 0,
 'document_id': '0-0'}

Now, let's use the model to infer kpi answers from the relevance results 

In [19]:
infer_config.relevance_dir

{'Text': '/opt/app-root/src/aicoe-osc-demo/data/infer_relevance'}

In [20]:
kpi_df = s3c.download_df_from_s3(
    "aicoe-osc-demo/kpi_mapping",
    config.KPI_MAPPING_CSV,
    filetype=S3FileType.CSV,
    header=0,
)
kpi_df.head()

Unnamed: 0,kpi_id,question,sectors,add_year,kpi_category,Unnamed: 5,Unnamed: 6
0,0.0,What is the company name?,"OG, CM, CU",False,TEXT,,
1,1.0,In which year was the annual report or the sus...,"OG, CM, CU",False,TEXT,,
2,2.0,What is the total volume of proven and probabl...,OG,True,"TEXT, TABLE",,
3,2.1,What is the volume of estimated proven hydroca...,OG,True,"TEXT, TABLE",,
4,2.2,What is the volume of estimated probable hydro...,OG,True,"TEXT, TABLE",,


In [21]:
tki.infer_on_relevance_results(infer_config.relevance_dir['Text'], kpi_df)

07/10/2022 11:04:26 - INFO - src.models.text_kpi_infer -   #################### Starting KPI Inference for the following relevance CSV files found in /opt/app-root/src/aicoe-osc-demo/data/infer_kpi:
['sustainability-report-2019_predictions_relevant.csv', '2020-cdp-climate-response_predictions_relevant.csv', 'PGE_Corporation_CDP_Climate_Change_Questionnaire_2021_predictions_relevant.csv', 'vodafone-group-cdp-climate-change-questionnaire2021_predictions_relevant.csv', 'Unilever CDP Climate Response_predictions_relevant.csv', 'gap_inc-_cdp_climate_change_questionnaire_2021_predictions_relevant.csv', 'Corning_Incorporated_CDP_Climate_Change_Questionnaire_2021_FINAL_predictions_relevant.csv', 'Michelin-CDP-Climate-Change-2021_def_predictions_relevant.csv', 'bp-cdp-climate-change-questionnaire-2021_predictions_relevant.csv', 'Adobe_CDP_Climate_Change_Questionnaire_2021_predictions_relevant.csv', '2020-cdp-climate-response-checkpoint_predictions_relevant.csv', 'Apple_CDP-Climate-Change-Questi

Unnamed: 0,pdf_name,kpi,kpi_id,answer,page,paragraph,source,score,no_ans_score,no_answer_score_plus_boost,index
0,sustainability-report-2019,Absolute target,,US,25.0,US This year Equinor’s Empire Wind project suc...,Text,-1.477865,6.261431,-8.738569,
1,sustainability-report-2019,Absolute target,,"’s Climate Roadmap sets out new short-, mid- a...",10.0,"Equinor’s Climate Roadmap sets out new short-,...",Text,-5.128082,5.475263,-9.524737,
2,sustainability-report-2019,Absolute target,,Process safety,18.0,Process safety We continued to see a reduction...,Text,-5.214777,6.629411,-8.370589,
3,sustainability-report-2019,Absolute target,,’s broader leadership is in the same way asses...,6.0,At Equinor climate and sustainability is embed...,Text,-5.421431,6.867361,-8.132639,
4,sustainability-report-2019,Absolute target changes,,targeted tailored,24.0,"In 2019, we increased the number of targeted t...",Text,-3.673779,-0.187777,-15.187777,
...,...,...,...,...,...,...,...,...,...,...,...
131,Bayer AG Climate Change 2021,What percentage of your total operational spen...,,6,19.0,Figure or percentage in reporting year 6,Text,-0.602398,6.323046,-8.676954,
132,Bayer AG Climate Change 2021,What were your organization’s gross global Sco...,,2010000,28.0,Gross global Scope 1 emissions (metric tons CO...,Text,7.808585,10.135893,-4.864107,
133,Bayer AG Climate Change 2021,What were your organization’s gross global Sco...,,3580000,34.0,Metric numerator (Gross global combined Scope ...,Text,2.033325,10.106841,-4.893159,
134,Bayer AG Climate Change 2021,What were your organization’s gross global Sco...,,3580000,34.0,Metric numerator (Gross global combined Scope ...,Text,2.033325,10.106841,-4.893159,


In [22]:
if os.getenv("AUTOMATION"):
    # upload the predicted files to s3
    s3c.upload_files_in_dir_to_prefix(
        infer_config.result_dir['Text'],
        config.BASE_INFER_KPI_S3_PREFIX
    )

# Conclusion
This notebook ran the _KPI_ inference on a sample dataset and stored the output in a csv format.

In [9]:
config.BASE_INFER_KPI_S3_PREFIX

'test_cdp/pipeline_run/small/infer_KPI'

In [10]:
config.CHECKPOINT_S3_PREFIX

'test_cdp/saved_models'

In [11]:
s3c.upload_file_to_s3('/opt/app-root/src/aicoe-osc-demo/models/KPI_EXTRACTION.zip', config.CHECKPOINT_S3_PREFIX, "KPI_EXTRACTION.zip")

{'ResponseMetadata': {'RequestId': 'Q7Z5DBH1BBXSWM3V',
  'HostId': 'E8jq+aBgRY9xhgohZfMp+hXVQnPipYKuAu8i1esmTdWMn5IbN+Hz8H7e6YP0J057ne52nOmWguI=',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amz-id-2': 'E8jq+aBgRY9xhgohZfMp+hXVQnPipYKuAu8i1esmTdWMn5IbN+Hz8H7e6YP0J057ne52nOmWguI=',
   'x-amz-request-id': 'Q7Z5DBH1BBXSWM3V',
   'date': 'Sun, 10 Jul 2022 11:01:48 GMT',
   'etag': '"9ff39b913cc404563ee70807ced14a15"',
   'server': 'AmazonS3',
   'content-length': '0'},
  'RetryAttempts': 0},
 'ETag': '"9ff39b913cc404563ee70807ced14a15"'}