# Automatic Sustainability Objective Detection Demo

Given any sustainability report, we automatically detect objectives. The sustainability report could
- be in any format (PDF, HTML, etc.).
- have any length (a few to hundreds of pages).
- be from any domain (pharmaceutical, electronics, etc.).

For example, in this demo, we use a sustainability report located [here.](https://sustainability.aboutamazon.com/pdfBuilderDownload?name=sustainability-thinking-big-december-2019)

## === Setup ===

### Importing Libraries

In [None]:
import os
import sys
import pathlib 
import urllib3
import datetime
import minio
import pandas
import IPython.display
import transformers

sys.path.append("../source")
import document
import data_preprocessing
import transformer_model

pandas.set_option("display.max_rows", None)
pandas.set_option("display.max_columns", None)
pandas.set_option("display.max_colwidth", None)

### Setting up the Data Preprocessor

In [None]:
sustainability_keywords = [
    "green", "environment", "carbon", "footprint", "co2",  "emission", "pollution", "recycle", "waste", "plant", "energy", "renewable", "water", "electricity",
    "diversity", "employee", "women", "female", "human", "inclusion", "health", "safety", "security",
    # "goal", "sustainable", "zero", "right"
    ]
data_preprocessor = data_preprocessing.DataPreprocessing()

### Loading Our Sustainability Objective Detection Model

In [None]:
model = transformer_model.TransformerModel(name="climatebert/environmental-claims", load_from="../models/climatebert/environmental-claims")
pipe = model.load_pipeline()

## === Processing New Sustainability Reports ===

### Loading the Sustainability Report

In [None]:
url = "https://sustainability.aboutamazon.com/pdfBuilderDownload?name=sustainability-thinking-big-december-2019"
IPython.display.IFrame(url, width=1000, height=800)

### Running the Model on the New Sustainability Report

In [None]:
doc = document.Document(url)
doc.content_type = "pdf"
content = doc.request_url()
parsed_content = doc.parse_content(content)
text_blocks = doc.segment_text(parsed_content)
tdf = pandas.DataFrame({"URL": url, "Text Blocks": text_blocks, "Original Text Blocks": text_blocks})

tdf = data_preprocessor.clean_text_blocks(tdf, "Text Blocks", level="minimal")
tdf = data_preprocessor.filter_text_blocks(tdf, "Text Blocks", keep_only_size=(0, 300), keep_only_keywords=sustainability_keywords)

predictions = pipe(tdf["Text Blocks"].tolist())
tdf["Goal Score"] = [p[1]["score"] for p in predictions]
tdf = tdf.drop(["Text Blocks"], axis=1)
tdf = tdf.sort_values("Goal Score", ascending=False)
tdf.head(20)