# Automatic Sustainability Objective Detection Demo

Given any sustainability report, we automatically detect objectives. The sustainability report could
- be in any format (PDF, HTML, etc.).
- have any length (a few to hundreds of pages).
- be from any domain (pharmaceutical, electronics, etc.).

For example, a sustainability report can be found [here.](https://sustainability.aboutamazon.com/pdfBuilderDownload?name=sustainability-thinking-big-december-2019)

## === Setup ===

### Importing Libraries

In [1]:
import sys
import pandas
import IPython.display

sys.path.append("../source")
import document
import data_preprocessing
import transformer_model

pandas.set_option("display.max_rows", None)
pandas.set_option("display.max_columns", None)
pandas.set_option("display.max_colwidth", None)

[nltk_data] Downloading package omw-1.4 to /home/jovyan/nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!


### Setting up the Data Preprocessor

In [2]:
data_preprocessor = data_preprocessing.DataPreprocessing()

### Loading Our Trained Models

In [3]:
target_values = ["Not Goal", "Goal"]
goal_detection_model = transformer_model.TextClassification(target_values, name="climatebert/environmental-claims", 
                                                            load_from="../models/goal-detection/climatebert/environmental-claims")
target_values = ["Specific & Dated", "Specific & Undated"]
detail_detection_model = transformer_model.TextClassification(target_values, name="climatebert/environmental-claims", 
                                                              load_from="../models/detail-detection/climatebert/environmental-claims")
target_attributes = ["Due", "Baseline", "Change Number", "Change Unit"]
detail_extraction_model = transformer_model.TokenClassification(target_attributes, name="roberta-base", load_from="../models/detail-extraction/roberta-base")

### Objective Extraction Helper Function

In [4]:
def extract_objectives_from_url(url, content_type="pdf"):
    
    # Extracting Text Blocks of the Sustainability Report
    doc = document.Document(url)
    doc.content_type = content_type
    content = doc.request_url()
    parsed_content = doc.parse_content(content)
    text_blocks = doc.segment_text(parsed_content)
    tdf = pandas.DataFrame({"URL": url, "Text Blocks": text_blocks})
    
    # Running the Goal Detection Model
    tdf["text"] = tdf["Text Blocks"].copy()
    tdf = data_preprocessor.clean_text_blocks(tdf, "text", level="essential")
    tdf = data_preprocessor.filter_text_blocks(tdf, "text", keep_only_size=(0, 300))
    predictions = goal_detection_model.predict(tdf["text"].tolist())
    tdf["Goal Score"] = predictions["Goal"].values
    tdf = tdf.drop(["text"], axis=1)
    tdf = tdf.sort_values("Goal Score", ascending=False)

    # Running the Detail Detection Model
    tdf["text"] = tdf["Text Blocks"].copy()
    tdf = data_preprocessor.clean_text_blocks(tdf, "text", level="essential")
    tdf = data_preprocessor.filter_text_blocks(tdf, "text", keep_only_size=(0, 300))
    predictions = detail_detection_model.predict(tdf["text"].tolist())
    tdf["Status"] = predictions["Class"].values
    tdf = tdf.drop(["text"], axis=1)
    
    # Running the Detail Extraction Model
    tdf["text"] = tdf["Text Blocks"].copy()
    tdf = data_preprocessor.clean_text_blocks(tdf, "text", level="essential")
    tdf = data_preprocessor.filter_text_blocks(tdf, "text", keep_only_size=(0, 300))
    predictions = detail_extraction_model.predict(tdf["text"].tolist())
    for target_attribute in target_attributes:
        tdf[target_attribute] = predictions[target_attribute].values
    tdf = tdf.drop(["text"], axis=1)

    return tdf    

## === Processing New Sustainability Reports ===

#### Case 1: Amazon

In [5]:
url = "https://sustainability.aboutamazon.com/pdfBuilderDownload?name=sustainability-thinking-big-december-2019"
IPython.display.display(IPython.display.IFrame(url, width=1000, height=800))
df = extract_objectives_from_url(url)
df.head(20)



Unnamed: 0,URL,Text Blocks,Goal Score,Status,Due,Baseline,Change Number,Change Unit
1053,https://sustainability.aboutamazon.com/pdfBuilderDownload?name=sustainability-thinking-big-december-2019,"Amazon investment to \nupskill 300,000 of our \nown employees by 2025 \nas part of our Upskilling \n2025 pledge",0.992178,Specific & Dated,2025.0,,,
307,https://sustainability.aboutamazon.com/pdfBuilderDownload?name=sustainability-thinking-big-december-2019,Constructing data centers \nusing steel made with \nrenewable energy and up \nto 100% recycled content,0.991469,Specific & Undated,,,,
1395,https://sustainability.aboutamazon.com/pdfBuilderDownload?name=sustainability-thinking-big-december-2019,Top Five Sourcing \nCountries in 2021,0.991396,Specific & Dated,2021.0,,,sourcing countries
841,https://sustainability.aboutamazon.com/pdfBuilderDownload?name=sustainability-thinking-big-december-2019,"Upskill 300,000 \nAmazon employees \nby 2025",0.99099,Specific & Dated,2025.0,,,
180,https://sustainability.aboutamazon.com/pdfBuilderDownload?name=sustainability-thinking-big-december-2019,Inspire and empower \nothers to join us on a \nmission to reach \nnet-zero carbon by 2040,0.990716,Specific & Dated,2040.0,,,
245,https://sustainability.aboutamazon.com/pdfBuilderDownload?name=sustainability-thinking-big-december-2019,Reach net-zero carbon \nemissions across our \noperations by 2040,0.990702,Specific & Dated,2040.0,,,carbon emissions
305,https://sustainability.aboutamazon.com/pdfBuilderDownload?name=sustainability-thinking-big-december-2019,Reach net-zero carbon \nemissions across our \noperations by 2040,0.990702,Specific & Dated,2040.0,,,carbon emissions
178,https://sustainability.aboutamazon.com/pdfBuilderDownload?name=sustainability-thinking-big-december-2019,Make 50% of \nAmazon shipments \nnet-zero carbon \nby 2030,0.990393,Specific & Dated,2030.0,,,
165,https://sustainability.aboutamazon.com/pdfBuilderDownload?name=sustainability-thinking-big-december-2019,Increase in the number of \nBlack directors and vice \npresidents in 2021,0.989542,Specific & Dated,2021.0,,,
154,https://sustainability.aboutamazon.com/pdfBuilderDownload?name=sustainability-thinking-big-december-2019,On a path to powering \nour operations with \n100% renewable \nenergy by 2025,0.9888,Specific & Dated,2025.0,,,renewable energy


#### Case 2: Shell

In [6]:
url = "https://reports.shell.com/sustainability-report/2016/servicepages/downloads/files/entire_shell_sr16.pdf"
IPython.display.display(IPython.display.IFrame(url, width=1000, height=800))
df = extract_objectives_from_url(url)
df.head(20)



Unnamed: 0,URL,Text Blocks,Goal Score,Status,Due,Baseline,Change Number,Change Unit
952,https://reports.shell.com/sustainability-report/2016/servicepages/downloads/files/entire_shell_sr16.pdf,Increase in water recycling in oil\nsands mining from 2015,0.99339,Specific & Dated,,2015.0,,water recycling
296,https://reports.shell.com/sustainability-report/2016/servicepages/downloads/files/entire_shell_sr16.pdf,Achieve operational spills below a volume of 0.7\n(‘000 tonnes) (classified as “hydrocarbons\nreaching soil or water”).,0.986593,Specific & Dated,,,,
290,https://reports.shell.com/sustainability-report/2016/servicepages/downloads/files/entire_shell_sr16.pdf,Achieve total recordable case frequency (TRCF) –\nthe number of injuries per million working hours –\nbelow 0.96 for employees and contractors.,0.985948,Specific & Dated,,,,
361,https://reports.shell.com/sustainability-report/2016/servicepages/downloads/files/entire_shell_sr16.pdf,Achieve a refinery energy intensity below 92.2\n(based on the Refineries Energy Index).,0.985399,Specific & Dated,,,,
293,https://reports.shell.com/sustainability-report/2016/servicepages/downloads/files/entire_shell_sr16.pdf,Achieve a number of operational leaks below 54\n(classified as “operational Tier 1 process safety\nevents”).,0.983593,Specific & Dated,,,,
1581,https://reports.shell.com/sustainability-report/2016/servicepages/downloads/files/entire_shell_sr16.pdf,Implement mitigation plan through \nproject development and construction and \nthen in ongoing operations.,0.956351,Specific & Dated,,,,mitigation plan
950,https://reports.shell.com/sustainability-report/2016/servicepages/downloads/files/entire_shell_sr16.pdf,Reduction in operational spills in\nNigeria from 2015,0.508105,Specific & Dated,,2015.0,,operational spills
299,https://reports.shell.com/sustainability-report/2016/servicepages/downloads/files/entire_shell_sr16.pdf,Reduce flaring in our upstream business (million\ntonnes CO2 equivalent).\nOur policy is to reduce any continuous flaring or\nventing to as low a level as reasonably practical.,0.201811,Specific & Dated,,,,flaring
1504,https://reports.shell.com/sustainability-report/2016/servicepages/downloads/files/entire_shell_sr16.pdf,The alliance’s goal is for 100 million households to gain\naccess to clean and efficient cookstoves and fuels by 2020.,0.126916,Specific & Dated,2020.0,,100.0,households
1613,https://reports.shell.com/sustainability-report/2016/servicepages/downloads/files/entire_shell_sr16.pdf,All Shell employees and contract staff must follow our Code of Conduct.,0.099415,Specific & Undated,,,,


#### Case 3: Google

In [7]:
url = "https://services.google.com/fh/files/misc/google_2019-environmental-report.pdf"
IPython.display.display(IPython.display.IFrame(url, width=1000, height=800))
df = extract_objectives_from_url(url)
df.head(20)



Unnamed: 0,URL,Text Blocks,Goal Score,Status,Due,Baseline,Change Number,Change Unit
1027,https://services.google.com/fh/files/misc/google_2019-environmental-report.pdf,"go beyond our own operational footprint, enabling renewable energy",0.994788,Specific & Undated,,,,renewable energy
75,https://services.google.com/fh/files/misc/google_2019-environmental-report.pdf,continue to look for collaborative partnerships and innovative opportunities,0.991912,Specific & Undated,,,,
1442,https://services.google.com/fh/files/misc/google_2019-environmental-report.pdf,provide electric vehicle charging stations for 10% of total parking spaces at,0.991542,Specific & Dated,,,,
580,https://services.google.com/fh/files/misc/google_2019-environmental-report.pdf,"Pursue third-party green or healthy-building \ncertifications for office projects, such as LEED, \nWELL Building Standard, and Living Building \nChallenge.",0.990175,Specific & Undated,,,,certifications
603,https://services.google.com/fh/files/misc/google_2019-environmental-report.pdf,100% of device orders shipping to and from Google \ncustomers will be carbon neutral by 2020.,0.989968,Specific & Dated,2020.0,,,
1544,https://services.google.com/fh/files/misc/google_2019-environmental-report.pdf,"100% of Made by Google products will include recycled materials, with a drive",0.98822,Specific & Dated,,,,
552,https://services.google.com/fh/files/misc/google_2019-environmental-report.pdf,Maintain ISO 50001 energy management system \ncertification for all Google-owned data centers that \nmeet certain operational milestones.,0.987721,Specific & Undated,,,,certification
590,https://services.google.com/fh/files/misc/google_2019-environmental-report.pdf,Provide electric vehicle charging stations for \n10% of total parking spaces at our Bay Area \nheadquarters.,0.987116,Specific & Dated,,,,
945,https://services.google.com/fh/files/misc/google_2019-environmental-report.pdf,under construction in Tennessee and Alabama will be matched with 100%,0.986263,Specific & Dated,,,,
1432,https://services.google.com/fh/files/misc/google_2019-environmental-report.pdf,reduce single-use beverages per seated headcount by 20% relative,0.986101,Specific & Dated,,,,


#### Case 4: General Electric

In [9]:
url = "https://www.ge.com/sites/default/files/ge2022_sustainability_report.pdf"
IPython.display.display(IPython.display.IFrame(url, width=1000, height=800))
df = extract_objectives_from_url(url)
df.head(20)



Unnamed: 0,URL,Text Blocks,Goal Score,Status,Due,Baseline,Change Number,Change Unit
1143,https://www.ge.com/sites/default/files/ge2022_sustainability_report.pdf,Support 100% Sustainable \nAviation Fuel approval and \nadoption,0.994338,Specific & Dated,,,,sustainable aviation fuel
890,https://www.ge.com/sites/default/files/ge2022_sustainability_report.pdf,15% \nDECREASE \nin fuel consumption from \nthe twin-aisle CF6-80C2 \nto GEnx engine,0.991525,Specific & Dated,,,,
891,https://www.ge.com/sites/default/files/ge2022_sustainability_report.pdf,10% \nDECREASE \nin fuel consumption \nfrom the large twin-\naisle GE90-115B to \nGE9X engine,0.98852,Specific & Dated,,,,
2201,https://www.ge.com/sites/default/files/ge2022_sustainability_report.pdf,Do check that load does not exceed \nequipment load capacity,0.98711,Specific & Undated,,,,
786,https://www.ge.com/sites/default/files/ge2022_sustainability_report.pdf,Reduce \nEmissions by... \n45%,0.984752,Specific & Dated,,,,
838,https://www.ge.com/sites/default/files/ge2022_sustainability_report.pdf,22% reduction in carbon intensity,0.969347,Specific & Dated,,,,
2209,https://www.ge.com/sites/default/files/ge2022_sustainability_report.pdf,Don’t use any damaged lifting \nequipment or accessories,0.948679,Specific & Undated,,,,
863,https://www.ge.com/sites/default/files/ge2022_sustainability_report.pdf,"GE Vernova will focus, working \nwith other industry participants, on \nbringing into service breakthrough \ntechnologies by the early 2030s \nto help achieve absolute emission \nreductions for the power sector’s \npath to net zero.",0.94277,Specific & Dated,,,,breakthrough technologies
2405,https://www.ge.com/sites/default/files/ge2022_sustainability_report.pdf,"To support the energy efficiency program, in 2023, the plan is to \ndeploy energy management best practices and key performance \nindicators (KPIs) for energy usage across all supply chain and engine \nmaintenance sites globally, including the 18 top emitting sites.",0.940781,Specific & Dated,,,,practices
1141,https://www.ge.com/sites/default/files/ge2022_sustainability_report.pdf,All GE and joint venture engines \ncan operate on approved \nSustainable Aviation Fuel,0.922541,Specific & Undated,,,,
