<header style="padding:1px;background:#f9f9f9;border-top:3px solid #00b2b1"><img id="Teradata-logo" src="https://www.teradata.com/Teradata/Images/Rebrand/Teradata_logo-two_color.png" alt="Teradata" width="220" align="right" />

<b style = 'font-size:28px;font-family:Arial;color:#E37C4D'>Carbon Footprint Analytics</b>
<br>
</header>
<p style = 'font-size:20px;font-family:Arial;color:#E37C4D'><b>Introduction</b></p>

<p style = 'font-size:16px;font-family:Arial'>
 We all know that Earth's environment is changing, and it has harmful effects on all of us. 
The rising temperature of Earth is attributed to the Greenhouse effect.
The Greenhouse Effect is the way in which heat is trapped close to Earth's surface by “greenhouse gases” 
Carbon Dioxide (CO2) being the most significant one though there are other gases which are contributors.
<br><br>As the climate emergency keeps aggravating, regulatory requirements, will ask for an enterprise view also from a greenhouse gas perspective.
Requirements on this type of reporting will increase year by year.
<br><br>
    We have developed a prototype of a <b>Carbon Footprint Analytics</b> which can be used for this type of analysis and reporting. There are already a lot of data sources available in the company, financial and applications and there data around almost every aspect of a company.
The challenge for great analysis and reporting on greenhouse gases will not be so much to create additional data, but the challenge will rather be to transform the existing data into a greenhouse gas emission view of the enterprise.
All the enterprise data already available needs to be converted to greenhouse gas emissions.
To properly manage and govern all these needed data transformations over time is the pivotal point of greenhouse gas reporting, and this is exactly where the Carbon Footprint Analytics aims at.

<img id="DM" src="images/architecture.png" alt="architecture" width="1200" />

<p style = 'font-size:16px;font-family:Arial'>The basic principle how to use the Carbon Footprint Analytics works is quite simple. It consists of three steps.<br>
One, provide your company specific data according to the input interface of the engine.
This data will represent all the activities and consumption of the enterprise that entail greenhouse gas emissions.
<br>
Step two, run the program engine. Let the engine process your input and convert it into the greenhouse gas emissions of your enterprise.<br>
Finally, the engine will expose the greenhouse gas footprint of your company.

<p style = 'font-size:28px;font-family:Arial;color:#E37C4D'><b>1.Carbon Footprint Calculations</b></p>

<img id="DM" src="images/co2_calculation.png" alt="co2_calculation" width="1200" />

<p style = 'font-size:16px;font-family:Arial'>In a glance, the calculation for carbon footprint looks really easy; take the activities (the consumption) the company has multiply it with emission factors and sum it up.
Like when you consume 10 megawatts of electricity with an activity and each MW hour is worth 0.3 kilogram of CO2 equivalents, then the greenhouse gas emissions of these activities are three kilogram CO2 equivalent. <br>
In reality, the factors change over space and are regionally different.<br>
The carbon footprint of a MW hour depends on how the electricity is produced. Is it based on a brown coal power plant or generated by photovoltaic system? One kWh produced in Alaska is different from one in California. In addition to that factors may change over different times of the day. The electricity mix is different at noon compared to midnight for optimizing the carbon footprint and working towards net zero, these may be valuable differences to be tracked.
<br> For Enterprise reporting there are potentially a lot of different data sources to to be combined which will differ in quality, level of detail and different units of measurements. Operating this enterprise reporting will get out of hand rather quickly and can cause a lot of nonproductive efforts. The carbon footprint analytics is designed to deal with this complexity. The model is flexible to handle all sort of accreditation levels, units of measurements. It is leveraging ClearScape analytics to deal with time travel, handle geospatial challenges and improve data quality via text analysis functions.<br><br>Now that we got why carbon footprint analytics is needed and what calculations it does let take a look at some of the it's tables and the type of output it generates.

<p style = 'font-size:28px;font-family:Arial;color:#E37C4D'><b>2. Import python packages, connect to Vantage and explore the dataset</b></p>

In [None]:
import getpass
import warnings

import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go

from teradataml.dataframe.dataframe import DataFrame
from teradataml.analytics.sqle import NGramSplitter
from teradataml.dataframe.dataframe import in_schema
from teradataml.context.context import create_context, remove_context, get_context
from teradataml.dataframe.copy_to import copy_to_sql
from teradataml.options.display import display

from teradatasqlalchemy.types import *

import matplotlib.pyplot as plt

%matplotlib inline

warnings.filterwarnings('ignore')


<p style = 'font-size:16px;font-family:Arial'>You will be prompted to provide the password. Enter your password, press the Enter key, then use down arrow to go to next cell.<br>Below command will make a connection to the Vantage environment.
</p>

In [None]:
%run -i ../startup.ipynb
eng = create_context(host = 'host.docker.internal', username='demo_user', password = password)
print(eng)
eng.execute('''SET query_band='DEMO=CarbonFootprintAnalytics.ipynb;' UPDATE FOR SESSION; ''')


<p style = 'font-size:16px;font-family:Arial'>Begin running steps with Shift + Enter keys.</p>

<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'> <b>Getting Data for This Demo</b></p>
<p style = 'font-size:16px;font-family:Arial'>We have provided data for this demo on cloud storage. In this demo since we are using Temporal table we will be creating databases and tables in local storage and use them in the notebook. Please execute the procedure in the next cell.</p>

In [None]:
%run -i ../run_procedure.py "call get_data('DEMO_ESG_cloud');"
 # takes about 25 seconds, estimated space: 0 MB
#%run -i ../run_procedure.py "call get_data('DEMO_ESG_local');" 
# takes about 1minute 20 seconds, estimated space: 70 MB

<p style = 'font-size:16px;font-family:Arial'>Next is an optional step – if you want to see status of databases/tables created and space used.</p>

In [None]:
%run -i ../run_procedure.py "call space_report();"

<p style = 'font-size:28px;font-family:Arial;color:#E37C4D'> <b> 3. DataModel </b> </p>
<p style = 'font-size:16px;font-family:Arial'>Before moving forward let us look at the DataModel of Carbon Footprint Analytics.

<p style = 'font-size:16px;font-family:Arial'>The Data Model of Carbon Footprint Analytics</p>
<img id="DM" src="images/DM.png" alt="Data Model" width="1200" />

<p style = 'font-size:16px;font-family:Arial'>The standard reference tables can be pre-built while the other information related to the Enterprise is gathered by the Enterprise's available data. The engine processes the input data and generates the tables needed for it's calculations.</p>

In [None]:
#standard reference table pre-built in Carbon Footprint Analytics engine
df1 = DataFrame(in_schema("DEMO_ESG","GHG_Ref"))
df1

<p style = 'font-size:16px;font-family:Arial'>
Carbon Footprint Analytics calculates the emissions of three types(called scopes) for each of the event or activity.

In [None]:
df2 = DataFrame(in_schema("DEMO_ESG","Scope_Ref"))
df2

<p style = 'font-size:28px;font-family:Arial;color:#E37C4D'> <b> 4. Geospatial Enhancer </b> </p>

<p style = 'font-size:16px;font-family:Arial'>Another important feature in Carbon Footprint Analytics is the Geospatial Enhancer. As we have seen before the carbon footprint calculations differ across the regions; we needed geospatial data to assign the right emission factor for their emissions. We hve collected millions of geolocated points from <b>geonames.org</b>, a website that gathers geospatial data from multiple sources.
We also integrated electricity grid shape files from the <b>US Environmental Protection Agency</b>, which are necessary to correct the estimate carbon footprint of electricity consumption in the US.<br>Let us take a look at this table.

In [None]:
df3 = DataFrame.from_query("select * from DEMO_ESG.Geo_T_Ref sample 10;")
df3

<p style = 'font-size:16px;font-family:Arial'>As we can see from the sample data above,The Geospatial enhancer does not only provide cordinates of the locations it also has some additional information about what type of site it is. This information is for both artificial and natural sites in the world.
Here you can see for example all the forests in the US.

In [None]:
#Forests all US
geo = pd.read_sql('''select geo_name,geo_latitude,geo_longitude, country_code_iso,geo_feature_code_name from DEMO_ESG.Geo_T_Ref
where country_code_iso = 'US' and geo_feature_code_name like '%forest%';''',eng)
fig1 = px.scatter_mapbox(geo, lat="geo_latitude", lon="geo_longitude", hover_name="geo_name", 
                         color="country_code_iso", zoom=3, height=100)
fig1.update_layout(mapbox_style="open-street-map")
fig1.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig1.show()

<p style = 'font-size:16px;font-family:Arial'>We use the GeoSpatial enhancer to get the location of the event/activity for example the list of all the airports is very useful. With geo co-ordinates we can calculate the spherical distance between two airports.

<p style = 'font-size:28px;font-family:Arial;color:#E37C4D'> <b> 5. Effect View </b> </p>

<p style = 'font-size:16px;font-family:Arial'>We embraced all the complexity of carbon footprint calculations in the engine, but the purpose is to provide the end user the simplest experience possible. Hence, we created a single object the <b>Effect View</b>. Once the input information is processed by Carbon Footprint Analytics it generates the Effect View which contains the carbon footprints details of all events in the company that generate greenhouse gas emissions. All the different types of reports and dashboards can be created by this view.

In [None]:
df4 = DataFrame(in_schema("DEMO_ESG","Effect"))
df4

<p style = 'font-size:16px;font-family:Arial'>This table contains the main characteristic of the greenhouse gas emitting event and the calculation details of its carbon footprint in kilogram of CO2 equivalents. It also contains all the details of the carbon footprint calculation of every single event, the Geo location, the protocol used to determine the right emission factor, the type of greenhouse gas emitted, the original and converted units.<br> Let us build some reports from this table.

In [None]:
qry = ''' 
SELECT scope_number ||'_'||scope_category_description as scope_category
,count(*) as cnt
,sum(co2e_emission) as co2_emission 
from DEMO_ESG.Effect
group by 1;
'''

res = pd.read_sql(qry, eng)

res.head()
ax = res.plot.bar(x='scope_category', y='co2_emission')
plt.title("Carbon Footprint per activity")
plt.xlabel("scope_category")
plt.ylabel("co2_emission")


<p style = 'font-size:16px;font-family:Arial'> This chart displays the CO2 emissions for different categories.
With the sample data that we currently have, two electricity consumption is by far the most emitting activity in the company.

<p style = 'font-size:16px;font-family:Arial'>Let us now see the total carbon footprint generated over the years for Geo locations.

In [None]:
qry = ''' 
select   extract (year from begin_valid_date ) as yr 
,event_geo_name
,event_geo_latitude
,event_geo_longitude
,event_country_code
,sum(co2e_emission ) as co2 from DEMO_ESG.Effect group by 1,2,3,4,5  order by yr;
'''

df = pd.read_sql(qry, eng)

fig2 = px.scatter_mapbox(df, lat="event_geo_latitude", lon="event_geo_longitude", hover_name="event_geo_name",size="co2",
                         color="co2", size_max=70, zoom=0.75, 
                         animation_frame="yr",color_continuous_scale=px.colors.sequential.Bluered, 
                         height = 600
                  )
fig2.update_layout(mapbox_style="open-street-map")
fig2.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig2.update_layout(title_text = 'Carbon Emissions generated over the years' ,title_y=1)
fig2.show()

<p style = 'font-size:16px;font-family:Arial'>From the map above we can see that San Diego is the biggest point of emission in the map as it this is where our headquarters are located. But in 2020 you can see some emissions from Hawaii which at the first glance looks like a data quality issue but it is actually not. It's due to a global event that took place in Hawaii in 2020.

<p style = 'font-size:16px;font-family:Arial'>How much carbon emissions generated by region over the years?

In [None]:
qry = ''' 
select  extract (year from begin_valid_date ) as yr,
event_reporting_region,
sum(co2e_emission ) as co2 from DEMO_ESG.Effect group by 1,2 order by yr;

'''

df = pd.read_sql(qry, eng)
fig = px.bar(df, x="yr", y="co2", color="event_reporting_region", barmode="group")
fig.show()

<p style = 'font-size:16px;font-family:Arial'> How much emission is generated per activity?

In [None]:
qry = ''' 
select  substr(begin_valid_date,1,4)  as yr,
activity_type_desc as activity
,sum(co2e_emission ) as co2 from DEMO_ESG.Effect  group by 1,2 order by 1,2;
'''

df = pd.read_sql(qry, eng)
fig = px.bar(df, x="co2", y="yr", color="activity")
fig.show()

<p style = 'font-size:16px;font-family:Arial'> Here we can see from the graph that carbon emission significantly decreased from 2020 to 2021 due to pandamic.

<p style = 'font-size:28px;font-family:Arial;color:#E37C4D'> <b>6. CleanUp </b></p>
<p style = 'font-size:16px;font-family:Arial'>The following code will clean up tables and databases created above.</p>

In [None]:
%run -i ../run_procedure.py "call remove_data('DEMO_ESG');" 
#Takes 10 seconds

<p style = 'font-size:28px;font-family:Arial;color:#E37C4D'> <b>7. Conclusion </b> </p>

<p style = 'font-size:16px;font-family:Arial'>In this demonstration we have seen here that there is a need for companies to calculate their carbon footprint which will increase in future due to its environmental impact. We have seen how the carbon footprint engine works and what are some of the multiple usages that can be done with it. It can really help companies better understanding greenhouse gas emissions and take the best decision possible to decrease their carbon footprint. Based on the emission data generated companies can also build AI/ML models which can predict carbon footprint of a company and they can plan activities for reducing carbon footprint.<br><br>
<i>*sample data collected from Jan2018 to Jan2022<br>
*Internal reference tables are calculated by engine itself using company's data hence can vary from company to company<br>    
 

<footer style="padding:10px;background:#f9f9f9;border-bottom:3px solid #394851">©2023 Teradata. All Rights Reserved</footer>