## Energy saved from recycling
<p>Did you know that recycling saves energy by reducing or eliminating the need to make materials from scratch? For example, aluminum can manufacturers can skip the energy-costly process of producing aluminum from ore by cleaning and melting recycled cans. Aluminum is classified as a non-ferrous metal.</p>
<p>Singapore has an ambitious goal of becoming a zero-waste nation. The amount of waste disposed of in Singapore has increased seven-fold over the last 40 years. At this rate, Semakau Landfill, Singapore’s only landfill, will run out of space by 2035. Making matters worse, Singapore has limited land for building new incineration plants or landfills.</p>
<p>The government would like to motivate citizens by sharing the total energy that the combined recycling efforts have saved every year. They have asked you to help them.</p>
<p>You have been provided with three datasets. The data come from different teams, so the names of waste types may differ.</p>
<div style="background-color: #efebe4; color: #05192d; text-align:left; vertical-align: middle; padding: 15px 25px 15px 25px; line-height: 1.6;">
    <div style="font-size:16px"><b>datasets/wastestats.csv - Recycling statistics per waste type for the period 2003 to 2017</b>
    </div>
    <div>Source: <a href="https://www.nea.gov.sg/our-services/waste-management/waste-statistics-and-overall-recycling">Singapore National Environment Agency</a></div>
<ul>
    <li><b>waste_type: </b>The type of waste recycled.</li>
    <li><b>waste_disposed_of_tonne: </b>The amount of waste that could not be recycled (in metric tonnes).</li>
    <li><b>total_waste_recycle_tonne: </b>The amount of waste that could be recycled (in metric tonnes).</li>
    <li><b>total_waste_generated: </b>The total amount of waste collected before recycling (in metric tonnes).</li>
    <li><b>recycling_rate: </b>The amount of waste recycled per tonne of waste generated.</li>
    <li><b>year: </b>The recycling year.</li>
</ul>
    </div>
<div style="background-color: #efebe4; color: #05192d; text-align:left; vertical-align: middle; padding: 15px 25px 15px 25px; line-height: 1.6; margin-top: 17px;">
    <div style="font-size:16px"><b>datasets/2018_2019_waste.csv - Recycling statistics per waste type for the period 2018 to 2019</b>
    </div>
    <div> Source: <a href="https://www.nea.gov.sg/our-services/waste-management/waste-statistics-and-overall-recycling">Singapore National Environment Agency</a></div>
<ul>
    <li><b>Waste Type: </b>The type of waste recycled.</li>
    <li><b>Total Generated: </b>The total amount of waste collected before recycling (in thousands of metric tonnes).</li> 
    <li><b>Total Recycled: </b>The amount of waste that could be recycled. (in thousands of metric tonnes).</li>
    <li><b>Year: </b>The recycling year.</li>
</ul>
    </div>
<div style="background-color: #efebe4; color: #05192d; text-align:left; vertical-align: middle; padding: 15px 25px 15px 25px; line-height: 1.6; margin-top: 17px;">
    <div style="font-size:16px"><b>datasets/energy_saved.csv -  Estimations of the amount of energy saved per waste type in kWh</b>
    </div>
<ul>
    <li><b>material: </b>The type of waste recycled.</li>
    <li><b>energy_saved: </b>An estimate of the energy saved (in kiloWatt hour) by recycling a metric tonne of waste.</li> 
    <li><b>crude_oil_saved: </b>An estimate of the number of barrels of oil saved by recycling a metric tonne of waste.</li>
</ul>

</div>
<pre><code>
</code></pre>

In [2]:
import pandas as pd

# EDA

# cleaning dfs

In [3]:
# reading file
df1 = pd.read_csv('datasets/wastestats.csv')

# dropping cols of no interest
df1.drop(columns=[
    'waste_disposed_of_tonne',
    'recycling_rate'
], inplace=True)

# getting only years 2015-2017
df1 = df1[df1['year'].isin([2015,2016,2017])]

# listing all materials (NEXT VERSION: put this in a way it makes sense)
all_materials = list(df1.waste_type.unique())

# listing materials of interest
my_materials = [
    'Ferrous Metal', 'Ferrous Metals', 'Ferrous metal',
    'Glass',
    'Non-ferrous Metals', 'Non-ferrous metal', 'Non-ferrous metals',
    'Plastic', 'Plastics']

# slicing df to get only materials of interest
df1 = df1[df1['waste_type'].isin(my_materials)]

# changing names for consistency (NEXT VERSION: this should be a more elegant method)
df1.loc[
    (df1['waste_type']=='Ferrous Metal') |
    (df1['waste_type']=='Ferrous Metals') |
    (df1['waste_type']=='Ferrous metal')
    , 'waste_type'
] = 'Ferrous Metal'

df1.loc[
    (df1['waste_type']=='Non-ferrous Metals') |
    (df1['waste_type']=='Non-ferrous metal') |
    (df1['waste_type']=='Non-ferrous metals')
    , 'waste_type'
] = 'Non-Ferrous Metal'

df1.loc[
    (df1['waste_type']=='Plastic') |
    (df1['waste_type']=='Plastics')
    , 'waste_type'
] = 'Plastic'

df1.loc[(df1['waste_type']=='Glass'), 'waste_type'] = 'Glass'

# sorting df
df1.sort_values(['waste_type', 'year'], inplace=True)

# reseting index
df1.reset_index(drop=True, inplace=True)

# printing result
df1

Unnamed: 0,waste_type,total_waste_recycled_tonne,total_waste_generated_tonne,year
0,Ferrous Metal,1333300.0,1348500,2015
1,Ferrous Metal,1351500.0,1357500,2016
2,Ferrous Metal,1371000.0,1378800,2017
3,Glass,14600.0,75200,2015
4,Glass,14700.0,72300,2016
5,Glass,12400.0,71300,2017
6,Non-Ferrous Metal,160400.0,180000,2015
7,Non-Ferrous Metal,95900.0,97200,2016
8,Non-Ferrous Metal,92200.0,93700,2017
9,Plastic,57800.0,824600,2015


## df2

In [4]:
# reading file
df2 = pd.read_csv('datasets/2018_2019_waste.csv')

# renaming cols for consistency with df1
df2.rename(columns={
    "Waste Type":"waste_type",
    "Total Generated ('000 tonnes)":"total_waste_generated_tonne",
    "Total Recycled ('000 tonnes)":"total_waste_recycled_tonne",
    "Year":"year"
}, inplace=True)

# extracting list of all materials to filter out those of interest 
# (NEXT VERSION: could this be made more elegant?)
all_materials = list(df2['waste_type'].unique())

# listing materials of interest
my_materials = [
    'Ferrous Metal',
    'Glass',
    'Non-Ferrous Metal',
    'Plastics']

# slicing df2 to get only materials of interest
df2 = df2[df2['waste_type'].isin(my_materials)]

# changing names for consistency (NEXT VERSION: this should be a more elegant method)
df2.loc[(df2['waste_type']=='Ferrous Metal'), 'waste_type'] = 'Ferrous Metal'
df2.loc[(df2['waste_type']=='Non-Ferrous Metal'), 'waste_type'] = 'Non-Ferrous Metal'
df2.loc[(df2['waste_type']=='Plastics'), 'waste_type'] = 'Plastic'
df2.loc[(df2['waste_type']=='Glass'), 'waste_type'] = 'Glass'

# sorting df
df2.sort_values(['waste_type', 'year'], inplace=True)

# reseting index
df2.reset_index(drop=True, inplace=True)

# printing result
df2

Unnamed: 0,waste_type,total_waste_generated_tonne,total_waste_recycled_tonne,year
0,Ferrous Metal,1269,126,2018
1,Ferrous Metal,1278,1270,2019
2,Glass,64,12,2018
3,Glass,75,11,2019
4,Non-Ferrous Metal,171,170,2018
5,Non-Ferrous Metal,126,124,2019
6,Plastic,949,41,2018
7,Plastic,930,37,2019


## concatanation

In [25]:
# concatanating dfs
frames = [df1, df2]
df = pd.concat(frames)

# sorting df
df.sort_values(['waste_type', 'year'], inplace=True)

# reseting index
df.reset_index(drop=True, inplace=True)

# printing final table
df

Unnamed: 0,waste_type,total_waste_recycled_tonne,total_waste_generated_tonne,year
0,Ferrous Metal,1333300.0,1348500,2015
1,Ferrous Metal,1351500.0,1357500,2016
2,Ferrous Metal,1371000.0,1378800,2017
3,Ferrous Metal,126.0,1269,2018
4,Ferrous Metal,1270.0,1278,2019
5,Glass,14600.0,75200,2015
6,Glass,14700.0,72300,2016
7,Glass,12400.0,71300,2017
8,Glass,12.0,64,2018
9,Glass,11.0,75,2019


## calculating energy saved

In [33]:
#read
df3 = pd.read_csv('datasets/energy_saved.csv', header=3, index_col=0)

# transpose
df3 = df3.transpose()

# drop
df3.drop(columns='crude_oil saved', inplace=True)

# rename
df3.rename(columns={"energy_saved":'energy_saved (Kwh)'}, inplace=True)

# string to int

# df3 = pd.to_numeric(df3['energy_saved (Kwh)'])

# print
# df3

convert = df3['energy_saved (Kwh)'].str.split(expand=True)

# df3.drop(columns=1, inplace=True)

# df3.rename(columns={0:'energy_saved (Kwh)'}, inplace=True)

idx = df3.index
# idx.rename(name='waste_type', inplace=True)

df3 = df3['energy_saved (Kwh)'].str.split(expand=True)

df3.drop(columns=1, inplace=True)

df3.reset_index(inplace=True)

df3.rename(columns={"index":"waste_type",0:'energy_saved (Kwh)'}, inplace=True)

df3['energy_saved (Kwh)'] = pd.to_numeric(df3['energy_saved (Kwh)'])

df3

Unnamed: 0,waste_type,energy_saved (Kwh)
0,Plastic,5774
1,Glass,42
2,Ferrous Metal,642
3,Non-Ferrous Metal,14000
4,Paper,4000


In [34]:
df3.dtypes

waste_type            object
energy_saved (Kwh)     int64
dtype: object

In [36]:
# join
df_final = pd.merge(df, df3, left_on='waste_type', right_on='waste_type')

df_final

Unnamed: 0,waste_type,total_waste_recycled_tonne,total_waste_generated_tonne,year,energy_saved (Kwh)
0,Ferrous Metal,1333300.0,1348500,2015,642
1,Ferrous Metal,1351500.0,1357500,2016,642
2,Ferrous Metal,1371000.0,1378800,2017,642
3,Ferrous Metal,126.0,1269,2018,642
4,Ferrous Metal,1270.0,1278,2019,642
5,Glass,14600.0,75200,2015,42
6,Glass,14700.0,72300,2016,42
7,Glass,12400.0,71300,2017,42
8,Glass,12.0,64,2018,42
9,Glass,11.0,75,2019,42


In [37]:
df_final.dtypes

waste_type                      object
total_waste_recycled_tonne     float64
total_waste_generated_tonne      int64
year                             int64
energy_saved (Kwh)               int64
dtype: object

In [39]:
df_final['total_energy_saved'] = df_final['total_waste_recycled_tonne'] * df_final['energy_saved (Kwh)']

df_final

Unnamed: 0,waste_type,total_waste_recycled_tonne,total_waste_generated_tonne,year,energy_saved (Kwh),total_energy_saved
0,Ferrous Metal,1333300.0,1348500,2015,642,855978600.0
1,Ferrous Metal,1351500.0,1357500,2016,642,867663000.0
2,Ferrous Metal,1371000.0,1378800,2017,642,880182000.0
3,Ferrous Metal,126.0,1269,2018,642,80892.0
4,Ferrous Metal,1270.0,1278,2019,642,815340.0
5,Glass,14600.0,75200,2015,42,613200.0
6,Glass,14700.0,72300,2016,42,617400.0
7,Glass,12400.0,71300,2017,42,520800.0
8,Glass,12.0,64,2018,42,504.0
9,Glass,11.0,75,2019,42,462.0


In [41]:
df_final.groupby('year').agg({"total_energy_saved":"sum"})

Unnamed: 0_level_0,total_energy_saved
year,Unnamed: 1_level_1
2015,3435929000.0
2016,2554433000.0
2017,2470596000.0
2018,2698130.0
2019,2765440.0
