# Data Engineering

## Purpose of this notebook

To perform data cleaning and prepare the data to a suitable format for upload to MongoDB Atlas.


## Environment Setup, data Cleaning and Data Transformation

In [1]:
import pandas
import etl_processor

### Data Source

- Electricity Generation and Consumption
    - Description: This dataset is about the electricity generation and electricity consumption across different sectors and households from 1975 Jan to 2024 Mar
    - Link: https://tablebuilder.singstat.gov.sg/table/TS/M890841 
    - Data Source: SingStat Table Builder (Department of Statistics)
- Electricity Generation by Monthly data
    - Description: This dataset is about the electricity generation per month from 1975 Jan to 2024 Mar
    - Link: https://tablebuilder.singstat.gov.sg/table/TS/M890831 
    - Data Source: SingStat Table Builder (Department of Statistics)
- Peak System Demand from 2005 to Jul 2021
    - Description: This dataset is about the records of Peak System Demand in the unit of Megawatt for each month from 2005 to Jul 2021
    - Link: https://beta.data.gov.sg/datasets/d_926d3e304c0b41e56d4cbd3304acf105/view 
    - Data Source: data.gov.sg
- Solar PV Installations by URA Planning Region
    - Description: 
        - This dataset is about the Solar PV installation in different region within Singapore with data differentiating its residential status 
        - There are data columns about the number of solar pv installations, installed capacity in KWac and total installed capacity (percentage)    
    - Link: https://beta.data.gov.sg/datasets/d_cd4f91f7a1ebb2b7ceb1a70c0dbb706d/view
    - Data Source: data.gov.sg 
- Total Final Energy Consumption 2009 to 2019
    - Description: This dataset is about the energy consumption in the unit of Kilotonne of Oil equivalent across different energy type and sectors from 2009 to 2019
    - Link: https://beta.data.gov.sg/datasets/d_500440fba49cfc69f395e6dd1df967de/view 
    - Data Source: data.gov.sg
- Total Final Energy Consumption by Energy Type and Sector
    - Description: This dataset is about the energy consumption in the unit of Kilotonne of Oil equivalent across different energy type and sectors
    - Link: https://tablebuilder.singstat.gov.sg/table/TS/M891111 
    - Data Source: SingStat Table Builder (Department of Statistics Singapore)
- Licensed Local Food Farm
    - Description: This dataset is about the number of licensed food farms in Singapore with different food category
    - Link: https://tablebuilder.singstat.gov.sg/table/TS/M891471 
    - Data Source: SingStat Table Builder (Department of Statistics Singapore)
- Local Production Annual
    - Description: This dataset is about the value of Local food production annually with different categories of Food Type
    - Link: https://tablebuilder.singstat.gov.sg/table/TS/M890721 
    - Data Source: SingStat Table Builder (Department of Statistics Singapore)
- Value of Local Food Production in Singapore
    - Description: This dataset is about the value of Local Food Production in Singapore measured in million dollar for each year with different categories of food type
    - Link: https://beta.data.gov.sg/datasets/d_21ae7d83dd7ee33c8c932ba2564fd8aa/view
    - Data Source: data.gov.sg



### Data Preparation

#### Total Final Energy Consumption by Energy Type and Sector

In [2]:
total_final_energy_consumption_by_energy_type_and_sector_for_total_dataframe: pandas.DataFrame = etl_processor.excel_file_data_source.retrieve_data_for_total_final_energy_consumption_by_energy_type_and_sector_for_total()
total_final_energy_consumption_by_energy_type_and_sector_total_for_coal_and_peat_dataframe: pandas.DataFrame = etl_processor.excel_file_data_source.retrieve_data_for_total_final_energy_consumption_by_energy_type_and_sector_total_for_coal_and_peat()
total_final_energy_consumption_by_energy_type_and_sector_total_for_electricity_dataframe: pandas.DataFrame = etl_processor.excel_file_data_source.retrieve_data_for_total_final_energy_consumption_by_energy_type_and_sector_total_for_electricity()
total_final_energy_consumption_by_energy_type_and_sector_total_for_natural_gas_dataframe: pandas.DataFrame = etl_processor.excel_file_data_source.retrieve_data_for_total_final_energy_consumption_by_energy_type_and_sector_total_for_natural_gas()
total_final_energy_consumption_by_energy_type_and_sector_for_petroleum_products_dataframe: pandas.DataFrame = etl_processor.excel_file_data_source.retrieve_data_for_total_final_energy_consumption_by_energy_type_and_sector_for_petroleum_products()
total_final_energy_consumption_by_energy_type_and_sector_total_for_crude_oil_dataframe: pandas.DataFrame = etl_processor.excel_file_data_source.retrieve_data_for_total_final_energy_consumption_by_energy_type_and_sector_total_for_crude_oil()


#### Electricity Generation by Monthly data

In [3]:
electricity_generation_by_month_dataframe: pandas.DataFrame = etl_processor.excel_file_data_source.retrieve_data_for_electricity_generation_monthly_data()

#### Electricity Generation and Consumption

In [4]:
electricity_generation_and_consumption_by_month_dataframe: pandas.DataFrame = etl_processor.excel_file_data_source.retrieve_data_for_electricity_generation_and_consumption()

#### Total Final Energy Consumption by Energy Type and Sector from 2009 to 2019

In [5]:
total_final_energy_consumption_by_energy_type_and_sector_csv: pandas.DataFrame = etl_processor.csv_file_data_source.retrieve_data_for_total_final_energy_consumption_2009_to_2019()

#### Peak System Demand from 2005 to Jul 2021

In [6]:
peak_system_demand_dataframe: pandas.DataFrame = etl_processor.csv_file_data_source.retrieve_data_for_peak_system_demand_2005_to_jul_2021()

#### Solar PV Installations by URA Planning Region

In [7]:
solar_pv_installations_by_ura_planning_region_dataframe: pandas.DataFrame = etl_processor.csv_file_data_source.retrieve_data_for_solar_pv_installations_by_ura_planning_region()

#### Local Production Annual

In [8]:
local_food_production_annual_dataframe: pandas.DataFrame = etl_processor.excel_file_data_source.retrieve_data_for_local_production_annual()


#### Licensed Local Food Farm

In [9]:
licensed_local_food_farm_dataframe: pandas.DataFrame = etl_processor.excel_file_data_source.retrieve_data_for_licensed_local_food_farm()

### Validating for Null values or NA values

In [16]:
licensed_local_food_farm_dataframe.isna().count()

Data Series
Number Of Licensed Local Food Farms    5
Sea-Based Seafood                      5
Land-Based Seafood                     5
Vegetables                             5
Hen Shell Eggs                         5
Others                                 5
dtype: int64

In [17]:
licensed_local_food_farm_dataframe.isnull().count()

Data Series
Number Of Licensed Local Food Farms    5
Sea-Based Seafood                      5
Land-Based Seafood                     5
Vegetables                             5
Hen Shell Eggs                         5
Others                                 5
dtype: int64

In [18]:
local_food_production_annual_dataframe.isna().count()

Data Series
index                                                                               48
Total Value Of Local Production (Million Dollars)                                   48
Seafood (Million Dollars)                                                           48
Vegetables (Million Dollars)                                                        48
Hen Shell Eggs (Million Dollars)                                                    48
Local Production Of Seafood (Tonnes)                                                48
Local Production Of Vegetables (Tonnes)                                             48
Local Production Of Hen Shell Eggs (Million Pieces)                                 48
Local Production Of Aquarium Fish (Million Pieces)                                  48
Local Production Of Aquatic Plants And Tissue Culture Plantlets (Million Plants)    48
Local Production Of Orchids (Million Stalks)                                        48
Local Production Of Ornamental 

In [19]:
local_food_production_annual_dataframe.isnull().count()

Data Series
index                                                                               48
Total Value Of Local Production (Million Dollars)                                   48
Seafood (Million Dollars)                                                           48
Vegetables (Million Dollars)                                                        48
Hen Shell Eggs (Million Dollars)                                                    48
Local Production Of Seafood (Tonnes)                                                48
Local Production Of Vegetables (Tonnes)                                             48
Local Production Of Hen Shell Eggs (Million Pieces)                                 48
Local Production Of Aquarium Fish (Million Pieces)                                  48
Local Production Of Aquatic Plants And Tissue Culture Plantlets (Million Plants)    48
Local Production Of Orchids (Million Stalks)                                        48
Local Production Of Ornamental 

In [20]:
solar_pv_installations_by_ura_planning_region_dataframe.isna().count()

year                      140
ura_planning_region       140
residential_status        140
num_solar_pv_inst         140
inst_cap_kwac             140
total_inst_cap_percent    140
dtype: int64

In [21]:
solar_pv_installations_by_ura_planning_region_dataframe.isnull().count()

year                      140
ura_planning_region       140
residential_status        140
num_solar_pv_inst         140
inst_cap_kwac             140
total_inst_cap_percent    140
dtype: int64

In [22]:
peak_system_demand_dataframe.isna().count()

year                     199
mth                      199
peak_system_demand_mw    199
dtype: int64

In [23]:
peak_system_demand_dataframe.isnull().count()

year                     199
mth                      199
peak_system_demand_mw    199
dtype: int64

In [25]:
total_final_energy_consumption_by_energy_type_and_sector_csv.isna().count()

year                330
sector              330
energy_products     330
consumption_ktoe    330
dtype: int64

In [26]:
total_final_energy_consumption_by_energy_type_and_sector_csv.isnull().count()

year                330
sector              330
energy_products     330
consumption_ktoe    330
dtype: int64

In [27]:
electricity_generation_and_consumption_by_month_dataframe.isna().count()

Data Series    18
2023           18
2022           18
2021           18
2020           18
2019           18
2018           18
2017           18
2016           18
2015           18
2014           18
2013           18
2012           18
2011           18
2010           18
2009           18
2008           18
2007           18
2006           18
2005           18
2004           18
2003           18
2002           18
2001           18
2000           18
1999           18
1998           18
1997           18
1996           18
1995           18
1994           18
1993           18
1992           18
1991           18
1990           18
1989           18
1988           18
1987           18
1986           18
1985           18
1984           18
1983           18
1982           18
1981           18
1980           18
1979           18
1978           18
1977           18
1976           18
1975           18
dtype: int64

In [28]:
electricity_generation_and_consumption_by_month_dataframe.isnull().count()

Data Series    18
2023           18
2022           18
2021           18
2020           18
2019           18
2018           18
2017           18
2016           18
2015           18
2014           18
2013           18
2012           18
2011           18
2010           18
2009           18
2008           18
2007           18
2006           18
2005           18
2004           18
2003           18
2002           18
2001           18
2000           18
1999           18
1998           18
1997           18
1996           18
1995           18
1994           18
1993           18
1992           18
1991           18
1990           18
1989           18
1988           18
1987           18
1986           18
1985           18
1984           18
1983           18
1982           18
1981           18
1980           18
1979           18
1978           18
1977           18
1976           18
1975           18
dtype: int64

In [29]:
electricity_generation_by_month_dataframe.isna().count()

Data Series    1
2024 Mar       1
2024 Feb       1
2024 Jan       1
2023 Dec       1
              ..
1975 May       1
1975 Apr       1
1975 Mar       1
1975 Feb       1
1975 Jan       1
Length: 592, dtype: int64

In [30]:
electricity_generation_by_month_dataframe.isnull().count()

Data Series    1
2024 Mar       1
2024 Feb       1
2024 Jan       1
2023 Dec       1
              ..
1975 May       1
1975 Apr       1
1975 Mar       1
1975 Feb       1
1975 Jan       1
Length: 592, dtype: int64

In [31]:
total_final_energy_consumption_by_energy_type_and_sector_for_total_dataframe.isna().count()



Data Series    7
2021           7
2020           7
2019           7
2018           7
2017           7
2016           7
2015           7
2014           7
2013           7
2012           7
2011           7
2010           7
2009           7
dtype: int64

In [32]:
total_final_energy_consumption_by_energy_type_and_sector_for_total_dataframe.isnull().count()

Data Series    7
2021           7
2020           7
2019           7
2018           7
2017           7
2016           7
2015           7
2014           7
2013           7
2012           7
2011           7
2010           7
2009           7
dtype: int64

In [33]:
total_final_energy_consumption_by_energy_type_and_sector_total_for_coal_and_peat_dataframe.isnull().count()


Data Series    6
2021           6
2020           6
2019           6
2018           6
2017           6
2016           6
2015           6
2014           6
2013           6
2012           6
2011           6
2010           6
2009           6
dtype: int64

In [34]:
total_final_energy_consumption_by_energy_type_and_sector_total_for_coal_and_peat_dataframe.isna().count()

Data Series    6
2021           6
2020           6
2019           6
2018           6
2017           6
2016           6
2015           6
2014           6
2013           6
2012           6
2011           6
2010           6
2009           6
dtype: int64

In [35]:
total_final_energy_consumption_by_energy_type_and_sector_total_for_electricity_dataframe.isna().count()



Data Series    6
2021           6
2020           6
2019           6
2018           6
2017           6
2016           6
2015           6
2014           6
2013           6
2012           6
2011           6
2010           6
2009           6
dtype: int64

In [36]:
total_final_energy_consumption_by_energy_type_and_sector_total_for_electricity_dataframe.isnull().count()

Data Series    6
2021           6
2020           6
2019           6
2018           6
2017           6
2016           6
2015           6
2014           6
2013           6
2012           6
2011           6
2010           6
2009           6
dtype: int64

In [38]:
total_final_energy_consumption_by_energy_type_and_sector_total_for_natural_gas_dataframe.isna().count()


Data Series    6
2021           6
2020           6
2019           6
2018           6
2017           6
2016           6
2015           6
2014           6
2013           6
2012           6
2011           6
2010           6
2009           6
dtype: int64

In [37]:
total_final_energy_consumption_by_energy_type_and_sector_total_for_natural_gas_dataframe.isnull().count()

Data Series    6
2021           6
2020           6
2019           6
2018           6
2017           6
2016           6
2015           6
2014           6
2013           6
2012           6
2011           6
2010           6
2009           6
dtype: int64

In [39]:
total_final_energy_consumption_by_energy_type_and_sector_for_petroleum_products_dataframe.isna().count()


Data Series    6
2021           6
2020           6
2019           6
2018           6
2017           6
2016           6
2015           6
2014           6
2013           6
2012           6
2011           6
2010           6
2009           6
dtype: int64

In [40]:
total_final_energy_consumption_by_energy_type_and_sector_for_petroleum_products_dataframe.isnull().count()

Data Series    6
2021           6
2020           6
2019           6
2018           6
2017           6
2016           6
2015           6
2014           6
2013           6
2012           6
2011           6
2010           6
2009           6
dtype: int64

In [41]:
total_final_energy_consumption_by_energy_type_and_sector_total_for_crude_oil_dataframe.isnull().count()


Data Series    6
2021           6
2020           6
2019           6
2018           6
2017           6
2016           6
2015           6
2014           6
2013           6
2012           6
2011           6
2010           6
2009           6
dtype: int64

In [42]:
total_final_energy_consumption_by_energy_type_and_sector_total_for_crude_oil_dataframe.isna().count()

Data Series    6
2021           6
2020           6
2019           6
2018           6
2017           6
2016           6
2015           6
2014           6
2013           6
2012           6
2011           6
2010           6
2009           6
dtype: int64

### Data Transformation

Steps of the Data Transform for Local Food Production Annual Dataset
- Replace all fields with 'na' to integer value of 0
- Set 'Data Series' column as the index of this dataframe
- Remove all empty spaces in the Column name after transpose
- Reset the Index of this dataframe

In [10]:
local_food_production_annual_dataframe: pandas.DataFrame = etl_processor.excel_file_data_transformer.transform_local_food_production_annual_data(local_food_production_annual_dataframe)

Steps of the Data Transform for Licensed Local Food Farm dataset
- Remove all empty spaces in the DataFrame Column 
- Set 'Data Series' column as the index of this dataframe
- Transpose the dataframe
- Remove all empty spaces in the Column name again after the Tranpose method in the dataframe


In [11]:
licensed_local_food_farm_dataframe: pandas.DataFrame = etl_processor.excel_file_data_transformer.transform_licensed_local_food_farm_data(licensed_local_food_farm_dataframe)

Steps of the Data Transform for Electricity Generation and Consumption monthly dataset
- Remove all empty spaces in the DataFrame Column 
- Set 'Data Series' as the index for the dataframe
- Transpose the dataframe
- Remove all empty spaces in the DataFrame Column since the dataframe have transposed
- Replace all fields that contains 'na' in the dataframe to the integer value of 0




In [12]:
electricity_generation_and_consumption_by_month_dataframe = etl_processor.excel_file_data_transformer.transform_electricity_generation_and_consumption_monthly_data(electricity_generation_and_consumption_by_month_dataframe)

Steps of the Data Transform for Electricity Generation By Month dataset
- Rename the column name for 'Data Series' to 'Month'
- Set 'Month' as the index for the dataframe
- Transpose the dataframe


In [13]:
electricity_generation_by_month_dataframe = etl_processor.excel_file_data_transformer.transform_electricity_generation_by_month(electricity_generation_by_month_dataframe)

Steps of the Data Transform for Total Energy Consumption by Energy Type and Sector dataset
- Remove all empty spaces in the dataframe columns
- Filter the dataframe to retrieve only 3 columns ('Data Series', '2021', '2020')
- Replace all fields with '-' to the integer value of 0
- Exclude the first row in the dataframe


Purpose of this transformation for Total Energy Consumption by Energy Type and Sector dataset
- To merge the data that came from excel file into the dataframe that retrieves the data from CSV file
- All data format will follow the data source from the CSV file

These steps are to prepare for the merging of the data from both dataframe (excel source and csv file source)

In [14]:
total_final_energy_consumption_by_energy_type_and_sector_for_total_dataframe = etl_processor.excel_file_data_transformer.transform_total_energy_consumption_by_energy_type_and_sector_for_all_products(total_final_energy_consumption_by_energy_type_and_sector_for_total_dataframe)

In [15]:
total_final_energy_consumption_by_energy_type_and_sector_total_for_coal_and_peat_dataframe = etl_processor.excel_file_data_transformer.transform_total_energy_consumption_by_energy_type_and_sector_total_for_coal_and_peat(total_final_energy_consumption_by_energy_type_and_sector_total_for_coal_and_peat_dataframe)

In [16]:
total_final_energy_consumption_by_energy_type_and_sector_total_for_electricity_dataframe = etl_processor.excel_file_data_transformer.transform_total_final_energy_consumption_by_energy_type_and_sector_total_for_electricity(total_final_energy_consumption_by_energy_type_and_sector_total_for_electricity_dataframe)

In [17]:
total_final_energy_consumption_by_energy_type_and_sector_total_for_natural_gas_dataframe = etl_processor.excel_file_data_transformer.transform_total_final_energy_consumption_by_energy_type_and_sector_total_for_natural_gas(total_final_energy_consumption_by_energy_type_and_sector_total_for_natural_gas_dataframe)

In [18]:
total_final_energy_consumption_by_energy_type_and_sector_for_petroleum_products_dataframe = etl_processor.excel_file_data_transformer.transform_total_final_energy_consumption_by_energy_type_and_sector_for_petroleum_products(total_final_energy_consumption_by_energy_type_and_sector_for_petroleum_products_dataframe)

In [19]:
total_final_energy_consumption_by_energy_type_and_sector_total_for_crude_oil_dataframe = etl_processor.excel_file_data_transformer.transform_total_final_energy_consumption_by_energy_type_and_sector_for_crude_oil(total_final_energy_consumption_by_energy_type_and_sector_total_for_crude_oil_dataframe)

Merge all dataset for Total Energy Consumption by Energy Type and Sector from excel file dataset into another dataframe

Steps of the Data Transform for Electricity Generation By Month dataset
- Rename the column name for 'Data Series' to 'Month'
- Set 'Month' as the index for the dataframe
- Transpose the dataframe



In [20]:

csv_dataframe_columns = ['year', 'sector', 'energy_products', 'consumption_ktoe']



dataframe_information_list = ["Crude Oil", "Petroleum Products", "Coal And Peat", "Electricity", "Natural Gas"]
dataframe_list = [total_final_energy_consumption_by_energy_type_and_sector_total_for_crude_oil_dataframe, 
                  total_final_energy_consumption_by_energy_type_and_sector_for_petroleum_products_dataframe,
                    total_final_energy_consumption_by_energy_type_and_sector_total_for_coal_and_peat_dataframe,
                    total_final_energy_consumption_by_energy_type_and_sector_total_for_electricity_dataframe,
                    total_final_energy_consumption_by_energy_type_and_sector_total_for_natural_gas_dataframe
                  ]


total_final_energy_consumption_by_energy_type_and_sector_csv = etl_processor.csv_file_data_transformer.merge_all_data_of_total_final_energy_consumption_by_energy_type(
    dataframe_list=dataframe_list, 
    resulting_dataframe_column=csv_dataframe_columns, 
    dataframe_information=dataframe_information_list,
    resulting_dataframe=total_final_energy_consumption_by_energy_type_and_sector_csv
  )


### Data Loading to MongoDB Atlas

#### Electricity Generation and Consumption By Month

In [21]:
etl_processor.load.ElectricityGenerationAndConsumptionByMonth(electricity_generation_and_consumption_by_month_dataframe)

Tried to save duplicate unique keys (E11000 duplicate key error collection: SG-Green-Plan-Data-Analysis-DB.electricity_generation_and_consumption_by_month index: year_1 dup key: { year: 2023 }, full error: {'index': 0, 'code': 11000, 'errmsg': 'E11000 duplicate key error collection: SG-Green-Plan-Data-Analysis-DB.electricity_generation_and_consumption_by_month index: year_1 dup key: { year: 2023 }', 'keyPattern': {'year': 1}, 'keyValue': {'year': 2023}})
Tried to save duplicate unique keys (E11000 duplicate key error collection: SG-Green-Plan-Data-Analysis-DB.electricity_generation_and_consumption_by_month index: year_1 dup key: { year: 2022 }, full error: {'index': 0, 'code': 11000, 'errmsg': 'E11000 duplicate key error collection: SG-Green-Plan-Data-Analysis-DB.electricity_generation_and_consumption_by_month index: year_1 dup key: { year: 2022 }', 'keyPattern': {'year': 1}, 'keyValue': {'year': 2022}})
Tried to save duplicate unique keys (E11000 duplicate key error collection: SG-Gre

<etl_processor.load.ElectricityGenerationAndConsumptionByMonth at 0x7f13fcaf9ee0>

#### Electricity Generation By Month

In [22]:
etl_processor.load.ElectricityGenerationByMonth(electricity_generation_by_month_dataframe)

Tried to save duplicate unique keys (E11000 duplicate key error collection: SG-Green-Plan-Data-Analysis-DB.electricity_generation_monthly_data index: month_1 dup key: { month: "2024 Mar " }, full error: {'index': 0, 'code': 11000, 'errmsg': 'E11000 duplicate key error collection: SG-Green-Plan-Data-Analysis-DB.electricity_generation_monthly_data index: month_1 dup key: { month: "2024 Mar " }', 'keyPattern': {'month': 1}, 'keyValue': {'month': '2024 Mar '}})
Tried to save duplicate unique keys (E11000 duplicate key error collection: SG-Green-Plan-Data-Analysis-DB.electricity_generation_monthly_data index: month_1 dup key: { month: "2024 Feb " }, full error: {'index': 0, 'code': 11000, 'errmsg': 'E11000 duplicate key error collection: SG-Green-Plan-Data-Analysis-DB.electricity_generation_monthly_data index: month_1 dup key: { month: "2024 Feb " }', 'keyPattern': {'month': 1}, 'keyValue': {'month': '2024 Feb '}})
Tried to save duplicate unique keys (E11000 duplicate key error collection: 

<etl_processor.load.ElectricityGenerationByMonth at 0x7f13fc6a1850>

#### Licensed Local Food Farm

In [23]:
etl_processor.load.LicensedLocalFoodFarm(licensed_local_food_farm_dataframe)

Tried to save duplicate unique keys (E11000 duplicate key error collection: SG-Green-Plan-Data-Analysis-DB.licensed_local_food_farm index: year_1 dup key: { year: "2023" }, full error: {'index': 0, 'code': 11000, 'errmsg': 'E11000 duplicate key error collection: SG-Green-Plan-Data-Analysis-DB.licensed_local_food_farm index: year_1 dup key: { year: "2023" }', 'keyPattern': {'year': 1}, 'keyValue': {'year': '2023'}})
Tried to save duplicate unique keys (E11000 duplicate key error collection: SG-Green-Plan-Data-Analysis-DB.licensed_local_food_farm index: year_1 dup key: { year: "2022" }, full error: {'index': 0, 'code': 11000, 'errmsg': 'E11000 duplicate key error collection: SG-Green-Plan-Data-Analysis-DB.licensed_local_food_farm index: year_1 dup key: { year: "2022" }', 'keyPattern': {'year': 1}, 'keyValue': {'year': '2022'}})
Tried to save duplicate unique keys (E11000 duplicate key error collection: SG-Green-Plan-Data-Analysis-DB.licensed_local_food_farm index: year_1 dup key: { year:

<etl_processor.load.LicensedLocalFoodFarm at 0x7f13fc95c2c0>

#### Local Food Production Annual data

In [24]:
etl_processor.load.LocalFoodProduction(local_food_production_annual_dataframe)

Tried to save duplicate unique keys (E11000 duplicate key error collection: SG-Green-Plan-Data-Analysis-DB.local_food_production index: year_1 dup key: { year: "2023 " }, full error: {'index': 0, 'code': 11000, 'errmsg': 'E11000 duplicate key error collection: SG-Green-Plan-Data-Analysis-DB.local_food_production index: year_1 dup key: { year: "2023 " }', 'keyPattern': {'year': 1}, 'keyValue': {'year': '2023 '}})
Tried to save duplicate unique keys (E11000 duplicate key error collection: SG-Green-Plan-Data-Analysis-DB.local_food_production index: year_1 dup key: { year: "2022 " }, full error: {'index': 0, 'code': 11000, 'errmsg': 'E11000 duplicate key error collection: SG-Green-Plan-Data-Analysis-DB.local_food_production index: year_1 dup key: { year: "2022 " }', 'keyPattern': {'year': 1}, 'keyValue': {'year': '2022 '}})
Tried to save duplicate unique keys (E11000 duplicate key error collection: SG-Green-Plan-Data-Analysis-DB.local_food_production index: year_1 dup key: { year: "2021 " 

<etl_processor.load.LocalFoodProduction at 0x7f13fc669e20>

#### Solar PV Installation by URA Planning Region

In [25]:
etl_processor.load.SolarPVInstallationsByURAPlanningRegion(solar_pv_installations_by_ura_planning_region_dataframe)

Tried to save duplicate unique keys (E11000 duplicate key error collection: SG-Green-Plan-Data-Analysis-DB.solar_p_v_installations_by_u_r_a_planning_region index: year_1 dup key: { year: 2008 }, full error: {'index': 0, 'code': 11000, 'errmsg': 'E11000 duplicate key error collection: SG-Green-Plan-Data-Analysis-DB.solar_p_v_installations_by_u_r_a_planning_region index: year_1 dup key: { year: 2008 }', 'keyPattern': {'year': 1}, 'keyValue': {'year': 2008}})
Tried to save duplicate unique keys (E11000 duplicate key error collection: SG-Green-Plan-Data-Analysis-DB.solar_p_v_installations_by_u_r_a_planning_region index: year_1 dup key: { year: 2008 }, full error: {'index': 0, 'code': 11000, 'errmsg': 'E11000 duplicate key error collection: SG-Green-Plan-Data-Analysis-DB.solar_p_v_installations_by_u_r_a_planning_region index: year_1 dup key: { year: 2008 }', 'keyPattern': {'year': 1}, 'keyValue': {'year': 2008}})
Tried to save duplicate unique keys (E11000 duplicate key error collection: SG

<etl_processor.load.SolarPVInstallationsByURAPlanningRegion at 0x7f13fcdfa0c0>

#### Peak System Demand

In [26]:
etl_processor.load.PeakSystemDemand(peak_system_demand_dataframe)

Tried to save duplicate unique keys (E11000 duplicate key error collection: SG-Green-Plan-Data-Analysis-DB.peak_system_demand index: year_month_1 dup key: { year_month: "2005-1" }, full error: {'index': 0, 'code': 11000, 'errmsg': 'E11000 duplicate key error collection: SG-Green-Plan-Data-Analysis-DB.peak_system_demand index: year_month_1 dup key: { year_month: "2005-1" }', 'keyPattern': {'year_month': 1}, 'keyValue': {'year_month': '2005-1'}})
Tried to save duplicate unique keys (E11000 duplicate key error collection: SG-Green-Plan-Data-Analysis-DB.peak_system_demand index: year_month_1 dup key: { year_month: "2005-2" }, full error: {'index': 0, 'code': 11000, 'errmsg': 'E11000 duplicate key error collection: SG-Green-Plan-Data-Analysis-DB.peak_system_demand index: year_month_1 dup key: { year_month: "2005-2" }', 'keyPattern': {'year_month': 1}, 'keyValue': {'year_month': '2005-2'}})
Tried to save duplicate unique keys (E11000 duplicate key error collection: SG-Green-Plan-Data-Analysi

<etl_processor.load.PeakSystemDemand at 0x7f13fc6a2f00>

#### Total Final Energy Consumption by Energy Type and Sector

In [30]:
etl_processor.load.TotalFinalEnergyConsumptionByEnergyTypeAndSector(total_final_energy_consumption_by_energy_type_and_sector_csv)

Tried to save duplicate unique keys (E11000 duplicate key error collection: SG-Green-Plan-Data-Analysis-DB.total_final_energy_consumption_by_energy_type_and_sector index: year_index_1 dup key: { year_index: "0-2009" }, full error: {'index': 0, 'code': 11000, 'errmsg': 'E11000 duplicate key error collection: SG-Green-Plan-Data-Analysis-DB.total_final_energy_consumption_by_energy_type_and_sector index: year_index_1 dup key: { year_index: "0-2009" }', 'keyPattern': {'year_index': 1}, 'keyValue': {'year_index': '0-2009'}})
Tried to save duplicate unique keys (E11000 duplicate key error collection: SG-Green-Plan-Data-Analysis-DB.total_final_energy_consumption_by_energy_type_and_sector index: year_index_1 dup key: { year_index: "1-2009" }, full error: {'index': 0, 'code': 11000, 'errmsg': 'E11000 duplicate key error collection: SG-Green-Plan-Data-Analysis-DB.total_final_energy_consumption_by_energy_type_and_sector index: year_index_1 dup key: { year_index: "1-2009" }', 'keyPattern': {'year_i

<etl_processor.load.TotalFinalEnergyConsumptionByEnergyTypeAndSector at 0x7f13fc6a01d0>

### Data Profiling

In [None]:
total_final_energy_consumption_by_energy_type_and_sector_for_petroleum_products_dataframe.describe()

Unnamed: 0,Data Series,2021,2020,2019,2018,2017,2016,2015,2014,2013,2012,2011,2010,2009
count,6,6.0,6.0,6.0,6.0,6.0,6.0,6.0,6.0,6,6.0,6,6.0,6.0
unique,6,6.0,6.0,6.0,6.0,6.0,6.0,6.0,6.0,6,6.0,6,6.0,6.0
top,Petroleum Products,9009.4,8713.3,10125.1,9073.5,9149.5,9351.2,9993.4,8968.3,8475,7946.8,7614,7790.2,6474.8
freq,1,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1,1.0,1,1.0,1.0


In [None]:
total_final_energy_consumption_by_energy_type_and_sector_for_petroleum_products_dataframe.set_index("Data Series").transpose()

Data Series,Petroleum Products,Industry-Related,Commerce And Services-Related,Transport-Related,Households,Others
2021,9009.4,6423.7,75.2,2485.5,25.1,-
2020,8713.3,6406.7,69.3,2214.3,23.1,-
2019,10125.1,7611.1,70.4,2420.2,23.5,-
2018,9073.5,6519.2,74.4,2455.1,24.8,-
2017,9149.5,6562.8,75.2,2486.5,25.1,-
2016,9351.2,6567.2,77.3,2680.9,25.8,-
2015,9993.4,7265.1,80.0,2621.5,26.7,-
2014,8968.3,6665.5,78.9,2199.8,24.1,-
2013,8475.0,6225.3,87.3,2133.3,29.1,-
2012,7946.8,5562.0,68.3,2292.7,23.7,-


In [None]:
local_food_production_annual_dataframe.describe()

Data Series,Total Value Of Local Production (Million Dollars),Seafood (Million Dollars),Vegetables (Million Dollars),Hen Shell Eggs (Million Dollars),Local Production Of Seafood (Tonnes),Local Production Of Vegetables (Tonnes),Local Production Of Hen Shell Eggs (Million Pieces),Local Production Of Aquarium Fish (Million Pieces),Local Production Of Aquatic Plants And Tissue Culture Plantlets (Million Plants),Local Production Of Orchids (Million Stalks),Local Production Of Ornamental Plants (Million Plants)
count,48,48,48,48,48,48,48,48,48,48,48
unique,6,6,5,6,10,48,10,7,4,18,20
top,na,na,na,na,na,16915,na,na,na,na,na
freq,43,43,43,43,39,1,39,41,45,22,23


In [None]:
solar_pv_installations_by_ura_planning_region_dataframe

Unnamed: 0,year,ura_planning_region,residential_status,num_solar_pv_inst,inst_cap_kwac,total_inst_cap_percent
0,2008,Central,Non-Residential,4,73.2,30.0
1,2008,Central,Residential,4,19.9,10.0
2,2008,East,Non-Residential,1,2.3,0.0
3,2008,East,Residential,1,6.6,0.0
4,2008,North-East,Non-Residential,10,65.3,20.0
...,...,...,...,...,...,...
135,2021,North-East,Residential,616,2990.3,0.9
136,2021,North,Non-Residential,720,52915.5,15.5
137,2021,North,Residential,57,524.7,0.2
138,2021,West,Non-Residential,1015,154485.1,45.2


In [None]:
solar_pv_installations_by_ura_planning_region_dataframe.describe()

Unnamed: 0,year,num_solar_pv_inst,inst_cap_kwac,total_inst_cap_percent
count,140.0,140.0,140.0,140.0
mean,2014.5,158.292857,10124.407857,10.002857
std,4.045603,213.314153,24071.582149,12.083701
min,2008.0,0.0,0.0,0.0
25%,2011.0,13.5,136.775,0.5
50%,2014.5,50.0,953.35,4.25
75%,2018.0,254.0,6329.925,15.5
max,2021.0,1015.0,154485.1,47.3
