In [None]:
import pandas as pd

In [None]:
# Load the datasets (CSV files)
manufacturing_df = pd.read_csv("/content/dataset_csv/Manufacturing-Operations-mar-25.csv")
logistics_df = pd.read_csv("/content/dataset_csv/Supply-Chain-Logistics-mar-25.csv")
offices_df = pd.read_csv("/content/dataset_csv/Offices-Facilities-mar-25.csv")
employee_df = pd.read_csv("/content/dataset_csv/Employee-Commute-Travel-mar-25.csv")
product_df = pd.read_csv("/content/dataset_csv/Product-Usage-mar-25.csv")
rnd_df = pd.read_csv("/content/dataset_csv/RnD-Activities-mar-25.csv")
iot_df = pd.read_csv("/content/dataset_csv/IoT-Integrations-mar-25.csv")
electricity_bills_df = pd.read_csv("/content/dataset_csv/Energy-Bills-mar-25.csv")

Extract relevant CO2 emissions columns from each dataset (assuming they have 'CO2 Emissions' column)

In case column names are different, adjust as necessary.


In [None]:
# Manufacturing Operations (assuming CO2 Emissions column is 'CO2 Emissions (kg)')
manufacturing_emissions = manufacturing_df[['Date', 'CO2 Emissions (kg)']]

In [None]:
# Supply Chain and Logistics (assuming CO2 Emissions column is 'CO2 Emissions (kg CO2)' or similar)
# Check the actual column name in your CSV file for logistics data.
# Here, we assume the correct column name is "CO2 Emissions (kg CO2)"
logistics_emissions = logistics_df[['Date', 'CO2 Emissions (kg)']]

In [None]:
# Offices and Facilities (assuming CO2 Emissions column is 'CO2 Emissions (kg)')
offices_emissions = offices_df[['Date', 'CO2 Emissions (kg)']]

In [None]:
# Employee Commuting (assuming CO2 Emissions column is 'CO2 Emissions (kg)')
employee_emissions = employee_df[['Date', 'CO2 Emissions (kg)']]

In [None]:
# Product Use and End-of-Life (assuming CO2 Emissions column is 'CO2 Emissions from Product Use (kg CO2)')
product_emissions = product_df[['Date','Product ID', 'CO2 Emissions from Product Use (kg CO2)']]

In [None]:
# rnd_emissions = rnd_df[['Date', 'CO2 Emissions (kg)']]  # This line caused the error
rnd_emissions = rnd_df[['Date', 'CO2 Emissions (kg)']]  # Changed to the correct column name

In [None]:
# IoT Integrations (assuming CO2 Emissions column is 'CO2 Emissions (kg CO2)')
iot_emissions = iot_df[['Date', 'CO2 Emissions (kg CO2)']]

In [None]:
# Electricity Bills (assuming CO2 Emissions column is 'CO2 Emissions (kg CO2)')
electricity_bills_emissions = electricity_bills_df[['Date', 'CO2 Emissions (kg CO2)']]

# Categorizing into Scopes

To categorize the emissions into **Scope 1**, **Scope 2**, or **Scope 3** based on the sources of emissions, we need to understand the three scopes first:
1.	**Scope 1:** Direct emissions from owned or controlled sources.
  * These could be from fuel consumption in company-owned vehicles or facilities, industrial processes, etc.
2.	**Scope 2:** Indirect emissions from the generation of purchased electricity consumed by the company.
  * These are emissions from electricity or heat purchased for your own use, but the emissions are produced at the energy generation point.
3.	**Scope 3:** All other indirect emissions that occur in the value chain, both upstream and downstream.
  * This includes emissions from transportation and distribution (not owned or controlled by the company), employee commuting, product use, supply chain, business travel, waste disposal, etc.
**Steps to Categorize Emissions**
1.	**Manufacturing Operations:** Likely **Scope 1** (direct emissions from facilities, machinery, etc.).
2.	**Logistics / Supply Chain:** Likely **Scope 3** (indirect emissions from transportation and distribution).
3.	**Offices / Facilities:** **Scope 2** (electricity use in buildings).
4.	**Employee Commuting:** **Scope 3** (indirect emissions from employee travel).
5.	**Product End-of-Life:** **Scope 3** (indirect emissions from product disposal).
6.	**R&D Activities:** Could be **Scope 1** or **Scope 3**, depending on whether the R&D activities involve direct emissions or the emission impacts of outsourced processes.
7.	**IoT Integrations:** Likely **Scope 3**, if it involves external services.
8.	**Electricity Bills:** **Scope 2** (emissions from purchased electricity).


In [None]:
# Define the scope for each dataset based on the emission source
def categorize_scope(source):
    if source in ['manufacturing', 'rnd']:
        return 'Scope 1'  # Direct emissions from company operations
    elif source in ['electricity_bills', 'offices']:
        return 'Scope 2'  # Indirect emissions from purchased electricity
    elif source in ['logistics', 'employee', 'product', 'iot']:
        return 'Scope 3'  # Indirect emissions from the value chain

In [None]:
# Add Scope column to each dataset
manufacturing_emissions['Scope'] = 'Scope 1'
logistics_emissions['Scope'] = 'Scope 3'
offices_emissions['Scope'] = 'Scope 2'
employee_emissions['Scope'] = 'Scope 3'
product_emissions['Scope'] = 'Scope 3'
rnd_emissions['Scope'] = 'Scope 1'
iot_emissions['Scope'] = 'Scope 3'
electricity_bills_emissions['Scope'] = 'Scope 2'

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  manufacturing_emissions['Scope'] = 'Scope 1'
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  logistics_emissions['Scope'] = 'Scope 3'
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  offices_emissions['Scope'] = 'Scope 2'
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .lo

Combine all dataframes into one by 'Date'

In [None]:
# Ensure the 'Date' column is consistent and formatted similarly across all datasets
final_df = pd.merge(manufacturing_emissions, logistics_emissions, on='Date', how='outer', suffixes=('_manufacturing', '_logistics'))
final_df = pd.merge(final_df, offices_emissions, on='Date', how='outer', suffixes=('', '_offices'))
final_df = pd.merge(final_df, employee_emissions, on='Date', how='outer', suffixes=('', '_employee'))
final_df = pd.merge(final_df, product_emissions, on='Date', how='outer', suffixes=('', '_product'))
final_df = pd.merge(final_df, rnd_emissions, on='Date', how='outer', suffixes=('', '_rnd'))
final_df = pd.merge(final_df, iot_emissions, on='Date', how='outer', suffixes=('', '_iot'))
final_df = pd.merge(final_df, electricity_bills_emissions, on='Date', how='outer', suffixes=('', '_electricity'))

In [None]:
# Define a list of columns to exclude from numeric conversion
exclude_columns = ['Date', 'Product ID', 'Scope'] + [col for col in final_df.columns if 'scope' in col.lower()]

In [None]:
# Convert all columns except those in exclude_columns list to numeric, coercing errors to NaN
final_df = final_df.apply(lambda col: pd.to_numeric(col, errors='coerce') if col.name not in exclude_columns else col)

# Add a new column to aggregate the CO2 emissions across all sources, excluding the 'Date' column
final_df['Total_CO2_Emissions (kg)'] = final_df.drop(columns=exclude_columns).filter(like='CO2 Emissions').sum(axis=1, skipna=True)

In [None]:
# Apply the function to determine the 'Scope' based on emission source in the column name
final_df['Scope'] = final_df.columns.str.extract(r'_(manufacturing|logistics|offices|employee|product|rnd|iot|electricity)')[0].apply(categorize_scope)

In [None]:
# If you want to store this information in a new dataset with only the Total CO2 Emissions
total_emissions_df = final_df[['Date', 'Total_CO2_Emissions (kg)']]

In [None]:
# Save the final dataset to a CSV file
final_df.to_csv("/content/dataset_csv/combined_co2_emissions.csv", index=False)

In [None]:
# Save the final dataset to a CSV file
final_df.to_csv("/content/report/combined_co2_emissions.csv", index=False)

In [None]:
# Save the dataset with only the total emissions to a separate CSV file
total_emissions_df.to_csv("/content/dataset_csv/total_co2_emissions_only.csv", index=False)

In [None]:
# Save the dataset with only the total emissions to a separate CSV file
total_emissions_df.to_csv("/content/report/total_co2_emissions_only.csv", index=False)

In [None]:
# Print first few rows of the final dataset
print(final_df.head())

         Date  CO2 Emissions (kg)_manufacturing Scope_manufacturing  \
0  2025-03-01                               450             Scope 1   
1  2025-03-02                               410             Scope 1   
2  2025-03-03                               490             Scope 1   
3  2025-03-04                               460             Scope 1   
4  2025-03-05                               530             Scope 1   

   CO2 Emissions (kg)_logistics Scope_logistics  CO2 Emissions (kg)    Scope  \
0                         125.0         Scope 3                 320     None   
1                         375.0         Scope 3                 340  Scope 1   
2                         250.0         Scope 3                 348  Scope 1   
3                         300.0         Scope 3                 312  Scope 3   
4                         200.0         Scope 3                 360  Scope 3   

   CO2 Emissions (kg)_employee Scope_employee Product ID  \
0                        10.00  

# Explanation:
1.	**Loading Data:** We assume that the datasets are stored as CSV files. The pd.read_csv() function loads each dataset into a DataFrame.
2.	**Extracting CO2 Emissions:** We select the relevant columns that contain CO2 emissions data. The assumption is that each dataset has a column for CO2 emissions. If the column names differ, you should adjust the column names accordingly.
3.	**categorize_scope() Function:** This function assigns the proper scope based on the emission source. We pass the source name (which you can modify to match your dataset) and categorize it as Scope 1, Scope 2, or Scope 3.
4.	**Adding Scope to the Datasets:** Each dataset is given a new column (Scope), where we specify which scope applies to each emission source.
5.	**Merging Data:** We merge all the datasets on the 'Date' column. If the date column differs between datasets, you can adjust this. We're using an outer join (how='outer') to ensure that we keep all data points, even if a specific date doesn't exist in every dataset.
6.	**Summing Emissions:** After merging, we add a column called Total_CO2_Emissions (kg) that sums all emissions-related columns for each date, providing a total CO2 footprint for that day.
7.	**Saving the Final Dataset:** We then save the combined dataset to a CSV file using to_csv().
8.	**Printing the Data:** The script prints the first few rows of the final dataset to give you a preview of the combined data.
