# Greenhouse Gas Emissions Prediction - Week 1

## Objective
- Load and explore the Excel dataset containing supply chain emission factors and data quality metrics.
- Combine relevant sheets into a single dataset.
- Preprocess the data by selecting the required features and cleaning missing values.


In [None]:
# Importing libraries
import pandas as pd

# Load the Excel file
excel_path = "Data_Set.xlsx"
excel_file = pd.ExcelFile(excel_path)

# Display all sheet names
excel_file.sheet_names

## Selecting Sheets
We will select all sheets that end with `_Summary_Commodity`.


In [None]:
# Filter sheet names
commodity_sheets = [sheet for sheet in excel_file.sheet_names if sheet.endswith('_Summary_Commodity')]
commodity_sheets

### Loading and Combining Data
Load data from the selected sheets and concatenate into a single DataFrame.

In [None]:
# Define feature columns and target
feature_cols = [
    'Supply Chain Emission Factors without Margins',
    'Margins of Supply Chain Emission Factors',
    'DQ ReliabilityScore of Factors without Margins',
    'DQ TemporalCorrelation of Factors without Margins',
    'DQ GeographicalCorrelation of Factors without Margins',
    'DQ TechnologicalCorrelation of Factors without Margins',
    'DQ DataCollection of Factors without Margins'
]
target_col = 'Supply Chain Emission Factors with Margins'

# Load and clean
dataframes = []
for sheet in commodity_sheets:
    df = pd.read_excel(excel_path, sheet_name=sheet)
    if all(col in df.columns for col in feature_cols + [target_col]):
        df = df[feature_cols + [target_col]].dropna()
        dataframes.append(df)

# Combine all sheets
combined_df = pd.concat(dataframes, ignore_index=True)
combined_df.head()

### Summary
The dataset has been successfully loaded and preprocessed for further analysis.
