#### Objective
To transform newly detected recycler data into the standard CPCB reporting format with correct column structure and serial numbering.

#### Brief Description
This script reads raw recycler data and maps selected columns into a CPCB-compliant format using a predefined column mapping.

* Column names are renamed to match CPCB standards.
* A serial number column is automatically generated.
* Only relevant fields such as industry name, address, district, product, capacity, and contact details are retained.
* The transformed dataset is saved as a new Excel file, ready for submission or integration with CPCB records.

In [98]:
import pandas as pd

# Define column mapping
column_mapping = {
    "Name of Industry": "IndustryName",
    "Address": "Address",
    "District": "District",
    "RO": "RO OFFICER",
    "Product": "Product UOM",
    "Capacity (Quantity)": "Total",
    "Unit of Measurement": "Uom name",
    "Contact Person Name": "Name",
    "Phone Number": "TelNo",
    "E-mail": "Email"
}

# Load the Excel file (modify 'input.xlsx' with your actual file path)
input_file = "C:/Users/Atique/Rutuja_Mam_Coding/Recycler_Directory/Mar17/New_Detected_Combined.xlsx"
df = pd.read_excel(input_file, sheet_name='SpentSolvent')

# Create a new DataFrame with the required format
new_df = pd.DataFrame()

# Add Sr. No. column
new_df["Sr. No."] = range(1, len(df) + 1)

# Rename columns based on mapping
for new_col, old_col in column_mapping.items():
    if old_col in df.columns:
        new_df[new_col] = df[old_col]

# Retain the 'sheet' column if it exists
if "Sheet" in df.columns:
    new_df["Sheet"] = df["Sheet"]

# Save the transformed data to a new Excel file
output_file = "MPCB_Format_SpentSolvent.xlsx"
new_df.to_excel(output_file, index=False)

print(f"Data has been transformed and saved to {output_file}")


Data has been transformed and saved to MPCB_Format_SpentSolvent.xlsx
