# **Overview** 📚


This python notebook outlines the methods and steps taken to compile a focused subset from the following dataset: *Cooling_Boiler_Generator_Data_Summary_2023.csv*. This subset highlights temperature efficiency data throughout generator systems. The process includes filtering by system type, column type, and data importation and exportation.


Our main goal is to efficiently extract the data corresponding to utility and generator systems, focusing on variables like temperature output.

# **Getting Started!** ✅

The required files needed to reproduce this process:

*   *Cooling_Boiler_Generator_Data_Summary_2023.csv*

After downloading the file to your device, you can upload it to a Google CoLab notebook by manually uploading it through the file explorer interface. If the file is accessible through Google Drive, run the code below:

(you will be asked to grant Google Drive permission to complete this step)

In [None]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

# Assuming your file is in 'My Drive/data/Cooling_Boiler_Generator_Data_Summary_2023.csv'
file_path = '/content/drive/My Drive/data/Cooling_Boiler_Generator_Data_Summary_2023.csv'

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


# **Step 1: Import Libraries** ✅

We begin this process by importing the appropriate libraries necessary to handle this data efficiently. Run the code below to complete this step:

In [None]:
import pandas as pd #pandas will allow us to read and manipulate a CSV file
import numpy as np #numpy is most commonly used for quantitative analysis

# **Step 2: Load the Dataset** ✅

In order to begin the processing, it is required for us to load the dataset into memory as a DataFrame, which is a two-dimensional data structure (similar to a table).

You can see what this structure looks like after running the code below:

In [None]:
df = pd.read_csv('Cooling_Boiler_Generator_Data_Summary_2023.csv') #read.csv reads the file into a DataFrame
df.head() #head() displays the first few rows of the dataset

  df = pd.read_csv('Cooling_Boiler_Generator_Data_Summary_2023.csv') #read.csv reads the file into a DataFrame


Unnamed: 0,Unnamed: 1.1,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,...,Unnamed: 60,Unnamed: 61,Unnamed: 62,Unnamed: 63,Unnamed: 64,Unnamed: 65,Unnamed: 66,Unnamed: 67,Unnamed: 68,Unnamed: 69
0,,,,,,,,,,,...,,,,,,,,,,
1,\n \n \n \n \n \nUtility ID,State,Plant Code,Plant Name,Year,Month,Generator ID,Boiler ID,Cooling ID,Generator Primary Technology,...,Combined Heat and Power Generator?,Generator Primary Energy Source Code,Generator Prime Mover Code,Generator Duct Burners?,Sector,Steam Plant Type,Number Operable Generators,Number Operable Boilers,Number Operable Cooling Systems,Relationship Type
2,195,AL,3,Barry,2023,1,Many,Many,1-3,Natural Gas Steam Turbine,...,N,NG,ST,,Electric Utility,1,2,2,1,1C MB MG
3,195,AL,3,Barry,2023,2,Many,Many,1-3,Natural Gas Steam Turbine,...,N,NG,ST,,Electric Utility,1,2,2,1,1C MB MG
4,195,AL,3,Barry,2023,3,Many,Many,1-3,Natural Gas Steam Turbine,...,N,NG,ST,,Electric Utility,1,2,2,1,1C MB MG


*Make sure the column names and values are relative to what you expected to see before continuing!*

# **Step 3: Filter for the Relevant System Types** ✅

In this process, we are only interested in looking at **utility** and **generator** systems.

Analyze and run the code below:

In [None]:
# Load the dataframe
df = pd.read_csv('Cooling_Boiler_Generator_Data_Summary_2023.csv')

# 'Utility ID' is in the sixth column and 'Generator ID' is in the eighth
df_subset = df.iloc[:, [6, 8]]  # Select columns by index

# Rename the columns if desired
df_subset.columns = ['Utility ID', 'Generator ID']

print(df_subset.head())  # Print the first few rows of the subset dataframe

     Utility ID Generator ID
0           NaN          NaN
1  Generator ID   Cooling ID
2          Many          1-3
3          Many          1-3
4          Many          1-3


  df = pd.read_csv('Cooling_Boiler_Generator_Data_Summary_2023.csv')


# **Step 4: Exporting the Subset as a New CSV File** ✅

Now, we have our refined data subset. Let's export it into a clean, compiled **CSV** file.

Run the code below:

In [None]:
df_subset.to_csv('compiled_utility_generator_subset.csv', index=False) #index=False prevents pandas from adding an extra column

This code will create a new file called *compiled_utility_generator_subset.csv*, containing only the two relevant columns.

# **Step 6: Download the New CSV File** ✅

To download your file in Colab:


* Click the folder icon in the left sidebar
* Locate compiled_utility_generator_subset.csv
* Click the three dots > Download

You have now successfully compiled a clean subset of the original dataset, isolating only the Utility ID and Generator ID columns for more targeted analysis! 🎊


