# Project 6: Power Generators in Wisconsin

## Your Information

At the start of each assignment, you will need to provide us your name and the name of the partner you worked with for this assignment (if you had one). Double click on the cell below or click once and hit enter to edit it. Replace "First Last" with your first name and last name. Replace "None" with the first and last name of your partner if you had one for this assignment. We ask for this information so we don't accuse you of cheating when your code looks like your partner's.

Please keep these lines commented so they don't cause an error.

In [None]:
# MY NAME: Hyokyung Kim
# MY PARTNER's NAME: None

## Imports

Every project will begin with some import statements. It's crucial that you run the cell below, otherwise we will not be able to grade your code and provide feedback to you.

In [1]:
# it is considered a good coding practice to place all import statements at the top of the notebook

import os
import csv
import student_grader
student_grader.initialize(os.getcwd(), "p6")

## Learning Objectives:

In this assignment, you will demonstrate how to:

* access and utilize data in CSV files
* process real world datasets
* use string methods and sorting function / method to order data
* use sets to determine the unique values in a list

# Lab portion (32 questions)

## Segment 1: Loading Data from CSVs

### About the dataset

Open `power_generators.csv` by clicking on the file from the file tab in Jupyter Lab. Or do this with Microsoft Excel (or some other Spreadsheet viewing software) and have a look at the data. The first few rows of the dataset are reproduced here:

entity_id|entity_name|plant_id|plant_name|generator_id|county|net_summer_capacity|net_winter_capacity|technology|latitude|longitude
---|---|---|---|---|---|---|---|---|---|---
13781|Northern States Power Co - Minnesota|1756|Saxon Falls|1|Iron|0.5|0.5|Conventional Hydroelectric|46.5392|-90.3742
13781|Northern States Power Co - Minnesota|1756|Saxon Falls|2|Iron|0.5|0.6|Conventional Hydroelectric|46.5392|-90.3742
20847|Wisconsin Electric Power Co|1775|Brule|1|Florence|1.3|1.3|Conventional Hydroelectric|45.9472|-88.2189
20847|Wisconsin Electric Power Co|1775|Brule|2|Florence|2|2|Conventional Hydroelectric|45.9472|-88.2189
20847|Wisconsin Electric Power Co|1775|Brule|3|Florence|2|2|Conventional Hydroelectric|45.9472|-88.2189

**WARNING**: If you open `power_generators.csv` using a Spreadsheet Viewer, you need to be careful **not** to modify the dataset in any way. Leading zeroes are *intentionally* a part of some plant and generator ids. In particular, make sure you **do not save** the file before you close the file.

The `power_generators.csv` file has data about every Power Generator in operation within the state of Wisconsin, as of February 2024. Each row of data represents a **single** generator within the state of Wisconsin. The columns contain the following data about each generator (along with the correct data type you **must** represent it as)::

1. `entity_id` - the **ID** of the **entity** that operates the Power Generator (`int`)
2. `entity_name` - the **name** of the **entity** that operates the Power Generator (`str`)
3. `plant_id` - the **ID** of the **Power Plant** hosting the Power Generator (`int`)
4. `plant_name` - the **name** of the **Power Plant** hosting the Power Generator (`str`)
5. `generator_id` - the **ID** of the specific **Power Generator** within its Power Plant (`str`)
6. `county` - the **name** of the **county** that the **Power Plant** is located in (`str`)
7. `net_summer_capacity` - the maximum **capacity** of the **Power Generator** (in units of MW) during the Summer months (`float`)
8. `net_winter_capacity` - the maximum **capacity** of the **Power Generator** (in units of MW) during the Winter months (`float`)
9. `technology` - the **technology** used by the **Power Generator** (`str`)
10. `latitude` - the **latitude** where the **Power Plant** is located (`float`)
11. `longitude` - the **longitude** where the **Power Plant** is located (`float`)

**Warning**: Keep in mind while writing your project, some entries may be **missing data** for specific columns. Sadly, data in real life is often messy. **In P6, we will have to deal with missing data.**

### Task 1.1: Processing the CSV file

You will now read this dataset with Python. [Chapter 14](https://automatetheboringstuff.com/chapter14/) of Automate the Boring Stuff introduces CSV files and provides a code snippet you can reuse. You can use the same code snippet for P6. Run the next few cells and see their outputs.

#### Lab Function 1: `process_csv(filename)`

In [2]:
# modified from https://automatetheboringstuff.com/chapter14/

def process_csv(filename):
    example_file = open(filename, encoding="utf-8")
    example_reader = csv.reader(example_file)
    example_data = list(example_reader)
    example_file.close()
    return example_data

In [3]:
# this call to process_csv reads the data in "power_generators.csv"
csv_data = process_csv("power_generators.csv")

# this will display the first three items in the list `csv_data`
csv_data[:3]

[['entity_id',
  'entity_name',
  'plant_id',
  'plant_name',
  'generator_id',
  'county',
  'net_summer_capacity',
  'net_winter_capacity',
  'technology',
  'latitude',
  'longitude'],
 ['13781',
  'Northern States Power Co - Minnesota',
  '1756',
  'Saxon Falls',
  '1',
  'Iron',
  '0.5',
  '0.5',
  'Conventional Hydroelectric',
  '46.5392',
  '-90.3742'],
 ['13781',
  'Northern States Power Co - Minnesota',
  '1756',
  'Saxon Falls',
  '2',
  'Iron',
  '0.5',
  '0.6',
  'Conventional Hydroelectric',
  '46.5392',
  '-90.3742']]

The variable `csv_data` stores the contents of the file `power_generators.csv` as a **list of lists** (i.e., `csv_data` is a **list**, and the elements of this list are **lists** themselves). In the next subsection, you will learn to access data stored within this data structure.

### Task 1.2: Accessing the contents of the dataset

You will now index the data to extract the correct answers for the questions listed below. Some have been done for you. To understand the results better, locate the values in the `power_generators.csv` file.

#### Lab question 1

What are the **names** of the **columns** in the dataset?

**Hint:** Take a look at the output of the cell above and see where the column names are stored. Use **indexing** to extract the csv header from the `csv_data` variable.

**Note:** Index starts at 0

Points possible: 3

In [4]:
# replace the ... with your code
csv_header = csv_data[0]


csv_header

['entity_id',
 'entity_name',
 'plant_id',
 'plant_name',
 'generator_id',
 'county',
 'net_summer_capacity',
 'net_winter_capacity',
 'technology',
 'latitude',
 'longitude']

In [5]:
student_grader.check("lab-q1")

Make sure you saved the notebook before running this cell. Running check for lab-q1...
Great job! You passed all test cases for this question.


#### Lab question 2

How many **rows** are in the dataset (excluding the **header**)?

In this assignment, DO NOT attempt to display `csv_rows` (i.e., do not print out the variable or add the variable name to the end of the cell)
`csv_rows` has over 600 lists, and will take up unnecessary space

Points possible: 3

In [12]:
# We have done this one for you as an example, do not modify
csv_rows = csv_data[1:] 
num_rows = len(csv_rows)

num_rows

626

In [13]:
student_grader.check("lab-q2")

Make sure you saved the notebook before running this cell. Running check for lab-q2...
Great job! You passed all test cases for this question.


#### Lab question 3

What are the **first** *ten* rows in the dataset?

Points possible: 3

In [14]:
# We have done this one for you as an example, do not modify
first_ten_rows = csv_rows[:10]

first_ten_rows

[['13781',
  'Northern States Power Co - Minnesota',
  '1756',
  'Saxon Falls',
  '1',
  'Iron',
  '0.5',
  '0.5',
  'Conventional Hydroelectric',
  '46.5392',
  '-90.3742'],
 ['13781',
  'Northern States Power Co - Minnesota',
  '1756',
  'Saxon Falls',
  '2',
  'Iron',
  '0.5',
  '0.6',
  'Conventional Hydroelectric',
  '46.5392',
  '-90.3742'],
 ['20847',
  'Wisconsin Electric Power Co',
  '1775',
  'Brule',
  '1',
  'Florence',
  '1.3',
  '1.3',
  'Conventional Hydroelectric',
  '45.9472',
  '-88.2189'],
 ['20847',
  'Wisconsin Electric Power Co',
  '1775',
  'Brule',
  '2',
  'Florence',
  '2',
  '2',
  'Conventional Hydroelectric',
  '45.9472',
  '-88.2189'],
 ['20847',
  'Wisconsin Electric Power Co',
  '1775',
  'Brule',
  '3',
  'Florence',
  '2',
  '2',
  'Conventional Hydroelectric',
  '45.9472',
  '-88.2189'],
 ['4247',
  'Consolidated Water Power Co',
  '3971',
  'Biron',
  '1',
  'Wood',
  '1.3',
  '1.3',
  'Conventional Hydroelectric',
  '44.4306',
  '-89.7808'],
 ['4247

In [15]:
student_grader.check("lab-q3")

Make sure you saved the notebook before running this cell. Running check for lab-q3...
Great job! You passed all test cases for this question.


#### Lab question 4

What are the **last** *ten* rows in the dataset?

Points possible: 3

In [16]:
# replace the ... with your code

last_ten_rows = csv_data[-10:]

last_ten_rows

[['20860',
  'Wisconsin Public Service Corp',
  '66059',
  'Weston RICE',
  'W13',
  'Marathon',
  '18.8',
  '18.8',
  'Natural Gas Internal Combustion Engine',
  '44.856372',
  '-89.65402'],
 ['20860',
  'Wisconsin Public Service Corp',
  '66059',
  'Weston RICE',
  'W14',
  'Marathon',
  '18.8',
  '18.8',
  'Natural Gas Internal Combustion Engine',
  '44.856372',
  '-89.65402'],
 ['20860',
  'Wisconsin Public Service Corp',
  '66059',
  'Weston RICE',
  'W15',
  'Marathon',
  '18.8',
  '18.8',
  'Natural Gas Internal Combustion Engine',
  '44.856372',
  '-89.65402'],
 ['20860',
  'Wisconsin Public Service Corp',
  '66059',
  'Weston RICE',
  'W16',
  'Marathon',
  '18.8',
  '18.8',
  'Natural Gas Internal Combustion Engine',
  '44.856372',
  '-89.65402'],
 ['20860',
  'Wisconsin Public Service Corp',
  '66059',
  'Weston RICE',
  'W17',
  'Marathon',
  '18.8',
  '18.8',
  'Natural Gas Internal Combustion Engine',
  '44.856372',
  '-89.65402'],
 ['65501',
  'Dane County Solar LLC',
  

In [17]:
student_grader.check("lab-q4")

Make sure you saved the notebook before running this cell. Running check for lab-q4...
Great job! You passed all test cases for this question.


#### Advice for reading large files

In general, when you want to confirm that you are reading a large file correctly, it is a good idea to check that you have the correct number of rows, and that the first and last few rows are correct. Here, you were given access to the grader check, which knows the correct answers, so it was easy for you to check. Otherwise, you would have to manually open `power_generators.csv` and confirm that you have not made any mistakes. It is recommended that you manually open `power_generators.csv` in any case to verify that the data matches your answers for the previous three questions.

#### Lab question 5

What values are present in the *first* **row** of the dataset?

**Hint**: You already know how to extract a single element from a list. That is all you need to do here.

Points possible: 3

In [18]:
# replace the ... with your code using the 'csv_rows' variable defined before
first_row = csv_rows[0]

first_row

['13781',
 'Northern States Power Co - Minnesota',
 '1756',
 'Saxon Falls',
 '1',
 'Iron',
 '0.5',
 '0.5',
 'Conventional Hydroelectric',
 '46.5392',
 '-90.3742']

In [19]:
student_grader.check("lab-q5")

Make sure you saved the notebook before running this cell. Running check for lab-q5...
Great job! You passed all test cases for this question.


#### Lab question 6

What is the `entity_name` of the *first* power generator?

**Hint:** The **column index** for the `entity_name` column is `1`. You may **hardcode** the **column index** as `1` **just for this question**.

In the last question, you extracted a single **row** from the file `power_generators.csv`. You will now extract data from a single **cell** of the file.

To extract data from a single cell of the csv file, we need two things:
  1. row index
  2. column index
    
You already know to extract a row of data with `csv_rows[row_idx]`. Given this list, can you now extract the data in a particular cell using the **column index**?

Points possible: 2

In [20]:
# replace the ... with your code
first_entity_name = csv_rows[1][1]

first_entity_name

'Northern States Power Co - Minnesota'

In [21]:
student_grader.check("lab-q6")

Make sure you saved the notebook before running this cell. Running check for lab-q6...
Great job! You passed all test cases for this question.


#### Lab question 7

What is the **index** of the column `technology`?

You solved the previous question by **hardcoding** the **column index**, when you were just given the **column name**. This is however a **bad practice**, and you **must not** do it in your project. It would be much safer to somehow **extract** the **column index** **from** the **column name**, and then use the **column index**. The following (built-in) list method helps us with that:

**Syntax:** `list.index(value)`

This function will return the index of the item `value` in the `list`. You can see this function in action in the question below.

Points possible: 2

In [22]:
# We have done this one for you as an example, do not modify
technology_index = csv_header.index('technology')
technology_index

8

In [23]:
student_grader.check("lab-q7")

Make sure you saved the notebook before running this cell. Running check for lab-q7...
Great job! You passed all test cases for this question.


#### Lab question 8

What is the `technology` of the *first* generator in the dataset (first row, i.e. `csv_rows[0]`)?

Remember that you already computed the `technology_index` variable in the previous question.

Points possible: 2

In [24]:
# replace the ... with your code
technology_first_row = csv_rows[0][technology_index]

technology_first_row

'Conventional Hydroelectric'

In [25]:
student_grader.check("lab-q8")

Make sure you saved the notebook before running this cell. Running check for lab-q8...
Great job! You passed all test cases for this question.


### Task 1.3: Build a helper function for quick data access

#### Lab Function 2: `cell_temp(row_idx, col_name)`

It is quite cumbersome to extract data from `power_generators.csv` by indexing `csv_rows` and using the `index` method each time. To save yourself some time and effort, fill in the details of the following helper function.

Note that missing data in this dataset is represented by the string with just a single space. That isn't always the case, but it is in this dataset. This function returns `None` when it encounters missing data.

Later in this notebook, we're going to write a better version of this function called just `cell`, so we're calling this one `cell_temp` where `temp` means temporary.

Points possible: 3

In [28]:
# Replace each ... with your code

def cell_temp(row_idx, col_name):
    col_idx = csv_header.index(col_name)
    val = csv_rows[row_idx][col_idx]
    
    # DO NOT EDIT the lines below
    if val == " ":
        return None
    return val

# test your function yourself by uncommenting the function below and experimenting with different parameters 
# cell_temp(0, 'entity_name')

In [29]:
student_grader.check("lab-cell_temp")

Make sure you saved the notebook before running this cell. Running check for lab-cell_temp...
Great job! You passed all test cases for this question.


#### Lab question 9

What is the `technology` used by the *first* power generator?

You **must** answer this question by calling the `cell_temp` function.

Points possible: 3

In [30]:
# replace the ... with your code
technology_first_row_l9 =cell_temp(0, 'technology')

technology_first_row_l9

'Conventional Hydroelectric'

In [31]:
student_grader.check("lab-q9")

Make sure you saved the notebook before running this cell. Running check for lab-q9...
Great job! You passed all test cases for this question.


#### Lab question 10

What is the `plant_name` of the *second* power generator?

You **must** answer this question by calling the `cell_temp` function.

Points possible: 3

In [32]:
# replace the ... with your code

plant_name_second_row = cell_temp(1, 'plant_name')

plant_name_second_row

'Saxon Falls'

In [33]:
student_grader.check("lab-q10")

Make sure you saved the notebook before running this cell. Running check for lab-q10...
Great job! You passed all test cases for this question.


#### Lab question 11

What is the `latitude` of the *third* power generator?

You **must** answer this question by calling the `cell_temp` function.

Points possible: 3

In [34]:
# replace the ... with your code
latitude_third_row = cell_temp(2, 'latitude')

latitude_third_row

'45.9472'

In [35]:
student_grader.check("lab-q11")

Make sure you saved the notebook before running this cell. Running check for lab-q11...
Great job! You passed all test cases for this question.


#### Lab question 12

What is the `generator_id` of the *tenth* power generator?

You **must** answer this question by calling the `cell_temp` function.

Points possible: 3

In [36]:
# replace the ... with your code

generator_id_tenth_row = cell_temp(9, 'generator_id')

generator_id_tenth_row

'5'

In [37]:
student_grader.check("lab-q12")

Make sure you saved the notebook before running this cell. Running check for lab-q12...
Great job! You passed all test cases for this question.


#### Lab question 13

How **many** power generators are in the `county` *Iron*?

You **must** use `cell_temp` to extract data from the csv file.

**Hint:** You must loop through the entire dataset. Use `cell_temp` to extract the `county` of each generator.

Points possible: 3

In [38]:
# replace the ... with your code
# use `cell_temp` to determine if the power generator
# at `i` is from the correct county

iron_generators = 0
for i in range(num_rows):
    if cell_temp(i, 'county') == 'Iron':
        iron_generators += 1

iron_generators

2

In [39]:
student_grader.check("lab-q13")

Make sure you saved the notebook before running this cell. Running check for lab-q13...
Great job! You passed all test cases for this question.


#### Lab question 14

List the names (`plant_name`) of all the power plants operated by the entity with the `entity_name` *Butter Solar, LLC*.

Your output **must** be a *list*. You **must** use `cell_temp` to extract any data from the csv file.

**Hint:** Loop through the entire dataset and use `cell_temp` to determine if each generator is operated by the required `entity_name`. Use `cell_temp` once again to extract the `plant_name` of each such generator, and use the `<list>.append` method to add the `plant_name` to your list.

Remember that you can uncomment lines of code by holding command (Mac) or control (Windows) and hitting `/`.

Points possible: 3

In [40]:
# Initialize an empty list to store plant names
butter_solar_plants = []

# Loop through all rows in the dataset
for i in range(num_rows):
    # Check if the entity_name at row i matches "Butter Solar, LLC"
    if cell_temp(i, 'entity_name') == 'Butter Solar, LLC':
        # Append the plant_name to the list if the entity matches
        butter_solar_plants.append(cell_temp(i, 'plant_name'))

# Output the list of plant names
butter_solar_plants

['Arcadia Solar',
 'Fennimore Solar',
 'New Lisbon Solar',
 'Cumberland Solar',
 'Cashton Solar',
 'Elroy Solar']

In [41]:
student_grader.check("lab-q14")

Make sure you saved the notebook before running this cell. Running check for lab-q14...
Great job! You passed all test cases for this question.


#### Lab Function 3: `cell(row_idx, col_name)`

Our helper function `cell_temp` could use some improvement. As you have seen, the function currently returns columns such as `entity_id` as **strings**, even though it would make more sense to represent them as **ints**. Let us ensure that the function returns the required type on its own. 

We will define a new function `cell` and test its implementation.


- Recall that the **correct** datatypes for each of the columns are as follows:
    1. `entity_id` - **`int`**
    2. `entity_name` - **`str`**
    3. `plant_id` - **`int`**
    4. `plant_name` - **`str`**
    5. `generator_id` - **`str`** (note that this is an id but it's a string!)
    6. `county` - **`str`**
    7. `net_summer_capacity` - **`float`**
    8. `net_winter_capacity` - **`float`**
    9. `technology` - **`str`**
    10. `latitude` - **`float`**
    11. `longitude` - **`float`**


An `if` condition will become very long if you keep using `or` to separate each column comparison operation. Thus, it is easier to make a list of all the column names whose values require `int` conversion (for example) and use `in` operator to check if the column name is in that list.


```python
if col_name in [..., ..., ...]
```

Points possible: 0

In [42]:
def cell(row_idx, col_name):
    # Get the column index from the header
    col_idx = csv_header.index(col_name)
    
    # Extract the value from the specific row and column
    val = csv_rows[row_idx][col_idx]
    
    # Return None if the value is a single space (missing data)
    if val == " ":
        return None
    # Convert specific columns to int
    elif col_name in ['entity_id', 'plant_id']:
        val = int(val)
    # Convert specific columns to float
    elif col_name in ['net_summer_capacity', 'net_winter_capacity', 'latitude', 'longitude']:
        val = float(val)
    # Leave other columns (like strings) as they are
    return val

# Test your function by experimenting with different parameters
# Example: cell(0, 'latitude')

In [43]:
student_grader.check("lab-cell")

Make sure you saved the notebook before running this cell. Running check for lab-cell...
Great job! You passed all test cases for this question.


#### Lab question 15

What is the `entity_id` of the *last* power generator?

Your output **must** be an `int`. You **must** call the `cell` function to answer this question.

Points possible: 3

In [44]:
# we have done this for you
entity_id_last_row = cell(-1, 'entity_id')

entity_id_last_row

60025

In [45]:
student_grader.check("lab-q15")

Make sure you saved the notebook before running this cell. Running check for lab-q15...
Great job! You passed all test cases for this question.


#### Lab question 16

What is the `plant_id` of the *last* power generator?

Points possible: 3

In [46]:
# Extract the plant_id of the last power generator using the cell function
plant_id_last_row = cell(num_rows - 1, 'plant_id')

plant_id_last_row

66987

In [47]:
student_grader.check("lab-q16")

Make sure you saved the notebook before running this cell. Running check for lab-q16...
Great job! You passed all test cases for this question.


#### Lab question 17

What is the `net_summer_capacity` of the *fifth* power generator?

Points possible: 3

In [48]:
# Extract the net_summer_capacity of the fifth power generator using the cell function
net_summer_capacity_fifth_row = cell(4, 'net_summer_capacity')

net_summer_capacity_fifth_row

2.0

In [49]:
student_grader.check("lab-q17")

Make sure you saved the notebook before running this cell. Running check for lab-q17...
Great job! You passed all test cases for this question.


#### Lab question 18

What is the `latitude` of the *hundredth* power generator?

Points possible: 3

In [50]:
# we have done this one for you
latitude_hundredth_row = cell(99, 'latitude')

latitude_hundredth_row

45.41167

In [51]:
student_grader.check("lab-q18")

Make sure you saved the notebook before running this cell. Running check for lab-q18...
Great job! You passed all test cases for this question.


#### Lab question 19

What is the **list** of `net_winter_capacity` of the generators from the `county` *Sheboygan*?

Points possible: 3

In [52]:
# Initialize an empty list to store the net_winter_capacity values
sheboygan_winter_capacities = []

# Loop through all rows in the dataset
for i in range(num_rows):
    # Check if the county at row i is 'Sheboygan'
    if cell(i, 'county') == 'Sheboygan':
        # Append the net_winter_capacity to the list
        sheboygan_winter_capacities.append(cell(i, 'net_winter_capacity'))

# Output the list of net_winter_capacity for Sheboygan county
sheboygan_winter_capacities

[409.3, 172.6, 169.5, 81.8, 1.0, 2.3]

In [53]:
student_grader.check("lab-q19")

Make sure you saved the notebook before running this cell. Running check for lab-q19...
Great job! You passed all test cases for this question.


## Segment 3: Sorting Data

There are two major ways to sort lists in Python: (1) with the `sorted` function and (2) with the `.sort` method. For each method, let's examine (a) how it modifies existing structures, and (b) what new values it returns, if any.

The default sorting order is ascending. You can change that to descending, by passing the keyword argument `reverse = True`. The same parameter / argument pair is applicable for both `.sort` method and `sorted` function.

### Task 3.1: Sort lists using `sorted()`

#### Lab question 20

What does the function call `sorted(sheboygan_winter_capacities)` do? Does the original list passed to `sorted` change?

Points possible: 1

In [58]:
# Observe the output here

sorted_sheboygan_winter_capacities = sorted(sheboygan_winter_capacities)

print("Returned value:", sorted_sheboygan_winter_capacities)
print("Original list after sorting:", sheboygan_winter_capacities)


Returned value: [1.0, 2.3, 81.8, 169.5, 172.6, 409.3]
Original list after sorting: [409.3, 172.6, 169.5, 81.8, 1.0, 2.3]


In [59]:
student_grader.check("lab-q20")

Make sure you saved the notebook before running this cell. Running check for lab-q20...
Returned value: [1.0, 2.3, 81.8, 169.5, 172.6, 409.3]
Original list after sorting: [409.3, 172.6, 169.5, 81.8, 1.0, 2.3]
Great job! You passed all test cases for this question.


### Task 3.2: Sort lists using `.sort()`

#### Lab question 21

What does the method call `sheboygan_winter_capacities.sort()` do? Does the original list (the one whose `.sort` method is called) change? How is this different from when you used `sorted` in the question above?

Points possible: 1

In [60]:
# Observe the output here
result = sheboygan_winter_capacities.sort()

print("Returned value:", result)
print("Original list after sorting:", sheboygan_winter_capacities)

Returned value: None
Original list after sorting: [1.0, 2.3, 81.8, 169.5, 172.6, 409.3]


In [61]:
student_grader.check("lab-q21")

Make sure you saved the notebook before running this cell. Running check for lab-q21...
Returned value: None
Original list after sorting: [1.0, 2.3, 81.8, 169.5, 172.6, 409.3]
Great job! You passed all test cases for this question.


#### Sorting in reverse order

In the questions above, we saw that `sorted` will return a sorted version of the list passed to the function without modifying the original copy of that list. On the other hand, the `.sort` method of a list will modify the list calling that method, and it has no return value.

Now run the below code cells. Can you explain the output? Which one will modify the original sheboygan_winter_capacities list?

In [62]:
# sort in descending order
reverse_sorted_sheboygan_winter_capacities = sorted(sheboygan_winter_capacities, reverse=True) 
reverse_sorted_sheboygan_winter_capacities 

[409.3, 172.6, 169.5, 81.8, 2.3, 1.0]

In [63]:
# sort in descending order
sheboygan_winter_capacities.sort(reverse=True)

sheboygan_winter_capacities

[409.3, 172.6, 169.5, 81.8, 2.3, 1.0]

### Task 3.3: Sorting to find the median

#### Lab Function 4: `median(items)`

Now, let's try using sorting to solve a common problem - that of finding the median of a given distribution of values. Recall that the median is the **middle number** in a sorted (ascending or descending) list of numbers.   
  
In a sorted list, if the list has an **odd** number of elements, the median is the middle number:

For example, for the list `[10, 20, 30, 40, 50]` --> median is `30`. Note that the middle index is `2` for an array of length `5`. Likewise, the middle index is `4` for an array of length `9`. That's calculated as `9 // 2` where `//` is floor division.

If a sorted list has an **even** number of elements, the median is the **average** of the **two middle numbers**:

For example, for the list `[10, 20, 30, 40]` --> median is `25.0`. Note that the median is a float here because you're dividing by 2. For an array of length `4`, the first middle is at index `1` (`4 // 2 - 1`) while the second middle is at index 2 (`4 // 2`).

**Note:** The function **must not** change the original list's order. Think about whether you should use the `.sort` method or the `sorted` function here.

Points possible: 4

In [64]:
def median(items):
    # Sort the list without changing the original order
    sorted_list = sorted(items)
    
    # Determine the length of the list
    list_len = len(sorted_list)
    
    # Check if the length is odd
    if list_len % 2 != 0:
        # Return the middle element if the list length is odd
        return sorted_list[list_len // 2]
    
    else:
        # For even length, find the two middle elements
        first_middle = sorted_list[(list_len // 2) - 1]
        second_middle = sorted_list[list_len // 2]
        
        # Return the average of the two middle elements
        return (first_middle + second_middle) / 2

# Test your function by experimenting with different parameters
# Example: 
print(median([1, 2, 3, 4]))  # Should return 2.5
print(median([10, 20, 30, 40, 50]))  # Should return 30

2.5
30


In [65]:
student_grader.check("lab-median")

Make sure you saved the notebook before running this cell. Running check for lab-median...
2.5
30
Great job! You passed all test cases for this question.


#### Lab question 22

What is the median of the list `list1 = [5, 3, 1, 2, 4]`?

Points possible: 3

In [66]:
list1 = [5, 3, 1, 2, 4]

# Call the median function with list1
median1 = median(list1)

median1


3

In [67]:
student_grader.check("lab-q22")

Make sure you saved the notebook before running this cell. Running check for lab-q22...
Great job! You passed all test cases for this question.


#### Lab question 23

What is the median of the `list2 = [5, 3, 1, 2, 4, 6]`?

Points possible: 3

In [68]:
list2 = [5, 3, 1, 2, 4, 6]

# Call the median function with list2
median2 = median(list2)

median2

3.5

In [69]:
student_grader.check("lab-q23")

Make sure you saved the notebook before running this cell. Running check for lab-q23...
Great job! You passed all test cases for this question.


#### Lab question 24

What is the **median** `latitude` of all power generators in the dataset?

**Hint:** First create a *list* of the `latitude`(s) of all the power generators in the dataset, and then use the `median` function to find the **median** of that list.

Points possible: 3

In [70]:
# Step 1: Create a list of latitudes for all the power generators
latitudes = []

# Loop through all rows in the dataset and extract the latitude using the cell function
for i in range(num_rows):
    latitudes.append(cell(i, 'latitude'))

# Step 2: Use the median function to find the median latitude
median_latitude = median(latitudes)

median_latitude


44.3364

In [71]:
student_grader.check("lab-q24")

Make sure you saved the notebook before running this cell. Running check for lab-q24...
Great job! You passed all test cases for this question.


#### Lab question 25

What is the **median** `net_summer_capacity` of all generators powered by the `technology` *Nuclear*?

**Hint:** First create a *list* of the `net_summer_capacity` of all the power generators in the dataset with the required `technology`, and then use the `median` function to find the **median** of that list.

Points possible: 3

In [72]:
# Step 1: Create a list of net_summer_capacity for all generators with technology "Nuclear"
nuclear_summer_capacities = []

# Loop through all rows in the dataset
for i in range(num_rows):
    # Check if the technology of the generator is "Nuclear"
    if cell(i, 'technology') == 'Nuclear':
        # Append the net_summer_capacity of the generator to the list
        nuclear_summer_capacities.append(cell(i, 'net_summer_capacity'))

# Step 2: Use the median function to find the median of the nuclear summer capacities
median_nuclear_summer_capacities = median(nuclear_summer_capacities)

median_nuclear_summer_capacities

598.0

In [73]:
student_grader.check("lab-q25")

Make sure you saved the notebook before running this cell. Running check for lab-q25...
Great job! You passed all test cases for this question.


## Segment 4: Missing Data

So far, we have carefully avoided having to deal with missing data in the dataset. We have defined our `cell` function to identify missing data in the dataset, and to return `None` every time the data we request is missing. This will make it a little easier for us to identify missing data while we work with the dataset.

#### Lab question 26

What is the **list** of net summer capacities (`net_summer_capacity`) of all generators from the `county` *Columbia*?

Points possible: 3

In [74]:
# Initialize an empty list to store the net_summer_capacity values
columbia_summer_capacities = []

# Loop through all rows in the dataset
for i in range(num_rows):
    # Check if the county at row i is 'Columbia'
    if cell(i, 'county') == 'Columbia':
        # Append the net_summer_capacity to the list
        columbia_summer_capacities.append(cell(i, 'net_summer_capacity'))

# Output the list of net_summer_capacity for Columbia county
columbia_summer_capacities

[9.5, None, None, None, 579.3, 568.8, 162.0, 5.0]

In [75]:
student_grader.check("lab-q26")

Make sure you saved the notebook before running this cell. Running check for lab-q26...
Great job! You passed all test cases for this question.


### Task 4.1: Ignoring missing data

As you can see in the list above, some of the data for power generators in *Columbia* was **missing** in the dataset. Now, if we wanted to find the **median** `net_summer_capacity` of the generators in the `county` of *Columbia*, it would not make any sense to include the data from these generators with missing data. Therefore, we will have to **ignore** all the **missing data** while computing the median.

#### Lab question 27

What is the **median** `net_summer_capacity` of all generators from the `county` *Columbia*?

You **must** ignore any generators for which the `net_summer_capacity` data is **missing**.

Points possible: 3

In [76]:
columbia_summer_capacities_valid = []

# Loop through all rows in the dataset
for i in range(num_rows):
    # Skip rows with missing net_summer_capacity
    net_summer_capacity = cell(i, 'net_summer_capacity')
    if net_summer_capacity is None:
        continue
    
    # Check if the generator is from Columbia county
    if cell(i, 'county') == 'Columbia':
        # Append the valid net_summer_capacity to the list
        columbia_summer_capacities_valid.append(net_summer_capacity)

# Calculate the median of the valid net_summer_capacity values
median_columbia_summer_capacities = median(columbia_summer_capacities_valid)

median_columbia_summer_capacities

162.0

In [77]:
student_grader.check("lab-q27")

Make sure you saved the notebook before running this cell. Running check for lab-q27...
Great job! You passed all test cases for this question.


## Segment 5: Sets

In class, we learned about the Python `list` sequence. Another simpler structure you'll sometimes find useful is the `set`. A set is **not** a sequence because it does not keep all the values in any particular order.

### Task 5.1: Create a set

You can create sets the same way as lists, just **replacing** the *square brackets*(`[]`) with *curly braces*(`{}`). In the cell below, create a set with the same elements as the example list provided.

#### Lab question 28

Create a set that has the same content as `example_list` by adding each item in `example_list` into the curly braces.

Points possible: 3

In [78]:
example_list = ["Saxon Falls", "Brule", "Biron"]
print(example_list)

# Creating a set with the same items as example_list
example_set = {"Saxon Falls", "Brule", "Biron"}

example_set


['Saxon Falls', 'Brule', 'Biron']


{'Biron', 'Brule', 'Saxon Falls'}

In [79]:
student_grader.check("lab-q28")

Make sure you saved the notebook before running this cell. Running check for lab-q28...
['Saxon Falls', 'Brule', 'Biron']
Great job! You passed all test cases for this question.


### Task 5.2: Check if an element is present in a list or set

The `in` operator is used to check if an element is present in a list or set. Try it below:

In [None]:
"Biron" in example_list

#### Lab question 29

Check if `plant_name` *Saxon Falls* is **present in** the set `example_set`.

Points possible: 4

In [80]:
# Check if "Saxon Falls" is present in the set example_set
saxon_falls_check = "Saxon Falls" in example_set

saxon_falls_check


True

In [81]:
student_grader.check("lab-q29")

Make sure you saved the notebook before running this cell. Running check for lab-q29...
Great job! You passed all test cases for this question.


### Task 5.3: Check the ordering of elements in a list or set

Sets have no inherent ordering, so they don't support indexing.

![Sets do not support indexing](./images/index_list_versus_sets.png)

Unlike lists, the order does not matter to sets for comparisons. Try evaluating this boolean expression:

In [82]:
["Saxon Falls", "Brule", "Biron"] == ["Biron", "Brule", "Saxon Falls"]

False

And now try this:

In [83]:
{"Saxon Falls", "Brule", "Biron"} == {"Biron", "Brule", "Saxon Falls"}

True

### Task 5.4 Convert between lists and sets

You can switch back and forth between lists and sets with ease. Let's try it.

#### Lab question 30

What is the **list** of all plant names (`plant_name`) operated by the entity with the ID (`entity_id`) *13781*?

Points possible: 4

In [84]:
plant_names_13781 = []

# Loop through all rows in the dataset
for i in range(num_rows):
    # Check if the entity_id matches 13781
    if cell(i, 'entity_id') == 13781:
        # Append the plant_name to the list
        plant_names_13781.append(cell(i, 'plant_name'))

# Output the list of plant names
plant_names_13781


['Saxon Falls',
 'Saxon Falls',
 'Bay Front',
 'Bay Front',
 'Big Falls',
 'Big Falls',
 'Big Falls',
 'Ladysmith Dam',
 'Ladysmith Dam',
 'Ladysmith Dam',
 'Thornapple',
 'Thornapple',
 'White River (WI)',
 'White River (WI)',
 'Cedar Falls (WI)',
 'Cedar Falls (WI)',
 'Cedar Falls (WI)',
 'Chippewa Falls',
 'Chippewa Falls',
 'Chippewa Falls',
 'Chippewa Falls',
 'Chippewa Falls',
 'Chippewa Falls',
 'Dells',
 'Dells',
 'Dells',
 'Dells',
 'Dells',
 'French Island',
 'French Island',
 'French Island',
 'French Island',
 'Holcombe',
 'Holcombe',
 'Holcombe',
 'Jim Falls',
 'Jim Falls',
 'Jim Falls',
 'Menomonie',
 'Menomonie',
 'St Croix Falls',
 'St Croix Falls',
 'St Croix Falls',
 'St Croix Falls',
 'St Croix Falls',
 'St Croix Falls',
 'St Croix Falls',
 'St Croix Falls',
 'Trego',
 'Trego',
 'Wheaton',
 'Wheaton',
 'Wheaton',
 'Wheaton',
 'Wheaton',
 'Wissota',
 'Wissota',
 'Wissota',
 'Wissota',
 'Wissota',
 'Wissota',
 'Cornell',
 'Cornell',
 'Cornell',
 'Cornell',
 'Apple Rive

In [85]:
student_grader.check("lab-q30")

Make sure you saved the notebook before running this cell. Running check for lab-q30...
Great job! You passed all test cases for this question.


#### Lab question 31

What is the **set** of all plant names (`plant_name`) operated by the entity with the ID (`entity_id`) *13781*?

**Hint:** You can convert a *list* into a *set* by typecasting. For example, to convert a *list* `example_list` into a *set*, you can use `set(example_list)`.

We note that each power plant features multiple times in the list in the question above, since each power plant may have multiple generators, and each generator shows up once in the dataset. So here we're converting the list to a set.

Points possible: 4

In [86]:
# Create a list of plant names for entity_id 13781
plant_names_13781 = []

# Loop through all rows in the dataset
for i in range(num_rows):
    # Check if the entity_id matches 13781
    if cell(i, 'entity_id') == 13781:
        # Append the plant_name to the list
        plant_names_13781.append(cell(i, 'plant_name'))

# Convert the list to a set to remove duplicates
plant_names_13781_set = set(plant_names_13781)

# Output the length of the list and the set
print('Length of list:', len(plant_names_13781))
print('Length of set:', len(plant_names_13781_set))

# Output the set of plant names
plant_names_13781_set


Length of list: 68
Length of set: 19


{'Apple River',
 'Bay Front',
 'Big Falls',
 'Cedar Falls (WI)',
 'Chippewa Falls',
 'Cornell',
 'Dells',
 'French Island',
 'Holcombe',
 'Jim Falls',
 'Ladysmith Dam',
 'Menomonie',
 'Saxon Falls',
 'St Croix Falls',
 'Thornapple',
 'Trego',
 'Wheaton',
 'White River (WI)',
 'Wissota'}

In [87]:
student_grader.check("lab-q31")

Make sure you saved the notebook before running this cell. Running check for lab-q31...
Length of list: 68
Length of set: 19
Great job! You passed all test cases for this question.


#### Lists vs sets length

As you can see, the number of elements is different! This is because a set is a collection of **unique** elements. Therefore, there can be no duplicates in a **set**.

**Be careful!** When going from a set to a list, Python has to choose how to order the previously unordered values. If you run the same code, there's no guarantee Python will always choose the same way to order the set values in the new list.

In the previous question, the length of the list was `68`, while the length of the set was `19`. That means there were `19` unique items in the set, while the other `68 - 19 = 49` were repeated (not unique) values.

### Task 5.5 Remove Duplicates

Let's use the uniqueness property of sets above to remove duplicates from a list by **converting** from a **list to a set** and **back to a list again**.

#### Lab question 32

What is the **unique list** of all plant names (`plant_name`) operated by the entity with the ID (`entity_id`) *4247*?

**Hint:** Just as you can convert a *list* into a *set* by typecasting, you can convert a *set* into a *list*. For example, to convert a *set* `example_set` into a *list*, you can use `list(example_set)`.

Points possible: 4

In [88]:
plant_names_4247 = []

# Step 1: Collect plant names for entity_id 4247
for i in range(num_rows):
    if cell(i, 'entity_id') == 4247:
        plant_names_4247.append(cell(i, 'plant_name'))

# Step 2: Convert the list to a set to remove duplicates, then back to a list
plant_names_4247 = list(set(plant_names_4247))

# Output the unique list of plant names
plant_names_4247

['Whiting', 'Biron', 'Du Bay', 'Wisconsin Rapids', 'Stevens Point']

In [89]:
student_grader.check("lab-q32")

Make sure you saved the notebook before running this cell. Running check for lab-q32...
Great job! You passed all test cases for this question.


### Submitting the lab

Submit your `p6.ipynb` on Gradescope to the lab-p6 assignment, like usual. Remember that the grades for the lab portion of the project and the actual assignment grade are independent. You will submit the same notebook (at different levels of completion) to two different assignments.

# Project portion (20 questions)

## Dataset:

The dataset is the same as what was used in the lab portion. A small portion of the dataset `power_generators.csv` you will be working with for this project is reproduced here:

entity_id|entity_name|plant_id|plant_name|generator_id|county|net_summer_capacity|net_winter_capacity|technology|latitude|longitude
---|---|---|---|---|---|---|---|---|---|---
13781|Northern States Power Co - Minnesota|1756|Saxon Falls|1|Iron|0.5|0.5|Conventional Hydroelectric|46.5392|-90.3742
13781|Northern States Power Co - Minnesota|1756|Saxon Falls|2|Iron|0.5|0.6|Conventional Hydroelectric|46.5392|-90.3742
20847|Wisconsin Electric Power Co|1775|Brule|1|Florence|1.3|1.3|Conventional Hydroelectric|45.9472|-88.2189
20847|Wisconsin Electric Power Co|1775|Brule|2|Florence|2|2|Conventional Hydroelectric|45.9472|-88.2189
20847|Wisconsin Electric Power Co|1775|Brule|3|Florence|2|2|Conventional Hydroelectric|45.9472|-88.2189

Each row of data represents a **single** generator. The columns contain the following data about each generator (along with the correct data type you **must** represent it as):

1. `entity_id` - the **ID** of the **entity** that operates the Power Generator (`int`)
2. `entity_name` - the **name** of the **entity** that operates the Power Generator (`str`)
3. `plant_id` - the **ID** of the **Power Plant** hosting the Power Generator (`int`)
4. `plant_name` - the **name** of the **Power Plant** hosting the Power Generator (`str`)
5. `generator_id` - the **ID** of the specific **Power Generator** within its Power Plant (`str`)
6. `county` - the **name** of the **county** that the **Power Plant** is located in (`str`)
7. `net_summer_capacity` - the maximum **capacity** of the **Power Generator** (in units of MW) during the Summer months (`float`)
8. `net_winter_capacity` - the maximum **capacity** of the **Power Generator** (in units of MW) during the Winter months (`float`)
9. `technology` - the **technology** used by the **Power Generator** (`str`)
10. `latitude` - the **latitude** where the **Power Plant** is located (`float`)
11. `longitude` - the **longitude** where the **Power Plant** is located (`float`)

## Project Requirements:

You **may not** hardcode indices in your code unless specified in the question. If you are not sure what hardcoding is, here is a simple test you can use to determine whether you have hardcoded:

*If we were to change the data (e.g. add more power generators, remove some power generators, or swap some columns or rows), would your code still find the correct answer to the question as it is asked?*

If your answer to that question is *No*, then you have likely hardcoded something. Please reach out to TAs/PMs during office hours to find out how you can **avoid hardcoding**.

**Store** your final answer for each question in the **variable specified for each question**. This step is important because Otter grades your work by comparing the value of this variable against the correct answer.

For some of the questions, we'll ask you to write (then use) a function to compute the answer.  If you compute the answer **without** creating the function we ask you to write, the Gradescope autograder will **deduct** points, even if the way you did it produced the correct answer.

#### Required Functions:
- `process_csv`
- `cell`
- `find_entities_with_phrase`
- `num_generators_by`
- `find_indices_within`
- `median`
- `total_summer_capacity_of`
- `avg_winter_capacity_of`
    
You are only allowed to use Python commands and concepts that have been taught in the course prior to the release of P6. Therefore, **you should not use concepts/modules such as dictionaries, or the pandas module, to name a few examples**. Otherwise, the Gradescope autograder will **deduct** points, even if the way you did it produced the correct answer.

## Project Questions (20)

#### Project question 1

What **unique** technologies (`technology`) are used by the power generators in Wisconsin? (Note that all power generators in the dataset are from Wisconsin.)

Your output **must** be a *list* which stores all the **unique** technologies (i.e., without any duplicates). The order **does not** matter.

Points possible: 4

In [91]:
# Initialize an empty list to store the technology values
technologies = []

# Loop through all rows in the dataset
for i in range(num_rows):
    # Append the technology to the list
    technologies.append(cell(i, 'technology'))

# Convert the list to a set to remove duplicates, then back to a list
technologies = list(set(technologies))

# Output the list of unique technologies
technologies



['Batteries',
 'Petroleum Liquids',
 'Conventional Hydroelectric',
 'Natural Gas Fired Combustion Turbine',
 'Other Waste Biomass',
 'Natural Gas Internal Combustion Engine',
 'Petroleum Coke',
 'Conventional Steam Coal',
 'Onshore Wind Turbine',
 'Wood/Wood Waste Biomass',
 'Solar Photovoltaic',
 'Natural Gas Steam Turbine',
 'Natural Gas Fired Combined Cycle',
 'Landfill Gas',
 'Nuclear']

In [92]:
student_grader.check("q1")

Make sure you saved the notebook before running this cell. Running check for q1...
Great job! You passed all test cases for this question.


#### Project question 2

How many power generators are in the `county` *Dane*?

Points possible: 4

In [93]:
# Initialize a counter for the number of power generators in Dane county
count_dane = 0

# Loop through all rows in the dataset
for i in range(num_rows):
    # Check if the county at row i is 'Dane'
    if cell(i, 'county') == 'Dane':
        # Increment the counter
        count_dane += 1

# Output the number of power generators in Dane county
count_dane


47

In [94]:
student_grader.check("q2")

Make sure you saved the notebook before running this cell. Running check for q2...
Great job! You passed all test cases for this question.


#### Project question 3

What is the **total** `net_summer_capacity` of all the power generators in Wisconsin?

Your answer **must** be a **float** that represents the total `net_summer_capacity`. You **must** **ignore** all power generators whose `net_summer_capacity` data is **missing**.

Points possible: 4

In [95]:
# Initialize a variable to store the total net_summer_capacity
total_summer_capacity = 0

# Loop through all rows in the dataset
for i in range(num_rows):
    # Extract the net_summer_capacity for each row
    summer_capacity = cell(i, 'net_summer_capacity')
    
    # Ignore rows with missing net_summer_capacity data
    if summer_capacity is not None:
        # Add the summer capacity to the total
        total_summer_capacity += summer_capacity

# Output the total net_summer_capacity
total_summer_capacity


17628.09999999994

In [96]:
student_grader.check("q3")

Make sure you saved the notebook before running this cell. Running check for q3...
Great job! You passed all test cases for this question.


#### Project Function 1: `find_entities_with_phrase(phrase)`

We require you to complete the below function. You can review string methods from lecture slides.

When you call `find_entities_with_phrase(phrase)`, your function should loop through the data, look for `phrase` in the entity name of each plant and add that entity name to the list returns a list of all the entity names.

When comparing `phrase` and `entity_name`, this should be a **case-insentive** comparison. This can be done by converting both variables to the same case before checking whether the phrase is in the entity name. You can use the string `.lower()` method to convert a string to lowercase.

Points possible: 5

In [97]:
def find_entities_with_phrase(phrase):
    # Initialize an empty list to store entity names
    entity_names = []

    # Loop through all rows in the dataset
    for i in range(num_rows):
        # Extract the entity_name for the current row
        entity_name = cell(i, 'entity_name')
        
        # Perform a case-insensitive comparison to check if the phrase is in the entity_name
        if phrase.lower() in entity_name.lower():
            # Add the entity name to the list
            entity_names.append(entity_name)
    
    # Return a unique list of entity names
    return list(set(entity_names))

# Test your function
find_entities_with_phrase("Water")

['Whitewater Operating Services LLC', 'Consolidated Water Power Co']

In [98]:
student_grader.check("find_entities_with_phrase")

Make sure you saved the notebook before running this cell. Running check for find_entities_with_phrase...
Great job! You passed all test cases for this question.


#### Project question 4

Find all entity names (`entity_name`) that contain the string *"Madison"* (case insensitive).
    
Your output **must** be a **list**. The order **does not** matter. You **must** use the `find_entities_with_phrase` function to answer this question.

Points possible: 3

In [99]:
# Use the find_entities_with_phrase function to find all entity names containing "Madison"
madison_entities = find_entities_with_phrase("Madison")

# Output the list of entities
madison_entities

['Madison Gas & Electric Co']

In [100]:
student_grader.check("q4")

Make sure you saved the notebook before running this cell. Running check for q4...
Great job! You passed all test cases for this question.


#### Project question 5

Find all unique entity names (`entity_name`) that contain **either** *"Wisconsin"* **or** *"Power"* (case insensitive).

If an entity's name contains **both** *"Wisconsin"* and *"Power"*, then the `entity_name` must be included **only once** in your list.

Your output **must** be a **list**. The order **does not** matter.

**Hint**: You can use the `find_entities_with_phrase` function on *"Wisconsin"* and *"Power"* to answer this question.

Points possible: 5

In [101]:
# Find entities with "Wisconsin"
entities_contain_wisconsin_power = find_entities_with_phrase("Wisconsin")

# Extend the list to include entity names that contain "Power"
entities_contain_wisconsin_power.extend(find_entities_with_phrase("Power"))

# Remove duplicates by converting the list to a set and then back to a list
entities_contain_wisconsin_power = list(set(entities_contain_wisconsin_power))

# Output the final list of unique entity names
entities_contain_wisconsin_power

['Northwestern Wisconsin Elec Co',
 'North Central Power Co Inc',
 'Consolidated Water Power Co',
 'Wisconsin Power & Light Co',
 'Wisconsin River Power Company',
 'HQC Rock River Solar Power Generation Station LLC',
 'Wisconsin Electric Power Co',
 'Wisconsin Public Service Corp',
 'State of Wisconsin',
 'Dairyland Power Coop',
 'Dahlberg Light & Power Co',
 'Northern States Power Co - Minnesota']

In [102]:
student_grader.check("q5")

Make sure you saved the notebook before running this cell. Running check for q5...
Great job! You passed all test cases for this question.


#### Project question 6

Find all entity names (`entity_name`) that contain **both** *"Solar"* **and** *"LLC"* (case insensitive).

Your output **must** be a **list**. The order **does not** matter.

**Hint**: One way to solve this is as follows. You can use the `find_entities_with_phrase` function on *"Solar"* and loop through those entity names to find ones that contain *"LLC"* to answer this question.

Points possible: 4

In [103]:
# Step 1: Find entities that contain "Solar"
entities_contain_solar_llc = []
entities_contain_solar = find_entities_with_phrase("Solar")

# Step 2: Loop through the entities that contain "Solar" and check if they also contain "LLC"
for entity in entities_contain_solar:
    if "LLC".lower() in entity.lower():
        entities_contain_solar_llc.append(entity)

# Output the list of entity names that contain both "Solar" and "LLC"
entities_contain_solar_llc

['Middleton Biogas Solar, LLC',
 'Butter Solar, LLC',
 'HQC Rock River Solar Power Generation Station LLC',
 'Dane County Solar LLC',
 'Flambeau Solar Partners, LLC']

In [104]:
student_grader.check("q6")

Make sure you saved the notebook before running this cell. Running check for q6...
Great job! You passed all test cases for this question.


#### Project question 7

Find the generator IDs (`generator_id`) of all the generators that use the `technology` *"Wood/Wood Waste Biomass"* within the power plant with the `plant_id` *50614*.

Your output **must** be a *list*. The IDs **must** be sorted in **descending (alphabetical) order**.

Points possible: 5

In [105]:
# Initialize an empty list to store the generator IDs
plant_50614_generators = []

# Loop through all rows in the dataset
for i in range(num_rows):
    # Check if the plant_id matches 50614 and the technology is "Wood/Wood Waste Biomass"
    if cell(i, 'plant_id') == 50614 and cell(i, 'technology') == "Wood/Wood Waste Biomass":
        # Append the generator_id to the list
        plant_50614_generators.append(cell(i, 'generator_id'))

# Sort the generator IDs in descending alphabetical order
plant_50614_generators.sort(reverse=True)

# Output the sorted list of generator IDs
plant_50614_generators

['WEST', 'GEN1']

In [106]:
student_grader.check("q7")

Make sure you saved the notebook before running this cell. Running check for q7...
Great job! You passed all test cases for this question.


#### Project question 8

What are the power plants (`plant_name`) that contain generators which use the `technology` *Conventional Hydroelectric* and have a `net_summer_capacity` greater than *5*?

You **must** **ignore** all generators with **missing** `net_summer_capacity` data. Thus, if your cell function returns `None` for this property, you should skip the current index with a `continue`.

Your output **must** be a *list* of **unique** plant names (`plant_name`). The names **must** be sorted in **ascending (alphabetical) order**.

Points possible: 5

In [107]:
# Initialize an empty list to store the plant names
powerful_hydro_electric_plants = []

# Loop through all rows in the dataset
for i in range(num_rows):
    # Check if the technology is "Conventional Hydroelectric"
    if cell(i, 'technology') == "Conventional Hydroelectric":
        # Get the net_summer_capacity and skip rows with missing data
        net_summer_capacity = cell(i, 'net_summer_capacity')
        if net_summer_capacity is None:
            continue
        
        # Check if net_summer_capacity is greater than 5
        if net_summer_capacity > 5:
            # Append the plant_name to the list
            powerful_hydro_electric_plants.append(cell(i, 'plant_name'))

# Remove duplicates and sort the list in ascending (alphabetical) order
powerful_hydro_electric_plants = sorted(list(set(powerful_hydro_electric_plants)))

# Output the sorted list of plant names
powerful_hydro_electric_plants

['Cornell',
 'Flambeau Hydroelectric Station',
 'Grandfather Falls',
 'Holcombe',
 'Jim Falls',
 'Kilbourn',
 'Prairie Du Sac']

In [108]:
student_grader.check("q8")

Make sure you saved the notebook before running this cell. Running check for q8...
Great job! You passed all test cases for this question.


#### Project Function 2: `num_generators_by(entity_name)`

Please complete the below function. This function should return the number of power generators operated by the given `entity_name`.

Points possible: 3

In [109]:
def num_generators_by(entity_name):
    # Initialize a counter for the number of generators
    generator_count = 0

    # Loop through all rows in the dataset
    for i in range(num_rows):
        # Check if the entity_name matches the input entity_name
        if cell(i, 'entity_name') == entity_name:
            # Increment the counter if there's a match
            generator_count += 1

    # Return the final count
    return generator_count

# Test your function by calling it with an entity name
num_generators_by("City of New Lisbon")

4

In [110]:
student_grader.check("num_generators_by")

Make sure you saved the notebook before running this cell. Running check for num_generators_by...
Great job! You passed all test cases for this question.


#### Project question 9

How **many** generators are operated by the entity (`entity_name`) *Madison Gas & Electric Co*?

You **must** use the `num_generators_by` function to answer this question.

Points possible: 2

In [111]:
# Use the num_generators_by function to find how many generators are operated by "Madison Gas & Electric Co"
num_generators_by_mge = num_generators_by("Madison Gas & Electric Co")

# Output the result
num_generators_by_mge


19

In [112]:
student_grader.check("q9")

Make sure you saved the notebook before running this cell. Running check for q9...
Great job! You passed all test cases for this question.


#### Project question 10

How **many** generators are operated by entities whose name (`entity_name`) **contains** the **phrase** *River* (case insensitive)?

You **must** use the `num_generators_by` and `find_entities_with_phrase` functions to answer this question. You will need to use a loop to answer this question.

Points possible: 3

In [113]:
# Step 1: Find all entities with "River" in their name (case insensitive)
river_entities = find_entities_with_phrase("River")

# Step 2: Initialize a counter for the total number of generators
num_generators_by_river = 0

# Step 3: Loop through each entity and count its generators
for entity in river_entities:
    num_generators_by_river += num_generators_by(entity)

# Output the total number of generators operated by entities with "River" in their name
num_generators_by_river

10

In [114]:
student_grader.check("q10")

Make sure you saved the notebook before running this cell. Running check for q10...
Great job! You passed all test cases for this question.


#### Project question 11

Which entity (`entity_name`) operates the **most** number of generators within Wisconsin?

You **must** use the `num_generators_by` function to answer this question. You do **not** have to worry about any ties. There is a **unique** entity with the most number of generators in the dataset.

**Hint**: You must first create a list of unique entity names from the dataset, then loop through them to find the entity with the most number of generators.

Take a look back at the `find_entities_with_phrase` function and reason through why `find_entities_with_phrase("")` will give you a list of all unique entities. You do **not** have to use this variable to answer the question, but it can make this question easier.

Points possible: 5

In [115]:
# Initialize variables to keep track of the entity with the most generators
most_generators_entity = None
max_generators = 0

# Get a list of all unique entities
all_unique_entities = find_entities_with_phrase("")

# Loop through each entity and count the number of generators
for entity in all_unique_entities:
    # Get the number of generators for the current entity
    generator_count = num_generators_by(entity)
    
    # Update if the current entity has more generators than the previous maximum
    if generator_count > max_generators:
        max_generators = generator_count
        most_generators_entity = entity

# Output the entity that operates the most generators
most_generators_entity

'Northern States Power Co - Minnesota'

In [116]:
student_grader.check("q11")

Make sure you saved the notebook before running this cell. Running check for q11...
Great job! You passed all test cases for this question.


#### Project Function 3: `find_indices_within(lat_min, lat_max, long_min, long_max)` 

Please complete the below function. `find_indices_within` should return a list of *row indices* of all generators located within the
latitudes `lat_min` and `lat_max` (both inclusive) and the longitudes `long_min` and `long_max` (both inclusive).

Points possible: 5

In [117]:
def find_indices_within(lat_min, lat_max, long_min, long_max):
    indices = []
    
    # Loop through all rows in the dataset
    for i in range(num_rows):
        # Get the latitude and longitude for the current row
        latitude = cell(i, 'latitude')
        longitude = cell(i, 'longitude')
        
        # Check if the latitude and longitude fall within the specified bounds
        if lat_min <= latitude <= lat_max and long_min <= longitude <= long_max:
            indices.append(i)  # Append the row index to the list if the conditions are met
    
    return indices

# Example test: Uncomment the following line to test the function with specific latitude and longitude ranges
# find_indices_within(43, 43.5, -90.5, -90)

In [118]:
student_grader.check("find_indices_within")

Make sure you saved the notebook before running this cell. Running check for find_indices_within...
Great job! You passed all test cases for this question.


#### Project question 12

How **many** power generators are located **within** the *City of Milwaukee* (`42.9870 <= latitude <= 43.1936`, `-88.0636 <= longitude <= -87.8727`)?

Note that simply checking if the `county` is *Milwaukee* will lead you to count generators that are within *Milwaukee County*, but not within the City. Use the coordinates given above to determine the generators that lie within the City.

You **must** use the `find_indices_within` function and `len` to answer this question.

Points possible: 3

In [119]:
# Use the find_indices_within function to find all generators within the specified latitude and longitude range for Milwaukee
num_generators_in_milwaukee = len(find_indices_within(42.9870, 43.1936, -88.0636, -87.8727))

# Output the number of generators within the City of Milwaukee
num_generators_in_milwaukee

8

In [120]:
student_grader.check("q12")

Make sure you saved the notebook before running this cell. Running check for q12...
Great job! You passed all test cases for this question.


#### Project question 13

What are the **unique** technologies (`technology`) used by power generators located **near** the *University of Wisconsin-Madison Department of Computer Sciences* (`43.0675 <= latitude <= 43.0725`, `-89.4100 <= longitude <= -89.4000`)?

You may assume that any power generator that lies within the coordinates given above are **near** the *University of Wisconsin-Madison Department of Computer Sciences*. You **must** use the `find_indices_within` function to answer this question.

You **must** return a **list** of **unique** technologies. The order **does not** matter. 

Points possible: 3

In [122]:
# Step 1: Find the indices of generators within the specified latitude and longitude range
plant_indices_near_uwm = find_indices_within(43.0675, 43.0725, -89.4100, -89.4000)

# Step 2: Collect the technologies used by these generators
uw_madison_technologies = []
for idx in plant_indices_near_uwm:
    uw_madison_technologies.append(cell(idx, 'technology'))

# Step 3: Remove duplicates by converting the list to a set and back to a list
uw_madison_technologies = list(set(uw_madison_technologies))

# Output the list of unique technologies
uw_madison_technologies

['Petroleum Liquids', 'Natural Gas Steam Turbine']

In [123]:
student_grader.check("q13")

Make sure you saved the notebook before running this cell. Running check for q13...
Great job! You passed all test cases for this question.


#### Project question 14

Which power plant (`plant_name`) in *North Wisconsin* (`44.9657 <= latitude <= 46.6989`, `-92.1908 <= longitude <= -87.6449`) has the generator with the **highest** `net_summer_capacity`?

You may assume that any power generator that lies within the coordinates given above are **in** *North Wisconsin*. You **may** assume that **none** of the `net_summer_capacity` values of any of the power generators within this area are **missing**.

You do **not** have to worry about any ties. There is a **unique** generator in *North Wisconsin* with the highest `net_summer_capacity`.

You **must** use the `find_indices_within` function to answer this question.

Points possible: 5

In [124]:
# Step 1: Find the indices of generators in North Wisconsin
plant_indices_north_wisconsin = find_indices_within(44.9657, 46.6989, -92.1908, -87.6449)

# Step 2: Initialize variables to track the highest net_summer_capacity and its index
max_capacity = 0
max_capacity_index = None

# Step 3: Loop through the indices to find the generator with the highest net_summer_capacity
for idx in plant_indices_north_wisconsin:
    # Get the net_summer_capacity of the current generator
    capacity = cell(idx, 'net_summer_capacity')
    
    # Update if this generator has a higher capacity than the current maximum
    if capacity > max_capacity:
        max_capacity = capacity
        max_capacity_index = idx

# Step 4: Get the plant name for the generator with the highest capacity
north_wisconsin_most_powerful = cell(max_capacity_index, 'plant_name')

# Output the plant name
north_wisconsin_most_powerful

'West Marinette'

In [125]:
student_grader.check("q14")

Make sure you saved the notebook before running this cell. Running check for q14...
Great job! You passed all test cases for this question.


#### Project question 15

What is the **median** `net_winter_capacity` of *Conventional Hydroelectric* (`technology`) power generators **near** *Lake Winnebago* (`43.6961 <= latitude <= 44.3512`, `-88.5375 <= longitude <= -88.2713`)?

You may assume that any power generator that lies within the coordinates given above are **near** *Lake Winnebago*. You **may** assume that **none** of the `net_winter_capacity` values of any of the power generators within this area are **missing**.

You **must** use the `find_indices_within` and `median` functions to answer this question.

Points possible: 4

In [126]:
# Step 1: Find the indices of generators near Lake Winnebago
winnebago_indices = find_indices_within(43.6961, 44.3512, -88.5375, -88.2713)

# Step 2: Collect the net_winter_capacity for Conventional Hydroelectric generators
winnebago_hydro_winter_capacities = []
for idx in winnebago_indices:
    if cell(idx, 'technology') == "Conventional Hydroelectric":
        winnebago_hydro_winter_capacities.append(cell(idx, 'net_winter_capacity'))

# Step 3: Get the median of the collected net_winter_capacity values
winnebago_hydro_winter_capacity = median(winnebago_hydro_winter_capacities)

# Output the median net_winter_capacity
winnebago_hydro_winter_capacity


0.7

In [127]:
student_grader.check("q15")

Make sure you saved the notebook before running this cell. Running check for q15...
Great job! You passed all test cases for this question.


#### Project Function 4: `total_summer_capacity_of(plant_name)` 

Please complete the below function. This function must take in a `plant_name` and return the **total** `net_summer_capacity` of all the power generators within that power plant.

This function can be **case-sensitive**. You **only** need to consider the power plants whose names **exactly match** `plant_name`. This means that if a plant name at an index you're considering is not exactly `plant_name`, you should skip it with `continue`. You must also skip plants with missing `net_summer_capacity` data.

Points possible: 6

In [128]:
def total_summer_capacity_of(plant_name):
    total_capacity = 0  # Initialize the total capacity to 0

    # Loop through all rows in the dataset
    for i in range(num_rows):
        # Check if the plant_name matches exactly
        if cell(i, 'plant_name') != plant_name:
            continue  # Skip if the plant_name doesn't match

        # Get the net_summer_capacity and skip if it is missing (None)
        summer_capacity = cell(i, 'net_summer_capacity')
        if summer_capacity is None:
            continue  # Skip if the capacity is missing

        # Add the capacity to the total
        total_capacity += summer_capacity

    return total_capacity  # Return the total capacity

# Test your function by calling it with different plant names
# Example: total_summer_capacity_of("Kilbourn")

In [129]:
student_grader.check("total_summer_capacity_of")

Make sure you saved the notebook before running this cell. Running check for total_summer_capacity_of...
Great job! You passed all test cases for this question.


#### Project question 16

What is the **net summer capacity** of the **power plant** with the `plant_name` *Point Beach Nuclear Plant*?

The **net summer capacity** of a **power plant** refers to the **total** `net_summer_capacity` of **all** the generators within the power plant.

You **must** use the `total_summer_capacity_of` function to answer this question.

Points possible: 2

In [130]:
# Use the total_summer_capacity_of function to find the net summer capacity of the "Point Beach Nuclear Plant"
point_beach_summer_capacity = total_summer_capacity_of("Point Beach Nuclear Plant")

# Output the net summer capacity
point_beach_summer_capacity

1211.0

In [131]:
student_grader.check("q16")

Make sure you saved the notebook before running this cell. Running check for q16...
Great job! You passed all test cases for this question.


#### Project question 17

Find the **median** of the **net summer capacities** of **all** the **power plants** in Wisconsin.

The **net summer capacity** of a **power plant** refers to the **total** `net_summer_capacity` of **all** the generators within the power plant.

You **must** use the `total_summer_capacity_of` function to answer this question.

**WARNING**: You **must not** find the **median** across all the power **generators**. Multiple generators may belong to the same power plant. Instead, you **must** find the **median** across the **total** power generated by each **power plant**.

**Hint**: You must first make a list of all the **unique** power plants (`plant_name`) in the dataset, then make a **list** of all their **total** net summer capacities (using `total_summer_capacity_of`), and finally, find the **median** of this list.

Points possible: 5

In [132]:
# Step 1: Make a list of all unique power plant names
power_plants = []

# Loop through all rows in the dataset to collect plant names
for i in range(num_rows):
    power_plants.append(cell(i, 'plant_name'))

# Remove duplicates by converting the list to a set, then back to a list
power_plants = list(set(power_plants))

# Step 2: Make a list of total net summer capacities for each power plant
summer_capacities = []

for plant in power_plants:
    summer_capacities.append(total_summer_capacity_of(plant))

# Step 3: Find the median of the summer capacities
median_summer_capacity = median(summer_capacities)

# Output the median summer capacity
median_summer_capacity

7.25

In [133]:
student_grader.check("q17")

Make sure you saved the notebook before running this cell. Running check for q17...
Great job! You passed all test cases for this question.


#### Project Function 6: `avg_winter_capacity_of(technology)`

We **require** you to complete the below function. This function must take in a `technology` and return the **average** (i.e., **mean**) `net_winter_capacity` of **all** the power generators which use that particular `technology`. If a particular `technology` has **some** generators with **missing data**, those generators **must** be **ignored** while finding the average (i.e., ignored in **both** the *numerator* and the *denominator*).

This function can be **case-sensitive**. You **only** need to consider the technologies whose names **exactly match** `technology`.

Points possible: 4

In [134]:
def avg_winter_capacity_of(technology):
    total_capacity = 0  # To store the total winter capacity
    count = 0  # To count the number of generators

    # Loop through all rows in the dataset
    for i in range(num_rows):
        # Check if the technology matches exactly
        if cell(i, 'technology') != technology:
            continue  # Skip if the technology doesn't match
        
        # Get the net_winter_capacity and skip if it is missing (None)
        winter_capacity = cell(i, 'net_winter_capacity')
        if winter_capacity is None:
            continue  # Skip if the capacity is missing
        
        # Add the capacity to the total and increment the count
        total_capacity += winter_capacity
        count += 1

    # Return the average winter capacity (or 0 if no matching generators)
    if count == 0:
        return 0
    return total_capacity / count

In [135]:
student_grader.check("avg_winter_capacity_of")

Make sure you saved the notebook before running this cell. Running check for avg_winter_capacity_of...
Great job! You passed all test cases for this question.


#### Project question 18

What is the **average** `net_winter_capacity` of all power generators that use the `technology` *Conventional Hydroelectric*?

You **must** use the `avg_winter_capacity_of` function to answer this question.

Points possible: 2

In [136]:
# Replace the ... with your code to compute the average winter capacity
hydro_avg_winter_capacity = avg_winter_capacity_of("Conventional Hydroelectric")

# Output the result
hydro_avg_winter_capacity

1.7491304347826093

In [137]:
student_grader.check("q18")

Make sure you saved the notebook before running this cell. Running check for q18...
Great job! You passed all test cases for this question.


#### Project question 19

Which `technology` has the **highest** `net_winter_capacity` on **average**?

You **must** use the `avg_winter_capacity_of` function to answer this question. You do **not** have to worry about any ties. There is a **unique** technology with the highest `net_winter_capacity` on average.

**Hint**: You already created a list of the **unique** technologies to answer Question 1. Loop through that list and find which of those technologies has the **highest** average `net_winter_capacity` (using `avg_winter_capacity_of`).

Points possible: 5

In [138]:
# Initialize variables to track the technology with the highest average net_winter_capacity
max_winter_capacity_tech = None
max_avg_capacity = 0

# List of unique technologies (assuming you've already created this list)
unique_technologies = list(set([cell(i, 'technology') for i in range(num_rows)]))

# Loop through the unique technologies and find the one with the highest average net_winter_capacity
for tech in unique_technologies:
    avg_capacity = avg_winter_capacity_of(tech)
    
    # Update if this technology has a higher average capacity than the current maximum
    if avg_capacity > max_avg_capacity:
        max_avg_capacity = avg_capacity
        max_winter_capacity_tech = tech

# Output the technology with the highest average net_winter_capacity
max_winter_capacity_tech

'Nuclear'

In [139]:
student_grader.check("q19")

Make sure you saved the notebook before running this cell. Running check for q19...
Great job! You passed all test cases for this question.


#### Project question 20

Find the **difference** between the **average** net winter capacities of the `technology` with the **highest** average and the **second highest** average.

You must use the `avg_winter_capacity_of` function to answer this question.

**Hint**: You have already found the `technology` with the **highest** average. You need to find the `technology` with the **second highest** average, and find the difference between their average net winter capacities using the `avg_winter_capacity_of` function.

**Extra Hint**: If you create a list of average winter capacities and then sort it in reverse order, the first and second items will be the items with the highest and second highest values, respectively.

Points possible: 4

In [140]:
# Step 1: Create a list of average winter capacities for each technology
avg_winter_capacities = []

# List of unique technologies (assuming you've already created this list)
unique_technologies = list(set([cell(i, 'technology') for i in range(num_rows)]))

# Loop through the unique technologies and calculate their average net_winter_capacity
for tech in unique_technologies:
    avg_winter_capacities.append((tech, avg_winter_capacity_of(tech)))

# Step 2: Sort the list of average capacities in descending order based on the average values
avg_winter_capacities.sort(key=lambda x: x[1], reverse=True)

# Step 3: Find the difference between the highest and second highest average net_winter_capacity
diff_avg_winter_capacity = avg_winter_capacities[0][1] - avg_winter_capacities[1][1]

# Output the difference
diff_avg_winter_capacity

271.00000000000006

In [141]:
student_grader.check("q20")

Make sure you saved the notebook before running this cell. Running check for q20...
Great job! You passed all test cases for this question.


## Submission and Grading

**Congrats on finishing p6!**

Now, please submit your `p6.ipynb` file on gradescope. This will follow the exact same process you used when submitting earlier projects. Please post on Piazza if you run into any issues when uploading your notebook to gradescope.