In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("pre04.ipynb")

<table style="width: 100%;">
<tr style="background-color: transparent;">
<td width="100px"><img src="https://cs104williams.github.io/assets/cs104-logo.png" width="90px" style="text-align: center"/></td>
<td>
  <p style="margin-bottom: 0px; text-align: left; font-size: 18pt;"><strong>CSCI 104: Data Science and Computing for All</strong><br>
                Williams College<br>
                Fall 2025</p>
</td>
</tr>


# Prelab 4: Functions

**Instructions**
- Before you begin, execute the cell at the TOP of the notebook to load the provided tests, as well as the following cell to setup the notebook by importing some helpful libraries. Each time you start your server, you will need to execute these cells again.  
- Be sure to consult your [Python Reference](https://cs104williams.github.io/assets/python-library-ref.html)!
- Complete this notebook by filling in the cells provided. 
- Please be sure to not re-assign variables throughout the notebook.  For example, if you use `max_temperature` in your answer to one question, do not reassign it later on. Otherwise, you will fail tests that you thought you were passing previously.
- There are no hidden tests in prelabs.

<hr/>
<h2>Setup</h2>


In [None]:
# Run this cell to set up the notebook.
# These lines import the numpy, datascience, and cs104 libraries.

import numpy as np
from datascience import *
from cs104 import *
%matplotlib inline

<hr style="margin-bottom: 0px; padding:0; border: 2px solid #500082;"/>


## 1. Defining Functions (25 pts)



<font color='#B1008E'>
    
##### Learning objectives
- Understand the components of a *function* in Python 
- Create Python functions
- Distinguish `print()` and `return` 
</font>

A function definition has a few parts, which we review using the `double` function from lecture:

<img width=75% src="double.png"/>

* **Name**: the name of the function.  Like other names we've defined, it can't start with a number or contain spaces. 
* **Parameters**: the number and the names of the parameters to your function.  A function can have any number of arguments (including 0). 
    
    If we want our function to take more than one argument, we add a comma between each argument name. Note that if we had zero arguments, we'd still place the parentheses () after the function's name. 
    
* **Signature**: We call the "function signature line" the line that contains the keyword `def` (which tells Python we're "defining" a function) the function name, and the function parameters. 

* **Docstring**: Functions can perform sophisticated computations, so you should write an explanation of what your function does.  For small functions, this is less important, but it's a good habit to learn from the start.  Conventionally, Python functions are documented by writing an **indented** triple-quoted string right after the the function's signature line.

* **Body**: This is the code that runs when the function is called.  Every line **must be indented with a tab**. Some notes about the body of the function:
    - We can write code that we would write anywhere else.  
    - We use the arguments defined in the function signature. We can do this because we assume that when we call the function, values are already assigned to those arguments.
    - We generally avoid referencing variables defined *outside* the function. If you would like to reference variables outside of the function, pass them through as arguments.

* **Return statement**: The keyword `return` is part of the function's body and tells Python to make the value of the function call equal to whatever comes right after `return`.  A `return` instruction only makes sense in the context of a function, and **can never be used outside of a function**. Also, `return` is always the last line of the function because Python stops executing the body of a function once it hits a `return` statement. If a function does not have a return statement, it will not return anything; if you expect a value back from the function, make sure to include a return statement. 

    *Note:*  `return` inside a function tells Python what value the function evaluates to. However, there are other functions, like `print`, that have no `return` value. For example, `print` simply prints a certain value out to the console. `return` and `print` are **very** different. 

#### Part 1.1 (5 pts)


Let's write a function `to_percentage` that converts a proportion to a percentage by multiplying it by 100.  For example, the value of `to_percentage(0.5)` should be the number 50 (no percent sign).

Complete the definition of `to_percentage` in the cell below.  Call your function to convert the proportion .2 to a percentage.  Name that percentage `twenty_percent`.



In [None]:
def to_percentage(proportion):
    """Converts a proportion to a percentage."""
    factor = 100 # factor to multipy by the input parameter
    return ...
    
twenty_percent = to_percentage(...)
twenty_percent

In [None]:
grader.check("p1.1")

Here's something important about functions: the names assigned *within* a function body are only accessible within the function body. Once the function has returned, those names are gone.  So even if you created a variable called `factor` and defined `factor = 100` inside of the body of the `to_percentage` function and then called `to_percentage`, `factor` would not have a value assigned to it outside of the body of `to_percentage`:

In [None]:
# You should get an error when you run this.  (If you don't, 
# you might have defined factor somewhere above.)
factor

#### Part 1.2 (5 pts)


Use `to_percentage` again to convert the proportion named `a_proportion` (defined below) to a percentage called `a_percentage`.

*Note:* You don't need to define `to_percentage` again!  Like other named values, functions stick around after you define them.

In [None]:
a_proportion = 2**(0.5) / 2
a_percentage = ...
a_percentage

In [None]:
grader.check("p1.2")

As we've seen with built-in functions, functions can also take strings (or arrays, or tables) as arguments, and they can return those things, too.

In the following cell, we will define a function called `disemvowel`.  It takes in a single string as its argument. It returns a copy of that string, but with all the characters that are vowels removed.  (In English, the vowels are the characters "a", "e", "i", "o", and "u".) 

To remove all the "a"s from a string, we used `a_string.replace("a", "")`.  The `.replace` method for strings returns a new string, so we can call `replace` multiple times, one after the other. 

In [None]:
def disemvowel(text):
    """Removes all vowels from a string."""
    return text.replace("a", "").replace("e", "").replace("i", "").replace("o", "").replace("u", "")

# An example call to the function.  (It's often helpful to run
# an example call from time to time while we're writing a function,
# to see how it currently works.)
disemvowel("Can you read this without vowels?")

#### Calls on calls on calls

Just as you write a series of lines to build up a complex computation, it's useful to define a series of small functions that **build on each other**.  Since you can write any code inside a function's body, you can call other functions you've written.

If a function is a like a **recipe**, defining a function in terms of other functions is like having a recipe for cake telling you to follow another recipe to make the frosting, and another to make the jam filling.  This makes the cake recipe shorter and clearer, and it avoids having a bunch of duplicated frosting recipes.  It's a foundation of productive programming.

For example, suppose you want to count the number of characters *that aren't vowels* in a piece of text.  One way to do that is this to remove all the vowels and count the size of the remaining string.

#### Part 1.3 (5 pts)


Write a function called `num_non_vowels` to do that.  It should take a string as its argument and return a number.  That number should be the number of characters (including spaces) in the argument string that aren't vowels. You should use the `disemvowel` function we provided above inside of the `num_non_vowels` function.

*Hint:* Remember that the function `len` takes a string as its argument and returns the number of characters in it.

In [None]:
def num_non_vowels(text):
    """The number of characters in a string, minus the vowels."""
    ...

# Try calling your function yourself to make sure the output is what
# you expect. 

num_non_vowels("hoo hoo hoo")   # should be 5: 3 h's, and 2 spaces

In [None]:
grader.check("p1.3")

Functions can also encapsulate code that *displays output* instead of computing a value. For example, if you call `print` inside a function, and then call that function, something will get printed.

Let's use the `majors` dataset which we explored in lecture. This contains the average number of majors at Williams per year for two year ranges. 

In [None]:
majors = Table.read_table("majors.csv")
majors.show(5)

Suppose you'd like to display the year with the 5th most popular major in 2018-2021, printed in a human-readable way.  You might do this:

In [None]:
rank = 5
major_fifth_from_top = majors.sort("2018-2021", descending=True).column("Major").item(rank-1)
print("Major ranked", rank, "by size in 2018-2021 was", major_fifth_from_top)

After writing this, you realize you also wanted to print out the 1st and 3rd largest majors too.  Instead of copying your code, you decide to put it in a function.  Since the rank varies, you make that an argument to your function.

#### Part 1.4 (5 pts)


Write a function called `print_kth_top_major`.  It should take a single argument, the rank *k* (like 2, 3, or 5 in the above examples).  It should print out a message like the one above.  

*Note:* Your function shouldn't have a `return` statement.

In [None]:
def print_kth_top_major(k):
    """
    Prints the kth top major where 
    k is the rank of the major
    """
    major_kth_from_top = ...
    ...
    
# Example calls to your function:
print_kth_top_major(1)
print_kth_top_major(3)

The two calls to your function above should print:

    Major ranked 1 by size in 2018-2021 was Economics
    Major ranked 3 by size in 2018-2021 was Mathematics

Try passing in other values too!  1, 11, 21, ...

There are no grader tests for this one since all it does is print messages...  You'll get credit for this part as long as you try it!

**Interactive cells.** We'll now show you how to make interactive cells that allow you to select inputs to a function using pop ups or slides. This is how we've been making our interactive graph visualizations in lecture.  

To do so, we make use of a new library function `interact` that takes as its first argument a function you'd like to interact with.  Subsequent arguments describe the possible values for each argument to that function.   Below, our function is `print_kth_top_major` and we will let the user choose a value of `k` using a slider that goes from 1 to 10.

In [None]:
interact(print_kth_top_major, k=Slider(1, 10))

#### Part 1.5 (5 pts)


Write a new function called `kth_top_major_for_years`.  It should take a two arguments, a rank, and then the label for the column with the desire year ranges (eg: `'2008-2012'`).  

For this function, return the appropriate major without printing any messages.  That is, your function should end with a `return` statement rather than a `print` statement.

In [None]:
def kth_top_major_for_years(k, year_range):
    """k is the rank of the major; year_range is a string indicating the year range"""
    major_kth_from_top = ...
    ...
    
# Example calls to your function:
print('5th most popular in 2008-2012:', kth_top_major_for_years(5, '2008-2012'))
print('5th most popular in 2018-2011:', kth_top_major_for_years(5, '2018-2021'))

In [None]:
grader.check("p1.5")

#### Note: `print` is not the same as `return`
The `print_kth_top_major(k)` function prints the major at the given rank for 2018-2021.  However, since we did not return any value in this function, we can not use it after we call it. Let's look at an example of another function that prints a value but does not return it.

In [None]:
def print_number_five():
    print(5)

In [None]:
print_number_five()

However, if we try to use the output of `print_number_five()`, we see that the value `5` is printed but we get a TypeError when we try to add the number 2 to it!

In [None]:
print_number_five_output = print_number_five()
print_number_five_output + 2

It may seem that `print_number_five()` is returning a value, 5. In reality, it just displays the number 5 to you without giving you the actual value! If your function prints out a value **without returning it** and you try to use that value, you will run into errors, so be careful.

<hr style="margin-bottom: 0px; padding:0; border: 2px solid #500082;"/>


## 2. Apply (10 pts)



<font color='#B1008E'>
    
##### Learning objectives
- Use apply to perform operations on all values in Table columns.
- Write helper functions to support transforming Table data.
</font>

[Cricket Creek Farm](https://cricketcreekfarm.com/) is a local, organic dairy farm in Williamstown, MA that makes cheese.  We'll use some information about their cheese varieties to practice using the [apply method](https://www.cs.williams.edu/~cs104/auto/python-library-ref.html#apply) on tables.  First, load the data by running the following cell.

In [None]:
cheeses = Table.read_table("cricket_creek_cheeses.csv")
cheeses

Notice that the `weight` of the cheese wheels or blocks for each variety is given as a string like `'5lbs'`, indicating five pounds.  

We'd like to change those strings to `float` values representing weights in ounces.  

Given that there are 16 ounces in a pound, we'd this like to convert the string `'0.5lbs`' into the number 8.0. Let's apply the following steps: 
1. First, replace `'lbs'` with the empty string `''`, as we did about for vowels in `disemvowel`.
1. Then use `float()` to convert the remaining string (`'0.5'` in this case) to a float. 
1. Finally, apply the conversion of pounds to ounces.

Running the following cell demonstrates why we cannot just use array broadcasting:

In [None]:
float(cheeses.column("weight").replace('lbs', '')) * 16

In essence, the `replace` method for strings is not designed to be broadcast to the elements of an array.  Thus, we will define a method to do our coversion and then use `apply` on our table.

#### Part 2.1 (5 pts)


Complete the following definition of `lbs_string_to_oz` that takes a parameter `weight_text` as a string of the form `'5lbs'` and returns that weight in ounces as a `float`.  

*Note:* You won't need to use the `cheeses` table in this part -- we'll use that table in the next part.

In [None]:
def lbs_string_to_oz(weight_text):
    """Takes a weight_text as a string containing a number followed by 'lbs'.  
       Returns that weight as a number of ounces."""
    ...

# Two calls to manually test your code.
print('Converting 5lbs produces', lbs_string_to_oz('5lbs'))   # Should be 80.0
print('Converting 0.5625lbs produces', lbs_string_to_oz('0.5625lbs'))   # Should be 9.0

In [None]:
grader.check("p2.1")

#### Part 2.2 (5 pts)


Now, we'll use the `lbs_string_to_oz` function we just created to convert the `'weight'` column into a column storing weights in ounces. We've given you most of the code.  You just need to fill in the parameters to `apply`.  See the documentation for [apply](https://www.cs.williams.edu/~cs104/auto/python-library-ref.html#apply) to see a description of the parameters and examples.

In [None]:
weights_in_oz = cheeses.apply(..., ...)

cheese_with_oz = cheeses.with_column("weight (ounces)", weights_in_oz)
cheese_with_oz

In [None]:
grader.check("p2.2")

<hr style="margin-bottom: 0px; padding:0; border: 2px solid #500082;"/>


## 3. Group (15 pts)



<font color='#B1008E'>
    
##### Learning objectives
- Use `group` to aggregate information from related rows in a table.
</font>

As we've seen in lecture, our table's [group](https://www.cs.williams.edu/~cs104/auto/python-library-ref.html#group) method aggregates all rows with the same value for a given column into a single row in the resulting table.  Here, we'll write a few grouping operations on our Williams majors from 2018-2021 to gain a better intuition for what it means to aggregate rows into a single row.  

First, run the following cell to load and subset the data:

In [None]:
majors_2018_2021 = Table.read_table("majors.csv")
majors_2018_2021 = majors.drop("2008-2012")
majors_2018_2021

#### Part 3.1 (5 pts)


Let's count the number of distinct Major degrees in each Division. For example, American Studies, Anthropology are both Majors in Division 2.  

To do so, we give only one parameter to `group`, namely the column with the values we're using to divide the rows into groups.  

Complete the following code to compute the counts for each Division. 

*Note:* Be sure to carefully examine the labels of the columns in the resulting table. 

In [None]:
division_counts = majors_2018_2021.group(...)
division_counts

In [None]:
grader.check("p3.1")

#### Part 3.2 (5 pts)


Now, we'll use the second form of group that takes two parameters: 
1. The name of the column containing the values we're using the group the rows 
2. The function to apply to the array of values in each other column for that group.  

We'll start by computing the total number of student majors in each Division.  

*Hint*: Recall, the `sum` function?    

Again, notice the column names in the resulting table, and also that if the aggregation function doesn't make sense for data in a column, you will just see an empty column in the result.   You can always drop those columns so they don't clutter your tables, but for now we'll just leave them in.

In [None]:
majors_by_division = majors_2018_2021.group(..., ...)
majors_by_division

In [None]:
grader.check("p3.2")

#### Part 3.3 (5 pts)


Let's do one last quick `group` operation.  

Using `max` as the aggregation function, create a table that, for each division, shows the size of the largest major within that division.  You'll want to drop the "Major" column before grouping, since we are not interested in the name of the major.  Moreover, the group operation computes the values for the `Major` and `2018-2021` columns separately, so there is no guarantee that the "Major" column in the resulting table will match the data in the "2018-2021" column.  (You can try your group operation without first dropping "Major" to see this potential pitfall.)

In [None]:
max_by_division = ...
max_by_division

In [None]:
grader.check("p3.3")

## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. **Please save before exporting!**

In [None]:
# Save your notebook first, then run this cell to export your submission.
grader.export(pdf=False, run_tests=True)