In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("pre03.ipynb")

<table style="width: 100%;">
<tr style="background-color: transparent;">
<td width="100px"><img src="https://cs104williams.github.io/assets/cs104-logo.png" width="90px" style="text-align: center"/></td>
<td>
  <p style="margin-bottom: 0px; text-align: left; font-size: 18pt;"><strong>CSCI 104: Data Science and Computing for All</strong><br>
                Williams College<br>
                Fall 2025</p>
</td>
</tr>


# Prelab 3: Strings, Array Broadcasting, and Visualization

**Instructions**
- Before you begin, execute the cell at the TOP of the notebook to load the provided tests, as well as the following cell to setup the notebook by importing some helpful libraries. Each time you start your server, you will need to execute these cells again.  
- Be sure to consult your [Python Reference](https://cs104williams.github.io/assets/python-library-ref.html)!
- Complete this notebook by filling in the cells provided. 
- Please be sure to not re-assign variables throughout the notebook.  For example, if you use `max_temperature` in your answer to one question, do not reassign it later on. Otherwise, you will fail tests that you thought you were passing previously.
- There are no hidden tests in prelabs.

<hr/>
<h2>Setup</h2>


In [None]:
# Run this cell to set up the notebook.
# These lines import the numpy, datascience, and cs104 libraries.

import numpy as np
from datascience import *
from cs104 import *
%matplotlib inline

<hr style="margin-bottom: 0px; padding:0; border: 2px solid #500082;"/>


## 1. Text (15 pts)



<font color='#B1008E'>
    
#### Learning objectives
- Manipulate text data via strings
- Practice converting between data types
- Use strings in combination with functions
</font>

Programming doesn't just concern numbers. Text is one of the most common data types used in programs. 

Text is represented by a **string value** in Python. The word "string" is a programming term for a sequence of characters. A string might contain a single character, a word, a sentence, or a whole book.

To distinguish text data from actual code, we demarcate strings by putting quotation marks around them. Single quotes (`'`) and double quotes (`"`) are both valid, but the types of opening and closing quotation marks must match. The contents can be any sequence of characters, including numbers and symbols. 

We've seen strings before in `print` statements.  Below, two different strings are passed as arguments to the `print` function.

In [None]:
# Run this cell see this example
print("I ♥️", 'CS 104')

Just as variables can be assinged to numbers, variables can also be assigned to strings.  The names and strings aren't required to be similar in any way. 

In [None]:
# Run this cell see this example
one = 'two'
plus = '*'
print(one, plus, one)

#### String Methods

As with tables, Strings can be transformed using **methods**. Many String methods exist, but we'll focus on just one here.

Specifically, the `replace` method replaces all occurrences of a specified phrase with another specified phrase in a string.

For example, run the following cell:

In [None]:
text = 'I like bananas and more bananas!'
print(text.replace('bananas', 'apples'))

The `replace` method returns (evaluates to) a new string, leaving the original string unchanged.  Try to predict the output of this example, then run the cell!

In [None]:
# Replace one letter
hello = 'Hello'
print(hello.replace('o', 'a'), hello)

You can call functions on the results of other functions.  For example, `max(abs(-5), abs(3))` evaluates to 5.  Similarly, you can call methods on the results of other method or function calls.

You may have already noticed one difference between functions and methods - a function like `max` does not require a `.` before it's called, but a string method like `replace` does. 

You can refer to the [Python reference](https://www.cs.williams.edu/~cs104/auto/python-library-ref.html) page whenever you're unsure of how to call a function or method.

In [None]:
# Calling replace on the output of another call to replace
'train'.replace('t', 'ing').replace('in', 'de')

Here's a picture of how Python evaluates a "chained" method call like that:

<img src="chaining_method_calls.png"/>

#### Part 1.1 (5 pts)


 Use `replace` to transform the string `'hitchhiker'` into `'matchmaker'`. Assign your result to `new_word`.



In [None]:
the_word = 'hitchhiker'
new_word = ...
new_word

In [None]:
grader.check("p1.1")

There are many more string methods in Python, but most programmers don't memorize their names or how to use them.  In the "real world," people usually just search the internet for documentation and examples. A complete [list of string methods](https://docs.python.org/3/library/stdtypes.html#string-methods) appears in the Python language documentation.

#### Converting to and from Strings

Strings and numbers are different **types** (even when a string contains the digits of a number). For example, evaluating the following cell causes an error because an integer cannot be added to a string.

In [None]:
8 + "8"

However, there are built-in functions to convert numbers to strings and strings to numbers. Some of these built-in functions have restrictions on the type of argument they take:

|Function | Description |
|- | - |
|`int`|Converts a string of digits or a float to an integer ("int") value |
|`float`|Converts a string of digits (perhaps with a decimal point) or an int to a decimal ("float") value |
|`str`|Converts any value to a string |

Try to predict what data type and value `example` evaluates to, then run the cell.

In [None]:
example = 8 + int("10") + float("8")

print(example)
print("This example returned a " + str(type(example)) + "!")

#### Part 1.2 (5 pts)


Suppose you're writing a program that looks for dates in a text, and you want your program to find the amount of time that elapsed between two years it has identified.  It doesn't make sense to subtract two texts, but you can first convert the text containing the years into numbers.  Finish the code below to compute the number of years that elapsed between `one_year` and `another_year`.  Don't just write the numbers `1618` and `1648` (or `30`); use a conversion function to turn the given text data into numbers.

In [None]:
# Some text data:
one_year = "1618"
another_year = "1648"

# Complete the next line.  Note that we can't just write:
#   difference = another_year - one_year
# If you don't see why, try seeing what happens when you
# write that here.
difference = ...
difference

In [None]:
grader.check("p1.2")

#### Length of strings

String values, like numbers, can be arguments to functions and can be returned by functions. 

The function `len` (derived from the word "length") takes a single string as its argument and returns the number of characters (including spaces) in the string.

#### Part 1.3 (5 pts)


String values, like numbers, can be arguments to functions and can be returned by functions. 

The function `len` (derived from the word "length") takes a single string as its argument and returns the number of characters (including spaces) in the string.

Use `len` to find the number of characters in the long string in the next cell.  Characters include things like spaces and punctuation. Assign `sentence_length` to that number.  (The string is the first sentence of the English translation of the French [Declaration of the Rights of Man](http://avalon.law.yale.edu/18th_century/rightsof.asp).)  

In [None]:
a_very_long_sentence = "The representatives of the French people, organized as a National Assembly, believing that the ignorance, neglect, or contempt of the rights of man are the sole cause of public calamities and of the corruption of governments, have determined to set forth in a solemn declaration the natural, unalienable, and sacred rights of man, in order that this declaration, being constantly before all the members of the Social body, shall remind them continually of their rights and duties; in order that the acts of the legislative power, as well as those of the executive power, may be compared at any moment with the objects and purposes of all political institutions and may thus be more respected, and, lastly, in order that the grievances of the citizens, based hereafter upon simple and incontestable principles, shall tend to the maintenance of the constitution and redound to the happiness of all."
sentence_length = ...
sentence_length

In [None]:
grader.check("p1.3")

<hr style="margin-bottom: 0px; padding:0; border: 2px solid #500082;"/>


## 2. Array Broadcasting (15 pts)



<font color='#B1008E'>
    
#### Learning objectives
- Practice using *array broadcasting* to make data science calculations easier
</font>

The array datatype we use in this class ([Chapter 5 in the book](https://inferentialthinking.com/chapters/05/1/Arrays.html)) uses *array broadcasting* to manipulate arrays by other numbers. This problem will help you understand why array broadcasting can save you time in your data science workflow and give you more practice with operations on arrays -- we'll use them throughout the rest of the semester!

#### Part 2.1 (5 pts)


We've loaded an array of temperatures in the next cell.  Each number is the highest temperature observed on a day at a climate observation station, mostly from the US.  

In [None]:
# Just run this cell
max_temperatures = Table.read_table("temperatures.csv").column("Daily Max Temperature")
print("The number of elements in this array is", len(max_temperatures))
max_temperatures

Since these values are from the US government agency [NOAA](https://www.noaa.gov/), all the temperatures are in Fahrenheit.  

Let's use [array broadasting](https://inferentialthinking.com/chapters/05/1/Arrays.html) to convert them all to Celsius by first subtracting 32 from them, then multiplying the results by $\frac{5}{9}$. Make sure to **ROUND** the final result after converting to Celsius to the nearest integer using the `np.round` function, which takes an array and rounds number in the array to the nearest integer.

In [None]:
celsius_max_temperatures = ...
celsius_max_temperatures

In [None]:
grader.check("p2.1")

#### Part 2.2 (5 pts)


The cell below loads all the *lowest* temperatures from each day (in Fahrenheit).  Compute the daily temperature range for each day. That is, compute the difference between each daily maximum temperature and the corresponding daily minimum temperature.  **Pay attention to the units, give your answer in Celsius!** Make sure **NOT** to round your answer for this question! 

*Note:* Remember that in the previous part, `celsius_max_temperatures` was rounded, so you might not want to use that in this question.

In [None]:
min_temperatures = Table.read_table("temperatures.csv").column("Daily Min Temperature")

celsius_temperature_ranges = ...
celsius_temperature_ranges

In [None]:
grader.check("p2.2")

#### Part 2.3 (5 pts)


Using `celsius_temperature_ranges` computer the largest temperature range observed at a weather station on that day.  Since we're using `celsius_temperature_ranges`, your answer will be in Celcius.  We also convert it in Farenheit for you.

In [None]:
largest_range_celsius = ...
print("The largest range is", largest_range_celsius, "in Celsius, or", 9/5 * largest_range_celsius, "in Farenheit. Yikes!")

In [None]:
grader.check("p2.3")

<hr style="margin-bottom: 0px; padding:0; border: 2px solid #500082;"/>


## 3. World Population (15 pts)



<font color='#B1008E'>
    
#### Learning objectives
- Create and interpret basic line plots and interpret their results.
</font>

The following cell loads in a table of the world population from the years 1950 to 2022.

In [None]:
world_population = Table().read_table("world_population.csv")
world_population.show(5)

#### Part 3.1 (5 pts)


Create a line plot of this data, with "Year" on the x-axis and "Population" on the y-axis.  You'll want to use the [plot](https://www.cs.williams.edu/~cs104/auto/python-library-ref.html#plot) method on tables.

In [None]:
plot = ...
plot.set_title("World Population")

From the plot, the world population crossed seven billion for the first time in what year?
Set the variable `first_time_seven_billion` to your answer, representing that year as an integer like `1999`.

In [None]:
first_time_seven_billion = ...

In [None]:
grader.check("p3.1")

#### Part 3.2 (5 pts)


Now compute how much the population changed during each year since 1950.  To do this, use the `np.diff` function applied to the "Population" column.  Recall that [np.diff](https://www.cs.williams.edu/~cs104/auto/python-library-ref.html#diff) takes an array and creates a new array containing the differences between consecutive elements in the original.

In [None]:
diffs = ...
diffs

In [None]:
grader.check("p3.2")

#### Part 3.3 (5 pts)


The following cell plots the difference in population between years.  We build a new table with one fewer row than the original to hold the differences.  The row with the year 1951 then contains the difference in population between 1950 and 1951 that you computed in the previous step.

In [None]:
diffs_table = Table().with_columns("Year", np.arange(1951,2023), "Difference", diffs)
plot = diffs_table.plot("Year", "Difference")
plot.set_title("Yearly Change in Population")

From the plot, the world population had its peak growth in about which year?
* 1980
* 1990
* 2000
* 2010

Set the variable `peak_year` to your answer in the cell below.

In [None]:
peak_year = ...

In [None]:
grader.check("p3.3")

<hr class="m-0" style="border: 3px solid #500082;"/>

# You're Done!
Follow these steps to submit your work:
* Run the tests and verify that they pass as you expect. 
* Choose **Save Notebook** from the **File** menu.
* **Run the final cell** and click the link below to download the zip file. 

Once you have downloaded that file, go to [Gradescope](https://www.gradescope.com/) and submit the zip file to 
the corresponding assignment. For Prelab N, the assignment will be called "Prelab N Autograder".

Once you have submitted, your Gradescope assignment should show you passing all the tests you passed in your assignment notebook.


## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. **Please save before exporting!**

In [None]:
# Save your notebook first, then run this cell to export your submission.
grader.export(pdf=False, run_tests=True)