# **Show Your Work**: An Introduction to Computational Notebooks for K-12 Teachers

<div class="alert alert-info">
This notebook is designed for teachers in math and science who are new to computational notebooks. Our goal is to introduce you to what notebooks are, why they're useful, and how you can start thinking about them for your classroom.
</div>

### What is a Computational Notebook?

In the professional world, from science and engineering to finance and journalism, computational notebooks (like this one!) have become a standard tool. These notebooks are typically used to present investigations of data through an interactive document with runnable code, corresponding output (like tables and graphs), and explanatory text and pictures all in one place. They create a complete story of an analysis.

Let's try a quick example to see what we mean. Below, the grey rectangle is what's called a "code block" or "code cell."

Each line of code below is followed by a comment (<span style='color:green'>_#like this_</span>). The hash tag tells the computer to ignore the text that follows -- it's meant only for human readers, to help make sense of the code. Find the play button for this notebook, and click it to run all the code in the cell.

In [None]:
x = 4 + 5 # make x equal to 4 plus 5
x # output the contents of x

Below is another block. To keep things interesting, we've designed this notebook so running the block will create an error. Try it out to see what this looks like.

In [None]:
y # output the contents of y
y = 5 # put 5 in the contents of y

Every error gives you a clue about how you can fix your code. This one tells you that you need to assign a value (number or some text) to y before asking to show it. Edit the code block above so something is put into y _before_ showing it.

When you are working with code, order matters. In notebooks, there are two different kinds of order: The order of _lines_ like in the box above, and the order that you choose to run _code cells_ with the play button. Try running the code cell below. You will see a familiar error.

In [None]:
z

Instead of trying to fix the error, try running the block below. It won't give you any output, since you are not asking to show anything. But _then_, run the block above one more time to output z. 

In [None]:
z = "Look at me now!"

What this allows you to do is tinker, edit, and discover things through making changes to code. If you get a result that's surprising or that inspires more questions, go ahead add or edit the code and run it again to learn more! We'll try an example of what this means for data investigations soon.

It's helpful to remember that the order of cells matters because sometimes, when you get an error in a Jupyter notebook, it could just be because you forgot to press "play" on an earlier box in the document, that the current code needs to execute correctly.

### Why Should I Learn About Computational Notebooks?

As a teacher, you know especially well that science is not just about getting an answer. It's about the process, and about the decisions you make when you make observations, build models, make calculations, or collect data. Notebooks can make this process transparent. Anyone can see the exact steps—the data, the code, the model, the reasoning—that led to a conclusion. This is called **reproducibility**, and it's a cornerstone of modern scientific practice.

Another important feature of notebooks is they can allow students to create simulations that have probabilistic elements, modeling **uncertainty** in science and statistics. Here is a very simple example of what that looks like. Try running the cell a few times. 

In [None]:
sample.int(10,1) # output one random integer between 0-10

Sharing a notebook is like sharing your lab notes, calculations, simulations, analyses, and your final report all at once. This kind of transparency is necessary for good scientific and statistical work, and can be very useful for students, too. It demystifies data analysis, showing it as a series of deliberate, understandable steps. It directly connects to the ways that science and math education should work: by having students experience how knowledge is built, revised, and validated. And, it allows students to not just consume information, but to interact with it, change it, and ask their own questions.

The biggest hurdle is the initial learning curve. Learning the basics of a coding language can feel intimidating. That's why we are here! However, as you'll see, you don't need to be a software developer to do powerful things.

### How do notebooks compare to other tools I use in my classroom?

You already use tools for calculation and data in your classroom. How might notebooks like this one fit in?

* **Paper & Pencil** activities are classic, inexpensive, highly accessible, flexible, and excellent for sketching ideas or working through a single problem. However, they can become unweildy when you want students to work with large datasets, or to tinker with models or simulations that are difficult to reproduce with hands-on materials.

* **Spreadsheets** like Excel or Google Sheets are great for entering data, simple formulas, and basic charts, and it's good for students to become comfortable with spreadsheets as a popular professional tool. However with spreadsheets, the *process* and *story* is hidden. It's hard to track all the steps, clicks, and ideas that created a chart. 

* **Scientific and Graphing Calculators** like TI-85 or Desmos are best for manual calculations and graphing functions. But these tools are not designed for more flexible analysis of datasets with multiple variables and are limited in terms of visualization and tracking process.

* **Interactive Visualization Tools** like CODAP, Tuva, and DataClassroom make it very easy to quickly explore patterns and relationships with data through menu-based and drag-and-drop methods. Some also include tools to help you to document the process. However, these tools are designed specifically for education, they are not used in the professional world. They are more open-ended, which can make it harder to focus on certain details of their exploration.

The main point is that notebooks integrate the _narrative_ (text and pictures), the _analysis_ (code), and _results_ (your tables and plots) into a single, interactive story.

# A ❄︎Cool❄︎ Investigation: Penguin Populations

<div class="alert alert-info">
This activity is designed to give you a little taste of what it can feel like to explore questions with data using Jupyter notebooks. We picked a topic that's easy to imagine doing in a classroom, and we'll share standards connections at the end of the investigation.
</div>

We're going to analyze a simple, fun dataset about penguins. The data were collected in the Palmer Archipelago, Antarctica.

<div style="display: flex; justify-content: space-around;">
<img src="adelie.jpg" width="150">
<img src="gentoo.jpg" width="150">
<img src="chinstrap.jpg" width="150">
</div>

### Key Data Concepts

Before we dive in, let's define some key terms. A dataset is a structured collection of information. In our dataset, each **case**, or row in the table, describes a single penguin. The characteristics or measurements we record for each penguin are called **attributes**, and represent a column in the table. In our dataset, some of the attributes are:

* `species` (the species of the penguin observed)
* `island` (the island where the penguin was observed)
* `bill_length_mm` (the length of the penguin's bill in millimeters)
* `flipper_length_mm` (the length of the penguin's flipper in millimeters)

This simple, consistent structure is the key to making data analysis work.

## Let's get started

You will use code to explore the penguin data. As you do, we will have you keep track of your ideas using these text boxes. This will give you practice with **documenting** your ideas and analysis decisions.

Good documentation does at least two, and sometimes more, things.
1.  **Explanatory (in Text Cells):** Explaining the background information and *why* we are doing something, like we're doing here.
2.  **Technical (in Code Comments):** Using comments (lines starting with `#`) inside the code to explain what a specific line of code does.

### Step 1: Load the Data

Let's start by importing some libraries. These are code toolkits that allow you to perform certain kinds of specialized analysis. Below, we import libraries for making data tables, and for building data plots. Then, in the same code block, we'll load our data.

In [None]:
# This is an R code cell.

# First, we import libraries. These are toolkits that add functions.
# readr is for loading data.
# ggplot2 is for making beautiful plots.
# We only need to install them once.
install.packages("readr")
install.packages("ggplot2")

library(readr)
library(ggplot2)

# Load the dataset from a URL. We give it the name 'penguins'.
# The data is now stored in a tibble (R's modern data frame).
penguins <- read_csv('https://raw.githubusercontent.com/allisonhorst/palmerpenguins/main/inst/extdata/penguins.csv')

First, let's take a look at the first few rows of this dataset. Take a moment before you hit run to make a prediction about the output will look like.

In [None]:
# Use the head() function to display the first 6 rows of our table.
# This is a great way to confirm the data loaded correctly and see its structure.
head(penguins)

<div class="alert alert-success"> 
<b>Considering the data you see above,</b><br>
* What do you notice and wonder about this dataset? <br> 
* What are some questions you could explore?
<br><br>
<i>Double click the text cell below to add your responses.</i>
</div>

**DOUBLE CLICK HERE.** When you are finished, press the play button to "close" the cell reformat the text.

### Step 2: Get Descriptive Statistics

Now that the data is loaded and you've seen a few cases, let's get a statistical summary of _all_ the cases. This helps us understand the some characteristics of the numerical attributes in this dataset, things like: What's the average bill length? What's the shortest and longest flipper that were observed?

In [None]:
# The summary() function calculates key descriptive
# statistics for all columns in the data frame.
summary(penguins)

<div class="alert alert-success"> 
<b>Take some time to review the descriptive statistics above:</b><br>
* In what years were these data collected? <br>
* How would you describe a penguin that's 5000 grams, in terms of the general population? <br>
* How would you describe this penguin population in general? <br>
* Is there anything you find surprising about the summary of data?
</div>

**DOUBLE CLICK HERE** to add your responses to the questions above.

### Step 3: Visualize the Data

Let's start to explore some relationships in this dataset. We'll start by showing you how to create a simple scatterplot of the penguins' bill length versus depth:

<img src="bill.png" width="250" height="150">

Before going to the next step, take some time to think about what you expect to see!

In [None]:
# We use the ggplot2 library to build our plot layer by layer.
ggplot(
  data = penguins,                     # Use our penguins data frame
  mapping = aes(x = flipper_length_mm, # Map flipper length to the x-axis
                y = bill_length_mm)
) +
  geom_point(na.rm = TRUE)             # Add the points themselves, creating a scatter plot

<div class="alert alert-success"> 
<b>Consider the scatterplot above:</b><br>
* Does it look like what you were expecting? Why, or why not? <br>
* How would you describe the general distribution of points in the scatterplot? What might be some reasons for that distribution? 
</div>

**DOUBLE CLICK HERE** to add your responses to the questions above.

You might wonder whether something like the penguin's island, sex, or species has something to do with these patterns. You can explore this question by using these attributes to color the scatterplot points. Try coloring the plot according to sex: Add the line `color = sex` inside the `ggplot` function in the code above and run it. It might take a few tries to get it right; don't give up! Then, use the spaces above to take more notes, ask more questions, and continue exploring. 

You can also create new scatterplots with different attributes and colors working until you are able to identify a satisfying, well-supported explanation for the irregular shape of the plot. 

### Step 4: Share and Compare Your Findings

<div class="alert alert-success"> 
<b>Once you are satisfied with your investigation:</b><br>
* How would you describe this population of penguins? <br>
* What are some things that might be impacting the characteristics of these penguins? <br>
* What science and math ideas did you find yourself using during this investigation?
</div>

**DOUBLE CLICK HERE** to add your responses to the questions above.

As you might have discovered, there are quite a few paths that this investigation can take! If you're interested, check out [this video](https://docs.google.com/document/d/10_5XSb_BIQ5CzFdHhFs5ThVD-MnItylUMFcrNoTRY2k/edit?usp=sharing) to see some of these paths.

# Key Takeaways about Computational Notebooks

<div class="alert alert-info">
While this activity is meant for <i>you</i> as an educator, we designed it with classroom practice in mind. Exploring the Palmer Penguins dataset has great connections to several Next Generation Science Standards Disciplinary Core Ideas and Science and Engineering Practices:
</div>

* _Interdependent Relationships in Ecosystems_. Organisms, and populations of organisms, are dependent on their environmental interactions both with other living things and with nonliving factors (LS2.A).

* _Analyzing and Interpreting Data_. Construct, analyze, and/or interpret graphical displays of data and/or large data sets to identify linear and nonlinear relationships (MS).

* _Developing and Using Models_. Develop, revise, and/or use a model based on evidence to illustrate and/or predict the relationships between systems or between components of a system (HS).

One strength of computational notebooks is that they can provide clear models of code, while also allowing the space to explore and make discoveries with data. Another is that as you learn more about the code, you can begin to create your own real, integrated data stories.

Working in a notebook is a new skill. The most important thing you can bring to it is a productive mindset. Encourage these habits of mind in yourself and your students.

* **Embrace Tinkering:** What happens if you change some of the code you are introduced to below? Try it! The worst that can happen is you get an error or a weird-looking graph. You can't permanently break anything. If your notebook gets messy or you get stuck, you can always reset everything.
* **Code as Expression:** Students should read and change the code in ways that reflect _scientific ideas_ about what they are exploring. Just like data  can provide evidence in science, and science helps explain data, code can help externalize and test data and science ideas.
* **Document Your Journey:** Use text cells as your lab notebook. Write down your questions before you write code. After you get an output, write a sentence about what you think it means. This practice of "documenting as you go" is a powerful skill for thinking and for collaboration.
* **Errors are Your Friends:** You *will* get error messages. Everyone who codes does. An error is not a sign of failure; it is a clue. Read it carefully. Often, it will tell you exactly where you made a typo. Learning to read error messages is a core part of learning to code.
* **Be Curious:** This is the most important disposition of all. A notebook is a tool for asking and answering questions. Use it to explore your curiosity and empower your students to explore theirs.

### Credits

This notebook was developed as part of "Show York Work" (SyW), a research and development project at UC Berkeley to introduce computational notebooks to K-12 educators. 
                                                                      
The SyW team includes, in alphabetical order: Pavritha Arun Anand, Sun Young Ban, Chul Huang, JungMin Shin, Michelle Wilkerson, and Xiaoyue Zhang.

Preliminary drafting of the notebook was done with the assistance of Google Gemini Pro 2.5. 

The Palmer Penguins dataset was downloaded from https://github.com/allisonhorst/palmerpenguins, lisenced under CC1.0 Universal. The data were collected from 2007 - 2009 by Dr. Kristen Gorman with the Palmer Station Long Term Ecological Research Program, part of the US Long Term Ecological Research Network. 

Adelie, Gentoo, and Chinstrap Penguin photos by [Andrew Shiva](https://commons.wikimedia.org/wiki/User:Godot13) / Wikipedia / CC BY-SA 4.0. The Gentoo photo was cropped; no other modifications were made to the images.

Bill diagram artwork by [@allison_horst](https://allisonhorst.com/).

SyW is supported by a grant from the Barbara Y. White Bequest and by the CalTeach BERET-AIRE (NSF Award - 2419242) Summer Research Institute. 