# **Show Your Work**: An Introduction to Computational Notebooks for K-12 Teachers

<div class="alert alert-info">
This notebook is designed for teachers in math and science who are new to computational notebooks. Our goal is to introduce you to what notebooks are, explore how they might be useful, and help you decide if they are right for your classroom.
</div>

<hr style="border: 5px solid #003262;" />

### 🔸 What is a Computational Notebook?



In the professional world, from science and engineering to finance and journalism, **computational notebooks** (like this one!) have become a standard tool. These notebooks are typically used to present data analyses through an interactive document that includes

* **Runnable code**
* **Corresponding output** (such as tables and graphs)
* **Explanatory text**

all in one place. Together, these elements create a **complete story of an analysis**.

<br/>

---

Let’s try a quick example to show what we mean by “runnable code.” Below is what’s called a **code block** or a **code cell**.

Each line of code in the block is followed by a **comment**
 (<span style='color:green'>_#like this_</span>).
The hash (`#`) tells the computer to **ignore** everything that comes after it. It’s written only for humans, to help make the code easier to understand.

Click on the code cell below, find the **play button** for this notebook, and click it to **run all the code** in this block.

In [None]:
x = 4 + 5  # make x equal to 4 plus 5
x          # output the contents of x

Below is another code block. To keep things interesting, we've designed this notebook so running the block will **create an error.** Try it out to see what the error looks like.

In [None]:
y        # output the contents of y
y = 5    # put 5 in the contents of y

Every error gives you a clue about how you can fix your code.

This one tells you that you need to assign a value (a number or some text) to **`y`** *before* asking the computer to show it.

Edit the code block above to assign a value to **`y`** before outputting it. Run the block to test and see what happens.
    
When you edit a code cell, the line on the left-hand side turns orange. This is a reminder that something has been changed, but you haven't yet run the changed code.

---

When you are working with code, **order matters**.

In notebooks, there are two different kinds of order:

* The order of **lines** (like in the box above)
* The order in which you **run code blocks** using the play button

Try running the code block below. You’ll see a familiar error.

In [None]:
z

Now, instead of trying to fix the error, try running the block below. It won’t give you any output, since you're not asking for any.

But *then*, run the block **above** one more time to display **`z`**.

In [None]:
z = "Look at me now!"

What this allows you to do is **tinker, edit, and discover** things through making changes to code.  
If you get a result that’s surprising or that inspires more questions, go ahead—**edit the code and run it again** to learn more!

We’ll try an example of what this means for data investigations soon.

It’s helpful to remember that **the order of cells matters** because sometimes, when you get an error in a Jupyter notebook, it could just be because you forgot to press “play” on an earlier box. The current code might depend on something that hasn’t executed yet.

<hr style="border: 1px solid #fdb515;" />

### 🔸 Why Should I Learn About Computational Notebooks?

As a teacher, you know especially well that science is not just about getting an answer.  
It’s about the **process**, and about the **decisions** you make when you:

- Make observations  
- Build models  
- Make calculations  
- Collect data

Notebooks can make this process **transparent**.  
Anyone can see the exact steps — the data, the code, the model, the reasoning — that led to a conclusion.

This is called **reproducibility**, and it’s a cornerstone of modern scientific practice.

Another important feature of notebooks is that they allow students to create simulations that have **probabilistic elements**, modeling **uncertainty** in science and statistics.

Here's a very simple example of what that looks like. Try running the cell a few times.

In [None]:
import random # load a library for random numbers

random.randint(0,10) # output a random integer between 0 and 10

Sharing a notebook is like sharing your:

- Lab notes  
- Calculations  
- Simulations  
- Analyses  
- Final report

This kind of **transparency** is necessary for good scientific and statistical work, and can be very useful for students, too.  
It **demystifies data analysis**, showing it as a series of deliberate, understandable steps.

It directly connects to the ways that science and math education should work:

- Students experience how knowledge is built, revised, and validated  
- They interact with the code and change it  
- They ask their own questions

The biggest hurdle is the **initial learning curve**. Learning the basics of a coding language can feel intimidating.  

That’s why we are here! However, as you’ll see, **you don’t need to be a software developer** to do powerful things.

<hr style="border: 1px solid #fdb515;" />

## 🔸 How do notebooks compare to other tools I use in my classroom?

You already use tools for calculation and data in your classroom.  
How might notebooks like this one fit in?

- #### 📝 Paper & Pencil  
  Classic, inexpensive, highly accessible, flexible, and excellent for sketching ideas or working through a single problem.  

  However, they can become unwieldy when you want students to work with **large datasets**, or to **tinker with models or simulations**.

- #### 📊 Spreadsheets and Calculators (Excel, Google Sheets, or TI-85)  
  Great for entering simple formulas, and basic graphs.  
  It’s good for students to become comfortable with spreadsheets as a popular professional tool.  

  However, spreadsheets and calculators aren't designed for flexible analysis with many variables at once, or for tracking the **process and justification** for your analysis decisions.

- #### 📈 Interactive Graphing Tools (CODAP, Desmos, Tuva, DataClassroom)  
  These make it very easy to explore **patterns and relationships** with data using menu-based and drag-and-drop methods. Some also include tools to document the process.  
      
  However, these tools are designed specifically for **education**, and are **not used in the professional world**. It can be hard to strike a balance between tools that are too **open-ended**, which can make it harder to focus on certain details, or too **constrained** to allow students to explore.

- #### ▶️ Interactive Programming Tools (Code.org)  
  Other interactive tools, like notebooks, integrate text and images with runnable, editable code as a webpage or interactive book.
      
  Computational notebooks allow students to edit not just code, but also the text and other parts of a document. In this way, students can slowly become **authors** of work over time by adding new text, new code, and new ideas.

<br>

### 🔑 Main Point  
  Notebooks integrate the **narrative** (text and pictures), the **analysis** (code), and the **results** (your tables and plots) into a **single, interactive story**.

<hr style="border: 5px solid #003262;" />

# ❄️ A Cool Investigation: Penguin Populations

<div class="alert alert-info">
This activity is designed to give you a little taste of what it can feel like to explore questions with data using Jupyter notebooks. We picked a topic that's easy to imagine doing in a classroom, and we'll share standards connections at the end of the investigation.
</div>

We're going to analyze a simple, fun dataset about penguins. The data were collected in the Palmer Archipelago, Antarctica.


<div style="display: flex; justify-content: space-around;">

<img src="https://github.com/CalCoRE/show-your-work/blob/main/images/adelie.jpg?raw=true" width="150">

<img src="https://github.com/CalCoRE/show-your-work/blob/main/images/gentoo.jpg?raw=true" width="150">

<img src="https://github.com/CalCoRE/show-your-work/blob/main/images/chinstrap.jpg?raw=true" width="150">

</div>

Before we dive in, let's define some key terms. A dataset is a structured collection of information. In our dataset: 
* A **case**, or row in the table, describes a single observation (in this case, a penguin). 
* An **attribute**, or column in the table, describes characteristics or measurements we record for each observation. 
                                                                                
Our dataset, includes several attributes such as `species`, `island`, `bill_length_mm`, `flipper_length_mm`, and more.

This simple, consistent structure is the key to making data analysis work.

<hr style="border: 1px solid #fdb515;" />

## 🔹 Let's get started

You will use code to explore the penguin data. As you do, we will have you keep track of your ideas using these text boxes. This will give you practice with **documenting** your ideas and analysis decisions.

Good documentation does at least two, and sometimes more, things.

1.  **Explanatory (in Text Cells):** Explaining the background information and *why* we are doing something, like we're doing here.

2.  **Technical (in Code Comments):** Using comments (lines starting with `#`) inside the code to explain what a specific line of code does.

---
### 🔸 Step 1: Load the Data

Let's start by importing some libraries. These are code toolkits that allow you to perform certain kinds of specialized analysis. Below, we import libraries for making data tables, and for building data plots. Then, in the same code block, we'll load our data.

In [None]:
# First, we import libraries. These add new functions we can use.

import pandas as pd # pandas is for making data tables
import seaborn as sns # seaborn is for making beautiful plots

# The code below loads the dataset from a file. 
# We give the loaded dataset the name 'penguins'.
penguins = pd.read_csv('https://github.com/CalCoRE/show-your-work/blob/main/data/penguins.csv?raw=true')

Let's take a look at the first few rows of this dataset. Take a moment before you hit run to make a prediction about the output will look like.

In [None]:
# The .head() function will display the first 5 rows of penguins
# It helps confirm the data loaded correctly, and shows its structure.
penguins.head()

**DOUBLE CLICK** The green box below to write responses to the questions. When you are done, click the "run" button to close and format the box.

<div class="alert alert-success">

**Considering the data you see above,**



* What do you notice and wonder about this dataset? 



* What are some questions you could explore?



<br>

---

### 🔸 Step 2: Get Descriptive Statistics

Now that the data is loaded and you've seen a few cases, let's get a statistical summary of _all_ the cases. This helps us understand the some characteristics of the numerical attributes in this dataset, things like: What's the average bill length? What's the shortest and longest flipper that were observed?

In [None]:
# The .describe() function calculates key descriptive statistics
# for all the numerical columns in our data table.
penguins.describe()

<div class="alert alert-success">

**Take some time to review the descriptive statistics above:**



* In what years were these data collected?



* How would you describe a penguin that's 5000 grams, in terms of the general population?



* How would you describe this penguin population in general?



* Is there anything you find surprising about the summary of data?



**DOUBLE CLICK THE GREEN BOX** to add your responses above.

<br>

---

### 🔸 Step 3: Visualize the Data



Let's start to explore some relationships in this dataset. We'll start by showing you how to create a simple scatterplot of the penguins' bill length versus depth:




<img src="https://github.com/CalCoRE/show-your-work/blob/main/images/bill.png?raw=true" width="250" height="150">

Before going to the next step, take some time to think about what you expect to see!

In [None]:
# We use the seaborn library (sns) to create a scatter plot.
sns.scatterplot(
    data=penguins,        # Use our penguins data
    x='bill_length_mm',   # The column for the x-axis
    y='bill_depth_mm'
)

<div class="alert alert-success">

**Consider the scatterplot above:**



* Does it look like what you were expecting? Why, or why not?


* How would you describe the general distribution of points in the scatterplot? What might be some reasons for that distribution?




**DOUBLE CLICK THE GREEN BOX** to add your responses above.

You might wonder whether something like the penguin's island, sex, or species has something to do with these patterns. You can explore this question by using these attributes to color the scatterplot points. Try coloring the plot according to sex! Add the line `hue = 'sex'` inside the `scatterplot` function in the code above and run it. **It might take a few tries to get it right; don't give up!** Then, use the spaces above to take more notes, ask more questions, and continue exploring.

You can also create new scatterplots with different attributes and colors. For example, in addition to coloring data points, you can change their size using (for example) `size = 'sex'`.

Use these tools to try to identify a satisfying candidate explanation for the irregular shape of the plot. 

---

### 🔸 Step 4: Share and Compare Your Findings

<div class="alert alert-success">

**Once you are satisfied with your investigation:**



* How would you describe this population of penguins?


* What are some things that might be impacting the characteristics of these penguins?


* What science and math ideas did you find yourself using during this investigation?



**DOUBLE CLICK THE GREEN BOX** to add your responses above.

As you might have discovered, there are quite a few paths that this investigation can take! If you're interested, check out [this document](https://docs.google.com/document/d/10_5XSb_BIQ5CzFdHhFs5ThVD-MnItylUMFcrNoTRY2k/edit?usp=sharing) to see some of these paths.

<hr style="border: 5px solid #003262;" />

# Curriculum Connections

<div class="alert alert-info">
While this activity is meant for <i>you</i> as an educator, we designed it with classroom practice in mind. Exploring the Palmer Penguins dataset has great connections to several standards: Next Generation Science Standards Disciplinary Core Ideas and Science and Engineering Practices:
</div>

- In the **Next Generation Science Standards**, it connects to ideas about Interdependent Relationships in Ecosystems (LS2.A), Analyzing and Interpreting Data (Middle School Science and Engineering Practice), and Developing and Using Models (High School Science and Engineering Practice).

- In the **Common Core State Standards in Mathematics**, it connects to summarizing, representing, and interpreting data on two quantitative variables (HSS.ID.B.6) and can be connected to discussions of linear and non-linear models.

- In the **Computer Science Teachers Association K-12 Standards**, it connects to representing data using multiple encoding schemes (2-DA-07), creating interactive data visualizations of real-world phenomena (3A-DA-11), and using data analysis tools and techiques to identify patterns in data representing complex systems (3B-DA-05).

---
# Key Takeaways about Computational Notebooks

One strength of computational notebooks is that they can provide **clear models of code**, while also allowing the space to **explore and make discoveries with data**.

Another is that as you learn more about the code, you can begin to create your own **real, integrated data stories**.

Working in a notebook is a new skill. The most important thing you can bring to it is a **productive mindset**. Encourage these habits of mind in yourself and your students:

- **💡 Embrace Tinkering**  
  You can’t permanently break anything. If your notebook gets messy or you get stuck, you can always reset everything.

- **🧠 Code as Expression**  
  Students should read and change the code in ways that reflect _scientific ideas_ about what they are exploring. Code can help externalize and test their ideas about science _and_ data.

- **📝 Document Your Journey**  
  Use text cells as your lab notebook. Write down your questions before you write code. This practice of “documenting as you go” is a powerful skill for thinking and for collaboration.

- **❌ Errors are Your Friends**  
  Everyone gets error messages. An error is not a sign of failure; it is a **clue**. Learning to read error messages is a core part of learning to code.

- **🔍 Be Curious**  
  This is the most important disposition of all. A notebook is a tool for asking and answering questions. Use it to explore your curiosity and empower your students to explore theirs.

<hr style="border: 5px solid #003262;" />

### 🔸 Credits

This notebook was developed as part of "Show York Work" (SyW), a research and development project at UC Berkeley to introduce computational notebooks to K-12 educators.

The SyW team includes, in alphabetical order: Pavritha Arun Anand, Sun Young Ban, Chul Huang, JungMin Shin, Michelle Wilkerson, and Xiaoyue Zhang.

This specific notebook includes contributions from Michelle Wilkerson, JungMin Shin, Sun Young Ban, and Xiaoyue Zhang.

Preliminary drafting of the notebook was done with the assistance of Google Gemini Pro 2.5.

The Palmer Penguins dataset was downloaded from https://github.com/allisonhorst/palmerpenguins, lisenced under CC1.0 Universal. The data were collected from 2007 - 2009 by Dr. Kristen Gorman with the Palmer Station Long Term Ecological Research Program, part of the US Long Term Ecological Research Network.

Adelie, Gentoo, and Chinstrap Penguin photos by [Andrew Shiva](https://commons.wikimedia.org/wiki/User:Godot13) / Wikipedia / CC BY-SA 4.0. The Gentoo photo was cropped; no other modifications were made to the images.

Bill diagram artwork by [@allison_horst](https://allisonhorst.com/).

SyW is supported by a grant from the Barbara Y. White Bequest and by the CalTeach BERET-AIRE (NSF Award - 2419242) Summer Research Institute.