<a href="https://colab.research.google.com/github/LarrySnyder/ASJ/blob/main/intro_notebooks/Intro_to_Jupyter.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Hi there! 👋

You found your way to our first Jupyter notebook in Colab. Congrats!

This file is read-only. To work with it, you first need to **save a copy to your Google Drive:**

1. Go to the *File* menu. (The *File* menu inside the notebook, right below `Intro to Jupyter`—not the *File* menu in your browser, at the top of your screen.)
2. Choose *Save a copy in Drive*. (Log in to your Google account, if necessary.) Feel free to move it to a different folder in your Drive, if you want.
3. Colab should open up a new browser tab with your copy of the notebook. Double-click the filename at the top of the window and rename it `Intro to Jupyter [your name(s)]`. 
4. Close the original read-only notebook in your browser.



---
> 👓 **Note:** This notebook is part of the *Algorithms and Social Justice* course at Lehigh University, Profs. Larry Snyder and Suzanne Edwards.
---


---
> 👓 **Note:** Portions of this notebook are adapted from the *Data-4ac* course at the University of California–Berkeley, Spring 2021, Prof. Margarita Boenig-Liptsin. Available at https://github.com/ds-modules/data-4ac. 
---



# Introduction to Jupyter Notebooks

Welcome to this Jupyter Notebook! **Notebooks** are documents that support interactive computing in which code is interwoven with text, visualizations, and more.

The Jupyter Notebook was first released as a web-based interface in 2011. It originally supported the programming languages **Ju**lia, **Py**thon, and **R** (hence "Jupyter"), but now supports over 40 different languages.

Notebook interfaces for other programming languages (e.g., Mathematica and MATLAB) have been around even longer, but Jupyter notebooks have become a standard way to share research methodology and results for open-source scientific tools. 


## Why (Jupyter) Notebooks?
Notebooks are used for *literate programming*, a programming paradigm introduced by Donald Knuth in 1984, in which a programming language is accompanied with a documentation language, or a natural language. In other words, the computer program has an explanation in a natural language. This approach to programming effectively treats software as works of literature ([Knuth, *Literate Programming*](http://www.literateprogramming.com/knuthweb.pdf)).  It helps readers and users to have a strong conceptual map of what is happening in the code and also have clarity on the flow and logic of the code/program.

Jupyter leverages this idea and enables users to create and share documents that combine code, visualizations, narrative text, equations, and rich media. Notebooks are multipurpose and can be used in any discipline. The notebook is like a laboratory notebook, but for computing. Researchers can write code to work with their data while supplementing their methods with explanations, analysis, or hypotheses. Notebooks are also used in education because they enable students  to engage with content presented in different forms, experience computation with no prior experience, and practice programming in a scaffolded way. 

---
> 👓 In ISE/WGSS 296, we'll be using Jupyter notebooks to introduce you to how algorithm designers implement their algorithms, to learn about issues of social justice using real-world data sets, and to learn how to reason about the human choices embedded in algorithms.
---

You're using this Jupyter notebook in Google's Colaboratory system (or **Colab**, for short). Colab is only one of many ways to access a Jupyter notebook, but it's convenient for us since it can be shared easily, it lives in the Google ecosystem, and it bypasses some of the annoying Python-related stuff that you'd need to do if you run Jupyter locally on your computer.

## Notebook Cells

A notebook is composed of rectangular sections called **cells**. There are 2 kinds of cells: markdown and code. A **markdown cell**, such as this one, contains text. A **code cell** contains computer code—often in Python, the programming language that we will be using throughout this class, but sometimes in another language.

You can select any cell by clicking it once.


To "run" a code cell (i.e. tell the computer to perform the programmed instructions in the cell), select it and then either:
- press `Shift` + `Return`, or
- click the ▶ icon to the left of the cell.

If a code cell is running, you will see a progress indicator the left of the cell. Once the cell has finished running, a number will replace the asterisk and any output from the code will appear under the cell.


## Let's Try It!

Time to create your (maybe) first Python program. Everybody's favorite first program prints "Hello, world!" to the screen, and that's what yours will do, too.

In the cell below this one, delete the ellipsis (`...`) and type

```
print('Hello, world!')
```

Type it exactly as it is above—don't change any of the punctuation, capitalization, etc. 

After you type the command, and while the cursor is still in the cell, run the cell using one of the methods described above.

If you did everything correctly, the output of your code—`Hello, world!`—will appear below the cell.


In [None]:
# YOUR CODE HERE!
...

# Working with Notebooks



## Comments

You'll notice that many code cells contain lines of colored text that start with a `#`. These are **comments**. Comments often contain helpful information about what the code does or what you are supposed to do in the cell. The leading `#` tells the computer to ignore whatever text follows it—that text is meant for humans, not for the computer.

## Editing Markdown Cells

You can change the text in a markdown cell by double-clicking it. Text in markdown cells is written in [**Markdown**](https://www.ibm.com/docs/en/watson-studio-local/1.2.3?topic=notebooks-markdown-jupyter-cheatsheet), a formatting language for plain text, so you may see some funky symbols should you try and edit a markdown cell we've already written. Once you've made changes to a markdown cell, you can exit editing mode by running the cell the same way you'd run a code cell.

**Try double-clicking on this text to see what some Markdown formatting looks like.**

## Adding and Deleting Cells

You can add cells by clicking either `+ Code` or `+ Text` at the top of the screen. This will add a cell immediately below the currently selected cell.

To delete a cell, select it and then click the trashcan icon in the popup menu at the top-right of the cell. (If you accidentally delete a cell that you need, you can undo it using 'Edit' > 'Undo Delete Cells'. If you accidentally delete content in a cell, you can use `Ctrl` + `Z` to undo.)

## Saving and Loading

Your notebook will automatically save your text and code edits, as well as any graphs you generate or any calculations you make. However, you can also manually save the notebook in its current state by using `Ctrl` + `S` or by going to the `File` menu and selecting `Save`.

Next time you open your notebook, it will look the same as when you last saved it!

**Note:** When you load a notebook you will see all the outputs from your last saved session (such as graphs, computations, etc.) but you won't be able to use any of the variables you assigned in your code without running it again. (You'll learn a little bit about variables in the next section.)

An easy way to "catch up" to the last work you did is to highlight the cell you left off on, then go to `Runtime` menu and click `Run before`.

# Some Things You Can Do with Jupyter Notebooks

Here are a handful of examples of other things you can do with Jupyter notebooks. This is just to illustrate the range of possibilities. It is by no means exhaustive!

## Crunch Numbers

Jupyter notebooks provide full access to Python, which has enormously powerful tools for doing calculations. For example:

In [None]:
# Lehigh has 5451 undergraduates and 1812 graduate students.
# How many students does Lehigh have in total?
# (Source: https://www1.lehigh.edu/about/university-statistics)
5451 + 1812

(Don't forget to run each code cell!)

In [None]:
# Lehigh class of 2020 had 1528 students. 95% of them are employed, continuing 
# education or pursuing military or volunteer service.
# How many students is this?
# (Use * for multiplication, / for divide.)
1528 * 0.95

In [None]:
# Average high temperature on Lehigh commencement day is 74°F. 
# What is this in Celcius?
(74 - 32) * 5 / 9



For some mathematical tasks, we need to use Python **packages**. (A package is a collection of code written by other people that we can use in our own code.) We first have to **import** the package, then we can use it. One useful package is called `math`.

In [None]:
import math

# Side of a square whose area is 90.
math.sqrt(90)

In [None]:
# Cosine of π/4.
math.cos(math.pi / 4)

# (Once we import a package, we don't have to import it again until we restart 
# the notebook.)

Often, it's convenient to store numbers in **variables** to use them again later:

In [None]:
# Calculate the sine of 60°. (Python's sin() function assumes the input is in
# radians, not degrees. Fortunately, Python also has a function to convert
# for us.)
angle_in_deg = 60
angle_in_rad = math.radians(angle_in_deg)
math.sin(angle_in_rad)

#### Questions

**Question 1a**

Lehigh has 5451 undergraduates and 552 full-time faculty. What is the student : faculty ratio?

In [None]:
# YOUR CODE HERE!
...

**Question 1b**

Approximately 2015 of Lehigh's undergraduates are in CAS. What percentage of Lehigh undergraduates are in CAS?

In [None]:
# YOUR CODE HERE!
...

**Question 1c**

Calculate the square root of 40. 

In [None]:
# YOUR CODE HERE!
...

**Question 1d**

Calculate the natural logarithm of 7. (Use the `math.log()` function.)

In [None]:
# YOUR CODE HERE!
...

## Work with Datasets

You can a load dataset from a URL or from your Google Drive, and work with it here in the notebook. We'll be doing lots of this during this course. 

Here's a quick example, in which we'll explore a dataset published by the city of Chicago that provides information about monthly ridership on the "L" (Chicago's public rail transit system) at each station. We'll use the `pandas` package, which is used for data handling. 

In [None]:
# Import the pandas package (and abbreviate it as 'pd').
import pandas as pd

# Load the data from the city of Chicago data portal.
rides = pd.read_csv('https://data.cityofchicago.org/resource/t2rn-p8d7.csv')

# Display the first few lines of the table.
rides.head()

In [None]:
# Get a summary of the data.
rides.describe()

From the summary we can see that, for example, on average, 89,448.7 riders use each station per month.

In [None]:
# Show all rows for Loyola station.
rides[rides['stationame'] == 'Loyola']

So, for example, during the month of April, 2001, there were an average of 4878.9 rides per weekday.

#### Questions

Use the code above as a guide to answer the questions below.

**Question 2a**

Load the "Average Daily Traffic Counts" dataset from the city of Chicago data portal at https://data.cityofchicago.org/resource/pfsx-4n4m.csv. (The dataset contains manual counts of cars passing specific locations, taken in 2006.) Display the first few lines of the table.

In [None]:
# YOUR CODE HERE!
...

**Question 2b**

Display a summary of the data.

In [None]:
# YOUR CODE HERE!
...

**Question 2c**

What is the average "total passing vehicle volume" across all measurements? What is the smallest traffic volume measured? The largest?

**Answer:** *YOUR ANSWER HERE*

**Question 2d**

Display all the rows for measurements taken on Homan Ave (i.e., all rows in which the `street` column equals `Homan Ave`).

In [None]:
# YOUR CODE HERE!
...

**Question 2e**

What address on Homan Ave had the largest traffic count, and what was the count?

**Answer:** *YOUR ANSWER HERE*

## Visualize Data

Data visualization has become increasingly important in using data to understand the world, communicate narratives, and advocate for change. Python includes many tools for data visualization, most of which can be used within Jupyter notebooks. Here's an example, which uses the data from the FiveThirtyEight story "[Comic Books Are Still Made By Men, For Men And About Men](https://fivethirtyeight.com/features/women-in-comic-books/)." (FiveThirtyEight, like many other news outlets, makes its data publicly available on the code-sharing site GitHub.) The dataset we'll use lists all DC Comics characters introduced between 1935 and 2014, including each character's gender and other characteristics, the year they were introduced, and the number of appearances of the character in comic books.

In [None]:
# Read the dataset and display the first few lines.
dc = pd.read_csv('https://raw.githubusercontent.com/fivethirtyeight/data/master/comic-characters/dc-wikia-data.csv')
dc.head()

In [None]:
# Create a histogram showing the number of characters introduced each year,
# like the first figure in the article.
# (Might as well use the built-in FiveThirtyEight style, too!)
import matplotlib as mpl
mpl.style.use('fivethirtyeight')
hist = dc.hist('YEAR', bins=79)

In [None]:
# Create a scatter plot showing the number of appearances of each character
# versus the year the character was introduced.
scatter = dc.plot.scatter(x='YEAR', y='APPEARANCES')

#### Questions

Use the code above as a guide to answer the questions below.

**Question 3a**

Load the data from the FiveThirtyEight story "Our Guide To The Exuberant Nonsense Of College Fight Songs" from the FiveThirtyEight GitHub repository at https://raw.githubusercontent.com/fivethirtyeight/data/master/fight-songs/fight-songs.csv. (The dataset contains data on the fight songs from schools in the Power Five conferences, including when it was written and by whom, and the tempo and duration of the song.) Display the first few lines of the table.

In [None]:
# YOUR CODE HERE!
...

**Question 3b**

The `number_fights` column indicates the number of times the song says "fight." Create a histogram that shows the number of songs with each number of "fights." (Set `bins=18`.)

In [None]:
# YOUR CODE HERE!
...

**Question 3c**

Create a scatter plot showing the beats per minute of each song (`bpm`) versus the duration of the song in seconds (`sec_duration`). (This is a simpler version of the figure in the article.)

In [None]:
# YOUR CODE HERE!
...

------

# Notebooks in Practice

Jupyter notebooks encourage exploration, allowing users to iteratively update code and document the results. Notebooks are great for exploring algorithms and data in an interactive way—interacting both with the code and with other users. 

Scientific papers—the primary means for communicating scientific research for centuries—are static documents that often use technical jargon, replace concrete ideas with abstract or symolic representations, and report the results of experiments but not the actual implementation or data. 

Notebooks foster transparency and encourage reproducibility. They are an effective way to supplement—or sometimes replace—static documents like scientific papers with interactive ones that provide the reader with all of the code and data necessary to reproduce the results, tinker with the approach, and build new experiments using the original ones as a starting point. In other words, Jupyter notebooks support the computational work of researchers from different fields, from astronomy to psychology to literature, and therefore enable new ways for researchers in very different domains to share research tools and methods, and to learn from one another.


### Things to Think About

As we mentioned above, notebooks are used to make programming easier to read by interleaving code with text and other media types. Just as literature allows for creativity and supports multiple interpretations, we can treat notebooks as a medium that lets us tell complex stories that incorporate programming. 

What does that imply about notebooks as a medium and about us as readers? Can you think of ways to incorporate Jupyter notebooks into social justice projects?

(*You do not need to submit responses to these questions. These are for you to think about on your own.*)


---

# How to Submit this Notebook

* Please click the "Share" button at the top of the page. In the Share options:

    * Under "General Access", change to "Restricted" (instead of "Anyone with the link").
    * At the top, share it with Oumaima (ous219@lehigh.edu), Larry (lvs2@lehigh.edu), and Suzanne (sme6@lehigh.edu).
    * Click "Copy Link".
    * Click "Done".

* Send a Slack DM to Oumaima with the link you copied.
