# Lecture 15: Introduction to the Policy context

Aidan Feldman

Computing in Context (SIPA)

## Structure for today

1. Intro
1. Going over course info like the syllabus, tools, etc.
1. Rewind: Programming languages, data, and Jupyter

## About me

- Coding since 2005 🖥
- Government since 2014 🦅
- Teaching since 2011 🎓
- Also a modern dancer 💃 and cyclist 🚲

### Day jobs

Currently [freelancing](https://blog.afeld.me/know-any-teams-who-could-use-my-help-87a4247c51c1) with the [Colorado Behavioral Health Administration](https://bha.colorado.gov/). In the past, have worked for...

#### Government

- [NYC Office of Technology & Innovation (OTI)](https://www.nyc.gov/content/oti/pages/)
- [California Department of Transportation (CalTrans)](https://dot.ca.gov/)
- [General Services Administration (GSA)](https://www.gsa.gov/)
   - [18F](https://en.wikipedia.org/wiki/18F)
   - [TTS](https://www.gsa.gov/about-us/organization/federal-acquisition-service/technology-transformation-services)
- [Census xD](https://xd.gov/)
- [NYC Planning Labs](https://labs.planning.nyc.gov/)

#### Non-profits

- [Reinvent Albany](https://reinventalbany.org/)
- [Upsolve](https://upsolve.org/)
- [VoteAmerica](https://www.voteamerica.org/)

#### Tech companies

- [GitHub](https://github.com/)
- [Artsy](https://www.artsy.net/)

## Intros

- Name
- Pronouns
- Why you're taking this class / what you want to do with it
   - The more specific, the better.

## Access the course site

[**computing-in-context.afeld.me**](https://computing-in-context.afeld.me/)

You can also get there through CourseWorks.

## Class structure

### Class materials walkthrough

New context-specific stuff:

- [Syllabus](https://computing-in-context.afeld.me/)
- Files
- [CourseWorks](https://courseworks2.columbia.edu/courses/207091)
- [Ed](https://courseworks2.columbia.edu/courses/203144/external_tools/37606?display=borderless)

## Disclaimers

### Me

- Here to teach you to:
   - Do a lot with just a little code
   - Troubleshoot
   - Google stuff
- Not a statistician

### You

- Are not going to understand everything the first time
- Will want to throw your computer out a window at one or many points in the class
   - Celebrate the little victories
- Will get out of it what you put into it

### Politics/protests/war

## ⏪ Restart

## Spreadsheets vs. programming languages

What do you like about spreadsheets?

### Why spreadsheets

- The easy stuff is easy
- Lots of people know how to use them
- Mostly just have to point, click, and scroll
- Data and logic live together as one

### Why programming languages

- Data and logic _don't_ live together
   - Why might this matter?

- More powerful, flexible, and expressive than spreadsheet formulas; don't have to cram into a single line

   ```
   =SUM(INDEX(C3:E9,MATCH(B13,C3:C9,0),MATCH(B14,C3:E3,0)))
   ```

- Better at working with large data
   - [Google Sheets](https://support.google.com/drive/answer/37603) and [Excel](https://support.microsoft.com/en-us/office/excel-specifications-and-limits-1672b34d-7043-467e-8e27-269d656771c3) have hard limits at 1-5 million rows, but get slow long before that
- Reusable code (packages)
- Automation

### Side-by-side<sup>1</sup>

|                       Task |      Spreadsheets      | Programming Languages |
| -------------------------: | :--------------------: | :-------------------: |
|           **Loading data** |          Easy          |        Medium         |
|           **Viewing data** |          Easy          |        Medium         |
|         **Filtering data** |          Easy          |        Medium         |
|      **Manipulating data** |         Medium         |        Medium         |
|           **Joining data** |          Hard          |        Medium         |
| **Complicated transforms** | Impossible<sup>2</sup> |        Medium         |
|             **Automation** | Impossible<sup>2</sup> |        Medium         |
|        **Making reusable** |   Limited<sup>3</sup>  |        Medium         |
|         **Large datasets** |       Impossible       |         Hard          |

1. These ratings are obviously subjective
1. Not including scripting, including [Excel's new Python+pandas support](https://support.microsoft.com/en-us/office/introduction-to-python-in-excel-55643c2e-ff56-4168-b1ce-9428c8308545)
1. [Google Sheets supports named functions](https://support.google.com/docs/answer/12504534)

### Python vs. other languages

- Good for general-purpose _and_ data stuff
- Widely used in both industry and academia
- Relatively easy to learn
- Open source

![Python logo](https://upload.wikimedia.org/wikipedia/commons/thumb/c/c3/Python-logo-notext.svg/110px-Python-logo-notext.svg.png)

### Where to Python

Pyton can be run in:

- A text file, using the `python` command
- [The interactive Python interpreter / command prompt / shell](https://www.python.org/shell/)
- An [integrated development environment (IDE)](https://runestone.academy/ns/books/published/thinkcspy/Appendices/usingIDE.html) like [Spyder](https://www.spyder-ide.org/) or [PyCharm](https://www.jetbrains.com/pycharm/)
- A [Jupyter notebook](https://docs.jupyter.org/en/latest/#what-is-a-notebook)
    - [Various other tools](https://python-public-policy.afeld.me/en/columbia/resources.html#jupyter-outside-this-course) are built around them
    - What we'll be using for this class

Each can be on your computer ("local"), or in the cloud somewhere. All call `python` under the hood, more or less.

## Packages

- a.k.a. "libraries"
- Developers have create them to make code/functionality reusable and easily sharable
- Software plugins that you `import`
- Main packages we’ll use:
    - `pandas`
    - `plotly`

> A module is a file containing Python definitions and statements.

https://docs.python.org/3/tutorial/modules.html

Your code, part of the [standard library](https://docs.python.org/3/library/index.html), or part of a package.

### Pandas

_Review from Lab 7_

- A Python package (bundled up code that you can reuse)
- Very common for data science in Python
- [A lot like R](https://pandas.pydata.org/docs/getting_started/comparison/comparison_with_r.html)
   - Both organize around "data frames"

## Jupyter

- Web based programming environment
- Supports Python by default, and other languages with added [kernels](https://docs.jupyter.org/en/stable/projects/kernels.html)
- Nicely displays output of your code so you can check and share the results
- Avoids using the command line

We'll be using [JupyterLab](https://jupyterlab.readthedocs.io/en/latest/) through the [Anaconda Distribution](https://www.anaconda.com/download).

### Command line vs. Jupyter

![Command line vs. Jupyter output](img/cli_vs_jupyter.png)

### Jupyter basics

A "cell" can be either code or [Markdown](https://www.markdownguide.org/getting-started/) (text). Raw Markdown looks like this:

```
## A heading

Plain text

[A link](https://somewhere.com)
```

#### Running

- You "run" a cell by either:
    - Pressing the ▶️ button
    - Pressing `Control`+`Enter` on your keyboard
- Cells don't run unless you tell them to, in the order you do so
    - Generally, you want to do so from the top every time you open a notebook

#### Output

- The last thing in a code cell is what gets displayed when it's run
- The output gets saved as part of the notebook
- Just because there's existing output from a cell, doesn't mean that cell has been run during this session

### Some pandas/Jupyter best practices

- Make variable names descriptive
    - Ignore that all examples use `requests`
- Only do one thing per line
    - Makes troubleshooting easier
- Make notebooks [idempotent](https://en.wikipedia.org/wiki/Idempotence)
    - Makes your work reproducible
    - Use `Restart and run all` (⏩ button in toolbar)