<img src="../../shared/img/slides_banner.svg" width=2560></img>

# 00a - Welcome

In [None]:
%matplotlib notebook

In [None]:
import sys

sys.path.append("../../")

from shared.src import quiet
from shared.src import seed
from shared.src import style

In [None]:
from IPython.display import HTML, IFrame, Image, YouTubeVideo
import matplotlib.pyplot as plt
import pandas as pd

import utils.daft

# What is this class?

## Data Science for Research Psychology

## What do we mean by _Data Science_?

### Data 8: The Foundations of Data Science
> The UC Berkeley Foundations of Data Science course combines  ... **inferential thinking** \[and\] **computational thinking** ... The course teaches **critical concepts and skills in computer programming and statistical inference**

"Data Science" is the intersection of Computer Science and Statistics.

### Ubiquitous computing power has changed statistical practice ...

so it can and should also change statistical education.

#### Ubiquitous Computing

Around 3 out of 4 of U.S. adults are carrying around a powerful computer in their pockets.

In [None]:
# SOURCE: Pew Research Center
IFrame(src="https://www.pewinternet.org/chart/mobile-phone-ownership/iframe/",
       width="1840px", height="480px", scrolling="no")

#### Consider Apollo 11

> ... the latest phones typically have 4GB of RAM. ... This is more than one million ... times more memory than the Apollo computer had in RAM ...

> ... But memory isn’t the only thing that matters. The Apollo 11 computer had a processor ... which ran at 0.043 MHz ... the iPhone in your pocket has over 100,000 times the processing power of the computer that landed man on the moon 50 years ago.

[SOURCE: realclearscience.com](https://www.realclearscience.com/articles/2019/07/02/your_mobile_phone_vs_apollo_11s_guidance_computer_111026.html)

In [None]:
print("Margaret Hamilton, Programming Lead for Apollo 11 Guidance Computer, with her codebase.")
Image("img/margaret_hamilton.jpg", height="25%", width="25%")

SOURCE: https://qz.com/726338/the-code-that-took-america-to-the-moon-was-just-published-to-github-and-its-like-a-1960s-time-capsule/

[Source code](https://github.com/chrislgarry/Apollo-11/) and
[simulator](https://github.com/virtualagc/virtualagc)
can be downloaded from GitHub in seconds,
as files around the size use to stream 10 minutes of music.

With the click of a button and
for just [10 cents/s](https://aiimpacts.org/recent-trend-in-the-cost-of-computing/),
you can rent 100,000 times that compute power!

Compare that to advances in rocket technology:
a trip to the space, let alone the moon, remains out of the reach of the average American.

#### As in other fields, this has fundamentally changed statistics.

In [None]:
Image("img/f_table.gif")

[SOURCE](https://www.oreilly.com/library/view/a-step-by-step-approach/9781590474174/9781590474174_app03.html)

This table contains the "Critical Values" for an important and often-used statistic, $F$.
Without the ability to compute these values, it's impossible to do certain common statistical tests.
These values were difficult enough to compute that, until the advent of ubiquitous computing,
they had to be computed by specialists and then published in tables, like this one.

This led early statistics to emphasize specific tests for which these tables were computable
with the resources of the time.

Now, they can be computed in fractions of a second by anyone with the right software packages,
and critical values can be found on the fly even faster.

### ... and computing has become more statisical.

The increasing ease and decreasing cost of collecting, saving, and analyzing data
has made statistical analysis of data a focus of computing and
revolutionized business, science, and politics.

In [None]:
YouTubeVideo(
    id="8WVoJ6JNLO8", start=256, width=1.5 * 600, height=1.5 * 450)

[SOURCE: The Visual Capitalist](https://www.visualcapitalist.com/animation-top-15-global-brands-2000-2018/)

In [None]:
Image("img/top-10-companies-100-years.jpg", width=600)

This has led to the [oft-repeated](),
[oft-disputed](https://towardsdatascience.com/data-is-not-the-new-oil-bdb31f61bc2d) quote "data is the new oil".

### The centrality of statistical computing has led to the rise of the field of Data Science.

In [None]:
df = pd.read_csv("data/ds_v_stats.csv"); df["Date"] = df["Date"].apply(pd.to_datetime)
print("SOURCE: Google Trends Data (https://www.google.com/trends)")
f, ax = plt.subplots(figsize=(10, 4))

In [None]:
ax.plot("Date", "Data Scientist", data=df, lw=4); ax.plot("Date", "Statistician", data=df, lw=4);
ax.legend(); ax.set_xlabel("Date"); ax.set_ylabel("Google Search Interest in US\n% of Max"); plt.tight_layout();

[SOURCE: The Visual Capitalist](https://www.visualcapitalist.com/most-valuable-companies-100-years/)

### But the rise of data is about more than money!

Social scientists are increasingly raising alarms about how data is being used
and the impacts of data-driven decision-making.


The world needs more informed, conscious citizens to resolve political and ethical concerns around data.

In [None]:
HTML("src/aoc_tweet.html")

In [None]:
Image("img/eubanks_auto_ineq.jpg") # amazon.com/Automating-Inequality-High-Tech-Profile-Police/dp/1250074312

In [None]:
Image("img/oneil_math_destruction.jpg") # amazon.com/Weapons-Math-Destruction-Increases-Inequality/dp/0553418831/

### The Approach:

- "Data Science" is the intersection of **computer science and statistics**. These ideas should be **taught together, just as they are used together**.

- The combination of computer science and statistics can and should be **taught differently from the sum of its parts**.

- **Contemporary technical tools for education** should be used to scale up and smooth over the process of teaching statistics.

# What is this class?

## ~~Data Science~~ for Research Psychology

## What do we mean by _for Research Psychology_?

### This course will connect with research psychology in two ways:

- Learning to **evaluate quantitative claims** in psychology research using **traditional statistical methods**

- Learning skills and tools used to engage in **new forms of psychology research enabled by computational methods**

#### Learning to evaluate claims using traditional statistical methods

In the traditional method, values called
_statistics_ (e.g. $F$, $p$, $t$)
are calculated on data and compared to reference values
to determine whether patterns in data are likely due to chance.

#### Example: MDMA and Responses to Rejection

The street reputation of MDMA (molly, ecstasy) is that it makes you feel more loving, socially-connected and empathic, in addition to more traditional stimulant effects.

What empirical evidence can we find of this?

In a paper back in 2013, I examined this with some colleagues at the University of Chicago.

We used a simulated social exclusion/inclusion paradigm called _Cyberball_.

In [None]:
Image("img/cyberball.png")

#### [SOURCE](https://www1.psych.purdue.edu/~willia55/Announce/cyberball.htm)

We found that MDMA buffered against the impacts of simulated social rejection.

In [None]:
Image("img/frye_et_al_2013_fig2.png")

and we reported our results like this:

> MDMA reduced the effect of rejection on mood and self-esteem (linear dose × social condition, F[1, 34] = 4.206, p < .05 and F[1, 34] = 5.626, p < .05, respectively ... )

\- [Frye et al., 2013](https://www.researchgate.net/publication/259246733_MDMA_decreases_the_effects_of_simulated_social_rejection)

In this class, we will learn to perform analyses like these
and report our results in this fashion.

But we will also learn to recognize these as claims about a _model of the world_ and how it relates to our data.

We'll represent our models as Python code and as graphs, like this one:

In [None]:
utils.daft.display_mdma_model()

#### Learning tools for computational psychology

In addition to providing new ways to think about old approaches,
computation will allow us to ask questions that would be impossible otherwise.

#### Example: Semantic Maps of the Brain

Recent research from Berkeley by [Huth et al., 2016](https://www.nature.com/articles/nature17637):

> In this experiment people passively **listened to stories from the Moth Radio Hour** while **brain activity was recorded**. Voxel-wise modeling was used to determine how **each individual brain location responded to 985 distinct semantic concepts** in the stories. The demo shows how these concepts are mapped across the cortical surface.

DEMO: https://gallantlab.org/huth2016/.

Use the buttons in the top-left that say "Next" for a guided tour.

The cell below shows a brain region I found interesting:
a region for moral judgment?

In [None]:
Image("img/huth_demo.png")

Even sophisticated experiments like this one rely on proposing a model
and comparing its predictions to the data you observed.

In [None]:
utils.daft.display_huth_model()  # after some simplifications

### Questions we will learn to answer:

- how do I use statistical methods to **interpret the outcome of an experiment**?

- how do I know when my statistical **methods might fail** me?

- what are the common statistical **methods I will encounter** as I practice psychological research?

- how do I **communicate scientific findings** to other psychologists?

# How does this class work?

This course

- ... is centered on **Jupyter notebooks**, which allow you to interactively run Python code with rich multimedia outputs using the browser.

- ... is **deployed on cloud computing platforms**, to avoid technical problems with installation.

- ... uses **partial autograding** to allow for real-time feedback on assignments.

This course is in its first iteration and is _highly experimental_.

I'll be asking for your feedback during the course,
via Piazza and through a middle-of-term evaluation.

I'll also ask for your patience with technical glitches
and other foibles as the course is rolled out.

Let's check out the [course website](https://charlesfrye.github.io/psych101d).

The most important component is the table of course materials,
which is populated with links.

Some of them look like the image below:

In [None]:
Image(url="https://charlesfrye.github.io/psych101d/content/shared/img/interact_badge.svg")

Click these if you are a Berkeley-affiliated user. You'll need a CalNet ID.

The others look like this:

In [None]:
Image(url="https://mybinder.org/badge_logo.svg")

these are for non-Berkeley folks who want to check out the course materials.

Notice that there are some materials already present, in case you want a sneak preview of where the course is going.

These materials are subject to change, so don't put in too much work.

The course has

- weekly labs, on Fridays
- weekly homeworks
- two exams: midterm and final
- a final (group) project

## Homeworks

Homeworks give you a chance to "drill" on concepts from the lectures.
Some will be due before lab, with the intent that you practice a skill that's important for doing the lab.

## Labs

Labs are more open-ended.
They are designed to teach you how to "play with" concepts and tools from the lectures.

It is intended for it to be possible to complete the lab within the two hour in-person time allotted on Fridays.
Attendance is not mandatory.
It is, however, encouraged, since you will be able to compare notes with your classmates and ask questions of the instructors.
Given the open-ended nature of the lab assignments and the limited amount of automatic grading on labs,
the direct feedback given in the lab sections will be especially useful.

## Group Project

In the group project, you will apply the modeling skills you've developed in the course to real data, ideally from a psychological experiment.

The exact format of the group project is to be determined.
Possibilities include poster, write-up, Jupyter notebook.

Groups will be no larger than 3 and no smaller than 1.

There will be a final presentation, tentatively scheduled for the last lab session, on 12/6/2019.

## Exams

The exams will be in-person and have a written component. Their final format is to be determined.

Together, the exams count for less of your grade than do the homeworks and labs.
Relative performance on those will be considered if the exams bring you to a grade border.

The midterm is tentatively scheduled for 10/30/2019, in-class.

# What's Next?

We _will_ meet for the first lab, on August 30th.

We will go over how to use Jupyter notebooks and the OK autograder.

There is no class on September 2nd, for Labor Day.

On Wednesday, September 4th, we'll do a whirlwind review of Python.

See the [Python Resources](https://charlesfrye.github.io/psych101d/res/) if you're feeling rusty.