<img src="../../shared/img/slides_banner.svg" width=2560></img>

# Welcome

In [None]:
%matplotlib notebook

In [None]:
import sys

sys.path.append("../../")

from shared.src import quiet
from shared.src import seed

In [None]:
from IPython.display import HTML, Image, YouTubeVideo
import matplotlib.pyplot as plt
import pandas as pd

# What is this class?

This is an experimental new course building off of [data8](http://data8.org), _The Foundations of Data Science_.

It is a rethinking of a long-standing course, PSYCH101, _Research and Data Analysis in Psychology_.

PSYCH101-D brings the _approach_ of data8 to the _topics_ of PSYCH101.

## What is data8?

### Data 8: The Foundations of Data Science
The UC Berkeley Foundations of Data Science course combines three perspectives: **inferential thinking, computational thinking**, and real-world relevance. Given data arising from some real-world phenomenon, how does one analyze that data so as to understand that phenomenon? The course teaches **critical concepts and skills in computer programming and statistical inference**, in conjunction with hands-on analysis of real-world datasets, including economic data, document collections, geographical data, and social networks. It delves into social issues surrounding data analysis such as privacy and design.

### The Core Idea of data8:

Ubiquitous computing power and the rise of data have changed statistical practice, so they can and should also change statistical education.

#### Ubiquitous Computing: Consider Apollo 11

> ... the latest phones typically have 4GB of RAM. ... This is more than one million (1,048,576 to be exact) times more memory than the Apollo computer had in RAM ...

> ... But memory isn’t the only thing that matters. The Apollo 11 computer had a processor – an electronic circuit that performs operations on external data sources – which ran at 0.043 MHz ... the iPhone in your pocket has over 100,000 times the processing power of the computer that landed man on the moon 50 years ago.

[SOURCE: realclearscience.com](https://www.realclearscience.com/articles/2019/07/02/your_mobile_phone_vs_apollo_11s_guidance_computer_111026.html)

In [None]:
print("The Director of Software Engineering for Apollo 11, with her codebase.")
Image("img/margaret_hamilton.jpg")

SOURCE: https://qz.com/726338/the-code-that-took-america-to-the-moon-was-just-published-to-github-and-its-like-a-1960s-time-capsule/

Now available for [download on GitHub.com](https://github.com/chrislgarry/Apollo-11/),
as a file smaller than would be used to stream music, let alone Netflix.

#### The Rise of Data

In [None]:
YouTubeVideo(
    id="8WVoJ6JNLO8", start=180, width=1.5 * 600, height=1.5 * 450)

[SOURCE: The Visual Capitalist](https://www.visualcapitalist.com/animation-top-15-global-brands-2000-2018/)

In [None]:
Image("img/top-10-companies-100-years.jpg", width=600)

[SOURCE: The Visual Capitalist](https://www.visualcapitalist.com/most-valuable-companies-100-years/)

In [None]:
df = pd.read_csv("data/ds_v_stats.csv"); df["Date"] = df["Date"].apply(pd.to_datetime)
print("SOURCE: Google Trends Data (https://www.google.com/trends)")
f, ax = plt.subplots(figsize=(10, 4))
ax.plot("Date", "Data Scientist", data=df, lw=4); ax.plot("Date", "Statistician", data=df, lw=4)
ax.legend(); ax.set_xlabel("Date"); ax.set_ylabel("Google Search Interest in US\n% of Max"); plt.tight_layout();

In [None]:
# Bootstraps!
[ax.plot("Date", "Data Scientist", data=df.sample(frac=1, replace=True).sort_values("Date"),
         lw=4, color="C0", alpha=0.01) for _ in range(100)];
[ax.plot("Date", "Statistician", data=df.sample(frac=1, replace=True).sort_values("Date"),
         lw=4, color="C1", alpha=0.01) for _ in range(100)];

This has led to the [oft-repeated](),
[oft-disputed](https://towardsdatascience.com/data-is-not-the-new-oil-bdb31f61bc2d) quote "data is the new oil".

But the rise of data is about more than money!

Social scientists are increasingly raising alarms about how data is being used and the impacts data-driven decision-making is having.

Understanding their arguments and the stakes thoroughly enough to be truly informed citizens necessitates engagement with data science.

In [None]:
Image("img/eubanks_auto_ineq.jpg") # amazon.com/Automating-Inequality-High-Tech-Profile-Police/dp/1250074312

In [None]:
Image("img/oneil_math_destruction.jpg") # amazon.com/Weapons-Math-Destruction-Increases-Inequality/dp/0553418831/

### The Approach:

- "Data Science" is the intersection of computer science and statistics. These ideas should be taught together, just as they are used together.

- The combination of computer science and statistics can and should be taught differently from the sum of its parts.

- Contemporary technical tools for education should be used to scale up and smooth over the process of teaching statistics.

## What is PSYCH101?

PSYCH101 is the quantitative research methods course in the Psychology Department.

It teaches the core concepts of statistics for students in the Psychology major.

### The Core Idea of PSYCH101:

Train undergraduate students in the language and practice of statistics for empirical research in psychology.

This is implemented in a variety of ways.

Some versions of the course emphasize programming, others pen-and-paper mathematics.

But all versions attempt to teach students the following:

- how do I use statistical methods to interpret the outcome of an experiment?

- how do I know when my statistical methods might fail me?

- what are the common statistical methods I will encounter as I practice psychological research?

- how do I communicate scientific findings to other psychologists?

## What is PSYCH101-D?

### The Core Idea of PSYCH101-D:

Take advantage of ubiquitous, easy computing, new educational tools, and the increasing technical sophistication of undergraduates to teach the fundamental ideas of scientific inference and experimentation in a new way.

Train those undergraduate students who have some background in data science in the language and practice of statistics for empirical research in _increasingly computational, data-driven_ sub-fields of psychology.

#### Computational Methods in Psychology

Ubiquitous computing power has enabled new kinds of research in psychology.

An example: recent research from Berkeley by [Huth et al., 2016](https://www.nature.com/articles/nature17637):

> In this experiment people passively listened to stories from the Moth Radio Hour while brain activity was recorded. Voxel-wise modeling was used to determine how each individual brain location responded to 985 distinct semantic concepts in the stories. The demo shows how these concepts are mapped across the cortical surface.

DEMO: https://gallantlab.org/huth2016/.

Use the buttons in the top-left that say "Next" for a guided tour.

The cell below shows a brain region I found interesting:
a region for moral judgment?

In [None]:
Image("img/huth_demo.png")

# How does this class work?

This course is

- centered on Jupyter notebooks, which allow you to interactively run Python code with rich multimedia outputs using the browser...

- deployed on cloud computing platforms, to avoid technical problems with installation...

- using partial autograding to allow for real-time feedback on assignments.

Let's check out the [course website](https://charlesfrye.github.io/psych101d).

The most important component is the table of course materials,
which is populated with links.

Some of them look like the image below:

In [None]:
Image(url="https://charlesfrye.github.io/psych101d/content/shared/img/interact_badge.svg")

Click these if you are a Berkeley-affiliated user. You'll need a CalNet ID.

The others look like this:

In [None]:
Image(url="https://mybinder.org/badge_logo.svg")

these are for non-Berkeley folks who want to check out the course materials.

Notice that there are some materials already present, in case you want a sneak preview of where the course is going.

These materials are subject to change, so don't put in too much work.

The course has

- weekly labs, on Fridays
- weekly homeworks
- two exams: midterm and final
- a final (group) project

## Homeworks

Homeworks give you a chance to "drill" on concepts from the lectures.
Some will be due before lab, with the intent that you practice a skill that's important for doing the lab.

## Labs

Labs are more open-ended.
They are designed to teach you how to "play with" concepts and tools from the lectures.

It is intended for it to be possible to complete the lab within the two hour in-person time allotted on Fridays.
Attendance is not mandatory.
It is, however, encouraged, since you will be able to compare notes with your classmates and ask questions of the instructors.
Given the open-ended nature of the lab assignments and the limited amount of automatic grading on labs,
the direct feedback given in the lab sections will be especially useful.

## Group Project

In the group project, you will apply the modeling skills you've developed in the course to real data, ideally from a psychological experiment.

The exact format of the group project is to be determined.
Possibilities include poster, write-up, Jupyter notebook.

Groups will be no larger than 3 and no smaller than 1.

There will be a final presentation, tentatively scheduled for the last lab session, on 12/6/2019.

## Exams

The exams will be in-person and have a written component. Their final format is to be determined.

Together, the exams count for less of your grade than do the homeworks and labs.
Relative performance on those will be considered if the exams bring you to a grade border.

The midterm is tentatively scheduled for 10/30/2019, in-class.

# What's Next?

We _will_ meet for the first lab, on August 30th.

We will go over how to use Jupyter notebooks and the OK autograder.

There is no class on September 2nd. Thank a labor union for inventing weekends.

On Wednesday, September 4th, we'll do a whirlwind review of Python.

See the [Python Resources](https://charlesfrye.github.io/psych101d/res/) if you're feeling rusty.