# SW 282: Lab 6 - Scatterplots

---

### Proessor Erin Kerrison

In this notebook, you will use the matplotlib skills learned last week and apply them to one of the most fundamental data visualizations: a scatter plot. You will also get practice and noticing relationships between variables shown in scatterplots.

---

### Table of Contents

1. [Scatterplots](#section-1) <br>
&nbsp;&nbsp;&nbsp; a. [Variable Relationships](#section-2)
2. [Practice](#section-2)

---

In [None]:
from datascience import *
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
plt.style.use("fivethirtyeight")
from ipywidgets import interact, Dropdown, IntSlider
import ipywidgets as widgets
import warnings
warnings.simplefilter('ignore', FutureWarning)
%matplotlib inline
hybrid = Table.read_table("data/hybrid.csv")

## 1. Scatterplots <a id="section-1"></a>

Recall from the last lab that you learned how to create plots using the plotting library matplotlib. One last type of plot that wasn't covered, although you saw it at the end, was a scatter plot. Scatter plots are very similar to line plots in their creation; the only difference is that the function call is `plt.scatter` rather than `plt.plot`.

Consider the scatterplot below from last week's data set.

In [None]:
plt.scatter(hybrid.column("acceleration"), hybrid.column("msrp"))
plt.xlabel("Accleration")
plt.ylabel("MSRP")
plt.title("MSRP vs. Acceleration");

Notice how similar the code is to what we've been doing in matplotlib so far. Scatterplots are easy to code and give you a good look at the relationships between variables.

### 1a. Variable Relationships <a id="section-1a"></a>

Scatterplots are a great way to see if there is a relationship between to variables. Consider the scatterplot generated below.

In [None]:
x = np.random.uniform(0, 10, 100)
y = 3 * x + 7 + np.random.normal(-2, 2, 100)
plt.scatter(x, y);

What kind of relationship do these variables share? If you said "linear," good job! These variables are pretty-well linearly related. In fact, we can calculate their _correlation coefficient_, a value of which close to 1 means that they share a positive linear relationship.

In [None]:
np.corrcoef(x, y)[0,1]

There are other types of relationships between variables. Run the cell below to visualize scatterplots showing different kinds of relationships.

In [None]:
def polyplot(deg):
    x = np.random.uniform(-100, 100, 100)
    y = np.zeros(100)
    for i in np.arange(deg):
        y += np.random.uniform(0, 100) * x ** deg
    maximum = max(y)
    y += np.random.normal(-maximum/6, maximum/6, 100)
    plt.scatter(x, y)
    
interact(polyplot, deg=IntSlider(value=2, min=1, max=10));

## 2. Practice <a id="section-2"></a>

In this section, you will practice making scatterplots of numerical data. We begin by loading our dataset below.

In [None]:
wine = Table.read_table("data/wine.csv")
wine

<div class="alert alert-info">

**QUESTION:** In the code cells below, create a scatterplot of `y` vs. `x` from the `wine` table. The first three are given to you, and you will choose the last two. Include axis labels and a title. In the Markdown cell following, write out an interpretation of the scatterplot. If you see any relationship between the variables, make a note of it.

</div>

_Hint:_ Remember that you'll need to extract your values into arrays using `Table.column()`...

In [None]:
x1, y1 = "Alcohol", "Proline"

...

_Type your answer here, replacing this text._

In [None]:
x2, y2 = "Total Phenols", "Flavanoids"

...

_Type your answer here, replacing this text._

In [None]:
x3, y3 = "Color Intensity", "Proline"

_Type your answer here, replacing this text._

In [None]:
x4, y4 = "", ""

...

_Type your answer here, replacing this text._

In [None]:
x5, y5 = "", ""

...

_Type your answer here, replacing this text._

Now that you have some practice in making your own scatterplots, take a look at the widget generated by running the cell below. It will allow you to select two variables to plot against one another.

In [None]:
def scatterplot(x, y):
    xvals = wine.column(x)
    yvals = wine.column(y)
    plt.scatter(xvals, yvals)
    plt.xlabel(x)
    plt.ylabel(y)
    plt.title("{} vs. {}".format(y, x));
    
cols = wine.labels[1:]
interact(scatterplot, x=Dropdown(options=cols), y=Dropdown(options=cols));

---

## Submission

Congrats on finishing another lab notebook! To turn in this lab assignment follow the steps below:

>1. Press `Control + P` (or `Command + P` on Mac) to open the Print preview
2. Change the destination so that it saves locally on your own computer.
3. Save as PDF
4. If you are stuck, follow further instructions [here](https://www.wikihow.com/Save-a-Web-Page-as-a-PDF-in-Google-Chrome).
5. Upload this PDF to bCourses.

---
Notebook developed by: Chris Pyles

Data Science Modules: http://data.berkeley.edu/education/modules