# Example: Cheese vs. Rain
A **spurious** relationship is when two variables are correlated, but clearly unrelated. To determine causation, you must be able to:

- Create an observed association (correlation)

- Prove that the independent variable occurs before the dependent one

- Rule out other potential causes

Explore this example that correlates the relationship between the per capita consumption of American cheese in the US with the precipitation amounts in Arkansas.

1. The r-value is 0.87. What kind of correlation does this represent?

2. Even though the scatterplots are in different parts of the graph, do they look as though they increase and decrease at the same time?

3. Why would this be considered a spurious relationship?

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

cheese = pd.read_csv (r"cheese.csv")
precipitation = pd.read_csv (r"precipitation.csv")

x = cheese.year
y1 = cheese.pounds
y2 = precipitation.inches

correlation = y1.corr(y2)
print(correlation) 


# adds the title
plt.title('Cheese vs. Rain')
plt.xlabel('Year')
plt.ylabel('Inches/Pounds')
  
# plot the data
plt.scatter(x, y1)
plt.scatter(x, y2)

plt.show()

# Problem 1 - Spurious Correlation

In the last activity, we looked at an existing spurious correlation. In the next activity, you will create one of your own!

This website, Spurious Correlations, allows you to find correlations between variables that don’t have a causal relationship to one another.

https://tylervigen.com/discover

Follow the steps provided to create your own spurious relationship.

1. Go to the Spurious Correlations website.

2. Pick a variable type that is of interest to you and then click **View Variables**.

3. This will load a new drop down menu with a list of variables that fit the category that you chose. Pick one, and click **Correlate**.

4. You should now have a list of variables that correlate with the variable that you chose. Pick one of these variables and click **Chart**.

Once you have your chart, use the data shown to recreate the chart and calculate the r-value in this item.

1. Create TWO csv files, using the sidebar on the left.

2. Copy and paste the data from both variables. One variable in each file.

3. Import the data and name your x, y1 and y2 values.

4. Graph a scatterplot for variables x and y1.

5. Graph a scatterplot for variables x and y2.

6. Determine the correlation between y1 and y2.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Problem 2 - Spurious Correlation Reflection

It turns out that a lot of different variables are correlated to one another, even though they don’t really have any relationship.

The website Spurious Correlations highlights many of the ridiculous data points that are unrelated, yet have a strong correlation.

Using the information from the spurious relationship that you created in the previous item, answer the following questions.

1. What was the correlation of the two variables that you choose? Describe the relationship between these two variables. Are they positively or negatively correlated?

2. What makes this a spurious correlation? Consider the three conditions that need to be proven to make something a causal relationship.

    - Create an observed association (correlation)

    - Prove that the independent variable occurs before the dependent one

    - Rule out other potential causes

3. What other reasons might be responsible for the relationhip between the two variables that you chose. Could there be any moderating or mediating variables that cuase this realationhip?

In [None]:
#Answers go here

# Problem 3 - Moderators and Mediators

This article includes highlights and an abstract section of a research study. An abstract is just a short summary of the completed research. Read the two sections and consider the following questions.

1. It lists that physical activity and body image are positively associated among men. What do you think the r-value should be for them to be able to make this conclusion?

2. Age and the intensity of physical activity are listed as moderators in this study. How do you think they moderate the results? How would age and intensity strengthen or weaken the relationship?

3. What could be a possible mediating variable between physical activity and body image?

4. What would you like to know more about in order to trust the results of this study?

Source:
Bassett-Gunter, Rebecca, et al. “Physical Activity and Body Image among Men and Boys: A Meta-Analysis.” Body Image, Elsevier, 27 July 2017, https://www.sciencedirect.com/science/article/abs/pii/S1740144516303758?via%3Dihub.

In [None]:
# answer here