**Running a Paired T-test in a Spreadsheet**

To practice running a t-test, open up  <a href='http://video.udacity-data.com.s3.amazonaws.com/topher/2016/September/57d7fd08_superherodata-matched/superherodata-matched.xlsx'>Super Hero Data Matched</a>. In that data set, you'll see five superheroes and five super villains. Each super hero is matched with the corresponding rival.

On average, the villains appear to be smarter and the heroes appear to be stronger. A t-test can tell us whether the differences in intelligence and strength are statistically significant.

Try it on your own. If you get stuck, take a look at the Super Hero Data Matched Solution file.

In [2]:
import pandas as pd
import numpy as np

In [79]:
df_hero = pd.read_excel('superherodata-matched.xlsx', usecols=3).iloc[:6]
df_hero = df_hero[:6]
df_hero

Unnamed: 0,Name,Alignment,Intelligence,Strength
0,Iron Man,good,100.0,85.0
1,Captain America,good,63.0,65.0
2,Hulk,good,88.0,100.0
3,Hawkeye,good,62.0,12.0
4,Thor,good,69.0,100.0
5,Spider-Man,good,88.0,55.0


In [80]:
df_villain=pd.read_excel('superherodata-matched.xlsx', usecols=3, skiprows=9, header=None).iloc[:6]
df_villain.columns = df_hero.columns
df_villain

Unnamed: 0,Name,Alignment,Intelligence,Strength
0,Ultron,bad,95.0,83.0
1,Red Skull,bad,90.0,30.0
2,Abomination,bad,85.0,80.0
3,Bullseye,bad,75.0,11.0
4,Loki,bad,87.0,57.0
5,Green Goblin,bad,85.0,35.0


In [81]:
df = pd.concat([df_hero, df_villain], ignore_index=True)
df

Unnamed: 0,Name,Alignment,Intelligence,Strength
0,Iron Man,good,100.0,85.0
1,Captain America,good,63.0,65.0
2,Hulk,good,88.0,100.0
3,Hawkeye,good,62.0,12.0
4,Thor,good,69.0,100.0
5,Spider-Man,good,88.0,55.0
6,Ultron,bad,95.0,83.0
7,Red Skull,bad,90.0,30.0
8,Abomination,bad,85.0,80.0
9,Bullseye,bad,75.0,11.0


In [82]:
df_match = pd.read_excel('superherodata-matched.xlsx', usecols=[5,6]).iloc[:6]
df_match

Unnamed: 0,Hero,Rival
0,Iron Man,Ultron
1,Captain America,Red Skull
2,Hulk,Abomination
3,Hawkeye,Bullseye
4,Thor,Loki
5,Spider-Man,Green Goblin


Run a test on the strength and intelligence of the heroes vs. the villains. Use a two-tailed, two-sample equal variance test. Are the differences between the heroes and villains statistically significant?

<a href='https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html'> scipy.stats.ttest_ind documentation</a>

In [84]:
from scipy import stats
np.random.seed(42)

In [87]:
#t-test to check if STRENGTH differences between the heroes and villains are statistically significant
stats.ttest_ind(df_hero.Strength, df_villain.Strength)

Ttest_indResult(statistic=1.1147174588142275, pvalue=0.291044408801752)

In [93]:
#t-test to check if INTELLIGENCE differences between the heroes and villains are statistically significant
stats.ttest_ind(df_hero.Intelligence, df_villain.Intelligence)

Ttest_indResult(statistic=-1.1205095545912165, pvalue=0.28868503197303447)

Now run a paired t-test on the strength and intelligence of the heroes vs. the villains. This test compares each hero/villain pair individually rather than collectively. Continue using a two-tailed test. Now are the differences between the heroes and villains statistically significant?

<a href='https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_rel.html'>scipy.stats.ttest_rel documentation</a>

In [90]:
#paired t-test to check if STRENGTH differences between the heroes and villains are statistically significant
stats.ttest_rel(df_hero.Strength, df_villain.Strength)

Ttest_relResult(statistic=2.911987470743906, pvalue=0.03332259946582927)

In [92]:
#paired t-test to check if INTELLIGENCE differences between the heroes and villains are statistically significant
stats.ttest_rel(df_hero.Strength, df_villain.Intelligence)

Ttest_relResult(statistic=-1.388674603785571, pvalue=0.22361360254656384)

**solution:**

<a href='http://video.udacity-data.com.s3.amazonaws.com/topher/2016/September/57d80da4_superherodata-matched-solution/superherodata-matched-solution.xlsx'>Super Hero Data Matched Solution</a>