## Stroop Effect 

In a Stroop task, participants are presented with a list of words, with each word displayed in a color of ink. The participant’s task is to say out loud the color of the ink in which the word is printed. The task has two conditions: a congruent words condition, and an incongruent words condition. In the congruent words condition, the words being displayed are color words whose names match the colors in which they are printed: for example RED, BLUE. In the incongruent words condition, the words displayed are color words whose names do not match the colors in which they are printed: for example PURPLE, ORANGE. In each case, we measure the time it takes to name the ink colors in equally-sized lists. Each participant will go through and record a time from each condition.

### 1. What is our independent variable? What is our dependent variable?

##### Independent Variable

* Congruent words or Incongruent words
* In this experiment we are examining whether the name of the word's color and the font color are same or different?  

##### Dependent Variable

The dependent variable is the reaction time that the user takes to read (i.e name the font color) in both types of lists.

### 2. What is an appropriate set of hypotheses for this task? What kind of statistical test do you expect to perform? Justify your choices.

##### Why?

To examine if there will be any difference in reaction time between the congruent and incongruent words 

##### What type of test?

* According to the Null hypothesis:
* There is no difference or interference in reaction time after the interference or it will be less than Incongruent.

* Null Hypotheses: (H0) : $\bar{X}$ <= 0 Where  $\bar{X}$ is the difference between the mean incongruent word time μi and mean congruent word time μc

* According to Alternate Hypothesis: There is a significant increase in reaction time for the incongruent words condition.
* Alternate Hypothesis HA: $\bar{X}$ > 0 Where  $\bar{X}$ is the difference between the mean incongruent word time μi and mean congruent word time μc

We will run a paired t-test to check the dfference between the two means. Since the same subject is exposed to two conditions and tested. We can assume it as dependent two sample t-test. Here we have chosen t-test because population parameters are unknown. I.e the data provided was just the samples of population. Sample set is less than 30. We also do not know the population standard deviation. So we can justify that our test will be one sided. Since we can be relatively certain that the differences will be either statistically insignificant from zero, or that incongruent words will take longer and thus the difference will be one sided.

The test will be the following :

$\Large\frac{\bar{x_i}-\bar{x_c}}{\frac{s}{\sqrt{n}}}$

* Where $\bar{x_i}$ represents the sample mean of incongruent times
* $\bar{x_c}$ represents the sample mean of congruent times
* standard error (se ) = $\frac{s}{\sqrt{n}}$
* Where s represents standard deviation

We also assume n-1 degrees of freedom, and consider the resulrs statistically significant at $\alpha$ = 0.05

A right-tailed test (sometimes called an upper test) is where your hypothesis statement contains a greater than (>) symbol. In other words, the inequality points to the right. 
Ex: Null hypothesis: No change (H0 = 1).
Ex: Alternate hypothesis: (HA) > 1.

The important factor here is that the alternate hypothesis(HA) determines if you have a right-tailed test, not the null hypothesis.


In [None]:
#Importing the required libraries
import pandas as pd
import numpy as np
from math import sqrt
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
sns.set_style("white")

* We are using Stroop dataset here
* We will print out first few records in data frame
* Here you can see variables in Stroop data frame

In [None]:
#Read the data into a pandas dataframe and add a subject column
stroop_data = pd.read_csv('C:\\Users\\alive\\Documents\\Data Analyst Nano Degree\\inferential statistics\\stroopdata.csv')
stroop_data['Subject'] = stroop_data.index + 1
stroop_data.head()

In [None]:
# Additional column that shows the time difference between congruent and Incongruent

stroop_data['time_diff'] = stroop_data['Congruent'] - stroop_data['Incongruent']
stroop_data

#### Summary Statistics that describe variable's numeric values

In [None]:
stroop_data.shape

i.e 24 rows 3 columns

In [None]:
stroop_data.columns

Above are the column nmaes


In [None]:
# Sum method (total of columns)
stroop_data.sum()

### 3. Report some descriptive statistics regarding this dataset. Include at least one measure of central tendency and at least one measure of variability.


In [None]:
# Median method for middle value
stroop_data.median()

In [None]:
# Mean of columns
stroop_data. mean()

In [None]:
stroop_data.count()

In [None]:
# Max values of each variable in data frame
stroop_data.max()


In [None]:
#call the.idx method to identify the row of max value
congruent_max  = stroop_data.Congruent
congruent_max.idxmax()

10 represents index value of row where the max value is

In [None]:
#call the.idx method to identify the row of max value
incongruent_max  = stroop_data.Incongruent
incongruent_max.idxmax()

14 represents index value of row where the max value is

In [None]:
#.std method calculates standaard deviation for ecah column
stroop_data.std()

In [None]:
#.var method calculates variance in columns
stroop_data.var()

In [None]:
#unique values in a variable (congruent)
congruent_unique = stroop_data.Congruent
congruent_unique.value_counts().head()


In [None]:
#unique values in a variable(incongruent)
incongruent_unique = stroop_data.Incongruent
incongruent_unique.value_counts().head()


In [None]:
#Entire statistical decriotion
stroop_data.describe()

* The difference in mean of time taken in performing incongruent and congruent tests = 7.96
* And the difference of standard deviation = 4.86

### 4. Provide one or two visualizations that show the distribution of the sample data. Write one or two sentences noting what you observe about the plot or plots.

In [None]:
stroop_data['Congruent'].plot(kind='hist')

Mostly the test completes in between 9 and 18 seconds, i.e  around the mean of 14 seconds.

In [None]:
stroop_data['Incongruent'].plot(kind='hist')

* As you can see, the test completes in between 16 and 25 seconds, I.e right around the mean of 22 seconds
* The data also has some outliers at around 35 seconds. 
* So we can say from the above two graphs that the Congruent data was read faster than the incongruent data.


In [None]:
stroop_data[['Congruent','Incongruent']].boxplot();

Above in the boxplot diagram, we can see a clear difference between the mean(Q2) of the congruent and the incongruent boxplots. 

* The Congruent Boxplot has no outliers
* While the incongruent plot shows two outliers at around 35 seconds.

The congruent data plot is slightly negatively skewed.
As the Mean (14.0511) is smaller than the Median (14.3565). We can say that It's proved

The incongruent data plot is slightly positively skewed.
As the Median (21.0175) is smaller than the Mean (22.0159)

### 5. Now, perform the statistical test and report your results. What is your confidence level and your critical statistic value? Do you reject the null hypothesis or fail to reject it? Come to a conclusion in terms of the experiment task. Did the results match up with your expectations?¶

##### Degrees of freedom:
The degree of freedom in our case is n − 1, where n represents the number of pairs (subjects in this case).

In [None]:
n=24
df = n-1
df

In [None]:
mean_of_differences =   -7.964792
std_of_differences =   4.864827
print("The mean of the difference: {:.4f}".format(mean_of_differences))
print("The standard deviation of the difference: {:.4f}".format(std_of_differences))

In [None]:
#Standard Error of the mean
Stderr_mean = std_of_differences/float(sqrt(n))
print("Standard Error of the mean value: {:.4f}".format(Stderr_mean))

In [None]:
# Calculate t-statistic

t_statistic_one_tail = mean_of_differences/float(Stderr_mean)

print("t-statistic value: {:.4f}".format(t_statistic_one_tail))

In [None]:
from scipy import stats

# t-critical values at alpha = 0.05 and n = 24 for one-tailed t-test, q = Quantile to check

t_critical_one_tail =stats.t.ppf(1-0.05, 23)  
print("t-critical values at alpha of 0.05 for one-tailed t-test:\
{:.4f}".format(t_critical_one_tail))

So I have conducted a one-tailed test:
Sample size n = 24 
Degrees of freedom df = 23
𝞪 = 0.05
t critical = -1.714. 
xc - xi is based on our samples and equal to -7.97. Sample standard deviation of the differences (std) = 4.86
t-statistic  = -8.02 

If the p value is less than Alpha null should be rejected

In [None]:
#Cumulative distribution function. 

pval = stats.t.cdf(t_statistic_one_tail, df)*2

print("p-value: {:.4e}".format(pval))

We got the result of p-value as 4.1030e-08. This means we'd expect a 0.000004103 chance of null hypothesis to be true. Our p-value is way lower than our significance level α (0.05) so we should reject the null hypothesis. That means participants need more time to say the color of the ink in the Incongruent words list.Asp < 0.05 it's positive direction.

In [None]:
#Paired t-test on response time for congruent vs incongruent words
print(stats.ttest_rel(stroop_data['Congruent'],stroop_data['Incongruent']))

In [None]:
#Confidence intervals (CI) are a useful statistic to include 
#because they indicate where the true population mean might be. 
#It is common to report 95% confidence intervals.
stats.norm.interval(0.95, loc = mean_of_differences, scale = Stderr_mean)

Confidence interval = (-9.91, -6.02)
The users who participated in testing has a delay of 9.9 to 6 seconds in reading the Incongruent words condition. I.e Incongruent words took more time to read when compared to Congruent words.

#### 6. Optional: What do you think is responsible for the effects observed? Can you think of an alternative or similar task that would result in a similar effect?


When we read anything brain automatically understands the meaning of words.
Where as recognizing colors is not an “automatic process”.  Especially when the brain has to read the wrongly colored words. I.e When the word color is different from word(name of the colour). So the experiment has proved that when a color word is printed in the same color as the word, people can name the ink color more quickly.

* Similar effects:

* Compare normal words with  words turned upside down
* Compare full words with  their corresponding shorcut words.


##### References

https://en.wikipedia.org/wiki/Stroop_effect

http://www.statstutor.ac.uk/resources/uploaded/paired-t-test.pdf

http://blog.minitab.com/blog/adventures-in-statistics-2/understanding-t-tests%3A-1-sample%2C-2-sample%2C-and-paired-t-tests

http://www.statisticshowto.com/p-value/

http://www.statisticshowto.com/how-to-decide-if-a-hypothesis-test-is-a-left-tailed-test-or-a-right-tailed-test/

https://www.youtube.com/watch?v=rWFDXt-MlNs

