<center> <img src="https://yildirimcaglar.github.io/ds3000/ds3000.png"> </center>

<center> <h2> Independent-Samples t Test </h2></center>

## Outline
1. <a href='#1'>SciPy</a>
2. <a href='#2'>Exploratory Data Analysis</a>
3. <a href='#3'>Independent-Samples t Test</a>
4. <a href='#4'>Assumption Checks</a>
5. <a href='#5'>Reporting the Results</a>

<a id="1"></a>

## 1. SciPy
* Fundamental library for scientific computing
    * https://docs.scipy.org/doc/scipy/reference/
* SciPy has a special module, stats, dedicated to common statistical tests used in data analysis
    * https://docs.scipy.org/doc/scipy/reference/tutorial/stats.html

In [None]:
from scipy import stats

In [None]:
import pandas as pd

### 1.1. Dataset from a Between-Subjects Experiment

In [None]:
data = pd.read_csv("res/wand_candles_data.csv")
data

<a id="2"></a>

## 2. Exploratory Data Analysis
* Involves checking the descriptive stats and visualizing the data before conducting the test

### 2.1. Descriptive Statistics

In [None]:
descriptives = #TODO in video. Refer to the corresponding video.
descriptives

### 2.2. Visualizing the Data

In [None]:
descriptives = descriptives["Candles"]
descriptives

In [None]:
descriptives.reset_index(inplace=True)

In [None]:
import plotly.express as plt
graph = plt.bar(descriptives, x = "Group", y = "mean", error_x = "sem", error_y = "sem", template='none', width=500, 
                labels = {"mean": "Number of Candles", "Group": "Wand Used"})

graph.update_traces(marker_color="#d4202f")
graph.update_traces(marker= dict(line={"width":3,"color":"#000000"}))

graph.update_xaxes(title_font={"size":16}, tickfont = {"size":14, "color":"gray"})
graph.update_yaxes(title_font={"size":16}, tickfont = {"size":14, "color":"gray"})


graph.show()

<a id="3"></a>

## 3. Independent-Samples t Test
* Use the **ttest_ind()** method available in SciPy's stats module
* **ttest_ind()** accepts two sequence-like objects (lists, Series, etc) corresponding to the distribution of scores in each group being compared
    * **ttest_ind(group_a, group_b)**
* **ttest_ind()** returns a tuple containing the calculated t statistic and p-value
* https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html

In [None]:
stats.ttest_ind?

In [None]:
elder_wand = data[data["Group"] == "Elder Wand"]["Candles"]
elder_wand

In [None]:
regular_wand = data[data["Group"] == "Regular Wand"]["Candles"]
regular_wand

In [None]:
stats.ttest_ind(#TODO in video. Refer to the corresponding video.)

### 3.1. t Test Results
* ttest_ind() method returns a tuple containing the calculated t statistic and p-value
    * the first element of the tuple is the t statistic
    * the second element of the tuple is the p-value

In [None]:
results = stats.ttest_ind(elder_wand, regular_wand)

In [None]:
#t value
tstatistic = results[0]
tstatistic

In [None]:
#p value in scientific notation
pvalue = results[1]
pvalue

### 3.2. Converting the p-value to float
* When p-value is extremely small, it is returned in scientific notation format.
* Can change this format using the built-in **format()** method

In [None]:
format(1.243345543049043, ".2f")

In [None]:
format(pvalue, '.10f')

### 3.3. Degrees of Freedom
The number of scores that can be used to estimate the population mean difference
* Unfortunately, the ttest_ind() method does not provide the degrees of freedom (df) value. 
* We can calculate it ourselves though!
* For an independent-samples t test, df is calculated as follows:
    * df = n1 + n2 - 2

In [None]:
df = len(elder_wand) + len(regular_wand) - 2
df

In [None]:
def report_independent_t (t, p, df):
    print("t(%d)=%.2f, p=%.3f" % (df, t, p))

In [None]:
report_independent_t(tstatistic, pvalue, df)

<a id="4"></a>

## 4. Assumption Checks
* Independent-samples t test makes two assumptions:
    * Assumption of equality of variances
    * Assumption of normality


### 4.1. Checking for Equality of Variances
* Levene’s Test of Equality of Variances
    * Use the **levene()** method in SciPy's stats module
    * https://docs.scipy.org/doc/scipy/reference/stats.html
 
* **levene()** returns a tuple containing the results of the assumption check
    * You want non-significant results from assumption checks (p > .05)

In [None]:
levene_results = stats.levene(elder_wand, regular_wand)
levene_results

### 4.2. Checking for Normality
* Shapiro-Wilk Test of Normality
    * Use the **shapiro()** method in SciPy's stats module
    * https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.shapiro.html#scipy.stats.shapiro

* **shapiro()** returns a tuple containing the results of the assumption check
    * You want non-significant results from assumption checks (p > .05)
    
* Need to pass in the entire distribution of the scores/data

In [None]:
shapiro_results = stats.shapiro(data["Candles"])
shapiro_results

<a id="5"></a>

  
## 5. Reporting the Results
* Report
    * descriptives
    * assumption checks
    * t statistic, degrees of freedom, and p-value
    * a bar graph

In [None]:
descriptives

In [None]:
tstatistic, pvalue, df

In [None]:
levene_results

In [None]:
shapiro_results

In [None]:
graph.show()