In [None]:
%matplotlib inline

In [None]:
%pip install pandas seaborn

In [None]:
import pandas as pd
import seaborn as sns

# Pandas and DataFrames

Often, we have tables of data--collections of named columns arranged in rows.  The **Pandas** package gives us a **DataFrame()** class that lets us index these columns the same way as with dicts, while still getting the benefit of Numpy arrays, meaning we can still write vectorized code.  

Let's start playing with the analysis now.  We'll examine Pandas in more depth in the coming days.

## Today's Dataset: Mental Rotation Psychology Experiment

![Mental Rotation Task Example](http://mercercognitivepsychology.pbworks.com/f/1353970952/mental-rotation-image.gif)

## Loading the Data

Please open the file “MentalRotation.csv” found at the url below (pd.read_csv()) and use it to answer the following questions about the results of the Mental Rotation psychology experiment. If you reach the end of the exercises, explore the dataset and DataFrames more and see what you can find about this experiment!

In [None]:
url = "https://raw.githubusercontent.com/nickdelgrosso/CodeTeachingMaterials/main/datasets/MentalRotation.csv"

## Examining the Dataset

| With Slicing | With Method | With Function |
| :-- | :-- | :-- |
| `df[:5]` | `df.head()` |   |
| `df[-5:]` | `df.tail()` |  |
|  | `df.sample(5)` |   | 
|  | `df.info()` |   |
|  | `df.describe()` |   |
|  | `df.shape[0]` | `len(df)` |

Print the first 5 lines of the dataset:

Look at the last 5 lines of the dataset

Check 3 random lines in the dataset.

How Many Total Trials (rows) are in the study?

## Calculating Values on Columns

| Method | Example |
| :-- | :-- |
| `.max()` | `df['Height'].max()` |
| `.min()` | `df['Weight'].min()` |
| `.mean()` | `df['Time'].mean()` |
| `.median()` | `df['Speed'].median()` |
| `.value_counts()` | `df['Kind'].value_counts()` |


What is the maximum number of trials that one subject performed?

What was the median reaction time across all subjects?

What was the average accuracy rate (i.e. proportion of correct trials) across all subjects?

How many trials were shown at each Angle?

How many trials were answered correctly and incorrectly, for each angle? (hint: `df[['A', 'B']]`)

In [None]:
df[['Angle', 'Correct']].value_counts()

Angle  Correct
0      1          1216
50     1          1198
100    1          1121
150    1          1052
       0           204
100    0           140
50     0            82
0      0            58
dtype: int64

### Making New Columns

| Syntax | 
| :-- |
| `df['NewCol'] = df['OldCol'] * 10` |

Make a "TimeSecs" column by converting the Time column to seconds by dividing it by 1000.

Make an "IsCorrect" column by converting the "Correct" column to *bool* (True/False) values

### Logical Indexing

| Syntax |
| :-- |
| `df[df['Time'] > 3]` |

Example: How many trials used an angle of 150?

In [None]:
len(df[df['Angle'] == 150])

1256

How many trials had response times longer than 3 seconds?

What was the accuracy of subject 9?

What was the average response time of subject 32?

What was the average response time for subject 12 on trials with an Angle of 50? (Hint: `(A) & (B)`)

Was there an overall difference in response accuracy between matching and non-matching trials?

Is there a response time difference between matching and nonmatching
trials?

## Group By

| Syntax |
| :-- |
| `df.groupby('Age').Time.mean()` |

Example: What was the response accuracy for matching and non-matching trials?

In [None]:
df.groupby('Matching').IsCorrect.mean()

Matching
0    0.909163
1    0.899961
Name: IsCorrect, dtype: float64

Example: What was the response accuracy for Each Angle?

What was the response accuracy for each Angle and Matching/Nonmatching value?

What was the average response time for each Angle and Matching/Nonmatching value?

What was the average response time for each Angle and Matching/Nonmatching value, for each subject?

### Plotting with Pandas

| Syntax |
| :-- |
| `df['Column'].plot(kind='hist')` |
| `df['Column'].plot.hist()` |
| `df.hist('Column', by='Group')` |
| `df.plot(x='Age', y='Height', kind='scatter')` |
| `df.plot.scatter(x='Age', y='Height')` |

Plot the response time distribution as a histogram.

Plot the average response time for each stimulus category (matching and non-matching)

Is there a correlation between Angle of mental rotation and response time?  Visualize the relationship using a scatter plot

Is there a relationship between subject age and average response time?   Visualize the relationship using a box plot

Did participants get faster or slower as they did more trials? Visualize the relationship using a scatter plot

Plot the response time distribution, with a seperate subplot for each subject.

## Plotting with Seaborn

| Syntax |
| :-- |
| `sns.catplot(data=df, x='Col1', y='Col2', hue='Col3', kind='bar')` |
| `sns.lineplot(data=df, x='Col1', y='Col2', hue='Col3')` |
| `sns.lmplot(data=df, x='Col1', y='Col2', hue='Col3')` |

Is there a difference between average response time for matching and non-matching trials?

Is there a correlation between Angle of mental rotation and response time?  Visualize the relationship

Is there a difference in the relationship between Angle of mental rotation and response time, between stimulus categories?

Is there a difference in the relationship between Angle of mental rotation and response time for participants younger than 22 and participants older than 22, between stimulus categories?

Did participants get faster or slower as they did more trials? Visualize the relationship using a line plot