# Data Dunkers Analysis - Notre Dame

This notebook will allow us to analyse the data we collected using the [Google Form](https://docs.google.com/forms/d/e/1FAIpQLSe2Xe8Iz1NWpEYhv0gPa1eO9fJiQ_wRjeRyGlEK9w2mbHcTJw/viewform).

The first step is to import the collected data. Click on the cell below then click the play button on the left.

In [None]:
import pandas as pd
df = pd.read_csv('https://docs.google.com/spreadsheets/d/1WNS4hQpYv_UqrDS9fbxZaDMGRWCgLACwPLyabCihUUA/export?format=csv')

print(f'There are {df.shape[0]} rows and {df.shape[1]} columns of data\n')
for column in df.columns:
    print(column)

## Filtering and Sorting

In the code cell below, change the line `me = 'DD43'` to your initials and number. For example, `me = 'PS43'`.

Then run the cell to **filter** the data to show just your row.

In [None]:
me = 'DD43'

df[ df['Initials'] == me ]

Run the following code cell to **sort** the values by the last column and display the top **10**.

In [None]:
print(f'Sorting by: {df.columns[-1]}')

df.sort_values(by=df.columns[-1], ascending=False).head(10)

To display a list the available columns, run the following code cell.

In [None]:
df.columns

### Your Turn

In the code cell below, edit the code to sort by one of the columns. You need to paste in the exact column title, for example `df.sort_values(by='Height (cm)', ascending=False)`

In [None]:
df.sort_values(by='', ascending=False)

## Graphing

### Scatterplot

We can create a scatterplot from two data columns.

*You may need to change the column titles if these are not available in your data set.*

In [None]:
import plotly.express as px
px.scatter(df, x='Height (cm)', y='Wingspan (cm)')

You can also add a `title` and a `trendline`.

In [None]:
px.scatter(df, x='Height (cm)', y='Wingspan (cm)', title='Wingspan vs Height', trendline='ols')

### Bar Charts and Histograms

A bar chart works in a similar way.

In [None]:
px.bar(df, x='Timestamp', y='Height (cm)', title='Height Over Time')

In [None]:
px.histogram(df, x='Handedness', title='Handedness')

### Your Turn

Try creating a graph in the code cell below.

## Correlation Map

We can also create a correlation map to see how the numeric variables are related. Higher numbers mean they are more closely correlated.

In [None]:
numeric_df = df.select_dtypes(include=['number'])
numeric_df.corr()

You can also add [colors](https://matplotlib.org/stable/gallery/color/colormap_reference.html) and number formatting to the correlation map.

In [None]:
numeric_df.corr().style.background_gradient(cmap='cividis').format(precision=3)

### Your Turn

Is there anything surprising about the correlation map?

... your answer here ...

## The End

Congratulations, you have completed this notebook. Hopefully you can use these examples to create your own graphs.

Check out other resources at [DataDunkers.ca](https://datadunkers.ca).

You can also try [chart.misterhay.com](https://chart.misterhay.com) for generating graphs and Python code.