![Banner](https://github.com/Data-Dunkers/lessons/blob/main/images/top-banner.jpg?raw=true)

# Your Own Data - Measurements

In this notebook you will be measuring and recording your own data related to basketball and comparing it to the existing data set.

## Data Collection

You will need:

* a measuring tape or ruler
* a basketball and hoop (or a crumpled paper and a waste basket)

Open [this form](https://docs.google.com/forms/d/e/1FAIpQLSeKwSCTExShTYpVzSqx_8Ik36yuFtyOTefqJWa1h0PSXTCwnw/viewform) and record your data.

## Data Analysis

Run the following code cell to import and display the whole data set.

In [None]:
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go

url = 'https://docs.google.com/spreadsheets/d/1ToPtuvdbMKDqhS3uod08I94D2lKA3h7ONg4gOuKmlEY/export?format=csv'
df = pd.read_csv(url)
df.tail()

Now that you have the data, run the following code cell to compare heights, wingspans, hand spans, and shoe sizes.

In [None]:
px.scatter(df, x='Height (cm)', y='Wingspan (cm)', color='Hand span (cm)', size='Shoe size', hover_data=['Initials'], title='Height vs Wingspan<br><sub>Dot size represents Shoe Size</sub>')

We can also add an annotation to show where your data fit in.

Change the first line in the following cell from `me = 'RC78'` to be your initials as you entered them in the form.

In [None]:
me = 'RC78'

my_data = df[df['Initials'] == me]
fig = px.scatter(df, x='Height (cm)', y='Wingspan (cm)', color='Hand span (cm)', size='Shoe size', hover_data=['Initials'], title='Height vs Wingspan<br><sub>Dot size represents Shoe Size</sub>')
fig.add_annotation(x=my_data['Height (cm)'].values[0], y=my_data['Wingspan (cm)'].values[0], text=me, showarrow=True)
fig.show()

We can look at how the numeric columns in our dataset are related to each other.

In [None]:
px.imshow(df.corr(numeric_only=True), title='Correlation Matrix for Student Data', height=800, text_auto='.3f')

We can also choose two columns to compare. The code cell below will group the data by the `x_column` and calculate the mean of the `y_column` for each of those groups.

You can change the values of `x_column` and `y_column` to compare different columns.

In [None]:
for column in df.columns:
    print(column)

x_column = 'Handedness'
y_column = 'Reaction time (ms)'
px.bar(df.groupby(x_column).mean(numeric_only=True), y=y_column, title=f'Average {y_column} versus {x_column}')

## Questions

1. Which columns in the dataset are categorical? Can we only group by categorical columns?
2. For reaction time, is a lower value better or worse than the average? Explain your reasoning.
3. If a student entered their height in meters instead of centimeters (e.g. 1.75 instead of 175), how would this affect the calculated average height?

---

### Online Access
You can run this notebook online using the following links:

*   [**Google Colab**](https://colab.research.google.com/github/Data-Dunkers/student/blob/main/activities/your-data-measurements.ipynb)
*   [**Callysto Hub**](https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2FData-Dunkers%2Fstudent&branch=main&subPath=activities/your-data-measurements.ipynb&depth=1)