# Robot Data Exploration Notebook

<hr style="border: 2px solid #003262">
<hr style="border: 2px solid #C9B676">

## Learning Objectives

In this notebook you will:
- Review an Introduction on the Robot Lab
- Learn about loading data into a Jupyter notebook
- Practice two methods of loading data into Jupyter
- Leard about different types of visualizations
- Create some of your own visualizations from your data

<hr>
<hr>

## Introduction

Now that you’ve assembled your _DASH robot_ and tested it, you have a better sense of its capabilities. You might have found it to be a bit of a challenge to get it to run in a straight line when you were testing for **maximum running speed**. Maybe you tried to stabilize it with a couple of **washers**. But then you might have rethought that strategy when you tried to measure the **maximum gap distance** crossed.

Our goal in this project is not only to verify that we can make a biologically inspired _hexapedal_ robot, but also to make a product that can be relied on to perform consistently. 

In this notebook, we’ll try to get a sense of how consistent your team’s robot was with the rest of the class. You’ll load data from the whole class, and you’ll visualize it in different ways.

<hr style="border: 2px solid #003262">
<hr style="border: 2px solid #C9B676">

## Setup

<div class="alert alert-block alert-info">
    <p style="font-size:20px">
        Just run this cell to make sure we import the correct modules, and everything will work!
    </p>
</div>

In [None]:
import matplotlib.pyplot as plt
from datascience import *
from utils import *
plt.style.use("seaborn")
%matplotlib inline

<hr style="border: 2px solid #003262">
<hr style="border: 2px solid #C9B676">

## Getting the Data

<hr>
<hr>

### Methods in Making Tables

In this portion of the notebook, we will use two different methods to create tables in Jupyter Notebooks using the data we gathered. 

- **Method 1**: Create arrays that represent columns and put them together to make a table

    > This is a bit tedious
- **Method 2**: Upload a `.csv` file of the data

    > This is a much easier way to have the computer do the work for you!

<div class="alert alert-block alert-success">
    <p style="font-size:20px">
        Remember, if you have questions on any of the terminology in this notebook, we define all of it for you in Notebook 0; 
        feel free to use that as a resource to help you complete this notebook!
    </p>
</div>

<hr>
<hr>

### Method 1: Appending Columns

The first method involves creating arrays of our desired column values and assigning each array to a variable. 

<hr>

#### Example 1

Here we are using data from a past semester. We will only do the first $10$ rows, as you will see it can be very repetitive.

After we create all of the columns, we need to add them to a table. However we don't have a table yet so we have to make a new one. We do this with the line that says `old_data = Table()`. 

Finally, we need to name the columns and add them to the table, which we do on the line that begins with `old_data = old_data.with_columns(...)`.

In [None]:
# Each variable will represent a column in our table
team_numbers = [1, 2, 4, 4, 4, 4, 4, 4, 5, 7]

max_speed = [0.467, 0.938521, 0.861822, 0.983234, 0.9525, 0.888746, 0.870966, 0.870966, 0.57, 1.31]

gap_crossing_speed = [0.467, 0.938521, 0.938521, 0.938521, 0.938521, 0.938521, 0.938521, 0.889, 0.05, 1.31]

max_distance = [5.08, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 8.9, 3.0, 4.0]

old_data = Table()

old_data = old_data.with_columns(
        "Team Number", team_numbers, 
        "Max Running Speed", max_speed, 
        "Gap Crossing Speed", gap_crossing_speed, 
        "Max Gap Crossing Distance", max_distance)

old_data

As you can see from above, we made our table just by typing in numbers!

<hr>

#### Question 1


Now it's your turn! Create a table with the **first $10$ rows** of the data in the _Google Sheet_ where you recorded your data!

<hr style="border-style:dashed">

##### Question 1.0

Create a blank table just like how we did it above.

In [None]:
# Replaces the ... with the appropriate code
new_data = Table() #SOLUTION

<hr style="border-style:dashed">

##### Question 1.1

Enter the data for the `Team Number` column and choose a name for that column, then do the same for the rest of the columns!

<b>Don't forget to run all of the cells!<b>

In [None]:
# Replace the ... with the **first 10** column values from the robot data you gathered.
team_number = [...] #SOLUTION: this should be the first 10 values from the Team Numbers column on the google sheet
# Replace the ... with the name that you have chosen for the column.
team_col_name = "..." #SOLUTION: this should be the name of the column they chose for the team number, can be any string

In [None]:
# Replace the ... with the **first 10** column values from the robot data you gathered.
running_number = [...] #SOLUTION: this should be the first 10 values from the Running Speed column on the google sheet
# Replace the ... with the name that you have chosen for the column.
running_col_name = "..." #SOLUTION: this should be the name of the column they chose for the Running Speed column, can be any string

In [None]:
# Replace the ... with the **first 10** column values from the robot data you gathered.
cross_speed = [...] #SOLUTION: this should be the first 10 values from the Crossing Speed column on the google sheet
# Replace the ... with the name that you have chosen for the column.
cross_speed_col_name = "..." #SOLUTION: this should be the name of the column they chose for the Crossing Speed column, can be any string

In [None]:
# Replace the ... with the **first 10** column values from the robot data you gathered.
cross_distance = [...] #SOLUTION: this should be the first 10 values from the Crossing Distance column on the google sheet
# Replace the ... with the name that you have chosen for the column.
cross_distance_col_name = "..." #SOLUTION: this should be the name of the column they chose for the Crossing Distance column, can be any string

In [None]:
# Replace the ... with the **first 10** column values from the robot data you gathered.
washers = [...] #SOLUTION: this should be the first 10 values from the Washers column on the google sheet
# Replace the ... with the name that you have chosen for the column.
washers_col_name = "..." #SOLUTION: this should be the name of the column they chose for the Washers column, can be any string

<hr style="border-style:dashed">

##### Question 1.2


Now that we have made the columns for the data, we just need to add them to the blank table we made earilier!

Fill in the code below to create the table with all the data you just entered

> _**HINT:** You only need to use the variables you defined in parts 1.1_

In [None]:
new_data = new_data.with_columns(
    # Replace the ... in the line below with 
    # the name of the column and the values (in that order)
    # for: **team number***
    team_col_name, team_number, #SOLUTION
    # Replace the ... in the line below with 
    # the name of the column and the values (in that order)
    # for: **running speed***
    running_col_name, running_number, #SOLUTION
    # Replace the ... in the line below with 
    # the name of the column and the values (in that order)
    # for: **crossing speed***
    cross_speed_col_name, cross_speed, #SOLUTION
    # Replace the ... in the line below with 
    # the name of the column and the values (in that order)
    # for: **crossing distance***
    cross_distance_col_name, cross_distance, #SOLUTION
    # Replace the ... in the line below with 
    # the name of the column and the values (in that order)
    # for: **washers***
    washers_col_name, washers #SOLUTION
    )
new_data

You did it! You made the table by just typing! That will work for data with only a few rows but it quickly gets tiring. 

So, lets see how we can have the computer do this work for us!

<hr>
<hr>

### Method 2: Uploading a csv file

As you can see, the previous method is rather tedious, especially if we want to create a complex table with many columns. In method 2, we create a table by uploading the `csv` file containing the data. This is the preferred and fastest method.  

<hr>

#### Question 2.1: Downloading the `csv` File from Google Sheets

In order to upload data onto _Jupyter Notebooks_, we need to convert the existing Google Sheet into a `csv` (_Comma Separated Values_) file. 

To do so from the Google Sheet you want to download, go to `File > Download > Comma Separated Values (csv)`

**You can also run the cell below to display a 7-second tutorial**

In [None]:
play_video("sheets")

<hr>

#### Question 2.2: Upoloading the `csv` File to Jupyter

Now, we need to upload the `csv` file to Jupyter. Run the cell below to show a very short clip on how to upload your data.

In [None]:
play_video("upload")

<hr>

#### Question 2.3: Rename the `csv` File

The last step in the process is to change the name of the file. This will make our task of typing out the name of the file much easier! Again, run the cell below for a clip on how to rename a file.

In [None]:
play_video("rename")

<hr>

#### Question 2.4: Load the `csv`

Now we're done! All we have to do is run the next cell. The only thing you need to do is replace the `...` with the name of the file you chose!

> _**HINT:** Make sure your name ends with `.csv`_

**Note:** The data may be a little messy, so we defined our own function (called `clean`) to clean it up for you. Don't worrt about understanding how it works!

In [None]:
file_name = "..." #SOLUTION: this should be the name of the file they uploaded to the datahub and then renamed
my_data = Table.read_table(file_name)
my_data = clean(my_data)
my_data

Viola! Now, lets make some plots to investigate this data!

<hr style="border: 2px solid #003262">
<hr style="border: 2px solid #C9B676">

## Visualizations

Now we will begin visualizing the data you have collected! In this section, we will guide you through the steps of creating two different types of graphs. This will be a continuation of what you learned in Notebook 0 for visualizations (again feel free to use it as a reference here).

<hr>

### Graph Defintions:

- A **scatter plot** consisting of one point for each row of the table. 
    - It's first argument is the label of the column to be plotted on the horizontal axis
    - It's second argument is the label of the column on the vertical axis
- A **histogram** is a plot that represents the distibution of a single variable
    - It's first and only argument is the numerical data for which you would like to view the distribution
- A **Bar Graph** is a plot that shows how a variable is different across groups
    - It's first argument is the column for the groups
    - It's second argument is the column for the values

<hr>

### Question 3

Create a scatter plot using two variables for each sub-part

**Syntax:**
```python
my_data.scatter("column name for x-axis", "column name for y-axis")
```

<hr style="border-style:dashed">

#### Question 3.1

Create a scatter plot of _`Maximum Running Speed`_ on the **x-axis** and _`Maximum Gap Distance Crossed`_ on the **y-axis**

In [None]:
my_data.scatter("Maximum Running Speed", "Maximum Gap Distance Crossed") #SOLUTION

<hr style="border-style:dashed">

#### Question 3.2

Create a scatter plot of _`Maximum Running Speed`_ on the **x-axis** and _`Gap Crossing Speed`_ on the **y-axis**

In [None]:
my_data.scatter("Maximum Running Speed", "Gap Crossing Speed") #SOLUTION

<hr style="border-style:dashed">

#### Question 3.3

Create a scatter plot of _`Maximum Gap Distance Crossed`_ on the **x-axis** and _`Gap Crossing Speed`_ on the **y-axis**

In [None]:
my_data.scatter("Maximum Gap Distance Crossed", "Gap Crossing Speed") #SOLUTION

<hr>

### Question 4

Create a histogram using one variable for each sub part

**Syntax:** 
```python
my_data.hist("column name")
```

<hr style="border-style:dashed">

#### Question 4.1

Create a histogram of the _`Maximum Running Speed`_ column

In [None]:
my_data.hist("Maximum Running Speed") #SOLUTION

<hr style="border-style:dashed">

#### Question 4.2

Create a histogram of the _`Gap Crossing Speed`_ column

In [None]:
my_data.hist("Gap Crossing Speed") #SOLUTION

<hr style="border-style:dashed">

#### Question 4.3

Create a histogram of the _`Maximum Gap Distance Crossed`_ column

In [None]:
my_data.hist("Maximum Gap Distance Crossed") #SOLUTION

<hr style="border-style:dashed">

#### Question 4.4

Create a histogram of the _`Washers`_ column

In [None]:
my_data.hist("Washers") #SOLUTION

<hr>

### Question 5

Create a bar chart using the two variables for each sub part

**Syntax:**
```python
my_data.bar("column for groups", "column for values")
```

<hr style="border-style:dashed">

#### Question 5.1

Plot a Bar Graph using _`Team Number`_ column for the groups and the _`Maximum Running Speed`_ for the values

In [None]:
my_data.bar("Team Number", "Maximum Running Speed") #SOLUTION

<hr style="border-style:dashed">

#### Question 5.2

Plot a Bar Graph using _`Team Number`_ column for the groups and the _`Gap Crossing Speed`_ for the values

In [None]:
my_data.bar("Team Number", "Gap Crossing Speed") #SOLUTION

<hr style="border: 2px solid #003262">
<hr style="border: 2px solid #C9B676">

## Conclusion

Now that you’ve visualized the data in a variety of ways, here are some questions to consider:
- What patterns do you see when visualizing the performance characteristics of the _DASH robots_?
- What visualizations are most helpful for the different performance characteristics?
- Are there any other relationships between performance characteristics that you’d like to explore further?

This notebook is also designed to **prepare you for future projects** where you might need to work with a completely different set of data. The notebook walks you through different ways to upload data and visualize it. It can serve as a resource for you to return to in the future. After you’ve completed the steps and used the notebook to think about the _DASH robot’s_ performance characteristics, feel free to get creative. Visualize data in new ways to see if you can find meaningful relationships, load a new set of data if you have one, or just try playing around with the code.

<hr>
<hr>

### Explore Data Science

If you enjoyed working with data in Jupyter notebooks, you may want to learn more about the Data Science community here at Cal. The following resources are a great place to look!

- [Data Science Department Homepage](https://data.berkeley.edu)
- [Data Science Modules](https://ds-modules.github.io/)
- [Data Science Course Offerings at Berkeley](https://data.berkeley.edu/academics/undergraduate-programs/courses)
- [Data 8 Course Information (Intro to Data Science)](https://data8.org) 

<hr style="border: 2px solid #003262">
<hr style="border: 2px solid #C9B676">

## A Final Request: Feedback Form

<div class="alert alert-block alert-info">
    <p style="font-size:20px">We encourage students to fill out the following feedback form to share your experience with this Module created notebook. This feedback form will take no longer than 5 minutes. At UC Berkeley Data Science Undergraduate Studies – Modules, we appreciate all feedback to improve the learning of students and experience utilizing Jupyter Notebooks for Data Science Education. You can fill out the survey by running the below and clicking the button that appears. Thank you in advance for your time!
</div>

In [5]:
feedback_button()