![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

# Turtles and Data

This notebook will use what is probably the largest turtles command data set ever assembled (that said, we only spent about 5 minutes looking before just making one). Using this notebook and our previous knowledge of turtles, let's learn some basics about data and pandas in Python with turtles.

## Pandas

Pandas (Python Data Analysis Library, name derived from PANel DAta) is a free library for Python which makes loading and manipulating data more straightforward by taking advantage of something known as a **data frame**. Think of a data frame as something like a spreadsheet, but better. Let's see what that looks like. First we need to import some code libraries.

In [None]:
import pandas as pd
import numpy as np 
from mobilechelonian import Turtle
from helper.turthelp import turtle_data
print('Libraries successfully imported.')

### Loading Data 

Here we are downloading a `csv` (comma separated values) file of the data set stored on the cybera cloud. This is done using the pandas command `read_csv`, where the argument we pass it is the file name.

In [None]:
#csv_file = "https://swift-yeg.cloud.cybera.ca:8080/v1/AUTH_233e84cd313945c992b4b585f7b9125d/callysto-open-data/turtles_drawings.csv"
csv_file = "./turtles-drawings.csv"
df = pd.read_csv(csv_file)
df

You'll notice that there are nearly 4000 rows of data - more than we'd want to deal with in a spreadsheet. You'll also note that the rows don't appear to be in any particular order - we're going to have to manipulate our data frame in order to draw shapes!

But first, let's go over what each column is:

1. `angle`, number or `NaN` (not a number): the angle at which a turn will be taken, if the `direction` is a turn.
2. `direction`, number or `NaN`: The direction (forward or backward, or left or right turn) the turtle will move.
3. `length`, number or `NaN`: The distance (in pixels) the turtle should move forward or backward, if `direction` is forward/backward.
4. `order`, number: The order in which commands should be run to draw a given shape (counting starts at 0).
5. `shape`, string: The shape the row is associated with.
5. `color`, string: The pen color will draw with. 

### A few basic Pandas Commands 

The list below will use the name `df`, short for data frame, to represent the variable name that any data frame is assigned to.

### Selecting Data

To select a single column, use 
```python 
df["column_name"]
```
To select multiple columns, use 
```python 
df[["list","of","column","names"]]
```

### Filtering Data

To filter data to only rows where a column is a certain value, use
```python
df[df['column_name'] == value]
``` 

To have multiple requirements, wrap things in parenthesis such as 
```python
df[(df['column_name1'] == value1) & (df['column_name2'] == value2)]
``` 
where `&` means "and". This command means we're looking for **rows** where `column_name1` is equal to `value1` **AND** where `column_name2` is equal to `value2`. 

Similarily, for `or`, represented by a vertical bar `|` 
```python
df[(df['column_name1'] == value1) | (df['column_name2'] == value2)]
``` 
This command means  we're looking for **rows** where `column_name1` is equal to `value1` **OR** where `column_name2` is equal to `value2`.


### Sorting Values

To sort alphanumerically by a single column use
```python
df.sort_values(by="column_name")
``` 

To sort alphanumerically by multiple columns, in the order that your list is given
```python
df.sort_values(by=["list", "of", "names"])
``` 

## Using the `turtle_data` Function to Draw Shapes

We've provided a simple function which can parse the Data Frames above to draw turtle graphics so you can focus on manipulating the data frame. The cell below demonstrates how to use it. In this case, the `turtle_data` function simply takes the turtles DataFrame as input, and it handles the rest. This is demonstrated below. **Note**: The demonstration below will not actually look correct. This is because we haven't yet sorted or filtered the data.

In [None]:
demo_frame = df[df['shape'] == 'box']
turtle_data(demo_frame)

## Task

Filter the data appropriately to draw the box, then try other shapes/colors!

**Instructor**: All shapes/colors can be done with a similar code, simply change which color/shape you want

In [None]:
# enter code below
box_frame = df[df['shape'] == 'box']
green_box = box_frame[box_frame['color']=='green'].sort_values(by='order')

turtle_data(green_box)

In [None]:
turtle_data(df[(df['shape']=='star') & (df['color'] == 'yellow')].sort_values(by="order"))

## Alternating Colors

This is a little more advanced, and there's many ways to do this. Here is one example of building a new dataframe while alternating colors.

In [None]:
star_frame = df[df['shape'] == 'star']

ac = star_frame.sort_values(by=['order', 'color']).reset_index(drop=True)
alter_frame = pd.DataFrame(columns = ac.columns)
colors = ac.color.unique()
for i in range(ac.order.max() + 1):
    # get row
    color_index = i % 6
    color = colors[color_index]
    alter_frame= alter_frame.append(ac[(ac['order']==i) & (ac.color == color)], 
                                    ignore_index=True)
    
turtle_data(alter_frame)

### Congratulations

You've successfully programmed your turtle using an instruction set from a pandas dataframe!

[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)