![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

# Turtles and Data

This notebook will use what is probably the largest turtles command data set ever assembled. That said, we only spent about five minutes looking before just making one.

Using this notebook and our previous knowledge of turtles, let's learn some basics about data and pandas in Python with turtles.

## Pandas

Pandas (Python Data Analysis Library, name derived from PANel DAta) is a free library for Python which makes loading and manipulating data more straightforward by taking advantage of something known as a **data frame**. Think of a data frame as something like a spreadsheet, but better.

First we need to import some code libraries.

In [None]:
import pandas as pd
import numpy as np 
from mobilechelonian import Turtle
from helper.turthelp import turtle_data
print('Libraries successfully imported.')

### Loading Data 

Here we are downloading a `csv` (comma separated values) file of the data set stored on the cybera cloud. This is done using the pandas command `read_csv`, where the argument we pass it is the file name (either a file on the internet or a local file).

In [None]:
#csv_file = "https://swift-yeg.cloud.cybera.ca:8080/v1/AUTH_233e84cd313945c992b4b585f7b9125d/callysto-open-data/turtles_drawings.csv"
csv_file = "./turtles-drawings.csv"
df = pd.read_csv(csv_file)
df

You'll notice that there are nearly 4000 rows of data, more than we'd want to deal with in a spreadsheet. You'll also note that the rows don't appear to be in any particular order, we're going to need to manipulate our data frame in order to draw shapes.

But first, let's go over what each column is:

1. `angle` : the angle at which a turn will be taken, if the `direction` is a turn (number or `NaN`, which means not a number).
2. `direction` : The direction (forward or backward, or left or right turn) the turtle will move (number or `NaN`).
3. `length` : The distance (in pixels) the turtle should move forward or backward, if `direction` is forward/backward (number or `NaN`).
4. `order` : The order in which commands should be run to draw a given shape (number, counting starts at 0).
5. `shape` : The shape the row is associated with (string).
5. `color` : The pen color to draw with (string).

### Filtering and Sorting Data

To filter the dataframe by values in a column and sort the values, use the following code:

In [None]:
df[(df['shape']=='star') & (df['color'] == 'yellow')].sort_values(by='order')

Let's explore the commands in that line of code that gave us a filtered and sorted dataframe.

```python
df[(df['shape']=='star') & (df['color'] == 'yellow')].sort_values(by='order')
```

First we are using part of the `df` dataframe, where the column `df['shape']` is equal to `'star'` **and** the column `df['color']` is equal to `'yellow'`.

Because we have more than one condition we use parentheses `()` around them and the symbol `&` between them.

The `.sort_values(by='order')` is a *method*, which is like a *function*, defined for dataframes that sorts the values in the dataframe. There are [more options](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sort_values.html) for the sorting process if you are interested.

The next cell has more information about pandas commands, or you can skip down to using the drawing function.

## Some Pandas Commands 

The list below will use the name `df`, short for data frame, to represent the variable name that any data frame is assigned to.

### Selecting Data

To select a single column, use 
```python 
df["column_name"]
```
To select multiple columns, use 
```python 
df[["list","of","column","names"]]
```

### Filtering Data

To filter data to only rows where a column is a certain value, use
```python
df[df['column_name'] == value]
``` 

To have multiple requirements, wrap things in parenthesis such as 
```python
df[(df['column_name1'] == value1) & (df['column_name2'] == value2)]
``` 
where `&` means "and". This command means we're looking for **rows** where `column_name1` is equal to `value1` **AND** where `column_name2` is equal to `value2`. 

Similarily, for `or`, represented by a vertical bar `|` 
```python
df[(df['column_name1'] == value1) | (df['column_name2'] == value2)]
``` 
This command means  we're looking for **rows** where `column_name1` is equal to `value1` **OR** where `column_name2` is equal to `value2`.


### Sorting Values

To sort alphanumerically by a single column use
```python
df.sort_values(by="column_name")
``` 

To sort alphanumerically by multiple columns, in the order that your list is given
```python
df.sort_values(by=["list", "of", "names"])
``` 

## Using the `turtle_data` Function to Draw Shapes

To allow you to focus on manipulating the data, we have written a function to parse the dataframe and draw turtle graphics.

The line `from helper.turthelp import turtle_data` in the first code cell of this notebook imported that function from the included Python code file.

The cell below demonstrates how to use it. In this case, the `turtle_data` function simply takes the turtles DataFrame as input, and it handles the rest. The result of running the code below will not look correct, because we haven't yet sorted or filtered the data.

In [None]:
demo_frame = df[df['shape'] == 'box']
turtle_data(demo_frame)

## Task

Filter the data appropriately to draw a box, then try other shapes and colors. You can use a single line of code to filter and sort the data, as we did above, or multiple steps like:

```python
df_box = df[df['shape'] == 'star']
df_green_box = df_box[df_box['color']=='yellow']
df_sorted = df_green_box.sort_values(by='order')
turtle_data(df_sorted)
```

**Instructor**: All shapes and colors can be achieved with a similar code, simply change which color and shape you want.

In [None]:
# enter code below
df_box = df[df['shape'] == 'box']
df_green_box = df_box[df_box['color']=='green']
df_sorted = df_green_box.sort_values(by='order')
turtle_data(df_sorted)

In [None]:
turtle_data(df[(df['shape']=='star') & (df['color'] == 'yellow')].sort_values(by="order"))

# Congratulations

You've successfully programmed your turtle using an instruction set from a pandas dataframe!

## Advanced: Alternating Colors

As an advanced challenge, you can try drawing shapes with alternating colors from the dataframe.

**Instructor**: This is more difficult, and there are many ways to do this. Here is one example of building a new dataframe while alternating colors.

In [None]:
star_frame = df[df['shape'] == 'star']
ac = star_frame.sort_values(by=['order', 'color']).reset_index(drop=True)
alter_frame = pd.DataFrame(columns=ac.columns)
colors = ac.color.unique()
for i in range(ac.order.max() + 1):
    color_index = i % 6
    color = colors[color_index]
    alter_frame= alter_frame.append(ac[(ac['order']==i) & (ac['color']==color)], ignore_index=True)
turtle_data(alter_frame)

[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)