![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

<a href="https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fcallysto%2Ftmteachingturtles&branch=master&subPath=TMDataTurtles/turtles-and-data-student.ipynb&depth=1" target="_parent"><img src="https://raw.githubusercontent.com/callysto/curriculum-notebooks/master/open-in-callysto-button.svg?sanitize=true" width="123" height="24" alt="Open in Callysto"/></a>

# Turtles and Data

This notebook will use what is probably the largest turtles command data set ever assembled. That said, we only spent about five minutes looking before just making one.

Using this notebook and our previous knowledge of Python turtles, let's learn some basics about data manipulation.

## pandas

[pandas](https://pandas.pydata.org) is a free library for Python that makes loading and manipulating data more straightforward by using something called a **dataframe**. Think of a dataframe as like a spreadsheet, but better.

### Loading Data 

Here we are reading a `csv` (comma separated values) file of the data set using the pandas command `read_csv`. The argument we pass to `read_csv` is the file name and location (either a file a local file or one on the internet).

In [None]:
import pandas as pd # the library for data analysis

# declare a variable of where to find the file,  ./ means in the current directory
csv_file = "./turtles-drawings.csv"

# read the csv file into a dataframe called df
df = pd.read_csv(csv_file)

# display the dataframe we called df
df

You'll notice that there are nearly 4000 rows of data, more than we'd want to deal with in a spreadsheet. You'll also note that the rows don't appear to be in any particular order, we're going to need to manipulate our data frame in order to draw shapes.

But first, let's go over what each column means:

1. `angle` : the angle at which a turn will be taken, if the `direction` is a turn (number or `NaN`, which means not a number).
2. `direction` : The direction (forward or backward, or left or right turn) the turtle will move (number or `NaN`).
3. `length` : The distance (in pixels) the turtle should move forward or backward, if `direction` is forward/backward (number or `NaN`).
4. `order` : The order in which commands should be run to draw a given shape (number, counting starts at 0).
5. `shape` : The shape the row is associated with (string).
5. `color` : The pen color to draw with (string).

### Filtering and Sorting Data

To filter the dataframe by values in a column and sort the values, use the following code:

In [None]:
df[(df['shape']=='star') & (df['color'] == 'yellow')].sort_values(by='order')

Let's explore the commands in that line of code that gave us a filtered and sorted dataframe.

```python
df[(df['shape']=='star') & (df['color'] == 'yellow')].sort_values(by='order')
```

First we are using part of the `df` dataframe, where the column `df['shape']` is equal to `'star'` **and** the column `df['color']` is equal to `'yellow'`.

Because we have more than one condition we use parentheses `()` around them, and the symbol `&` between them.

The `.sort_values(by='order')` is a *method*, which is like a *function*, defined for dataframes that sorts the values in the dataframe. There are [more options](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sort_values.html) for the sorting process if you are interested.

The next cell has more information about pandas commands, or you can skip down to using the drawing function.

## Some pandas Commands

The list below will use the name `df`, short for data frame, to represent the variable name that any data frame is assigned to.

### Selecting Data

To select a single column, use 
```python 
df["column_name"]
```
To select multiple columns, use 
```python 
df[["list","of","column","names"]]
```

### Filtering Data

To filter data to only rows where a column is a certain value, use
```python
df[df['column_name'] == value]
``` 

To have multiple requirements, wrap things in parenthesis such as 
```python
df[(df['column_name1'] == value1) & (df['column_name2'] == value2)]
``` 
where `&` means "and". This command means we're looking for **rows** where `column_name1` is equal to `value1` **AND** where `column_name2` is equal to `value2`. 

Similarily, for `or`, represented by a vertical bar `|` 
```python
df[(df['column_name1'] == value1) | (df['column_name2'] == value2)]
``` 
This command means  we're looking for **rows** where `column_name1` is equal to `value1` **OR** where `column_name2` is equal to `value2`.

### Sorting Values

To sort alphanumerically by a single column use
```python
df.sort_values(by="column_name")
``` 

To sort alphanumerically by multiple columns, in the order that your list is given
```python
df.sort_values(by=["list", "of", "names"])
```

## Using the `turtle_data` Function to Draw Shapes

To allow you to focus on manipulating the data, let's declare a function to parse the dataframe and draw turtle graphics.

In [None]:
from mobilechelonian import Turtle # for drawing with turtles

def turtle_data(df):
    t = Turtle()
    t.speed(10)
    for index, row in df.iterrows(): 
        t.pencolor(row.color)
        if row.direction == "left":
            t.left(row.angle)
        if row.direction == 'right':
            t.right(row.angle)
        if row.direction == "forward":
            t.forward(row.length)
        if row.direction == "backward":
            t.backward(row.length)
print('You can now use the turtle_data function.')

The cell below demonstrates how to use the function. The `turtle_data` function simply takes the turtles DataFrame as input, and it handles the rest.

In [None]:
# create a new dataframe where shape is star, color is yellow, and values are sorted
demo_frame = df[(df['shape']=='star') & (df['color'] == 'yellow')].sort_values(by='order')

# call the function using that new dataframe
turtle_data(demo_frame)

To see the list of shapes in the original dataframe, we can use `df['shape'].unique()`.

In [None]:
df['shape'].unique()

We can also list the unique colors.

In [None]:
df['color'].unique()

## Task

Filter the data appropriately to draw a box, then try other shapes and colors.

You can use a single line of code to filter and sort the data, as we did above, or multiple steps like:

```python
df_box = df[df['shape'] == 'star']
df_green_box = df_box[df_box['color']=='yellow']
df_sorted = df_green_box.sort_values(by='order')
turtle_data(df_sorted)
```

In [None]:
# enter code below


# Congratulations

You've successfully programmed your turtle using an instruction set from a pandas dataframe!

## Advanced: Alternating Colors

As an advanced challenge, you can try drawing shapes with alternating colors from the dataframe.

[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)