# Turtles and Data

This note book will use what is probably the largest turtles command data set ever assembled (that said, we only spent about 5 minutes looking before just making one....). Using this notebook and our previous knowledge of turtles, let's learn some basics about using data and pandas in Python with turtles

## Pandas

Pandas (Python Data Analysis Library, name derived from PANel DAta) is a free library for python which makes loading and manipulating data more straightforward by taking advantage of something known as a **data frame**. Think of a data frame as something like an excel spreadsheet on steroids. Let's take a look at what that looks like


In [None]:
import pandas as pd
import numpy as np 
from mobilechelonian import Turtle
# Note: move this to helper function later 
def turtle_data(df):
    '''
    Function to read some basic turtles control data and 
    convert it into actually scooting that little turtle around
    very basic so far. 
    
    TODO other command types? 
    '''
    t = Turtle()
    t.speed(10)
    for index, row in df.iterrows(): 
        t.pencolor(row.color)
        if np.isnan(row.length):
            if row.direction == "left":
                t.left(row.angle)
            elif row.direction == 'right':
                t.right(row.angle)
        elif np.isnan(row.angle):
            if row.direction == "forward":
                t.forward(row.length)
            elif row.direction == "backward":
                t.backward(row.length)

### Loading Data 

Here we are downloading a `csv` (comma separated values) file of the data set stored on the cybera cloud. This is done using the pandas command `read_csv`, where the argument we pass it looks huge and terrible, but that's _actually_ just the file name!

In [None]:
df = pd.read_csv("https://swift-yeg.cloud.cybera.ca:8080/v1/AUTH_233e84cd313945c992b4b585f7b9125d/callysto-open-data/turtles_drawings.csv")
df

you'll notice that there are nearly 4000 rows of data - more than we'd want to deal with in Excel. You'll also note that the rows don't appear to be in any particular order - we're going to have to manipulate our data frame in order to draw shapes! But first, let's go over what each column is

1. `angle`, number or `NaN` (not a number): the angle at which a turn will be taken, if the `direction` is a turn.
2. `direction`, number or `NaN`: The direction (forward, backward, left, right) the turtle will move
3. `length`, number or `NaN`: How far the turtle should move forward or backward, if `direction` is forward/backward
4. `order`, number: The order in which commands should be run to draw a given shape (counting starts at 0)
5. `shape`, string: the shape the row is associated with
5. `color` the color of shape to draw. 

### A few basic Pandas Commands 

The list below will use the name `df`, short for data frame, to represent the variable name that any data frame is assigned to

### Sorting
To sort values, use 
```python
df.sort_values(by="column_name")
``` 
to sort alphanumerically by a single column use
```python
df.sort_values(by=["list", "of", "names"])
``` 
to sort alphanumerically by multiple columns, in the order that your list is given
### Selecting Data
To select a single column, use 
```python 
df['column_name']
```
To select multiple columns, use 
```python 
df[[list_of_column_names]]
```
### Filtering Data
To filter data to only rows where a column is a certain value, use

```python
df[df['column_name'] == value]
```
 To have multiple requirements, wrap things in parenthesis such as 
```python
df[(df['column_name1'] == value1) & (df['column_name2'] == value2)]
``` 
where `&` means "and". This command means we're looking for **rows** where `column_name1` is equal to `value1` **AND** where `column_name2` is equal to `value2`


Similarily, for `or`, represented by a vertical bar `|` 
```python
df[(df['column_name1'] == value1) | (df['column_name2'] == value2)]
``` 
This command means  we're looking for **rows** where `column_name1` is equal to `value1` **OR** where `column_name2` is equal to `value2`


### Your Task

Using the dataframe we loaded earlier, `df`, and the provided function `turtle_data`, draw a blue box. See usage of `turtle_data` below (Note: This will _not_ draw a shape as the data is random) 
 

In [None]:
# df[:10] means "take the first 10 rows of the data frame"
turtle_data(df[:10])

## Next Task

Now that you've drawn the box, check out which other shapes you can create using the data set. 