# Stuff we want to do to `DataFrame`s

+ Extract a column
+ Extract a subset of the columns
+ Extract particular rows by index
+ Extract rows that match a criterion in a column
  - "I want all the rows where the `shoe_size` column has value > `7.5`."
  
This gets confusing if we're not careful. We'll explore a few concepts that will help us.

# `print` oddities

Two ways to display values in a notebook:
+ Last expression in a Code cell, which is displayed by JupyterHub
+ Using function `print`, which prints a string representation of the value

### Printing strings: there's a difference!

In [3]:
habitat = 'forest'

In [4]:
habitat

'forest'

In [5]:
print(habitat)

forest


Built-in function `print` doesn't include the quotes.

### Printing lists: no difference in how it looks

In [6]:
habitats = ['forest', 'park', 'street']

In [7]:
habitats

['forest', 'park', 'street']

In [9]:
print(habitats)

['forest', 'park', 'street']


### Printing DataFrames: big difference!

In [11]:
import pandas as df
faithful_df = df.read_csv('faithful.csv')
faithful_head_df = faithful_df.head() # Just keep the top few lines to save vertical space

In [12]:
faithful_head_df

Unnamed: 0,Index,"""Eruption length (mins)""",Eruption wait (mins)
0,1,3.6,79
1,2,1.8,54
2,3,3.333,74
3,4,2.283,62
4,5,4.533,85


In [13]:
print(faithful_head_df)

   Index   "Eruption length (mins)"  Eruption wait (mins)
0      1                      3.600                    79
1      2                      1.800                    54
2      3                      3.333                    74
3      4                      2.283                    62
4      5                      4.533                    85


So: to show the contents of a `DataFrame` or a `Series`, it's nicer to use an expression as the last thing in a cell rather than calling `print`.

# Python type `list`

+ A list begins with `[` and ends with `]`.
+ In between are a comma-separated list of values.
+ Each value has an _index_, starting at `0`.

### Indexing

In [14]:
habitats = ['forest', 'park', 'street']
habitats[0]

'forest'

In [16]:
# Here is nicer output.
print(f'There are {len(habitats)} items in the habitats list.')
print(f'0: {habitats[0]}')
print(f'1: {habitats[1]}')
print(f'2: {habitats[2]}')
print(f'3: {habitats[3]}') # This will result in an error.

There are 3 items in the habitats list.
0: forest
1: park
2: street


IndexError: list index out of range

You can also count from the end, which starts at index `-1`.

In [17]:
print(habitats[-1])
print(habitats[-2])
print(habitats[-3])
print(habitats[-4]) # Another error

street
park
forest


IndexError: list index out of range

### Adding lists

You can add `list`s together. This makes a new `list`.

In [19]:
habitats + habitats

['forest', 'park', 'street', 'forest', 'park', 'street']

In [20]:
[1, 3, 5] + [2, 4, 6]

[1, 3, 5, 2, 4, 6]

### I'll have a slice of that, please

You can extract a sublist like this:

In [23]:
coyote_counts = [59, 9, 10, 89, 19, 23, 54, 12, 0, 29, 8, 23, 30, 30]
coyote_counts[0:3]

[59, 9, 10]

Slicing: start at the first index, and go up to but not including the second index.

### Slicing tricks

You can omit one or both of the indexes:

In [24]:
coyote_counts[:3] # Up to but not including index 3

[59, 9, 10]

In [25]:
coyote_counts[3:] # Everything from index three to the end

[89, 19, 23, 54, 12, 0, 29, 8, 23, 30, 30]

In [26]:
coyote_counts[:] # Creates a copy of the whole list

[59, 9, 10, 89, 19, 23, 54, 12, 0, 29, 8, 23, 30, 30]

This `[:]` syntax and concept will be important when we get to `DataFrame`s in a bit.

In [28]:
# Challenge: how to extract [9, 10, 89, 19]?
coyote_counts[1:5]
coyote_counts

[59, 9, 10, 89, 19, 23, 54, 12, 0, 29, 8, 23, 30, 30]

In [29]:
# How do we extract the last 4 numbers, [8, 23, 30, 30]?
coyote_counts[-4:]

[8, 23, 30, 30]

### Heterogeneous lists

Lists can contain a mix of types.

In [30]:
list1 = ['A', 1, 'B', 2]
list2 = ['C', 3.14159]
list1 + list2

['A', 1, 'B', 2, 'C', 3.14159]

### Some functions that use `list`s

In [46]:
print(len(coyote_counts))
print(sum(coyote_counts))
print(min(coyote_counts))
print(max(coyote_counts))

14
395
0
89


### Getting every other item

In [32]:
coyote_counts

[59, 9, 10, 89, 19, 23, 54, 12, 0, 29, 8, 23, 30, 30]

In [37]:
coyote_counts[::3]

[59, 89, 54, 29, 30]

In [38]:
coyote_counts[::-1]

[30, 30, 23, 8, 29, 0, 12, 54, 23, 19, 89, 10, 9, 59]

# Dictionaries: type `dict`

Dictionaries map keys to values. In human dictionaries, the keys are words and the values are their definitions.

In [40]:
# Here's a dictionary!
urbanwildlife = {
  'habitat': ['forest', 'park', 'street', 'forest', 'street', 'park', 'forest', 'park', 'street', 'forest', 'street', 'park', 'street', 'park'],
  'coyote': [59, 9, 10, 89, 19, 23, 54, 12, 0, 29, 8, 23, 30, 30],
  'dog': [72, 197, 8811, 3, 555, 374, 1535, 101, 2216, 23, 1082, 35, 1635, 1469],
  'fox': [3, 10, 63, 54, 251, 43, 69, 57, 4, 6, 0, 6, 10, 3],
  'raccoon': [986, 64, 129, 213, 221, 73, 135, 24, 17, 528, 25, 106, 140, 114], 
  'site' : ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N'],
}

* Each line inside a dictionary contains a _key_ then a `:` then a _value_.
* In the example dictionary above, each value is a list.
* The lines end with commas `,` (and it's okay if the last line has a comma, Python ignores it).
* Indentation!

### Looking up keys to get values

Given a key, we can look up its value:

In [41]:
 urbanwildlife['fox']

[3, 10, 63, 54, 251, 43, 69, 57, 4, 6, 0, 6, 10, 3]

What is `urbanwildlife['fox'][-1]`?

In [42]:
 urbanwildlife['fox'][-1]

3

What is `urbanwildlife['habitat'][:3]`?

In [43]:
 urbanwildlife['habitat'][:3]

['forest', 'park', 'street']

### Making your own `DataFrame`s

You can use dictionaries to make `DataFrame`s!

In [45]:
import pandas as pd   # Hey, that's different!

df = pd.DataFrame(urbanwildlife)
df

Unnamed: 0,habitat,coyote,dog,fox,raccoon,site
0,forest,59,72,3,986,A
1,park,9,197,10,64,B
2,street,10,8811,63,129,C
3,forest,89,3,54,213,D
4,street,19,555,251,221,E
5,park,23,374,43,73,F
6,forest,54,1535,69,135,G
7,park,12,101,57,24,H
8,street,0,2216,4,17,I
9,forest,29,23,6,528,J
