Pandas
======

- Pandas is built on top of numpy
- It allows more sophisticated *Series* (like a dict) 
  and *Data Frame* (like a table) objects
- the datatype of the contained data is a NumPy.Scalar (e.g., float64)

Series
--------

In [1]:
import numpy as np
import pandas as pd

### Create from a dict ###

- Use the `Series` constructor (accepts any dict-like object)
- Can also take a data iterable (plus an optional same-length iterable
  representing data labels)

In [2]:
test_balance_data = {
    'alice': 20.00,
    'bob': 20.18,
    'carol': 1.05,
    'dan': 42.42,
}
balances = pd.Series(test_balance_data)
balances

alice    20.00
bob      20.18
carol     1.05
dan      42.42
dtype: float64

In [3]:
values = list(test_balance_data.values())
unlabeled_balances = pd.Series(values)
unlabeled_balances

0    20.00
1    20.18
2     1.05
3    42.42
dtype: float64

In [4]:
labels = list(test_balance_data.keys())
unlabeled_balances = pd.Series(values, index=labels)
unlabeled_balances

alice    20.00
bob      20.18
carol     1.05
dan      42.42
dtype: float64

### Accessing Data ###

- by index (aka `.iloc`)
- by label (if labelled) (aka `.loc`)
- like a dict (`.items()`, `.keys()`) **but note `.values` 
  NOT `.values()`**
- with dot notation (as long as label meets variable naming constraints)
- by slice (but note slicing with labels is inclusive)

In [5]:
# Optional import to use markdown-formatted output
from IPython.display import display, Markdown

def render(md):
    return display(Markdown(md))

In [6]:
print( balances[0] )
print( type(balances[0]) )
print( balances[-1] )
print( balances.iloc[0] )

20.0
<class 'numpy.float64'>
42.42
20.0


In [7]:
print( balances['alice'] )
print( balances['dan'] )
print( balances.loc['alice'] )

20.0
42.42
20.0


In [8]:
for label, value in balances.items():
    print(f'The label {label} has a value of {value}')

The label alice has a value of 20.0
The label bob has a value of 20.18
The label carol has a value of 1.05
The label dan has a value of 42.42


In [9]:
print( balances.keys() )
print( balances.values )  # .values NOT .values()

Index(['alice', 'bob', 'carol', 'dan'], dtype='object')
[20.   20.18  1.05 42.42]


In [10]:
try:
    balances['bmo']
except KeyError:
    render('Accessing a non-existent key raises `KeyError`')

Accessing a non-existent key raises `KeyError`

In [11]:
if 'bmo' not in balances:
    print("Use `in` to test the existence of a label")

Use `in` to test the existence of a label


In [12]:
balances.alice

20.0

In [13]:
# Slicing by index excludes the end value
balances.iloc[0:3]  # items 0, 1, 2

alice    20.00
bob      20.18
carol     1.05
dtype: float64

In [14]:
# Slicing by values includes the end value
balances.loc['alice':'dan']

alice    20.00
bob      20.18
carol     1.05
dan      42.42
dtype: float64