# Pandas DataFrame

> DataFrame is a two dimensional size-mutable, potentially heterogeneuous tabular data structure with labeled axes   (rows and colums.) It can be thought of as a dict-like container for Series objects. 

`Series` is the data structure for a single column of `DataFrame`. Many `Series` make up a single `DataFrame`.

## Object Creation

In [16]:
from IPython.display import display, HTML
import pandas as pd
import numpy as np


series = pd.Series([1, 3, 5, np.nan, 6, 8])
display(series)

dates = pd.date_range('20200101', periods=60) # Create 60 days from 2020-01-01
df = pd.DataFrame(np.random.randn(60, 4), index=dates, columns=['x0', 'x1', 'x2', 'x3'])
display(df.head())

# Even better, read CSV directly from URL
df = pd.read_csv('https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv')
df.index = df['date']
df = df.drop(['date'], axis=1)
display(df.head())

0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64

Unnamed: 0,x0,x1,x2,x3
2020-01-01,0.282005,0.138954,-0.913562,-1.79751
2020-01-02,-1.103609,-1.162046,0.886719,0.424802
2020-01-03,1.5169,-1.363741,-0.068121,0.115255
2020-01-04,0.292872,-0.696904,-0.09623,-2.226183
2020-01-05,-0.343698,-0.211762,-1.388476,0.896453


Unnamed: 0_level_0,county,state,fips,cases,deaths
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2020-01-21,Snohomish,Washington,53061.0,1,0
2020-01-22,Snohomish,Washington,53061.0,1,0
2020-01-23,Snohomish,Washington,53061.0,1,0
2020-01-24,Cook,Illinois,17031.0,1,0
2020-01-24,Snohomish,Washington,53061.0,1,0


In [18]:
display(df[df['deaths'] > 100].sort_values(by=['deaths'], ascending=False))

Unnamed: 0_level_0,county,state,fips,cases,deaths
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2020-04-04,New York City,New York,,63307,2254
2020-04-03,New York City,New York,,57160,1867
2020-04-02,New York City,New York,,51810,1562
2020-04-01,New York City,New York,,47440,1374
2020-03-31,New York City,New York,,43139,1096
2020-03-30,New York City,New York,,38087,914
2020-03-29,New York City,New York,,33768,776
2020-03-28,New York City,New York,,30766,672
2020-04-03,Unknown,New York,,37,608
2020-03-27,New York City,New York,,25399,450


## Retrieve Data from DataFrame

- `.iloc` is primarily integer position based (from 0 to length-1 of the axis), but may also be used with a boolean array
- `.loc` is primarily label based, but may also be used with a boolean array. 

In [2]:
index = ['villager', 'halberdier', 'arbalest', 'paladin', 'imperial_skirmisher', 'champion']
columns = ['food', 'wood', 'gold', 'build_time']
data = [
    [50, 0, 0, 25],
    [35, 25, 0, 22],
    [0, 25, 45, 27],
    [60, 0, 75, 30],
    [25, 35, 0, 22],
    [60, 0, 20, 21]
]

df = pd.DataFrame(data=data, index=index, columns=columns)
display(df)

# Direct indexing accepts column labels.
display(df['food']) # Returns a Series
display(df[['food']]) # Returns a DataFrame
display(df[['food', 'wood']])

# Locate by list of labels
df.loc[['villager', 'arbalest']]
# Locate by range of labels
df.loc['villager':'arbalest']
# Locate by single label for row and column
df.loc['paladin', 'build_time']
# Locate by slice with labels for row and single label for column. 
df.loc['villager':'paladin', 'build_time']
# Locate by boolean array
df.loc[df['gold'] == 0]

Unnamed: 0,food,wood,gold,build_time
villager,50,0,0,25
halberdier,35,25,0,22
arbalest,0,25,45,27
paladin,60,0,75,30
imperial_skirmisher,25,35,0,22
champion,60,0,20,21


villager               50
halberdier             35
arbalest                0
paladin                60
imperial_skirmisher    25
champion               60
Name: food, dtype: int64

Unnamed: 0,food
villager,50
halberdier,35
arbalest,0
paladin,60
imperial_skirmisher,25
champion,60


Unnamed: 0,food,wood
villager,50,0
halberdier,35,25
arbalest,0,25
paladin,60,0
imperial_skirmisher,25,35
champion,60,0


Unnamed: 0,food,wood,gold,build_time
villager,50,0,0,25
halberdier,35,25,0,22
imperial_skirmisher,25,35,0,22


## Operations on DataFrame
If I want to swap two columns

In [3]:
df[['food', 'wood']] = df[['wood', 'food']]
print(df)
df[['food', 'wood']] = df[['wood', 'food']]
print(df)

                     food  wood  gold  build_time
villager                0    50     0          25
halberdier             25    35     0          22
arbalest               25     0    45          27
paladin                 0    60    75          30
imperial_skirmisher    35    25     0          22
champion                0    60    20          21
                     food  wood  gold  build_time
villager               50     0     0          25
halberdier             35    25     0          22
arbalest                0    25    45          27
paladin                60     0    75          30
imperial_skirmisher    25    35     0          22
champion               60     0    20          21
