# Pandas Data Frames

* Think of it like a very powerful excel spreadsheet
* keys => column names
* When a data frame of several rows are created it will combine the "UNION" of all the information
* Missing data will be filled in with "NaN"

**Create data frame from dictionary with defined indices**

In [20]:
import pandas as pd
items = {'Bob' : pd.Series([245, 25, 55], index =['bike','pants','watch']), 
         'Alice': pd.Series([40, 110, 500, 45], index =['book','glasses', 'bike', 'pants'])}
print(items)
print(type(items))


{'Bob': bike     245
pants     25
watch     55
dtype: int64, 'Alice': book        40
glasses    110
bike       500
pants       45
dtype: int64}
<class 'dict'>


In [9]:
shopping_carts = pd.DataFrame(items)
shopping_carts

Unnamed: 0,Bob,Alice
bike,245.0,500.0
book,,40.0
glasses,,110.0
pants,25.0,45.0
watch,55.0,


When indices are not defined, numeric indices are used

In [19]:
items_num = {'Bob' : pd.Series([245, 25, 55]), 
         'Alice': pd.Series([40, 110, 500, 45])}
df = pd.DataFrame(items_num)
df

Unnamed: 0,Bob,Alice
0,245.0,40
1,25.0,110
2,55.0,500
3,,45


In [12]:
shopping_carts.index

Index(['bike', 'book', 'glasses', 'pants', 'watch'], dtype='object')

In [13]:
shopping_carts.columns

Index(['Bob', 'Alice'], dtype='object')

In [14]:
shopping_carts.values

array([[245., 500.],
       [ nan,  40.],
       [ nan, 110.],
       [ 25.,  45.],
       [ 55.,  nan]])

In [15]:
shopping_carts.shape

(5, 2)

In [16]:
shopping_carts.size

10

In [17]:
shopping_carts.ndim

2

**Selectively load information into Data Frames**

In [23]:
bob_df = pd.DataFrame(items, columns=['Bob'])
bob_df

Unnamed: 0,Bob
bike,245
pants,25
watch,55


In [25]:
sel_index = pd.DataFrame(items, index=['pants','book'])
sel_index

Unnamed: 0,Bob,Alice
pants,25.0,45
book,,40


In [27]:
sel_index_alice = pd.DataFrame(items, index=['pants','book'], columns=['Alice'])
sel_index_alice

Unnamed: 0,Alice
pants,45
book,40


**Creating data frames from Dict of Lists**

In [39]:
nums = {'Integers' : [1, 2, 3, 4],
       'Floats' : [4.5, 2.3, 6.3, 5.5]}
df = pd.DataFrame(nums)
df

Unnamed: 0,Integers,Floats
0,1,4.5
1,2,2.3
2,3,6.3
3,4,5.5


We can add indices while creating a data frame 

In [40]:
dfi = pd.DataFrame(nums, index=['label1','label2','label3','label4'])
dfi

Unnamed: 0,Integers,Floats
label1,1,4.5
label2,2,2.3
label3,3,6.3
label4,4,5.5


### Accessing elements

In [62]:
items = [{'bikes' : 20, 'pants' : 30, 'watches' : 35},
         {'watches' : 10, 'glasses' : 50, 'bikes' : 15, 'pants' : 5}]
store_items = pd.DataFrame(items, index={'store1','store2'})
store_items

Unnamed: 0,bikes,pants,watches,glasses
store1,20,30,35,
store2,15,5,10,50.0


In [63]:
store_items[['bikes']]

Unnamed: 0,bikes
store1,20
store2,15


In [48]:
store_items[['bikes', 'pants']]

Unnamed: 0,bikes,pants
store1,20,30
store2,15,5


In [50]:
store_items.loc[['store1']]

Unnamed: 0,bikes,pants,watches,glasses
store1,20,30,35,


Accessing individual elements, order is: **[Columnlabel]** [row label]

In [51]:
store_items['pants']['store2']

5

**Adding elements to an existing Data Frame**

Adding a column: 

In [64]:
store_items['shirts'] = [15, 2]
store_items

Unnamed: 0,bikes,pants,watches,glasses,shirts
store1,20,30,35,,15
store2,15,5,10,50.0,2


Add Column: from arithmetics on other columns

In [65]:
store_items['suits'] = store_items['shirts'] + store_items['pants']
store_items

Unnamed: 0,bikes,pants,watches,glasses,shirts,suits
store1,20,30,35,,15,45
store2,15,5,10,50.0,2,7


Add **ROW** using Append

In [66]:
item3 = {'bikes' : 20, 'glasses' : 4, 'pants' : 30, 'watches' : 35}
df3 = pd.DataFrame(item3, index={'store3'})
df3

Unnamed: 0,bikes,glasses,pants,watches
store3,20,4,30,35


In [67]:
store_items = store_items.append(df3)
store_items

Unnamed: 0,bikes,pants,watches,glasses,shirts,suits
store1,20,30,35,,15.0,45.0
store2,15,5,10,50.0,2.0,7.0
store3,20,30,35,4.0,,


Add new column and update it only for **selected indices** using slicing

In [68]:
store_items['new watches'] = store_items['watches'][1:]
store_items

Unnamed: 0,bikes,pants,watches,glasses,shirts,suits,new watches
store1,20,30,35,,15.0,45.0,
store2,15,5,10,50.0,2.0,7.0,10.0
store3,20,30,35,4.0,,,35.0


Add new column at a specific location using **Insert** 

In [69]:
store_items.insert(5, 'shoes', [8 , 5, 2])
store_items

Unnamed: 0,bikes,pants,watches,glasses,shirts,shoes,suits,new watches
store1,20,30,35,,15.0,8,45.0,
store2,15,5,10,50.0,2.0,5,7.0,10.0
store3,20,30,35,4.0,,2,,35.0


**Removing columns**

1. Pop - can only remove columns
2. drop - can remove row or column , using axis 

In [70]:
store_items.pop('new watches')
store_items

Unnamed: 0,bikes,pants,watches,glasses,shirts,shoes,suits
store1,20,30,35,,15.0,8,45.0
store2,15,5,10,50.0,2.0,5,7.0
store3,20,30,35,4.0,,2,


In [73]:
store_items = store_items.drop(['store3'], axis=0)
store_items

Unnamed: 0,bikes,pants,watches,glasses,shirts,shoes,suits
store1,20,30,35,,15.0,8,45.0
store2,15,5,10,50.0,2.0,5,7.0


In [74]:
store_items = store_items.drop(['watches','shoes'], axis=1)
store_items

Unnamed: 0,bikes,pants,glasses,shirts,suits
store1,20,30,,15.0,45.0
store2,15,5,50.0,2.0,7.0


**Rename** function to change names of column or index

In [76]:
store_items = store_items.rename(columns = {'bikes' : 'hats'})
store_items

Unnamed: 0,hats,pants,glasses,shirts,suits
store1,20,30,,15.0,45.0
store2,15,5,50.0,2.0,7.0


In [78]:
store_items = store_items.rename(index={'store2' : 'second store'})
store_items

Unnamed: 0,hats,pants,glasses,shirts,suits
store1,20,30,,15.0,45.0
second store,15,5,50.0,2.0,7.0


In [82]:
# borrowing the existing column name as the new index name
store_items = store_items.set_index('pants')
store_items

KeyError: "None of ['pants'] are in the columns"