# Pandas - DataFrames
#python/pandas

In [6]:
import pandas as pd

The second main data structure in pandas is a DataFrame. A DataFrame is a 2D object with labelled rows and columns and can hold multiple data types.

In [7]:
#Create a dictionary that contains two Series
items = {'bob': pd.Series([245,25,55], index=['bike', 'trousers','watch']),
          'alice': pd.Series([40,110,500,45],index=['book','glasses','bike','trousers'])}

type(items)

dict

In [8]:
#Create a DataFrame from the items dictionary
shopping_carts = pd.DataFrame(items)
shopping_carts

Unnamed: 0,bob,alice
bike,245.0,500.0
book,,40.0
glasses,,110.0
trousers,25.0,45.0
watch,55.0,


The row labels are built form the union of the data labels provided in the Series and the column labels are taken from the keys of the dictionary. The columns are ordered alphabetically and not in the order given by the dictionary. When importing data into a DataFrame from a file, this is not the case.

Where there is no value for a given row and column, we get a NaN value. This stands for 'not a number'.

NB: If we don't provide row labels, pandas will use numerical indices when it creates the DataFrame (starting from index 0)

We can return information about the data in our DataFrame using attributes.

In [9]:
shopping_carts.index

Index(['bike', 'book', 'glasses', 'trousers', 'watch'], dtype='object')

In [10]:
shopping_carts.columns

Index(['bob', 'alice'], dtype='object')

In [11]:
shopping_carts.values

array([[245., 500.],
       [ nan,  40.],
       [ nan, 110.],
       [ 25.,  45.],
       [ 55.,  nan]])

In [12]:
shopping_carts.shape

(5, 2)

In [13]:
shopping_carts.ndim

2

In [14]:
shopping_carts.size

10

We can also be specific about the data that we load into our DataFrame by using the keywords column and index.

In [15]:
bob_shoping_cart = pd.DataFrame(items, columns = ['bob'])
bob_shoping_cart

Unnamed: 0,bob
bike,245
trousers,25
watch,55


In [16]:
sel_shopping_cart = pd.DataFrame(items, index=['trousers','book'])
sel_shopping_cart

Unnamed: 0,bob,alice
trousers,25.0,45
book,,40


We can create DataFrames from dictionaries of lists or arrays in the same way as for a dictionary of Series; however, the length of the lists or arrays must be the same.

In [17]:
# We create a dictionary of lists (arrays)
data = {'Integers' : [1,2,3],
        'Floats' : [4.5, 8.2, 9.6]}

# We create a DataFrame and provide the row index
df = pd.DataFrame(data, index = ['label 1', 'label 2', 'label 3'])

# We display the DataFrame
df

Unnamed: 0,Integers,Floats
label 1,1,4.5
label 2,2,8.2
label 3,3,9.6


The last method for manually creating Pandas DataFrames that we want to look at, is by using a list of Python dictionaries. The procedure is the same as before, we start by creating the dictionary and then passing the dictionary to the pd.DataFrame() function.

In [18]:
# We create a list of Python dictionaries
items2 = [{'bikes': 20, 'pants': 30, 'watches': 35}, 
          {'watches': 10, 'glasses': 50, 'bikes': 15, 'pants':5}]

# We create a DataFrame 
store_items = pd.DataFrame(items2)

# We display the DataFrame
store_items

Unnamed: 0,bikes,glasses,pants,watches
0,20,,30,35
1,15,50.0,5,10


In [19]:
# We create a list of Python dictionaries
items2 = [{'bikes': 20, 'pants': 30, 'watches': 35}, 
          {'watches': 10, 'glasses': 50, 'bikes': 15, 'pants':5}]

# We create a DataFrame  and provide the row index
store_items = pd.DataFrame(items2, index = ['store 1', 'store 2'])

# We display the DataFrame
store_items

Unnamed: 0,bikes,glasses,pants,watches
store 1,20,,30,35
store 2,15,50.0,5,10


## Accessing Elements in Pandas DataFrames

We access items from a DataFrame using the column and row labels. E.g.

In [20]:
store_items[['bikes']]

Unnamed: 0,bikes
store 1,20
store 2,15


In [21]:
store_items[['bikes','watches']]

Unnamed: 0,bikes,watches
store 1,20,35
store 2,15,10


In [22]:
store_items.loc[['store 1']]

Unnamed: 0,bikes,glasses,pants,watches
store 1,20,,30,35


In [23]:
store_items['bikes']['store 1']

20

We can modify an existing DataFrame like this:

In [24]:
store_items['shirts'] = [15,2]
store_items

Unnamed: 0,bikes,glasses,pants,watches,shirts
store 1,20,,30,35,15
store 2,15,50.0,5,10,2


We can see that when we add a new column, the new column is added at the end of our DataFrame.

We can also add new columns to our DataFrame by using arithmetic operations between other columns in our DataFrame. Let's see an example:

In [25]:
# We make a new column called suits by adding the number of shirts and pants
store_items['suits'] = store_items['pants'] + store_items['shirts']

# We display the modified DataFrame
store_items

Unnamed: 0,bikes,glasses,pants,watches,shirts,suits
store 1,20,,30,35,15,45
store 2,15,50.0,5,10,2,7


Suppose now, that you opened a new store and you need to add the number of items in stock of that new store into your DataFrame. We can do this by adding a new row to the store_items Dataframe. To add rows to our DataFrame we first have to create a new Dataframe and then append it to the original DataFrame. Let's see how this works

In [26]:

# We create a dictionary from a list of Python dictionaries that will number of items at the new store
new_items = [{'bikes': 20, 'pants': 30, 'watches': 35, 'glasses': 4}]

# We create new DataFrame with the new_items and provide and index labeled store 3
new_store = pd.DataFrame(new_items, index = ['store 3'])

# We display the items at the new store
new_store

Unnamed: 0,bikes,glasses,pants,watches
store 3,20,4,30,35


We now add this row to our store_items DataFrame by using the .append() method.

In [34]:
# We append store 3 to our store_items DataFrame
store_items = store_items.append(new_store)

# We display the modified DataFrame
store_items

of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  sort=sort)


Unnamed: 0,bikes,glasses,pants,shirts,suits,watches
store 1,20,,30,15.0,45.0,35
store 2,15,50.0,5,2.0,7.0,10
store 3,20,4.0,30,,,35
store 3,20,4.0,30,,,35
store 3,20,4.0,30,,,35


Notice that by appending a new row to the DataFrame, the columns have been put in alphabetical order.

We can also add new columns of our DataFrame by using only data from particular rows in particular columns. For example, suppose that you want to stock stores 2 and 3 with new watches and you want the quantity of the new watches to be the same as the watches already in stock for those stores. Let's see how we can do this

It is also possible, to insert new columns into the DataFrames anywhere we want. The dataframe.insert(loc,label,data) method allows us to insert a new column in the dataframe at location loc, with the given column label, and given data. Let's add new column named shoes right before the suits column. Since suits has numerical index value 4 then we will use this value as loc. Let's see how this works:

In [35]:
# We insert a new column with label shoes right before the column with numerical index 4
store_items.insert(4, 'shoes', [8,5,4,3,2])

# we display the modified DataFrame
store_items

Unnamed: 0,bikes,glasses,pants,shirts,shoes,suits,watches
store 1,20,,30,15.0,8,45.0,35
store 2,15,50.0,5,2.0,5,7.0,10
store 3,20,4.0,30,,4,,35
store 3,20,4.0,30,,3,,35
store 3,20,4.0,30,,2,,35


Just as we can add rows and columns we can also delete them. To delete rows and columns from our DataFrame we will use the .pop() and .drop() methods. The .pop() method only allows us to delete columns, while the .drop() method can be used to delete both rows and columns by use of the axis keyword. Let's see some examples

In [36]:
# We remove the new watches column
store_items.pop('watches')

# we display the modified DataFrame
store_items

Unnamed: 0,bikes,glasses,pants,shirts,shoes,suits
store 1,20,,30,15.0,8,45.0
store 2,15,50.0,5,2.0,5,7.0
store 3,20,4.0,30,,4,
store 3,20,4.0,30,,3,
store 3,20,4.0,30,,2,


In [38]:
# We remove the watches and shoes columns
store_items = store_items.drop(['pants', 'shoes'], axis = 1)

# we display the modified DataFrame
store_items

Unnamed: 0,bikes,glasses,shirts,suits
store 1,20,,15.0,45.0
store 2,15,50.0,2.0,7.0
store 3,20,4.0,,
store 3,20,4.0,,
store 3,20,4.0,,


In [39]:
# We remove the store 2 and store 1 rows
store_items = store_items.drop(['store 2', 'store 1'], axis = 0)

# we display the modified DataFrame
store_items

Unnamed: 0,bikes,glasses,shirts,suits
store 3,20,4.0,,
store 3,20,4.0,,
store 3,20,4.0,,


Sometimes we might need to change the row and column labels. Let's change the bikes column label to hats using the .rename() method

In [41]:
# We change the column label bikes to hats
store_items = store_items.rename(columns = {'bikes': 'hats'})

# we display the modified DataFrame
store_items

Unnamed: 0,hats,glasses,shirts,suits
store 3,20,4.0,,
store 3,20,4.0,,
store 3,20,4.0,,
