# Dataframes with Pandas

DataFrames power data analysis in Python – they allow us to use grids just like we would conventional spreadsheets. They give us labelled columns and rows, functions, filtering and many more tools to get the most insight and ease of use from our data.

This page will introduce the creation of data frames, a few functions and tools to make selections within them. Let’s import our packages to get started (I’m sure you’ve installed them by now!).


In [1]:
import pandas as pd
import numpy as np

In [2]:
# Our table below has a scout report on 4 different players’ shooting, passing and defending skills.

playerList = ["Pagbo","Grazeman","Cantay","Ravane"]
skillList =["Shooting","Passing","Defending"]

In [3]:
#For this example, we have a random number generator for our scout
#I wouldn't recommend this for an actual team. !!!
scoreArray = np.random.randint(1,10,(4,3)) #create an array with random numbers between 1 and 10. size array 4rows,3columns
scout_df = pd.DataFrame(data=scoreArray, index=playerList, columns=skillList)
scout_df

Unnamed: 0,Shooting,Passing,Defending
Pagbo,3,4,8
Grazeman,7,5,2
Cantay,2,1,5
Ravane,3,2,6


## Selecting and indexing


In [4]:
#Square brackets for columns
scout_df['Shooting']


Pagbo       3
Grazeman    7
Cantay      2
Ravane      3
Name: Shooting, dtype: int32

In [5]:
#For rows, we use .loc if we use a name
#Turns out that DataFrame rows are also series!
scout_df.loc['Pagbo']

Shooting     3
Passing      4
Defending    8
Name: Pagbo, dtype: int32

In [6]:
#Or if we use a index number, .iloc
scout_df.iloc[1:3]

Unnamed: 0,Shooting,Passing,Defending
Grazeman,7,5,2
Cantay,2,1,5


## Creating and removing columns/rows
DataFrames make it really easy for us to be flexible with our datasets. Let’s ask our scout for their thoughts on more players and skills.

In [8]:
#Scout, what about their communication?
scout_df['Communication'] = np.random.randint(1,10,4)
scout_df

Unnamed: 0,Shooting,Passing,Defending,Communication
Pagbo,3,4,8,5
Grazeman,7,5,2,9
Cantay,2,1,5,3
Ravane,3,2,6,1


Our new manager doesn’t care about defending – they want these scores removed from report. The ‘.drop’ method makes this easy:



In [9]:
#axis=1 refers to columns
scout_df = scout_df.drop('Defending',axis=1)
scout_df

Unnamed: 0,Shooting,Passing,Communication
Pagbo,3,4,5
Grazeman,7,5,9
Cantay,2,1,3
Ravane,3,2,1


Turns out, though, that our scout didn’t do their homework on the players’ transfer fees. Grazemen is far too expensive and we need to swap him for another player – Mogez.

For rows, we use ‘.drop’ with the axis argument set to 0:

In [11]:
scout_df = scout_df.drop('Grazeman',axis=0)
scout_df

Unnamed: 0,Shooting,Passing,Communication
Pagbo,3,4,5
Cantay,2,1,3
Ravane,3,2,1


And to add a new one, we refer to our new row with ‘.loc’ (just like we did to refer to an existing one earlier). We then give this new row the list or series of values. Once again, we just use random numbers to fill the set here.

In [13]:
scout_df.loc['Gomez'] = np.random.randint(1,10,3)
scout_df


Unnamed: 0,Shooting,Passing,Communication
Pagbo,3,4,5
Cantay,2,1,3
Ravane,3,2,1
Gomez,5,7,7


## Conditonal Selection


In [15]:
# Let’s see where players’ attributes are above 5:
scout_df > 5

Unnamed: 0,Shooting,Passing,Communication
Pagbo,False,False,False
Cantay,False,False,False
Ravane,False,False,False
Gomez,False,True,True


In [16]:
scout_df[scout_df['Shooting']>5]


Unnamed: 0,Shooting,Passing,Communication
