# <font color='#eb3483'> Introduction to Pandas </font>

Pandas is numpy's extension for Data Analysis. Generally speaking, pandas provides the data structure `DataFrame`. A `pandas.Dataframe` is basically a table similar to a excel spreadsheet, that has columns and rows.   

http://pandas.pydata.org/

The standard way of importing pandas is:

In [4]:
import pandas as pd

In this notebook we will cover:
<font color='#eb3483'>
1. Dataframes
1. Reading and writing a data frame
1. Inspecting a data frame
1. Indexing 

    </font>


##  <font color='#eb3483'> 1. Building DataFrames </font>

There are many ways to create a dataframe

In [5]:
#We can feed in a 2D list and specify column names - and make the data frame with the function DataFrame()
rick_morty = pd.DataFrame(
    [
        ["Rick", "Sanchez", 60],
        ["Morty", "Smith", 14]
    ], columns = ["first_name", "last_name", "age"]
)
rick_morty

Unnamed: 0,first_name,last_name,age
0,Rick,Sanchez,60
1,Morty,Smith,14


In [6]:
type(rick_morty)

pandas.core.frame.DataFrame

In [7]:
#We can take a peak of our dataframe using the built-in head method .head()

#^^^

We can create an empty dataframe

In [8]:
df3 = pd.DataFrame()

In [9]:
#It's as you would expect ... empty
df3

Now lets add columns to the empty dataframe

In [10]:
df3['teaching_team'] = ["Martin", "Connor", "Daniyar"]

In [11]:
df3

Unnamed: 0,teaching_team
0,Martin
1,Connor
2,Daniyar


In [12]:
#guess our age (guess carefully...)
#assign it to column 'age' in df3


In [13]:
df3['age']=[25, 26, 28]
df3

Unnamed: 0,teaching_team,age
0,Martin,25
1,Connor,26
2,Daniyar,28


We can see the column names with `.columns`

In [14]:
df3.columns

Index(['teaching_team', 'age'], dtype='object')

We can see the values of a column

In [15]:
df3['teaching_team']

0     Martin
1     Connor
2    Daniyar
Name: teaching_team, dtype: object

We can also sort our dataframe by our columns

How do we get help ... lets try a few things

In [16]:
#Let's get some help
df3.sort_values?

In [17]:
#Let's sort by name
df3.sort_values(by="teaching_team")

Unnamed: 0,teaching_team,age
1,Connor,26
2,Daniyar,28
0,Martin,25


In [18]:
#Try and sort by age 
df3.sort_values(by="age", ascending = False)


#How can we change this to descending?

Unnamed: 0,teaching_team,age
2,Daniyar,28
1,Connor,26
0,Martin,25


Selecting a column that does not exists will raise a `KeyError` (same error as when selecting a missing key in a dictionary)

In [19]:
df3["address"]

KeyError: 'address'

## <font color='#eb3483'> 2. Reading/Writing data with dataframes </font>

It's not very often we have to create our own data frame but now we know just incase.   
Pandas can import from and export to many types of files, csv, json, excel among others.

For example, we can read a csv including information about the Avengers (taken from [here](https://github.com/fivethirtyeight/data/tree/master/avengers))

In [20]:
avengers = pd.read_csv("data/avengers.csv")

In [34]:
# look at the head of our data set

avengers.head()

Unnamed: 0,URL,name,appearances,current,gender,starting_date,notes
0,http://marvel.wikia.com/Henry_Pym_(Earth-616),"Henry Jonathan ""Hank"" Pym",1269,YES,MALE,1963,Merged with Ultron in Rage of Ultron Vol. 1. A...
1,http://marvel.wikia.com/Janet_van_Dyne_(Earth-...,Janet van Dyne,1165,YES,FEMALE,1963,Dies in Secret Invasion V1:I8. Actually was se...
2,http://marvel.wikia.com/Anthony_Stark_(Earth-616),"Anthony Edward ""Tony"" Stark",3068,YES,MALE,1963,"Death: ""Later while under the influence of Imm..."
3,http://marvel.wikia.com/Robert_Bruce_Banner_(E...,Robert Bruce Banner,2089,YES,MALE,1963,"Dies in Ghosts of the Future arc. However ""he ..."
4,http://marvel.wikia.com/Thor_Odinson_(Earth-616),Thor Odinson,2402,YES,MALE,1963,Dies in Fear Itself brought back because that'...


We can save the dataframe back to a csv file with `to_csv` (this method writes the index by default as a separate column, we can avoid this by passing the argument `index=False`).

In [35]:
avengers.to_csv("avengers2.csv")

or we can export to excel using `to_excel` (it requires a separate package, `xlwt`)

In [39]:
avengers.to_excel("avengers.xls", index=False)

  avengers.to_excel("avengers.xls", index=False)


Likewise we can read from a excel file easily (this requires the package `xlrd`)

In [40]:
avengers_reloaded = pd.read_excel("avengers.xls")

In [41]:
avengers_reloaded.head()

Unnamed: 0,URL,name,appearances,current,gender,starting_date,notes
0,http://marvel.wikia.com/Henry_Pym_(Earth-616),"Henry Jonathan ""Hank"" Pym",1269,YES,MALE,1963,Merged with Ultron in Rage of Ultron Vol. 1. A...
1,http://marvel.wikia.com/Janet_van_Dyne_(Earth-...,Janet van Dyne,1165,YES,FEMALE,1963,Dies in Secret Invasion V1:I8. Actually was se...
2,http://marvel.wikia.com/Anthony_Stark_(Earth-616),"Anthony Edward ""Tony"" Stark",3068,YES,MALE,1963,"Death: ""Later while under the influence of Imm..."
3,http://marvel.wikia.com/Robert_Bruce_Banner_(E...,Robert Bruce Banner,2089,YES,MALE,1963,"Dies in Ghosts of the Future arc. However ""he ..."
4,http://marvel.wikia.com/Thor_Odinson_(Earth-616),Thor Odinson,2402,YES,MALE,1963,Dies in Fear Itself brought back because that'...


Bothered by that extra 'Unnamed:0' column? It's the index column, which was created when you read in the data the first time. To avoid saving this column, use `index=False` when saving: `avengers.to_csv("avengers2.csv", index=False)`.

## <font color='#eb3483'> 3. Inspecting a dataframe </font>

Once we read in a data frame - we generally have a quick look around (just as you would do in excel).
 



We can see the first rows of a dataframe with `head()`

In [42]:
avengers.head()

Unnamed: 0,URL,name,appearances,current,gender,starting_date,notes
0,http://marvel.wikia.com/Henry_Pym_(Earth-616),"Henry Jonathan ""Hank"" Pym",1269,YES,MALE,1963,Merged with Ultron in Rage of Ultron Vol. 1. A...
1,http://marvel.wikia.com/Janet_van_Dyne_(Earth-...,Janet van Dyne,1165,YES,FEMALE,1963,Dies in Secret Invasion V1:I8. Actually was se...
2,http://marvel.wikia.com/Anthony_Stark_(Earth-616),"Anthony Edward ""Tony"" Stark",3068,YES,MALE,1963,"Death: ""Later while under the influence of Imm..."
3,http://marvel.wikia.com/Robert_Bruce_Banner_(E...,Robert Bruce Banner,2089,YES,MALE,1963,"Dies in Ghosts of the Future arc. However ""he ..."
4,http://marvel.wikia.com/Thor_Odinson_(Earth-616),Thor Odinson,2402,YES,MALE,1963,Dies in Fear Itself brought back because that'...


and the last ones with tail()

In [43]:
avengers.tail()

Unnamed: 0,URL,name,appearances,current,gender,starting_date,notes
168,http://marvel.wikia.com/Eric_Brooks_(Earth-616)#,Eric Brooks,198,YES,MALE,2013,
169,http://marvel.wikia.com/Adam_Brashear_(Earth-6...,Adam Brashear,29,YES,MALE,2014,
170,http://marvel.wikia.com/Victor_Alvarez_(Earth-...,Victor Alvarez,45,YES,MALE,2014,
171,http://marvel.wikia.com/Ava_Ayala_(Earth-616)#,Ava Ayala,49,YES,FEMALE,2014,
172,http://marvel.wikia.com/Kaluu_(Earth-616)#,Kaluu,35,YES,MALE,2015,


We can see the size of a dataframe (n_rows, n_columns) with `shape`

In [44]:
avengers.shape

(173, 7)

We can see the data type of each column with `dtypes`

In [45]:
avengers.dtypes

URL              object
name             object
appearances       int64
current          object
gender           object
starting_date     int64
notes            object
dtype: object

We can look at the column names using `.columns`

In [46]:
avengers.columns

Index(['URL', 'name', 'appearances', 'current', 'gender', 'starting_date',
       'notes'],
      dtype='object')

We can use `.describe()` to find statistical information about the dataframe's columns.

In [47]:
avengers.describe()

Unnamed: 0,appearances,starting_date
count,173.0,173.0
mean,414.052023,1988.445087
std,677.99195,30.374669
min,2.0,1900.0
25%,58.0,1979.0
50%,132.0,1996.0
75%,491.0,2010.0
max,4333.0,2015.0


<hr>

## <font color='#eb3483'> 4. Indexes </font>

Dataframes have an index that allows us to perform complex data manipulations. By default, the index is the row number.

In [48]:
avengers.index

RangeIndex(start=0, stop=173, step=1)

We can change the index to one of the columns using `set_index()`

In [49]:
avengers.head()

Unnamed: 0,URL,name,appearances,current,gender,starting_date,notes
0,http://marvel.wikia.com/Henry_Pym_(Earth-616),"Henry Jonathan ""Hank"" Pym",1269,YES,MALE,1963,Merged with Ultron in Rage of Ultron Vol. 1. A...
1,http://marvel.wikia.com/Janet_van_Dyne_(Earth-...,Janet van Dyne,1165,YES,FEMALE,1963,Dies in Secret Invasion V1:I8. Actually was se...
2,http://marvel.wikia.com/Anthony_Stark_(Earth-616),"Anthony Edward ""Tony"" Stark",3068,YES,MALE,1963,"Death: ""Later while under the influence of Imm..."
3,http://marvel.wikia.com/Robert_Bruce_Banner_(E...,Robert Bruce Banner,2089,YES,MALE,1963,"Dies in Ghosts of the Future arc. However ""he ..."
4,http://marvel.wikia.com/Thor_Odinson_(Earth-616),Thor Odinson,2402,YES,MALE,1963,Dies in Fear Itself brought back because that'...


In [50]:
# lets change it to gender
avengers = avengers.set_index('gender')

In [51]:
avengers.head()

Unnamed: 0_level_0,URL,name,appearances,current,starting_date,notes
gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
MALE,http://marvel.wikia.com/Henry_Pym_(Earth-616),"Henry Jonathan ""Hank"" Pym",1269,YES,1963,Merged with Ultron in Rage of Ultron Vol. 1. A...
FEMALE,http://marvel.wikia.com/Janet_van_Dyne_(Earth-...,Janet van Dyne,1165,YES,1963,Dies in Secret Invasion V1:I8. Actually was se...
MALE,http://marvel.wikia.com/Anthony_Stark_(Earth-616),"Anthony Edward ""Tony"" Stark",3068,YES,1963,"Death: ""Later while under the influence of Imm..."
MALE,http://marvel.wikia.com/Robert_Bruce_Banner_(E...,Robert Bruce Banner,2089,YES,1963,"Dies in Ghosts of the Future arc. However ""he ..."
MALE,http://marvel.wikia.com/Thor_Odinson_(Earth-616),Thor Odinson,2402,YES,1963,Dies in Fear Itself brought back because that'...


In [52]:
avengers = avengers.reset_index()


In [53]:
avengers.head()
#Notice our index is row numbers again!

Unnamed: 0,gender,URL,name,appearances,current,starting_date,notes
0,MALE,http://marvel.wikia.com/Henry_Pym_(Earth-616),"Henry Jonathan ""Hank"" Pym",1269,YES,1963,Merged with Ultron in Rage of Ultron Vol. 1. A...
1,FEMALE,http://marvel.wikia.com/Janet_van_Dyne_(Earth-...,Janet van Dyne,1165,YES,1963,Dies in Secret Invasion V1:I8. Actually was se...
2,MALE,http://marvel.wikia.com/Anthony_Stark_(Earth-616),"Anthony Edward ""Tony"" Stark",3068,YES,1963,"Death: ""Later while under the influence of Imm..."
3,MALE,http://marvel.wikia.com/Robert_Bruce_Banner_(E...,Robert Bruce Banner,2089,YES,1963,"Dies in Ghosts of the Future arc. However ""he ..."
4,MALE,http://marvel.wikia.com/Thor_Odinson_(Earth-616),Thor Odinson,2402,YES,1963,Dies in Fear Itself brought back because that'...


In [54]:
# make appearances the index
avengers = avengers.set_index('appearances')
avengers.head()

Unnamed: 0_level_0,gender,URL,name,current,starting_date,notes
appearances,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1269,MALE,http://marvel.wikia.com/Henry_Pym_(Earth-616),"Henry Jonathan ""Hank"" Pym",YES,1963,Merged with Ultron in Rage of Ultron Vol. 1. A...
1165,FEMALE,http://marvel.wikia.com/Janet_van_Dyne_(Earth-...,Janet van Dyne,YES,1963,Dies in Secret Invasion V1:I8. Actually was se...
3068,MALE,http://marvel.wikia.com/Anthony_Stark_(Earth-616),"Anthony Edward ""Tony"" Stark",YES,1963,"Death: ""Later while under the influence of Imm..."
2089,MALE,http://marvel.wikia.com/Robert_Bruce_Banner_(E...,Robert Bruce Banner,YES,1963,"Dies in Ghosts of the Future arc. However ""he ..."
2402,MALE,http://marvel.wikia.com/Thor_Odinson_(Earth-616),Thor Odinson,YES,1963,Dies in Fear Itself brought back because that'...


We can also sort our dataframe by the index using the `sort_index` command

In [57]:
avengers = avengers.sort_index()
avengers.head()


Unnamed: 0_level_0,gender,URL,name,current,starting_date,notes
appearances,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2,FEMALE,http://marvel.wikia.com/Fiona_(Inhuman)_(Earth...,Fiona,YES,1900,
2,FEMALE,http://marvel.wikia.com/Moira_Brandon_(Earth-6...,Moira Brandon,NO,1993,Died in her second appearance earns honorary A...
3,MALE,http://marvel.wikia.com/Doug_Taggert_(Earth-616)#,Doug Taggert,NO,2005,Accidently killed by Zaran
4,MALE,http://marvel.wikia.com/Gene_Lorrene_(Earth-616)#,Gene Lorrene,NO,2005,
6,MALE,http://marvel.wikia.com/Dennis_Sykes_(Earth-616)#,Dennis Sykes,NO,2010,Died in Heroic_Age:_One_Month_to_Live_Vol_1_5 ...


In [58]:
#to make it decending we just add ascending=False argument

avengers = avengers.sort_index(ascending=False)
avengers.head()

Unnamed: 0_level_0,gender,URL,name,current,starting_date,notes
appearances,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
4333,MALE,http://marvel.wikia.com/Peter_Parker_(Earth-616)#,Peter Benjamin Parker,YES,1990,Since joining the New Avengers: First death Ki...
3458,MALE,http://marvel.wikia.com/Steven_Rogers_(Earth-616),Steven Rogers,YES,1964,Dies at the end of Civil War. Later comes back.
3130,MALE,http://marvel.wikia.com/James_Howlett_(Earth-6...,"James ""Logan"" Howlett",YES,2005,Died in Death_of_Wolverine_Vol_1_4. Has not ye...
3068,MALE,http://marvel.wikia.com/Anthony_Stark_(Earth-616),"Anthony Edward ""Tony"" Stark",YES,1963,"Death: ""Later while under the influence of Imm..."
2402,MALE,http://marvel.wikia.com/Thor_Odinson_(Earth-616),Thor Odinson,YES,1963,Dies in Fear Itself brought back because that'...
