# Import Pandas

The standard way of importing pandas is like:

In [15]:
import pandas as pd

In [16]:
pd.__version__

'0.23.1'

http://pandas.pydata.org/

Pandas is numpy's extension for Data Analysis. Generally speaking, pandas provides the data structure `DataFrame`. A `pandas.Dataframe` is basically a table similar to a excel spreadsheet, that has columns and rows. In pandas each column is a `pandas.Series`, which is basically a numpy array with some additional functionality.

# Create a pd.Series

The columns  of a pandas dataframe are Series (pandas.Series)

In [17]:
series1 = pd.Series([1,2,3])

In [18]:
series1

0    1
1    2
2    3
dtype: int64

In [19]:
names = pd.Series(['Manuel', 'Michael', 'Hugo'])

In [20]:
names

0     Manuel
1    Michael
2       Hugo
dtype: object

In [21]:
type(names)

pandas.core.series.Series

# Create a DataFrame

There are many ways to create a dataframe

In [22]:
rick_morty = pd.DataFrame(
    [
        ["Rick", "Sanchez", 60],
        ["Morty", "Smith", 14]
    ], columns = ["first_name", "last_name", "age"]
)
rick_morty

Unnamed: 0,first_name,last_name,age
0,Rick,Sanchez,60
1,Morty,Smith,14


In [23]:
type(rick_morty)

pandas.core.frame.DataFrame

In [24]:
age = pd.Series([33, 25, 14])

In [25]:
df2 = pd.DataFrame([names, age])

In [26]:
df2.head()

Unnamed: 0,0,1,2
0,Manuel,Michael,Hugo
1,33,25,14


We can create an empty dataframe

In [27]:
df3 = pd.DataFrame()

In [28]:
df3

# Adding columns to a DataFrame

We can assign columns to a dataframe the same way we would assign columns to a dictionary

In [29]:
df3['name'] = names

In [30]:
df3

Unnamed: 0,name
0,Manuel
1,Michael
2,Hugo


In [31]:
df3['age'] = age

In [32]:
df3

Unnamed: 0,name,age
0,Manuel,33
1,Michael,25
2,Hugo,14


We can see a dataframe's columns with `.columns`

In [33]:
df3.columns

Index(['name', 'age'], dtype='object')

We can see the values of a column the same way we would do with a dict

In [34]:
df3['name']

0     Manuel
1    Michael
2       Hugo
Name: name, dtype: object

Selecting a column that does not exists will raise a `KeyError` (same error as when selecting a missing key in a dictionary)

In [35]:
df3["address"]

KeyError: 'address'

# Editting the index

Dataframes have an index that allows us to perform complex data manipulations. The index is the row number by default.

In [36]:
df3.index

RangeIndex(start=0, stop=3, step=1)

In [37]:
df3 = df3.set_index('name')

In [38]:
df3

Unnamed: 0_level_0,age
name,Unnamed: 1_level_1
Manuel,33
Michael,25
Hugo,14


In [39]:
df3.index

Index(['Manuel', 'Michael', 'Hugo'], dtype='object', name='name')

# Reset the index

In [40]:
df3 = df3.reset_index()

In [41]:
df3

Unnamed: 0,name,age
0,Manuel,33
1,Michael,25
2,Hugo,14


In [42]:
df3 = df3.set_index('age')

In [43]:
df3

Unnamed: 0_level_0,name
age,Unnamed: 1_level_1
33,Manuel
25,Michael
14,Hugo


# Sorting the index

In [44]:
df3 = df3.sort_index()

In [45]:
df3

Unnamed: 0_level_0,name
age,Unnamed: 1_level_1
14,Hugo
25,Michael
33,Manuel


In [46]:
df3 = df3.sort_index(ascending=False)

In [47]:
df3

Unnamed: 0_level_0,name
age,Unnamed: 1_level_1
33,Manuel
25,Michael
14,Hugo


In [48]:
df3 = df3.reset_index()

In [49]:
df3

Unnamed: 0,age,name
0,33,Manuel
1,25,Michael
2,14,Hugo


# Sorting by a column

In [50]:
df3.sort_values(by="name")

Unnamed: 0,age,name
2,14,Hugo
0,33,Manuel
1,25,Michael


In [51]:
df3.sort_values(by="age", ascending=False)

Unnamed: 0,age,name
0,33,Manuel
1,25,Michael
2,14,Hugo


# Reading/Writing data with dataframes

pandas can import from/export to many types of files, csv, json, excel among others.

For example, we can read a csv including information about the Avengers (taken from [here](https://github.com/fivethirtyeight/data/tree/master/avengers))

In [52]:
avengers = pd.read_csv("data/avengers.csv")

FileNotFoundError: File b'data/avengers.csv' does not exist

In [53]:
avengers.head()

NameError: name 'avengers' is not defined

We can save the dataframe back to a csv file with `to_csv` (this method writes the index by default as a separate column, we can avoid this by passing the argument `index=False`).

In [None]:
avengers.to_csv("avengers2.csv", index=False)

or we can export to excel using `to_excel` (it requires a separate package, `xlwt`)

In [None]:
avengers.to_excel("avengers.xls")

Likewise we can read from a excel file easily (this requires the package `xlrd`)

In [None]:
avengers_reloaded = pd.read_excel("avengers.xls")

In [None]:
avengers_reloaded.head()

# Inspecting a dataframe

We can see the first rows of a dataframe with `head()`

In [None]:
avengers.head()

and the last ones with tail()

In [None]:
avengers.tail()

We can see the size of a dataframe (n_rows, n_columns) with `shape`

In [None]:
avengers.shape

We can see the data type of each column with `dtypes`

In [None]:
avengers.dtypes

We can use `describe` to find statistical information about the dataframe's columns.

In [None]:
avengers.describe()