# <font color='#eb3483'> Introduction to Pandas </font>

Pandas is Numpy's extension for Data Analysis. Among many other things, it provides a really useful data structure called a `DataFrame`. A `pandas.DataFrame` is basically a table similar to an Excel spreadsheet with rows and columns. If you have experience with the R programming language, the `pandas.DataFrame` is very similar to an R `data.frame`.

http://pandas.pydata.org/

The standard way of importing pandas is:

In [None]:
import pandas as pd

In this notebook we will cover:
<font color='#eb3483'>
1. Dataframes
1. Reading and writing a data frame
1. Inspecting a data frame
1. Indexing

    </font>


##  <font color='#eb3483'> 1. Building DataFrames </font>

There are many ways to create a dataframe

In [None]:
#We can feed in a 2D list and specify column names - and make the data frame with the function DataFrame()
rick_morty = pd.DataFrame(
    [
        ["Rick", "Sanchez", 60],
        ["Morty", "Smith", 14]
    ], columns = ["first_name", "last_name", "age"]
)
rick_morty

In [None]:
type(rick_morty)

In [None]:
#We can take a peak of our dataframe using the built-in head method .head() and also .(tail)


In [None]:
#we can also ask to return a specific row by refering to the row index:

We can create an empty dataframe

In [None]:
df3 = pd.DataFrame()

In [None]:
#It's as you would expect ... empty


Now lets add columns to the empty dataframe

In [None]:
df3['Colours'] = ["Red", "Yellow", "Blue"]

In [None]:
#assign it to column 'number' in df3

We can see the column names with `.columns`

We can see the values of a column

We can also sort our dataframe by our columns

How do we get help ... lets try a few things

In [None]:
#Let's sort by name


In [None]:
#Try and sort by number value

#How can we change this to descending?

Selecting a column that does not exists will raise a `KeyError` (same error as when selecting a missing key in a dictionary)

In [None]:
df3["Black"]

## <font color='#eb3483'> 2. Reading/Writing data with dataframes </font>

It's not very often we have to create our own data frame but now we know just incase.   
Pandas can import from and export to many types of files, csv, json, excel among others.

For example, we can read a csv including information about the Avengers (taken from [here](https://github.com/fivethirtyeight/data/tree/master/avengers))

In [None]:
# read in data

In [None]:
# look at the head of our data set

We can save the dataframe back to a csv file with `to_csv` (this method writes the index by default as a separate column, we can avoid this by passing the argument `index=False`).

In [None]:
avengers.to_csv("avengers2.csv")

or we can export to excel using `to_excel` (it requires a separate package, `xlwt`)

Likewise we can read from a excel file easily (this requires the package `xlrd`)

Bothered by that extra 'Unnamed:0' column? It's the index column, which was created when you read in the data the first time. To avoid saving this column, use `index=False` when saving: `avengers.to_csv("avengers2.csv", index=False)`.

## <font color='#eb3483'> 3. Inspecting a dataframe </font>

Once we read in a data frame - we generally have a quick look around (just as you would do in excel).




We can see the first rows of a dataframe with `head()`

and the last ones with tail()

We can see the size of a dataframe (n_rows, n_columns) with `shape`

We can see the data type of each column with `dtypes`

We can look at the column names using `.columns`

We can use `.describe()` to find statistical information about the dataframe's columns.

<hr>

## <font color='#eb3483'> 3. Indexes </font>

Dataframes have an index that allows us to perform complex data manipulations. By default, the index is the row number.

We can change the index to one of the columns using `set_index()`

In [None]:
# lets change it the index to gender


In [None]:
#reset the index

In [None]:
# make appearances the index


We can also sort our dataframe by the index using the `sort_index` command

In [None]:
#to make it decending we just add ascending=False argument
