# Pandas Introduction

Pandas is one of the most widely used python libraries in data science. 
It provides high-performance, easy to use structures and data analysis tools. 
It also provides an in-memory 2d table object called Dataframe. 
This can be compared to a spreadsheet with column names and row labels.
Hence, with 2d tables, pandas is capable of providing many additional functionalities 
like creating pivot tables, computing columns based on other columns and plotting graphs.

In [10]:
import pandas as pd

### Pandas Series

A Pandas Series is a one-dimensional array of indexed data. 
It can be created from a list or array and used as follows.

In [11]:
data = pd.Series([0.25,0.5,0.75,0.1])
data

0    0.25
1    0.50
2    0.75
3    0.10
dtype: float64

In [12]:
data.values

array([0.25, 0.5 , 0.75, 0.1 ])

In [13]:
# The index is an array-like object of type pd.Index
data.index

RangeIndex(start=0, stop=4, step=1)

Data can be accessed by the associated index via the Python square-bracket notation:

In [14]:
print(data[1])
print(data[1:3])

0.5
1    0.50
2    0.75
dtype: float64


The Pandas Series has an explicitly defined index associated with
the values.This explicit index definition gives the Series object additional capabilities. 
For example, the index need not to be an integer, but can be any value of any desired type.
For example, if we wish, we can use strings as an index:

In [15]:
data = pd.Series([0.25, 0.5, 0.75, 1.0],index=['a', 'b', 'c', 'd'])
print(data)
print(data['b'])

a    0.25
b    0.50
c    0.75
d    1.00
dtype: float64
0.5


In this way, you can think of a Pandas Series as a Python dictionary
A dictionary is a structure that maps arbitrary keys to a set of arbitrary
values, and a Series is a structure that maps typed keys to a set of typed values. This
typing is important: the type information of a Pandas Series makes 
it much more efficient than Python dictionaries for certain operations.

You can also construct a Series object from a Python dictionary: 

In [16]:
city_population = {"Antwerpen":525935, "Gent":262219, "Brugge":118325, "Mechelen":86616, "Aalst":86445, "Kortrijk":76735, "Oostende":71494, "Genk":66227}
population = pd.Series(city_population)
print(population)
print(population['Brugge'])

Antwerpen    525935
Gent         262219
Brugge       118325
Mechelen      86616
Aalst         86445
Kortrijk      76735
Oostende      71494
Genk          66227
dtype: int64
118325


Unlike a dictionary, though, the Series also supports array-style operations such as slicing:

In [17]:
population['Gent':'Oostende']

Gent        262219
Brugge      118325
Mechelen     86616
Aalst        86445
Kortrijk     76735
Oostende     71494
dtype: int64

### Pandas Dataframe

- a Series is the analog of a one-dimensional array with flexible indices
- a DataFrame is the analog of a two-dimensional array with both flexible row indices and flexible column names.  
  
Let's first create a new Series

In [18]:
city_area = {"Antwerpen":204.51, "Gent":156.18, "Brugge":138.40, "Mechelen":33.71, "Aalst":78.12, "Kortrijk":80.02, "Oostende":37.72, "Genk":87.85}
print(city_area)
area = pd.Series(city_area)
print(area)

{'Antwerpen': 204.51, 'Gent': 156.18, 'Brugge': 138.4, 'Mechelen': 33.71, 'Aalst': 78.12, 'Kortrijk': 80.02, 'Oostende': 37.72, 'Genk': 87.85}
Antwerpen    204.51
Gent         156.18
Brugge       138.40
Mechelen      33.71
Aalst         78.12
Kortrijk      80.02
Oostende      37.72
Genk          87.85
dtype: float64


Now along with the population Series from before, we can use a
dictionary to construct a single two-dimensional object containing this information:

In [19]:
cities = pd.DataFrame({'population': population,'area': area})
print(cities)

           population    area
Antwerpen      525935  204.51
Gent           262219  156.18
Brugge         118325  138.40
Mechelen        86616   33.71
Aalst           86445   78.12
Kortrijk        76735   80.02
Oostende        71494   37.72
Genk            66227   87.85


Like the Series object, the DataFrame has an index attribute that gives access to the index labels:

In [20]:
print(cities.index)

Index(['Antwerpen', 'Gent', 'Brugge', 'Mechelen', 'Aalst', 'Kortrijk',
       'Oostende', 'Genk'],
      dtype='object')


Additionally, the DataFrame has a columns attribute, which is an Index object holding the column labels:

In [21]:
cities.columns

Index(['population', 'area'], dtype='object')

We can also think of a DataFrame as a specialization of a dictionary. Where
a dictionary maps a key to a value, a DataFrame maps a column name to a Series of
column data. For example, asking for the 'area' attribute returns the Series object
containing the areas we saw earlier:

In [22]:
print(cities['area'])

Antwerpen    204.51
Gent         156.18
Brugge       138.40
Mechelen      33.71
Aalst         78.12
Kortrijk      80.02
Oostende      37.72
Genk          87.85
Name: area, dtype: float64
