# Introduction to Pandas

### Outline
* [Getting started](#getting-started)
* [Data structures](#data-structures)
    * Series
    * DataFrame
* [Series overview](#series-overview)
    * Creating 
    * index
    * values
* [DataFrame overview](#dataframe-overview)
    * Creating
    * Adding columns
    * Appending rows to the DataFrame
* [DataFrame characteristics](#dataframe-characteristics)
    * shape
    * describe 
    * head
    * tail
* [DataFrame indexing](#dataframe-indexing)
    * Using iloc to select single row
    * Using iloc to select multiple rows (slicing)
    * Using iloc to select multiple rows with steps (slicing with steps)
    * Setting an index
    * Using loc to select a single row
* [Conditionals](#conditionals)
* [Resources](#resources)

<a id="getting-started"></a>
### Getting Started

By convention, when imported pandas is typically aliased as pd.

In [1]:
import pandas as pd

<a id="data-structures"></a>
### Data Structures

**Series**
Series are designed to accomodate a sequence of one-dimentional data.  

**DataFrame**
Dataframes are designed to contain cases with several dimensions.

<a id="series-overview"></a>
### Series Overview

[Series](https://pandas.pydata.org/pandas-docs/stable/reference/series.html) is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.).

In [2]:
s = pd.Series([1, 2, 3, 4])

In [3]:
s

0    1
1    2
2    3
3    4
dtype: int64

In [4]:
s.index

RangeIndex(start=0, stop=4, step=1)

In [5]:
type(s.index)

pandas.core.indexes.range.RangeIndex

In [6]:
s.values

array([1, 2, 3, 4])

In [7]:
type(s.values)

numpy.ndarray

<a id="dataframe-overview"></a>
### DataFrame Overview

[DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/frame.html) is a 2-dimensional labeled data structure with columns of potentially different types.

**Creating a DataFrame**

In [8]:
dataframe = pd.DataFrame()

In [9]:
dataframe

**Adding columns to a DataFrame**

In [10]:
dataframe['grand_prix'] = [
    'Australia', 
    'Bahrain', 
    'China', 
    'Azerbaijan', 
    'Spain', 
    'Monaco', 
    'Canada', 
    'France', 
    'Austria', 
    'Great Britain', 
    'Germany', 
    'Hungary', 
    'Belgium', 
    'Italy', 
    'Singapore', 
    'Russia', 
    'Japan', 
    'United States', 
    'Mexico', 
    'Brazil',
]

In [11]:
# entries for poosition with a value of zero indicate DNF (Did not finish)
dataframe['position'] = [3, 0, 3, 2, 0, 4, 6, 3, 2, 3, 3, 3, 0, 2, 5, 4, 5, 1, 3, 3]

In [12]:
dataframe['points'] = [15, 0, 15, 18, 0, 12, 8, 15, 18, 15, 15, 15, 0, 18, 10, 12, 10, 25, 15, 15]

In [13]:
dataframe

Unnamed: 0,grand_prix,position,points
0,Australia,3,15
1,Bahrain,0,0
2,China,3,15
3,Azerbaijan,2,18
4,Spain,0,0
5,Monaco,4,12
6,Canada,6,8
7,France,3,15
8,Austria,2,18
9,Great Britain,3,15


**Appending rows to the DataFrame**

In [14]:
abu_dhabi_race = pd.Series(['Abu Dhabi', 0, 0], index=['grand_prix', 'position', 'points'])

In [15]:
dataframe = dataframe.append(abu_dhabi_race, ignore_index=True)

<a id="dataframe-characteristics"></a>
### DataFrame Characteristics


In [16]:
dataframe.shape

(21, 3)

In [17]:
dataframe.describe()

Unnamed: 0,position,points
count,21.0,21.0
mean,2.619048,11.952381
std,1.716863,6.888223
min,0.0,0.0
25%,2.0,10.0
50%,3.0,15.0
75%,3.0,15.0
max,6.0,25.0


In [18]:
dataframe['points'].describe()

count    21.000000
mean     11.952381
std       6.888223
min       0.000000
25%      10.000000
50%      15.000000
75%      15.000000
max      25.000000
Name: points, dtype: float64

In [19]:
# view the first few rows 
dataframe.head()

Unnamed: 0,grand_prix,position,points
0,Australia,3,15
1,Bahrain,0,0
2,China,3,15
3,Azerbaijan,2,18
4,Spain,0,0


In [20]:
dataframe.head(3)

Unnamed: 0,grand_prix,position,points
0,Australia,3,15
1,Bahrain,0,0
2,China,3,15


In [21]:
dataframe.tail()

Unnamed: 0,grand_prix,position,points
16,Japan,5,10
17,United States,1,25
18,Mexico,3,15
19,Brazil,3,15
20,Abu Dhabi,0,0


In [22]:
dataframe.tail(7)

Unnamed: 0,grand_prix,position,points
14,Singapore,5,10
15,Russia,4,12
16,Japan,5,10
17,United States,1,25
18,Mexico,3,15
19,Brazil,3,15
20,Abu Dhabi,0,0


<a id="dataframe-indexing"></a>
### DataFrame Indexing


**Using iloc to select a single row**

In [23]:
# select the first row by index position
dataframe.iloc[0]

grand_prix    Australia
position              3
points               15
Name: 0, dtype: object

In [24]:
# select the last row by inde position
dataframe.iloc[-1]

grand_prix    Abu Dhabi
position              0
points                0
Name: 20, dtype: object

**Using iloc to select multiple rows (slicing)**

In [25]:
# select rows 7-13
dataframe.iloc[7:14]

Unnamed: 0,grand_prix,position,points
7,France,3,15
8,Austria,2,18
9,Great Britain,3,15
10,Germany,3,15
11,Hungary,3,15
12,Belgium,0,0
13,Italy,2,18


**Using iloc to select multiple rows with steps (slicing with steps)**

In [26]:
# select every third race in the DataFrame starting with the 3rd race
dataframe.iloc[3::3]

Unnamed: 0,grand_prix,position,points
3,Azerbaijan,2,18
6,Canada,6,8
9,Great Britain,3,15
12,Belgium,0,0
15,Russia,4,12
18,Mexico,3,15


**Setting an index**

In [27]:
dataframe = dataframe.set_index(dataframe['grand_prix'])

**Using loc to select a single row**

In [28]:
# select the row corresponding to the Brazilian Grand Prix
dataframe.loc['Brazil']

grand_prix    Brazil
position           3
points            15
Name: Brazil, dtype: object

<a id="conditionals"></a>
### Conditionals


<a id="resources"></a>
### Resources

* [User Guide](https://pandas.pydata.org/pandas-docs/stable/user_guide/index.html)