In [1]:
import pandas as pd
import numpy as np

### pandas has two type of data structure; Series and dataframes

In [3]:
# Population of G7 contries in millions
# Unlike python list, pandas series has a datatype
g7_pop = pd.Series([35.467, 63.951, 80.940, 60.665, 127.061, 64.511, 318.523])
g7_pop

0     35.467
1     63.951
2     80.940
3     60.665
4    127.061
5     64.511
6    318.523
dtype: float64

In [5]:
g7_pop.dtype

dtype('float64')

In [6]:
g7_pop.values

array([ 35.467,  63.951,  80.94 ,  60.665, 127.061,  64.511, 318.523])

pandas series look like simple python list or numpy arrays, but they are actually more similar to python dict. A series has an `index`, that's similar to automatic index assignment to python's list

In [7]:
type(g7_pop.values)

numpy.ndarray

But in contrast to the list, we can explicitly define the index

In [8]:
g7_pop.index = [
    'Canada',
    'Franc', 
    'Germany',
    'Italy',
    'Japan',
    'United Kingdom',
    'United States',
]

In [9]:
g7_pop

Canada             35.467
Franc              63.951
Germany            80.940
Italy              60.665
Japan             127.061
United Kingdom     64.511
United States     318.523
dtype: float64

We can say that series look like "ordered dictionaries". we can actually create series out of dictionaries

In [11]:
test = pd.Series({
    'Canada': 35,
    'France': 63,
    'Japan': 88,
    'Italy': 68
}, name='G7 population in million')

In [12]:
test

Canada    35
France    63
Japan     88
Italy     68
Name: G7 population in million, dtype: int64

### Indexing

In [13]:
g7_pop

Canada             35.467
Franc              63.951
Germany            80.940
Italy              60.665
Japan             127.061
United Kingdom     64.511
United States     318.523
dtype: float64

In [14]:
g7_pop['Canada']

35.467

Numeric position can also be used with the `iloc` attributes

In [15]:
g7_pop.iloc[0]

35.467

In [16]:
g7_pop.iloc[-1]

318.523

In [18]:
g7_pop[['Canada', 'Japan']]

Canada     35.467
Japan     127.061
dtype: float64

In [19]:
g7_pop.iloc[[0, 1]]

Canada    35.467
Franc     63.951
dtype: float64

Slicing also works, but **important**, in pandas, the upper limit is also included

In [21]:
g7_pop['Canada': 'Italy']

Canada     35.467
Franc      63.951
Germany    80.940
Italy      60.665
dtype: float64

### Conditional selection (boolean arrays)
the same boolean array techniques that applied to numpy, can be used in Pandas series as well

In [22]:
g7_pop > 70

Canada            False
Franc             False
Germany            True
Italy             False
Japan              True
United Kingdom    False
United States      True
dtype: bool

In [23]:
g7_pop[g7_pop > 70]

Germany           80.940
Japan            127.061
United States    318.523
dtype: float64

In [25]:
g7_pop[g7_pop > g7_pop.mean()]

Japan            127.061
United States    318.523
dtype: float64

In [26]:
g7_pop.std()

97.24996987121581

In [28]:
g7_pop[(g7_pop > g7_pop.mean() - g7_pop.std() /2) | (g7_pop > g7_pop.mean() + g7_pop.std() /2)]

Franc              63.951
Germany            80.940
Italy              60.665
Japan             127.061
United Kingdom     64.511
United States     318.523
dtype: float64

### Modifying Series

In [29]:
g7_pop['Canada'] = 40.5

In [30]:
g7_pop

Canada             40.500
Franc              63.951
Germany            80.940
Italy              60.665
Japan             127.061
United Kingdom     64.511
United States     318.523
dtype: float64

In [31]:
g7_pop.iloc[0] = 40.6

In [32]:
g7_pop

Canada             40.600
Franc              63.951
Germany            80.940
Italy              60.665
Japan             127.061
United Kingdom     64.511
United States     318.523
dtype: float64

In [33]:
g7_pop[g7_pop > 70] = 99.99

In [34]:
g7_pop

Canada            40.600
Franc             63.951
Germany           99.990
Italy             60.665
Japan             99.990
United Kingdom    64.511
United States     99.990
dtype: float64