## SERIES

A Series is a one-dimensional labelled array-like object.

In [1]:
import pandas as pd
S = pd.Series([11, 28, 72, 3, 5, 8])
S

0    11
1    28
2    72
3     3
4     5
5     8
dtype: int64

In [2]:
print(S.index)
print(S.values)

RangeIndex(start=0, stop=6, step=1)
[11 28 72  3  5  8]


So far our Series have not been very different to ndarrays of Numpy. This changes, as soon as we start defining Series objects with individual indices:

In [3]:
import pandas as pd
fruits = ['apples', 'oranges', 'cherries', 'pears']
quantities = [20, 33, 52, 10]
S = pd.Series(quantities, index=fruits)
S

apples      20
oranges     33
cherries    52
pears       10
dtype: int64

In [4]:
print(S['apples'])

20


In [5]:
print(S[['apples', 'oranges', 'cherries']])

apples      20
oranges     33
cherries    52
dtype: int64


**CREATING SERIES OBJECTS FROM DICTIONARIES**

In [6]:
cities = {"London":8615246,
          "Berlin":3562166,
          "Madrid": 3165235,
          "Rome":2874038,
          "Paris":  2273305,
          "Vienna":1805681, 
          "Bucharest": 1803425,
          "Hamburg":1760433,
          "Budapest": 1754000,
          "Warsaw":1740119,
          "Barcelona": 1602386,
          "Munich":1493900,
          "Milan":1350680}
city_series = pd.Series(cities)
print(city_series)

London       8615246
Berlin       3562166
Madrid       3165235
Rome         2874038
Paris        2273305
Vienna       1805681
Bucharest    1803425
Hamburg      1760433
Budapest     1754000
Warsaw       1740119
Barcelona    1602386
Munich       1493900
Milan        1350680
dtype: int64


**NAN - MISSING DATA**

Some cities from the
 dictionary will be missing and two cities
 ("Zurich" and "Stuttgart") don't occur in the dictionary

In [7]:
my_cities = ["London", "Paris", "Zurich", "Berlin", "Stuttgart", "Hamburg"]
my_city_series = pd.Series(cities, index=my_cities)
my_city_series

London       8615246.0
Paris        2273305.0
Zurich             NaN
Berlin       3562166.0
Stuttgart          NaN
Hamburg      1760433.0
dtype: float64

**THE METHODS ISNULL() AND NOTNULL()**

In [8]:
my_cities = ["London", "Paris", "Zurich", "Berlin", "Stuttgart", "Hamburg"]
my_city_series = pd.Series(cities, index=my_cities)
print(my_city_series.isnull())

London       False
Paris        False
Zurich        True
Berlin       False
Stuttgart     True
Hamburg      False
dtype: bool


In [9]:
print(my_city_series.notnull())

London        True
Paris         True
Zurich       False
Berlin        True
Stuttgart    False
Hamburg       True
dtype: bool


**FILTERING OUT MISSING DATA**

It's possible to filter out missing data with the Series method **dropna**. It returns a Series which consists only of
 non-null data:

In [10]:
print(my_city_series.dropna())

London     8615246.0
Paris      2273305.0
Berlin     3562166.0
Hamburg    1760433.0
dtype: float64


**FILLING IN MISSING DATA**

In many cases you don't want to filter out missing data, but you want to fill in appropriate data for the empty
 gaps. A suitable method in many situations will be **fillna**:

In [11]:
print(my_city_series.fillna(0))

London       8615246.0
Paris        2273305.0
Zurich             0.0
Berlin       3562166.0
Stuttgart          0.0
Hamburg      1760433.0
dtype: float64


Okay, that's not what we call "fill in appropriate data for the empty gaps". If we call fillna with a dict, we can
 provide the appropriate data, i.e. the population of Zurich and Stuttgart

In [12]:
missing_cities = {"Stuttgart":597939, "Zurich":378884}
my_city_series.fillna(missing_cities)

London       8615246.0
Paris        2273305.0
Zurich        378884.0
Berlin       3562166.0
Stuttgart     597939.0
Hamburg      1760433.0
dtype: float64

We still have the problem that integer values - which means values which should be integers like number of
 people - are converted to float as soon as we have NaN values. We can solve this problem now with the
 method 'fillna':

In [13]:
cities = {"London":8615246,
          "Berlin":3562166,
          "Madrid": 3165235,
          "Rome":2874038,
          "Paris":  2273305,
          "Vienna":1805681, 
          "Bucharest": 1803425,
          "Hamburg":1760433,
          "Budapest": 1754000,
          "Warsaw":1740119,
          "Barcelona": 1602386,
          "Munich":1493900,
          "Milan":1350680}

my_cities = ["London", "Paris", "Zurich", "Berlin", "Stuttgart", "Hamburg"]
my_city_series = pd.Series(cities, index=my_cities)
my_city_series = my_city_series.fillna(0).astype(int)
print(my_city_series)

London       8615246
Paris        2273305
Zurich             0
Berlin       3562166
Stuttgart          0
Hamburg      1760433
dtype: int32


## DATAFRAME

A DataFrame can be seen as a
 concatenation of Series, each Series having the same
 index, i.e. the index of the DataFrame.