# `pandas` Series

## Setup

In [1]:
import pandas as pd

## Creation

Creation of an example Series (starting from a dictionary of dictionaries):

In [2]:
data = {
    "Spain": "Madrid",
    "Belgium": "Brussels",
    "France": "Paris",
    "Italy": "Roma",
    "Germany": "Berlin",
    "Portugal": "Lisbon",
    "Norway": "Oslo",
    "Greece": "Athens",
}

In [3]:
# For now, let's forget about these steps:
s = pd.Series(data).astype("string")

Apple stock data, taken from the [`matplotlib` sample datasets](https://github.com/matplotlib/sample_data/blob/master/aapl.csv)

In [4]:
# For now, let's forget about these steps:
apple = pd.read_csv("AAPL.csv")
apple["Date"] = apple["Date"].astype("datetime64[ns]")
apple = apple.set_index("Date")
apple = apple.sort_index()
apple = apple["Volume"]

## Demo 1: Anatomy of a `pandas` Series

Check the DataFrame:

In [5]:
s

Spain         Madrid
Belgium     Brussels
France         Paris
Italy           Roma
Germany       Berlin
Portugal      Lisbon
Norway          Oslo
Greece        Athens
dtype: string

Check the type of the DataFrame:

In [6]:
type(s)

pandas.core.series.Series

A Series has **row labels** (shown in bold; also called the **index**).

Ideally, **a Series contains elements of the same data type** (e.g. strings, integers, floats, booleans, or dates).

Some data may be missing and are indicated by `<NA>`, or a variation thereof.

In a Series, **rows are ordered**.

## Exercise 1

Check the Series:

In [7]:
apple

Date
1984-09-07     2981600
1984-09-10     2346400
1984-09-11     5444000
1984-09-12     4773600
1984-09-13     7429600
                ...   
2008-10-08    78847900
2008-10-09    57763700
2008-10-10    79260700
2008-10-13    54967000
2008-10-14    70749800
Name: Volume, Length: 6081, dtype: int64

Check the type of the Series:

In [8]:
type(apple)

pandas.core.series.Series

## Demo 2: View a `Series`

In [9]:
s

Spain         Madrid
Belgium     Brussels
France         Paris
Italy           Roma
Germany       Berlin
Portugal      Lisbon
Norway          Oslo
Greece        Athens
dtype: string

Check the first 5 rows of the Series:

In [10]:
s.head()

Spain        Madrid
Belgium    Brussels
France        Paris
Italy          Roma
Germany      Berlin
dtype: string

Check the first 3 rows of the Series:

In [11]:
s.head(3)

Spain        Madrid
Belgium    Brussels
France        Paris
dtype: string

Check the last 5 rows of the Series:

In [12]:
s.tail()

Italy         Roma
Germany     Berlin
Portugal    Lisbon
Norway        Oslo
Greece      Athens
dtype: string

Check the last 2 rows of the Series:

In [13]:
s.tail(2)

Norway      Oslo
Greece    Athens
dtype: string

## Exercise 2

In [14]:
apple

Date
1984-09-07     2981600
1984-09-10     2346400
1984-09-11     5444000
1984-09-12     4773600
1984-09-13     7429600
                ...   
2008-10-08    78847900
2008-10-09    57763700
2008-10-10    79260700
2008-10-13    54967000
2008-10-14    70749800
Name: Volume, Length: 6081, dtype: int64

Check the first 5 rows of the Series:

In [15]:
apple.head()

Date
1984-09-07    2981600
1984-09-10    2346400
1984-09-11    5444000
1984-09-12    4773600
1984-09-13    7429600
Name: Volume, dtype: int64

Check the first 10 rows of the Series:

In [16]:
apple.head(10)

Date
1984-09-07    2981600
1984-09-10    2346400
1984-09-11    5444000
1984-09-12    4773600
1984-09-13    7429600
1984-09-14    8826400
1984-09-17    6886400
1984-09-18    3495200
1984-09-19    3816000
1984-09-20    2387200
Name: Volume, dtype: int64

Check the last 5 rows of the Series:

In [17]:
apple.tail()

Date
2008-10-08    78847900
2008-10-09    57763700
2008-10-10    79260700
2008-10-13    54967000
2008-10-14    70749800
Name: Volume, dtype: int64

Check the last 8 rows of the Series:

In [18]:
apple.tail(8)

Date
2008-10-03    81942800
2008-10-06    75264900
2008-10-07    67099000
2008-10-08    78847900
2008-10-09    57763700
2008-10-10    79260700
2008-10-13    54967000
2008-10-14    70749800
Name: Volume, dtype: int64

## Demo 3: Shape

Check the shape:

In [19]:
s.shape

(8,)

<div class="alert alert-info">

<b>Note:</b> The shape of a <code>Series</code> with <code>N</code> elements is <code>(N,)</code>, whereas the shape of a <code>DataFrame</code> with a single column is <code>(N, 1)</code>.

</div>

Check the length:

In [20]:
len(s)

8

## Exercise 3

In [21]:
apple.head()

Date
1984-09-07    2981600
1984-09-10    2346400
1984-09-11    5444000
1984-09-12    4773600
1984-09-13    7429600
Name: Volume, dtype: int64

Check the shape:

In [22]:
apple.shape

(6081,)

Check the length:

In [23]:
len(apple)

6081

## Demo 4: Index

Check the index (i.e. the row labels):

In [24]:
s.index

Index(['Spain', 'Belgium', 'France', 'Italy', 'Germany', 'Portugal', 'Norway',
       'Greece'],
      dtype='object')

Check the type of the index:

In [25]:
type(s.index)

pandas.core.indexes.base.Index

## Exercise 4

In [26]:
apple.head()

Date
1984-09-07    2981600
1984-09-10    2346400
1984-09-11    5444000
1984-09-12    4773600
1984-09-13    7429600
Name: Volume, dtype: int64

Check the index:

In [27]:
apple.index

DatetimeIndex(['1984-09-07', '1984-09-10', '1984-09-11', '1984-09-12',
               '1984-09-13', '1984-09-14', '1984-09-17', '1984-09-18',
               '1984-09-19', '1984-09-20',
               ...
               '2008-10-01', '2008-10-02', '2008-10-03', '2008-10-06',
               '2008-10-07', '2008-10-08', '2008-10-09', '2008-10-10',
               '2008-10-13', '2008-10-14'],
              dtype='datetime64[ns]', name='Date', length=6081, freq=None)

Check the type of the index:

In [28]:
type(apple.index)

pandas.core.indexes.datetimes.DatetimeIndex

## Demo 5: Data types

Check the data type of the series:

In [29]:
s.dtype

StringDtype

## Exercise 5

In [30]:
apple.head()

Date
1984-09-07    2981600
1984-09-10    2346400
1984-09-11    5444000
1984-09-12    4773600
1984-09-13    7429600
Name: Volume, dtype: int64

Check the data type of the series:

In [31]:
apple.dtype

dtype('int64')

## Demo 6: Missing data

Check if there is any missing data:

In [32]:
s.isnull()

Spain       False
Belgium     False
France      False
Italy       False
Germany     False
Portugal    False
Norway      False
Greece      False
dtype: bool

In [33]:
s.notnull()

Spain       True
Belgium     True
France      True
Italy       True
Germany     True
Portugal    True
Norway      True
Greece      True
dtype: bool

Count the number of non-missing values:

In [34]:
s.count()

8

## Exercise 6

In [35]:
apple.head()

Date
1984-09-07    2981600
1984-09-10    2346400
1984-09-11    5444000
1984-09-12    4773600
1984-09-13    7429600
Name: Volume, dtype: int64

Check if there is any missing data:

In [36]:
apple.isnull()

Date
1984-09-07    False
1984-09-10    False
1984-09-11    False
1984-09-12    False
1984-09-13    False
              ...  
2008-10-08    False
2008-10-09    False
2008-10-10    False
2008-10-13    False
2008-10-14    False
Name: Volume, Length: 6081, dtype: bool

In [37]:
apple.notnull()

Date
1984-09-07    True
1984-09-10    True
1984-09-11    True
1984-09-12    True
1984-09-13    True
              ... 
2008-10-08    True
2008-10-09    True
2008-10-10    True
2008-10-13    True
2008-10-14    True
Name: Volume, Length: 6081, dtype: bool

Count the number of non-missing values:

In [38]:
apple.count()

6081

## Demo 7: `name` attributes

Check the `name` attribute of the Series:

In [39]:
s.name

In [40]:
s

Spain         Madrid
Belgium     Brussels
France         Paris
Italy           Roma
Germany       Berlin
Portugal      Lisbon
Norway          Oslo
Greece        Athens
dtype: string

Check the `name` attribute of the index:

In [41]:
s.index.name

Set the name of the Series:

In [42]:
s.name = "capitals"

In [43]:
s

Spain         Madrid
Belgium     Brussels
France         Paris
Italy           Roma
Germany       Berlin
Portugal      Lisbon
Norway          Oslo
Greece        Athens
Name: capitals, dtype: string

In [44]:
s.name

'capitals'

Set the name of the index:

In [45]:
s.index.name = "country"

In [46]:
s

country
Spain         Madrid
Belgium     Brussels
France         Paris
Italy           Roma
Germany       Berlin
Portugal      Lisbon
Norway          Oslo
Greece        Athens
Name: capitals, dtype: string

In [47]:
s.index.name

'country'

## Exercise 7

In [48]:
apple.head()

Date
1984-09-07    2981600
1984-09-10    2346400
1984-09-11    5444000
1984-09-12    4773600
1984-09-13    7429600
Name: Volume, dtype: int64

Check the `name` attribute of the Series:

In [49]:
apple.name

'Volume'

Check the `name` attribute of the index:

In [50]:
apple.index.name

'Date'

Set the name of the Series to "Apple Stock":

In [52]:
apple.name = "Apple Stock"

In [53]:
apple.name

'Apple Stock'

In [54]:
apple

Date
1984-09-07     2981600
1984-09-10     2346400
1984-09-11     5444000
1984-09-12     4773600
1984-09-13     7429600
                ...   
2008-10-08    78847900
2008-10-09    57763700
2008-10-10    79260700
2008-10-13    54967000
2008-10-14    70749800
Name: Apple Stock, Length: 6081, dtype: int64

## Demo 8: Underlying values (`numpy` arrays)

Check the underlying values:

In [55]:
s.values

<StringArray>
['Madrid', 'Brussels', 'Paris', 'Roma', 'Berlin', 'Lisbon', 'Oslo', 'Athens']
Length: 8, dtype: string

Note that the underlying values are stored in `numpy` arrays:

In [56]:
type(s.values)

pandas.core.arrays.string_.StringArray

## Exercise 8

In [57]:
apple.head()

Date
1984-09-07    2981600
1984-09-10    2346400
1984-09-11    5444000
1984-09-12    4773600
1984-09-13    7429600
Name: Apple Stock, dtype: int64

Check the underlying values:

In [58]:
apple.values

array([ 2981600,  2346400,  5444000, ..., 79260700, 54967000, 70749800],
      dtype=int64)

Check the type of the underlying values:

In [59]:
type(apple.values)

numpy.ndarray

## Bonus: Conversion to `DataFrame`

Turn a Series into a DataFrame:

In [60]:
df = s.to_frame()

In [61]:
df

Unnamed: 0_level_0,capitals
country,Unnamed: 1_level_1
Spain,Madrid
Belgium,Brussels
France,Paris
Italy,Roma
Germany,Berlin
Portugal,Lisbon
Norway,Oslo
Greece,Athens


In [62]:
type(df)

pandas.core.frame.DataFrame

<div class="alert alert-info">

<b>Note:</b> The <code>DataFrame</code> gets the same index as the <code>Series</code>!

</div>

Note that the shapes of a Series and of a DataFrame with a single column are not the same:

In [63]:
s.shape

(8,)

In [64]:
df.shape

(8, 1)

<div class="alert alert-info">

<b>Note:</b> The shape of a <code>Series</code> with <code>N</code> elements is <code>(N,)</code>, whereas the shape of a <code>DataFrame</code> with a single column is <code>(N, 1)</code>.

</div>