# `pandas` - Series

__Contents__:
1. Series 
1. Numeric Series
1. Character Series
1. Time Series

## 1. Series

Documentation:
- https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.html

Load libraries.

In [6]:
import pandas  as pd
import numpy   as np
(pd.__version__,
 np.__version__
)

There are two types of objects in pandas:
- __Series__, which are 1-dimensional and contain only one type of value (i.e. `float64`, `int64`, `bool`, `object`)
- __Dataframes__, which are 2-dimensional tables

The columns of a DataFrame can be retrieved as a Series or as a DataFrame with a single column.

A __Series__ is a sequence of values and a corresponding sequence of names (an index) for these values.

Create a Series object `my_series` for a range.

In [10]:
my_series = pd.Series(range(0,50,10))
my_series

Use the `dtype` attribute to get the type of object stored in the Series.

In [12]:
my_series.dtype

The `size` attribute returns the number of elements in the Series.

In [14]:
my_series.size

The `shape` attribute returns the shape of the data.

In [16]:
my_series.shape

To see the number of elements in the Series, we can also use `len()` function.

In [18]:
len(my_series)

The counts of the distinct values of a Series are displayed using the `value_counts` method.

In [20]:
my_series.value_counts()

The values of a Series can be converted into a 1-dimensional numpy array using the `values` method.

In [22]:
print(type(my_series.values))
my_series.values

There are two ways to aggregate the values of a series. 
1. Through methods of the Series object, such as `.mean()`, `.median()`, `.sum()`.
1. Through numpy functions, such as `numpy.square()`, `numpy.add()`.

A couple of each are displayed below. More can be found in the documentation.

In [24]:
my_series.mean()

In [25]:
my_series.median()

In [26]:
my_series.sum()

In [27]:
np.square(my_series)

In [28]:
np.add(1000,my_series)

There are three types of Series. Each has a different type of index. These are:
- __Numeric Series__, where rows are labeled by numbers, which is the default
- __Character Series__, where rows are labeled by strings
- __Time Series__, where the rows are labeled by `datetime` values

The __index__ of a Series is a corresponding sequence of values used to refer to individual elements of a Series or sets of elements of a Series. 

The indexing (retrieval of elements) is slightly different for each type of Series.

## 2. Numeric Indexed Series

Documentation:
- https://docs.scipy.org/doc/numpy/reference/routines.random.html

Create a Series from a numpy array of 11 random integers between 100 and 110.

In [33]:
num_series = pd.Series(np.random.randint(100,110,11))
type(num_series)

Have a look at the Series.

In [35]:
num_series

Access group of values by labels/indexes in the Series using the `.loc` attribute.

In [37]:
num_series.loc[num_series > 107]

Access group of values by integer indexes using the `.iloc` attribute.

In [39]:
num_series.iloc[[0,2,4]]

Note that the `.iloc` attribute takes 0-based indexing. When slicing, the start bounds is included, while the upper bound is excluded.

In [41]:
num_series.iloc[:3]

The above section introduces the numeric indexed Series and ways to access the values in these Series.

## 3. Character Indexed Series

Create a Series with character index by passing a list of strings to the `index` parameter.

In [45]:
my_series_chr_ndx = pd.Series(np.random.randint(100,110,12), 
                      index=['jan','feb','mar','apr','may','jun',
                             'jul','aug','sep','oct','nov','dec'])
my_series_chr_ndx

Retrieve the value of a single element using an index label value inside square brackets.

In [47]:
my_series_chr_ndx['jan']

Retrieve multiple elements using the colon operator between index label values.

In [49]:
my_series_chr_ndx['jan':'apr']

Notice that the endpoints are included.

The `.loc` attribute does the same thing using index label to retrieve elements in the Series.

In [52]:
my_series_chr_ndx.loc['jan':'apr']

The `.iloc` attribute needs to take integer index values to retrieve elements in the Series.

In [54]:
my_series_chr_ndx.iloc[0:4]

To retrieve elements in the Series, one can also use index label values inside double square brackets.

In [56]:
my_series_chr_ndx[['jan','dec']]

The above section introduces creating a character indexed Series and retrieving elements in the Series.

## 4. Time Indexed Series

Documentations:
- https://pandas.pydata.org/pandas-docs/stable/generated/pandas.to_datetime.html
- http://pandas.pydata.org/pandas-docs/stable/timeseries.html

First create a timestamp `start_datatime` using the `to_datatime` function. The function converts its argument to datetime, and a `format` argument is passed to indicate the format of the string to be converted into a datetime value.

In [61]:
start_datetime = pd.to_datetime('201802031100', format='%Y%m%d%H%M')
start_datetime

Use the `date_range` function to return a fixed frequency `DatetimeIndex`. Exactly three of the four parameters `start`, `end`, `periods`, and `freq` must be specified.

In [63]:
time_ndx = pd.date_range(start_datetime, periods=72, freq='H')
time_ndx

Create a Series `my_series_dt_ndx` with time index.

In [65]:
my_series_dt_ndx = pd.Series(np.random.randint(100,110,72), index=time_ndx)

Use the `iloc` attribute to retrieve the first 10 rows of the Series.

In [67]:
my_series_dt_ndx.iloc[:10]

Use the `loc` attribute to retrieve values with passing the date as string.

In [69]:
my_series_dt_ndx.loc['2018-02-03']

In [70]:
my_series_dt_ndx.loc['Feb 3, 2018']

Note that when slicing, all the matching times in the range will be included (including the endpoint).

In [72]:
my_series_dt_ndx.loc['2018-02-03 11:00:00':'2018-02-03 19:00:00']

The above section introduces time indexed Series and ways to retrieve values in the Series.

Related/useful documentation:
- http://pandas.pydata.org/pandas-docs/stable/index.html
- https://pandas.pydata.org/pandas-docs/stable/dsintro.html

__The End__