# Series
Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers,
Python objects, etc.). The axis labels are collectively referred to as the index.

In [1]:
import pandas as pd
import numpy as np

##### Create series from NumPy array
number of labels in 'index' must be the same as the number of elements in array

In [2]:
my_simple_series = pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e'])
my_simple_series

a   -0.205270
b   -0.526918
c    0.326489
d   -0.403725
e   -0.299000
dtype: float64

In [3]:
my_simple_series.index

Index(['a', 'b', 'c', 'd', 'e'], dtype='object')

##### Create series from NumPy array, without explicit index

In [5]:
my_simple_series = pd.Series(np.random.randn(5))
my_simple_series

0   -1.024356
1   -0.795926
2    0.566463
3   -0.114793
4   -0.346569
dtype: float64

Access a series like a NumPy array

In [6]:
my_simple_series[:3]

0   -1.024356
1   -0.795926
2    0.566463
dtype: float64

##### Create series from Python dictionary

In [7]:
my_dictionary = {'a' : 45., 'b' : -19.5, 'c' : 4444}
my_second_series = pd.Series(my_dictionary)
my_second_series

a      45.0
b     -19.5
c    4444.0
dtype: float64

Access a series like a dictionary

In [10]:
my_second_series['b']

-19.5

note order in display; same as order in "index"

note NaN

In [11]:
pd.Series(my_dictionary, index=['b', 'c', 'd', 'a'])

b     -19.5
c    4444.0
d       NaN
a      45.0
dtype: float64

In [12]:
my_second_series.get('a')

45.0

In [13]:
unknown = my_second_series.get('f')
type(unknown)

NoneType

##### Create series from scalar
If data is a scalar value, an index must be provided. The value will be repeated to match the length of index

In [14]:
pd.Series(5., index=['a', 'b', 'c', 'd', 'e'])

a    5.0
b    5.0
c    5.0
d    5.0
e    5.0
dtype: float64

# Vectorized Operations
- not necessary to write loops for element-by-element operations
- pandas' Series objects can be passed to **_MOST_** NumPy functions

In [15]:
import pandas as pd
import numpy as np

In [16]:
my_dictionary = {'a' : 45., 'b' : -19.5, 'c' : 4444}
my_series = pd.Series(my_dictionary)
my_series

a      45.0
b     -19.5
c    4444.0
dtype: float64

###### add Series without loop

In [17]:
my_series + my_series

a      90.0
b     -39.0
c    8888.0
dtype: float64

In [18]:
my_series

a      45.0
b     -19.5
c    4444.0
dtype: float64

##### Series within arithmetic expression

In [19]:
my_series + 5

a      50.0
b     -14.5
c    4449.0
dtype: float64

##### Series used as argument to NumPy function

In [20]:
np.exp(my_series)

  """Entry point for launching an IPython kernel.


a    3.493427e+19
b    3.398268e-09
c             inf
dtype: float64

A key difference between Series and ndarray is that operations between Series automatically align the data based on
label. Thus, you can write computations without giving consideration to whether the Series involved have the same labels.

In [21]:
my_series[1:]

b     -19.5
c    4444.0
dtype: float64

In [22]:
my_series[:-1]

a    45.0
b   -19.5
dtype: float64

In [23]:
my_series[1:] + my_series[:-1]

a     NaN
b   -39.0
c     NaN
dtype: float64

### Apply Python functions on an element-by-element basis

In [24]:
def multiply_by_ten (input_element):
    return input_element * 10.0

In [25]:
my_series.map(multiply_by_ten)

a      450.0
b     -195.0
c    44440.0
dtype: float64

### Vectorized string methods
Series is equipped with a set of string processing methods that make it easy to operate on each element of the array. Perhaps most importantly, these methods exclude missing/NA values automatically. 

In [26]:
series_of_strings = pd.Series(['A', 'B', 'C', 'Aaba', 'Baca', np.nan, 'CABA', 'dog', 'cat'])

In [27]:
series_of_strings.str.lower()

0       a
1       b
2       c
3    aaba
4    baca
5     NaN
6    caba
7     dog
8     cat
dtype: object

# Date Arithmetic

| Type      |   | Description                                                       |
|-----------|---|-------------------------------------------------------------------|
| date      |   | Store calendar date (year, month, day) using a Gregorian Calendar |
| datetime  |   | Store both date and time                                          |
| timedelta |   | Difference between two datetime values                            |

##### common date arithmetic operations
- calculate differences between date
- generate sequences of dates and time spans
- convert time series to a particular frequency

### Date, time, functions

| to_datetime(*args, **kwargs)                      | Convert argument to datetime.                                               |   |
|---------------------------------------------------|-----------------------------------------------------------------------------|---|
| to_timedelta(*args, **kwargs)                     | Convert argument to timedelta                                               |   |
| date_range([start, end, periods, freq, tz, ...])  | Return a fixed frequency datetime index, with day (calendar) as the default |   |
| bdate_range([start, end, periods, freq, tz, ...]) | Return a fixed frequency datetime index, with business day as the default   |   |
| period_range([start, end, periods, freq, name])   | Return a fixed frequency datetime index, with day (calendar) as the default |   |
| timedelta_range([start, end, periods, freq, ...]) | Return a fixed frequency timedelta index, with day as the default           |   |
| infer_freq(index[, warn])                         | Infer the most likely frequency given the input index.                      |   |

In [29]:
import pandas as pd
import numpy as np
from datetime import datetime

##### now()

In [30]:
now = datetime.now()
now

datetime.datetime(2018, 5, 10, 19, 56, 13, 645136)

In [31]:
now.year, now.month, now.day

(2018, 5, 10)

In [32]:
delta = now - datetime(2001, 1, 1)
delta

datetime.timedelta(6338, 71773, 645136)

In [33]:
delta.days

6338

### Parsing Timedelta
##### from string

In [34]:
pd.Timedelta('4 days 7 hours')

Timedelta('4 days 07:00:00')

##### named keyword arguments

In [35]:
# note: these MUST be specified as keyword arguments
pd.Timedelta(days=1, seconds=1)

Timedelta('1 days 00:00:01')

##### integers with a unit

In [36]:
pd.Timedelta(1, unit='d')

Timedelta('1 days 00:00:00')

##### create a range of dates from Timedelta

In [37]:
us_memorial_day = datetime(2016, 5, 30)
print(us_memorial_day)
us_labor_day = datetime(2016, 9, 5)
print(us_labor_day)
us_summer_time = us_labor_day - us_memorial_day
print(us_summer_time)
type(us_summer_time)

2016-05-30 00:00:00
2016-09-05 00:00:00
98 days, 0:00:00


datetime.timedelta

In [38]:
us_summer_time_range = pd.date_range(us_memorial_day, periods=us_summer_time.days, freq='D')

##### summer_time time series with random data

In [39]:
us_summer_time_time_series = pd.Series(np.random.randn(us_summer_time.days), index=us_summer_time_range)
us_summer_time_time_series.tail()

2016-08-31   -2.631141
2016-09-01    1.092692
2016-09-02    0.828220
2016-09-03    0.141695
2016-09-04   -1.111708
Freq: D, dtype: float64