[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/prof-tcsmith/mis307.git/HEAD?labpath=notebooks%2Fpandas_02.ipynb)

<a href="https://colab.research.google.com/github/prof-tcsmith/mis307/blob/master/notebooks/pandas_02.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# Pandas Series Introduction

The Series pandas datatype is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). 

Each elements of the series can be access through specifying an index.

The general pattern to initializing and naming a Series is :

```
s = pd.Series(data, index=index)
```

Where data can be a dictionary, tuple, list or scalar.  

### A few examples of creating and using Pandas Series

We can create pandas Series from tuples, lists and dictionaries. Let's create a few of these data structures:

In [1]:
index_names = ("First", "Second", "Third")
a = (1,2,3)
b = [4.4,5.567,6.987645]
c = {"Seventh":7,'Eigth':8,"Ninth":9}

You must first import pandas before using. It's very common to import pandas as pd (saves typing when referencing pandas later in your program)

In [2]:
import pandas as pd

Let's create a data series from the integers tuple:

In [3]:
s1 = pd.Series(a)

In [4]:
s1

0    1
1    2
2    3
dtype: int64

Pandas series contains values, names for each value, and the data type that each value is stored as. In the case above, we see that our numbers were stored as int64 (this is short for 64 bit integers, and is a much 'bigger' integer that int in python; which is only 32 bits in size) and the default names (or indexes) for these values is 0 through 2.

If we want to add index name, we can do the following:

In [5]:
s1 = pd.Series(a, index=index_names)
s1

First     1
Second    2
Third     3
dtype: int64

If we'd rather store these values as a floating point, we can do the following:

In [6]:
s1 = pd.Series(a, index=index_names, dtype='float32')
s1

First     1.0
Second    2.0
Third     3.0
dtype: float32

> There are a number of other datatypes you can use, including:
    int8,
    int16, 
    int32,
    int62,
    float16,
    float32,
    float64,
    boolean,
    category

You can usually trust the default data type that pandas chooses based on your data. For instance, in our list b, we have a number of floating values:

In [7]:
s2 = pd.Series(b, index=index_names) 
s2

First     4.400000
Second    5.567000
Third     6.987645
dtype: float64

But, you can overide the defaults:

In [8]:
s2 = pd.Series(b, index=index_names, dtype='float32')  # index is implied if second argument is present. Here we can also set datatype
s2

First     4.400000
Second    5.567000
Third     6.987645
dtype: float32

In [9]:
s3 = pd.Series(c)
s3

Seventh    7
Eigth      8
Ninth      9
dtype: int64

In [10]:
s3 = pd.Series(c, index=index_names)
s3

First    NaN
Second   NaN
Third    NaN
dtype: float64

But, we could also extract the values from the dictionary and supply the index names:

In [11]:
s3 = pd.Series(c.values(), index=index_names)
s3

First     7
Second    8
Third     9
dtype: int64

... as we see above, dictionaries are a little different. Dictionaries already contain names for each element. Note that if we create a data series using a dictionary, we do not need a names argument...

### Iteration of a Pandas Series 

As you would expect, we can also easily iterate through values in a series

In [12]:
for element in s1:
    print(element)

1.0
2.0
3.0


... and also interate over key value pairs

In [13]:
for key, val in s2.iteritems():
    print(key, val)

First 4.400000095367432
Second 5.566999912261963
Third 6.987645149230957


### A Pandas Series is an Object - and contains many useful methods

To see a list of methods supported by an object, you can use the dir() function. The dir() function will list all the attributes and methods found within a given object.

In [14]:
# dir(s3)

...see here (http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.html)

### Yes, Series is very "dictionary-like"

In [15]:
s1["First"]

1.0

In [16]:
s1["Second"]

2.0

In [17]:
s1["Third"]

3.0

In [18]:
# s1["Fourth"]

... and, as we see if we ran the above code "s1["Fourth"]", Series are subject to "out of bounds" errors is we attempt to access an element with an index that does not exist. 

Like in our use of dict, we can also try "getting" a value from a Series object:

In [19]:
print(s1.get("Fourth"))

None


In [20]:
print(s1.get("Third"))

3.0


Like when using dictionaries, getting allows you to try a key to see if it exists without throwing an exception.

### Vectorized Operations with Pandas Series

Here's where Pandas Series starts to differentiate itself from Dictionaries. Using Pandas Series, we can use vector operations to create vectorized expressions.

In [21]:
s1 + s1

First     2.0
Second    4.0
Third     6.0
dtype: float32

In [22]:
s1 + 2

First     3.0
Second    4.0
Third     5.0
dtype: float32

In [23]:
s1 * 10

First     10.0
Second    20.0
Third     30.0
dtype: float32

In [24]:
s1*2+s2**3

First      87.184006
Second    176.529617
Third     347.187042
dtype: float32

## Slicing Pandas Series

We can also use the index/slice operator on a Pandas Series...

In [25]:
s1[1:]+s2[1:]

Second    7.567000
Third     9.987645
dtype: float32

In [26]:
s1[-1]+s2[1]

8.566999