___

<a href='http://www.pieriandata.com'><img src='../Pierian_Data_Logo.png'/></a>
___
<center><em>Copyright by Pierian Data Inc.</em></center>
<center><em>For more information, visit us at <a href='http://www.pieriandata.com'>www.pieriandata.com</a></em></center>

# Series

The first main data type we will learn about for pandas is the Series data type. Let's import Pandas and explore the Series object.

A Series is very similar to a NumPy array (in fact it is built on top of the NumPy array object). What differentiates the NumPy array from a Series, is that a Series can have axis labels, meaning it can be indexed by a label, instead of just a number location. It also doesn't need to hold numeric data, it can hold any arbitrary Python Object.

Let's explore this concept through some examples:

## Imports

In [2]:
import numpy as np
import pandas as pd

## Creating a Series from Python Objects

In [3]:
#help(pd.Series)

### Index and Data Lists

We can create a Series from Python lists (also from NumPy arrays)

In [5]:
myindex = ['USA','Canada','Mexico']

In [6]:
mydata = [1776,1867,1821]

Passing in just the data so far as a list -

In [7]:
myser = pd.Series(data=mydata)

Running the above with the data list passed in -

In [8]:
myser

0    1776
1    1867
2    1821
dtype: int64

Now run the same but with both the data and the index passed in -

In [10]:
newser = pd.Series(data=mydata,index=myindex)

In [11]:
newser

USA       1776
Canada    1867
Mexico    1821
dtype: int64

Retrieve data from the array by calling either the index number or from the index list -

In [14]:
newser[1]

1867

In [15]:
newser['USA']

1776

#### As above, generate some random numbers

In [18]:
ran_data = np.random.randint(0,100,4)

In [19]:
ran_data

array([34, 57, 13, 65])

Create a list of names -

In [20]:
names = ['Andrew','Bobo','Claire','David']

Pass both the names and random numbers into the series -

In [21]:
ages = pd.Series(ran_data,names)

In [22]:
ages

Andrew    34
Bobo      57
Claire    13
David     65
dtype: int32

### From a  Dictionary

With a dictionary of names with associated ages -

In [23]:
ages = {'Sammy':5,'Frank':10,'Spike':7}

In [24]:
ages

{'Sammy': 5, 'Frank': 10, 'Spike': 7}

Pandas is smart enough to associate the ages with each name -

In [25]:
pd.Series(ages)

Sammy     5
Frank    10
Spike     7
dtype: int64

# Key Ideas of a Series

## Named Index

Imaginary Sales Data for 1st and 2nd Quarters for Global Company -


In [27]:
q1 = {'Japan': 80, 'China': 450, 'India': 200, 'USA': 250}
q2 = {'Brazil': 100,'China': 500, 'India': 210,'USA': 260}

Convert into Pandas Series' -


In [28]:
sales_Q1 = pd.Series(q1)
sales_Q2 = pd.Series(q2)

In [29]:
sales_Q1

Japan     80
China    450
India    200
USA      250
dtype: int64

Call values based on their Named Index


In [30]:
sales_Q1['Japan']

80

Or the same with their integer-based Location information -


In [31]:
sales_Q1[0]

80

**Be careful with potential errors!**

Wrong Name -


In [37]:
# sales_Q1['France']

Accidental Extra Space -


In [33]:
# sales_Q1['USA ']

Capitalization Mistake -


In [40]:
# sales_Q1['usa']

## Operations

Grab just the index keys to find what we're looking for -


In [41]:
sales_Q1.keys()

Index(['Japan', 'China', 'India', 'USA'], dtype='object')

Can Perform Operations Broadcasted across entire Series -


In [42]:
# Multiply all in Q1 by 2
sales_Q1 * 2

Japan    160
China    900
India    400
USA      500
dtype: int64

In [43]:
# Divide all by 100 (express as %age)
sales_Q2 / 100

Brazil    1.0
China     5.0
India     2.1
USA       2.6
dtype: float64

Notice that numbers have been automatically converted from int to float by Pandas

## Between Series

As the two dictionaries (Q1 & Q2) have slightly different keys (Japan & Brazil), notice how Pandas recognises and informs of a mismatch with NaN -


In [44]:
sales_Q1 + sales_Q2

Brazil      NaN
China     950.0
India     410.0
Japan       NaN
USA       510.0
dtype: float64

In this case, use the 'add' method, and use the 'fill_value' argument to fill these with a value to default to -


In [45]:
sales_Q1.add(sales_Q2,fill_value=0)

Brazil    100.0
China     950.0
India     410.0
Japan      80.0
USA       510.0
dtype: float64

That is all we need to know about Series, up next, DataFrames!