# Series

The first main data type we will learn about for pandas is the Series data type. Let's import Pandas and explore the Series object.

A Series is very similar to a NumPy array (in fact it is built on top of the NumPy array object). What differentiates the NumPy array from a Series, is that a Series can have axis labels, meaning it can be indexed by a label, instead of just a number location. It also doesn't need to hold numeric data, it can hold any arbitrary Python Object.

Let's explore this concept through some examples:

## Imports

In [1]:
import numpy as np
import pandas as pd

In [2]:
mydata = [1776,1867,1821]
arr = np.array(mydata)
arr

array([1776, 1867, 1821])

In [3]:
arr[0:2]

array([1776, 1867])

## Creating a Series from Python Objects

In [7]:
# help(pd.Series)
# https://pandas.pydata.org/docs/reference/api/pandas.Series.html

### Index and Data Lists

We can create a Series from Python lists (also from NumPy arrays)

In [8]:
myindex = ['USA','Canada','Mexico']

In [9]:
mydata = [1776,1867,1821]

In [10]:
myser = pd.Series(data=mydata)

In [11]:
myser

0    1776
1    1867
2    1821
dtype: int64

In [12]:
pd.Series(data=mydata,index=myindex)

USA       1776
Canada    1867
Mexico    1821
dtype: int64

In [13]:
pd.Series(data=[67,78,56,90],index=['Maths','English','Science','Telugu'])

Maths      67
English    78
Science    56
Telugu     90
dtype: int64

In [18]:
ran_data = np.random.randint(0,100,4)

In [19]:
ran_data

array([70, 82, 22, 11], dtype=int32)

In [20]:
names = ['Andrew','Bobo','Claire','David']

In [22]:
ages = pd.Series(ran_data,names)

In [23]:
ages

Andrew    70
Bobo      82
Claire    22
David     11
dtype: int32

### From a  Dictionary

In [24]:
ages = {'Sammy':5,'Frank':10,'Spike':7,'Andrew':2}

In [25]:
ages

{'Sammy': 5, 'Frank': 10, 'Spike': 7, 'Andrew': 2}

In [26]:
pd.Series(ages)

Sammy      5
Frank     10
Spike      7
Andrew     2
dtype: int64

# Key Ideas of a Series

## Named Index

In [27]:
# Imaginary Sales Data for 1st and 2nd Quarters for Global Company
q1 = {'Japan': 80, 'China': 450, 'India': 200, 'USA': 250}
q2 = {'Brazil': 100,'China': 500, 'India': 210,'USA': 260}

In [28]:
# Convert into Pandas Series
sales_Q1 = pd.Series(q1)
sales_Q2 = pd.Series(q2)

In [29]:
sales_Q1

Japan     80
China    450
India    200
USA      250
dtype: int64

In [30]:
sales_Q2

Brazil    100
China     500
India     210
USA       260
dtype: int64

In [31]:
type(sales_Q1)

pandas.core.series.Series

In [32]:
sales_Q1

Japan     80
China    450
India    200
USA      250
dtype: int64

In [33]:
# Call values based on Named Index
sales_Q1['Japan']

np.int64(80)

In [34]:
# Integer Based Location information also retained!
sales_Q1.iloc[0]

np.int64(80)

In [35]:
sales_Q1.iloc[3]

np.int64(250)

In [36]:
sales_Q1['India']

np.int64(200)

In [37]:
sales_Q1.iloc[2]

np.int64(200)

In [38]:
sales_Q1

Japan     80
China    450
India    200
USA      250
dtype: int64

**Be careful with potential errors!**

In [39]:
# Wrong Name
sales_Q1['France']

KeyError: 'France'

In [40]:
sales_Q1

Japan     80
China    450
India    200
USA      250
dtype: int64

In [41]:
# Accidental Extra Space
sales_Q1['USA ']

KeyError: 'USA '

In [42]:
# Capitalization Mistake
sales_Q1['usa']

KeyError: 'usa'

# Index names

In [44]:
sales_Q1.keys()

Index(['Japan', 'China', 'India', 'USA'], dtype='object')

## Operations

In [45]:
# Grab just the index keys
sales_Q1.keys()

Index(['Japan', 'China', 'India', 'USA'], dtype='object')

In [46]:
sales_Q1

Japan     80
China    450
India    200
USA      250
dtype: int64

In [47]:
# Can Perform Operations Broadcasted across entire Series
sales_Q1 * 2

Japan    160
China    900
India    400
USA      500
dtype: int64

In [48]:
sales_Q2

Brazil    100
China     500
India     210
USA       260
dtype: int64

In [49]:
sales_Q2 / 100

Brazil    1.0
China     5.0
India     2.1
USA       2.6
dtype: float64

## Between Series

In [50]:
print(sales_Q1)
print('')
print(sales_Q2)

Japan     80
China    450
India    200
USA      250
dtype: int64

Brazil    100
China     500
India     210
USA       260
dtype: int64


In [51]:
# Notice how Pandas informs you of mismatch with NaN
sales_Q1 + sales_Q2

Brazil      NaN
China     950.0
India     410.0
Japan       NaN
USA       510.0
dtype: float64

In [52]:
# You can fill these with any value you want
sales_Q1.add(sales_Q2,fill_value=0)

Brazil    100.0
China     950.0
India     410.0
Japan      80.0
USA       510.0
dtype: float64

That is all we need to know about Series, up next, DataFrames!