# Pandas Series

The first main data type we will learn about for pandas is the Series data type. Let's import Pandas and explore the Series object.

A Series is very similar to a NumPy array (in fact it is built on top of the NumPy array object). What differentiates the NumPy array from a Series, is that a Series can have axis labels, meaning it can be indexed by a label, instead of just a number location. It also doesn't need to hold numeric data, it can hold any arbitrary Python Object.

Let's explore this concept through some examples:

In [1]:
import numpy as np
import pandas as pd

### Creating a Series

You can convert a list,numpy array, or dictionary to a Series:

In [3]:
labels = ['a','b','c']
my_list = [10,20,30]
arr = np.array([10,20,30])
d = {'a':10,'b':20,'c':30}

** Using Lists**

In [4]:
pd.Series(data=my_list)

0    10
1    20
2    30
dtype: int64

In [5]:
pd.Series(data=my_list,index=labels)

a    10
b    20
c    30
dtype: int64

In [6]:
pd.Series(my_list,labels)

a    10
b    20
c    30
dtype: int64

** NumPy Arrays **

In [7]:
pd.Series(arr)

0    10
1    20
2    30
dtype: int64

In [8]:
pd.Series(arr,labels)

a    10
b    20
c    30
dtype: int64

** Dictionary**

In [9]:
pd.Series(d)

a    10
b    20
c    30
dtype: int64

### Data in a Series

A pandas Series can hold a variety of object types:

In [10]:
pd.Series(data=labels)

0    a
1    b
2    c
dtype: object

In [11]:
# Even functions (although unlikely that you will use this)
pd.Series([sum,print,len])

0      <built-in function sum>
1    <built-in function print>
2      <built-in function len>
dtype: object

## Using an Index

The key to using a Series is understanding its index. Pandas makes use of these index names or numbers by allowing for fast look ups of information (works like a hash table or dictionary).

Let's see some examples of how to grab information from a Series. Let us create two sereis, ser1 and ser2:

In [36]:
ser1 = pd.Series([1,2,3,4],index = ['USA', 'Germany','USSR', 'Japan'])                                   

In [38]:
ser1

USA        1
Germany    2
USSR       3
Japan      4
dtype: int64

In [37]:
ser2 = pd.Series([1,2,5,4],index = ['USA', 'Germany','Italy', 'Japan'])                                   

In [39]:
ser2

USA        1
Germany    2
Italy      5
Japan      4
dtype: int64

In [16]:
ser1['USA']

1

In [58]:
ser1[0]

1

In [60]:
ser1[1:3]

Germany    2
USSR       3
dtype: int64

In [61]:
ser1[:3]

USA        1
Germany    2
USSR       3
dtype: int64

In [2]:
#Label based accessing
ser1.loc['USA']

NameError: name 'ser1' is not defined

In [3]:
#Index based accessing
ser1.iloc[3]

NameError: name 'ser1' is not defined

Operations

In [77]:
print(ser1.mean())
print(ser2.mean())

2.5
3.0


In [76]:
print(ser1.sum())
print(ser2.sum())

10
12


In [86]:
print("Maximum:- ",ser1.max())
print("Label of Maximum element:- ",ser1.idxmax())

Maximum:-  4
Label of Maximum element:-  Japan


In [87]:
print("Minimum:- ",ser1.min())
print("Label of Minimum element:- ",ser1.idxmin())

Minimum:-  1
Label of Minimum element:-  USA


In [98]:
ser1.between(1,2)

USA         True
Germany     True
USSR       False
Japan      False
dtype: bool

In [99]:
#Count uniqueness of values in a series
ser1.value_counts()

4    1
3    1
2    1
1    1
dtype: int64

In [24]:
ser1.add(ser2)

Germany    4.0
Italy      NaN
Japan      8.0
USA        2.0
USSR       NaN
dtype: float64

In [23]:
ser1.add(ser2,fill_value=0)

Germany    4.0
Italy      5.0
Japan      8.0
USA        2.0
USSR       3.0
dtype: float64

In [81]:
print("\n1st Series \n----------------------------------------\n",ser1)
print("\n2nd Series \n----------------------------------------\n",ser2)
print("\nResult\n----------------------------------------\n",ser1.sub(ser2))
print("\nResult\n----------------------------------------\n",ser1.sub(ser2,fill_value=0))


1st Series 
----------------------------------------
 USA        1
Germany    2
USSR       3
Japan      4
dtype: int64

2nd Series 
----------------------------------------
 USA        1
Germany    2
Italy      5
Japan      4
dtype: int64

Result
----------------------------------------
 Germany    0.0
Italy      NaN
Japan      0.0
USA        0.0
USSR       NaN
dtype: float64

Result
----------------------------------------
 Germany    0.0
Italy     -5.0
Japan      0.0
USA        0.0
USSR       3.0
dtype: float64


In [78]:
print("\n1st Series \n----------------------------------------\n",ser1)
print("\n2nd Series \n----------------------------------------\n",ser2)
print("\nResult\n----------------------------------------\n",ser1.ge(ser2))


1st Series 
----------------------------------------
 USA        1
Germany    2
USSR       3
Japan      4
dtype: int64

2nd Series 
----------------------------------------
 USA        1
Germany    2
Italy      5
Japan      4
dtype: int64

Result
----------------------------------------
 Germany     True
Italy      False
Japan       True
USA         True
USSR       False
dtype: bool


In [73]:
#Covariance 
ser1.cov(ser2)

2.3333333333333335