The first main data type we will learn about for pandas is the Series data type. Let's import Pandas and explore the Series object.

A Series is very similar to a NumPy array (in fact it is built on top of the NumPy array object). What differentiates the NumPy array from a Series, is that a Series can have axis labels, meaning it can be indexed by a label, instead of just a number location. It also doesn't need to hold numeric data, it can hold any arbitrary Python Object.

Let's explore this concept through some examples:

In [1]:
import numpy as np
import pandas as pd

### Creating a Series

You can convert a list,numpy array, or dictionary to a Series:

In [2]:
s = pd.Series([1, 3, 5, np.nan, 6, 8])
s

0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64

In [3]:
s[0] = "miau"
s

0    miau
1     3.0
2     5.0
3     NaN
4     6.0
5     8.0
dtype: object

In [4]:
s.to_numpy()

array(['miau', 3.0, 5.0, nan, 6.0, 8.0], dtype=object)

In [5]:
s.array

<PandasArray>
['miau', 3.0, 5.0, nan, 6.0, 8.0]
Length: 6, dtype: object

In [6]:
s.at[0] #acces a single value for a row/column label pair

'miau'

In [7]:
s.axes

[RangeIndex(start=0, stop=6, step=1)]

In [8]:
s.values

array(['miau', 3.0, 5.0, nan, 6.0, 8.0], dtype=object)

In [9]:
labels = ['a','b','c']
my_list = [10,20,30]
arr = np.array([10,20,30])
d = {'a':10,'b':20,'c':30}

** Using Lists**

In [10]:
pd.Series(data=my_list)

0    10
1    20
2    30
dtype: int64

In [11]:
pd.Series(data=my_list,index=labels)

a    10
b    20
c    30
dtype: int64

In [12]:
df_2 = pd.Series(my_list,labels)
np.exp(df_2)

a    2.202647e+04
b    4.851652e+08
c    1.068647e+13
dtype: float64

In [13]:
np.asarray(df_2)

array([10, 20, 30])

** NumPy Arrays **

In [14]:
arr_1 = pd.Series(arr)
arr_1

0    10
1    20
2    30
dtype: int64

In [15]:
pd.Series(arr,labels)

a    10
b    20
c    30
dtype: int64

** Dictionary**

In [16]:
d_s = pd.Series(d)
d_s

a    10
b    20
c    30
dtype: int64

In [17]:
#If an index is passed, the values in data corresponding to the labels in the index will be pulled out.
pd.Series(d, index=['c','b','a'])

c    30
b    20
a    10
dtype: int64

In [18]:
#from scalar value 
pd.Series(5.0, index =['a','b','c'])

a    5.0
b    5.0
c    5.0
dtype: float64

In [19]:
#While Series is ndarray-like, if you need an actual ndarray, then use Series.to_numpy().
d_s.to_numpy()

array([10, 20, 30])

### Data in a Series

A pandas Series can hold a variety of object types:

In [20]:
pd.Series(data=labels)

0    a
1    b
2    c
dtype: object

In [21]:
# Even functions (although unlikely that you will use this)
pd.Series([sum,print,len])

0      <built-in function sum>
1    <built-in function print>
2      <built-in function len>
dtype: object

## Using an Index

The key to using a Series is understanding its index. Pandas makes use of these index names or numbers by allowing for fast look ups of information (works like a hash table or dictionary).

Let's see some examples of how to grab information from a Series. Let us create two sereis, ser1 and ser2:

In [22]:
ser1 = pd.Series([1,2,3,4],index = ['USA', 'Germany','USSR', 'Japan'])                                   

In [23]:
ser1

USA        1
Germany    2
USSR       3
Japan      4
dtype: int64

In [24]:
ser2 = pd.Series([1,2,5,4],index = ['USA', 'Germany','Italy', 'Japan'])                                   

In [25]:
ser2

USA        1
Germany    2
Italy      5
Japan      4
dtype: int64

In [26]:
ser1['USA']

1

Operations are then also done based off of index:

In [27]:
ser1 + ser2

Germany    4.0
Italy      NaN
Japan      8.0
USA        2.0
USSR       NaN
dtype: float64

In [28]:
ser1.values

array([1, 2, 3, 4])

In [29]:
ser1.to_numpy()

array([1, 2, 3, 4])

In [30]:
# Using the get method, a missing label will return None or specified default:
ser2.get("Mex", np.nan)

nan

In [31]:
ser_1 = pd.Series([1,2,3])
ser_idx = pd.Index([4,5,6])

np.maximum(ser_1, ser_idx)

0    4
1    5
2    6
dtype: int64

In [32]:
#idxmin() & idxmax(), compute the index labels with the minimum and maximum corresponding values
ser_1.idxmin(), ser_1.idxmax()

(0, 2)

In [33]:
#Return a Series/DataFrame with absolute numeric value of each element.
series_1 = pd.Series([-1,-2,3,4])
series_1.abs() 

0    1
1    2
2    3
3    4
dtype: int64

In [34]:
a_1 = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
b_1 = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])

a_1.add(b_1)

a    2.0
b    NaN
c    NaN
d    NaN
e    NaN
dtype: float64

In [35]:
a_1.add(b_1, fill_value=0)

a    2.0
b    1.0
c    1.0
d    1.0
e    NaN
dtype: float64

In [36]:
#.add_prefix(), Prefix labels with string prefix.
a_1.add_prefix('item_')

item_a    1.0
item_b    1.0
item_c    1.0
item_d    NaN
dtype: float64

In [37]:
a_1

a    1.0
b    1.0
c    1.0
d    NaN
dtype: float64

In [38]:
#.add_suffix(), Suffix labels with string suffix.
a_1.add_suffix('_item')

a_item    1.0
b_item    1.0
c_item    1.0
d_item    NaN
dtype: float64

In [39]:
#.agg(), Aggregate using one or more operations over the specified axis.
#See also:
#pandas: powerful Python data analysis toolkit, Release 1.2.3

a_1.agg(['min', 'max']) 

min    1.0
max    1.0
dtype: float64

In [40]:
#.align(), Align two objects on their axes with the specified join method.

a_1.align(b_1, join='inner')

(a    1.0
 b    1.0
 d    NaN
 dtype: float64,
 a    1.0
 b    NaN
 d    1.0
 dtype: float64)

In [41]:
c_1 = a_1.append(b_1)
c_1

a    1.0
b    1.0
c    1.0
d    NaN
a    1.0
b    NaN
d    1.0
e    NaN
dtype: float64

In [42]:
c_1.loc['a']

a    1.0
a    1.0
dtype: float64

In [43]:
a_1.append(b_1, ignore_index=True)

0    1.0
1    1.0
2    1.0
3    NaN
4    1.0
5    NaN
6    1.0
7    NaN
dtype: float64

In [44]:
#pandas.Series.at_time
i = pd.date_range('2018-04-09', periods=4, freq='12H')
ts = pd.DataFrame({'A': [1, 2, 3, 4]}, index=i)
ts

Unnamed: 0,A
2018-04-09 00:00:00,1
2018-04-09 12:00:00,2
2018-04-10 00:00:00,3
2018-04-10 12:00:00,4


In [45]:
ts.at_time('12:00')

Unnamed: 0,A
2018-04-09 12:00:00,2
2018-04-10 12:00:00,4


In [46]:
#This function is equivalent to(left <= ser) & (ser <= right)
#Boundary values are included by default:

print(ser_1)
ser_1.between(2,4)

0    1
1    2
2    3
dtype: int64


0    False
1     True
2     True
dtype: bool

In [47]:
#With inclusive set to False boundary values are excluded:
ser_1.between(2,4, inclusive=False)

0    False
1    False
2     True
dtype: bool

In [48]:
print(ts)
ts.between_time('0:00', '11:59')

                     A
2018-04-09 00:00:00  1
2018-04-09 12:00:00  2
2018-04-10 00:00:00  3
2018-04-10 12:00:00  4


Unnamed: 0,A
2018-04-09,1
2018-04-10,3


In [49]:
ser_convert = pd.Series(["a", "b", np.nan])
ser_convert

0      a
1      b
2    NaN
dtype: object

In [50]:
ser_convert.convert_dtypes()

0       a
1       b
2    <NA>
dtype: string

In [51]:
s = pd.Series([[1, 2], [3, 4]]) 
deep = s.copy()
s[0][0] = 10
s

0    [10, 2]
1     [3, 4]
dtype: object

In [52]:
deep

0    [10, 2]
1     [3, 4]
dtype: object

In [53]:
s_cop = pd.Series([1,2], index=['a', 'b'])
s_copy = s_cop.copy()
s_copy

a    1
b    2
dtype: int64

In [54]:
shallow = s_cop.copy(deep=False)
s_cop[0]=3
shallow[1]=4
s_cop

a    3
b    4
dtype: int64

In [55]:
shallow

a    3
b    4
dtype: int64

In [57]:
s_copy

a    1
b    2
dtype: int64

Let's stop here for now and move on to DataFrames, which will expand on the concept of Series!
# Great Job!