# Pandas Series

This is a notebook for the medium article [A Practical Introduction to Pandas Series](https://bindichen.medium.com/a-practical-introduction-to-pandas-series-9915521cdc69)

Please check out article for instructions

**License**: [BSD 2-Clause](https://opensource.org/licenses/BSD-2-Clause)

In [48]:
import pandas as pd
import numpy as np

## 1. Creating a Series

### 1.1 from a Python list

In [2]:
companies = ['Google', 'Microsoft', 'Facebook', 'Apple']

pd.Series(companies)

0       Google
1    Microsoft
2     Facebook
3        Apple
dtype: object

In [3]:
# Custom Index
pd.Series(companies,index=[100,101,102,103])

100       Google
101    Microsoft
102     Facebook
103        Apple
dtype: object

In [4]:
# Custom Index
pd.Series(companies,index=['GOOGL','MSFT','FB','AAPL'])

GOOGL       Google
MSFT     Microsoft
FB        Facebook
AAPL         Apple
dtype: object

### 1.2 from a dict

In [5]:
companies = {
    'a': 'Google',
    'b': 'Microsoft',
    'c': 'Facebook',
    'd': 'Apple'
}
pd.Series(companies)

a       Google
b    Microsoft
c     Facebook
d        Apple
dtype: object

In [6]:
# When index is specified
pd.Series(
    companies, 
    index=['a', 'b', 'd']
)

a       Google
b    Microsoft
d        Apple
dtype: object

### 1.3 from a scalar

In [7]:
pd.Series(10, index=[100, 101, 102, 103])

100    10
101    10
102    10
103    10
dtype: int64

### 1.4 with the read_csv()

By default, the Pandas `read_csv()` function will import data as a DataFrame.

In [8]:
pd.read_csv('data.csv')

Unnamed: 0,date,product,price,cost,profit
0,2019/1/1,A,10,5,1
1,2019/1/2,B,20,12,2
2,2019/1/3,C,30,20,3
3,2019/1/4,D,40,30,4


If we want the data to be imported into a Series instead of a DataFrame, we can provide additional arguments `usecols` and `squeeze`. The `squeeze=True` will convert a DataFrame of one column into a Series.

In [9]:
pd.read_csv('data.csv', usecols=['product'], squeeze=True)

0    A
1    B
2    C
3    D
Name: product, dtype: object

## 2. Retrieving data

### 2.1 with position

In [10]:
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])

s[0]

1

In [11]:
s[:3]

a    1
b    2
c    3
dtype: int64

In [12]:
s[-3:]

c    3
d    4
e    5
dtype: int64

In [16]:
# Retrieving elements within a range
s[2:4]

c    3
d    4
dtype: int64

In [13]:
# Retrieving elements by step
s[::2]

a    1
c    3
e    5
dtype: int64

### 2.2 with index/label

In [17]:
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])

In [18]:
s['a']

1

In [19]:
s[['b','c','d']]

b    2
c    3
d    4
dtype: int64

## 3. Attributes

In [14]:
companies = ['Google', 'Microsoft', 'Facebook', 'Apple']
s = pd.Series(companies)

### 3.1 Values and indexes

In [15]:
# The values attribute returns an array of all the values within the series.
s.values

array(['Google', 'Microsoft', 'Facebook', 'Apple'], dtype=object)

In [16]:
# The index attribute returns a RangeIndex object. 
# We can see it starts at 0 and stops at 5. 
# The last part is called step and that's telling us that it's incrementing by 1
s.index

RangeIndex(start=0, stop=4, step=1)

In [17]:
# The is_unique attribute returns a boolean (True or False). 
# It is a really convenient way to check if every series value is unique or not.
s.is_unique

True

### 3.2 Data type and size

In [19]:
s.dtype

dtype('O')

In [20]:
s.size

4

In [21]:
s.shape

(4,)

In [22]:
s.ndim

1

## 4. Methods

A method as mentioned actually does something to the object. It may be manipulating it, or adding value, or do some calculation with the object's values.

In [23]:
prices = [10, 5, 3, 2.5, 8, 11]

s = pd.Series(prices)

### 4.1 Showing rows

In [24]:
s.head()

0    10.0
1     5.0
2     3.0
3     2.5
4     8.0
dtype: float64

In [25]:
s.head(2)

0    10.0
1     5.0
dtype: float64

In [27]:
s.tail(2)

4     8.0
5    11.0
dtype: float64

### 4.2 Performing aggregations

In [28]:
s.mean()

6.583333333333333

In [29]:
s.sum()

39.5

In [30]:
s.product()

33000.0

In [31]:
s.agg(['mean','sum','product'])

mean           6.583333
sum           39.500000
product    33000.000000
dtype: float64

### 4.3 Counting values

In [32]:
s = pd.Series(['a','b','b','a','a'])

In [33]:
s.unique()

array(['a', 'b'], dtype=object)

In [34]:
s.nunique()

2

In [35]:
s.value_counts()

a    3
b    2
dtype: int64

### 4.4 Sorting by values or index labels

In [38]:
prices = [10, 5, 3, 2.5, 8, 11]

s = pd.Series(prices)

In [39]:
# ascending by default
s.sort_values()

3     2.5
2     3.0
1     5.0
4     8.0
0    10.0
5    11.0
dtype: float64

In [40]:
# To sort it in descenting order
s.sort_values(ascending=False)

5    11.0
0    10.0
4     8.0
1     5.0
2     3.0
3     2.5
dtype: float64

In [42]:
# To modify the original series
s.sort_values(inplace=True)
s

3     2.5
2     3.0
1     5.0
4     8.0
0    10.0
5    11.0
dtype: float64

In [43]:
# ascending by default
s.sort_index()

0    10.0
1     5.0
2     3.0
3     2.5
4     8.0
5    11.0
dtype: float64

In [44]:
# To sort it in descenting order
s.sort_index(ascending=False)

5    11.0
4     8.0
3     2.5
2     3.0
1     5.0
0    10.0
dtype: float64

In [45]:
# To modify the original series
s.sort_index(inplace=True)

### 4.5 Working with missing values

In [49]:
s = pd.Series([1, 2, 3, np.nan, np.nan])

In [50]:
s.isna()

0    False
1    False
2    False
3     True
4     True
dtype: bool

In [51]:
s.isna().sum()

2

In [52]:
s.count()

3

### 4.6 Searching values

In [56]:
prices = [10, 5, 3, 2.5, 8, 11]

s = pd.Series(prices)

In [57]:
s.nlargest()

5    11.0
0    10.0
4     8.0
1     5.0
2     3.0
dtype: float64

In [58]:
s.nlargest(2)

5    11.0
0    10.0
dtype: float64

In [59]:
s.le(5, fill_value=0)

0    False
1     True
2     True
3     True
4    False
5    False
dtype: bool

In [60]:
s <= 5

0    False
1     True
2     True
3     True
4    False
5    False
dtype: bool

## 5. Working with Python built-in functions

In [61]:
prices = [10, 5, 3, 2.5, 8, 11]

s = pd.Series(prices)

In [62]:
len(s)

6

In [63]:
type(s)

pandas.core.series.Series

In [64]:
dir(s)

['T',
 '_AXIS_LEN',
 '_AXIS_NAMES',
 '_AXIS_NUMBERS',
 '_AXIS_ORDERS',
 '_AXIS_REVERSED',
 '_AXIS_TO_AXIS_NUMBER',
 '_HANDLED_TYPES',
 '__abs__',
 '__add__',
 '__and__',
 '__annotations__',
 '__array__',
 '__array_priority__',
 '__array_ufunc__',
 '__array_wrap__',
 '__bool__',
 '__class__',
 '__contains__',
 '__copy__',
 '__deepcopy__',
 '__delattr__',
 '__delitem__',
 '__dict__',
 '__dir__',
 '__div__',
 '__divmod__',
 '__doc__',
 '__eq__',
 '__finalize__',
 '__float__',
 '__floordiv__',
 '__format__',
 '__ge__',
 '__getattr__',
 '__getattribute__',
 '__getitem__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__iand__',
 '__ifloordiv__',
 '__imod__',
 '__imul__',
 '__init__',
 '__init_subclass__',
 '__int__',
 '__invert__',
 '__ior__',
 '__ipow__',
 '__isub__',
 '__iter__',
 '__itruediv__',
 '__ixor__',
 '__le__',
 '__len__',
 '__long__',
 '__lt__',
 '__matmul__',
 '__mod__',
 '__module__',
 '__mul__',
 '__ne__',
 '__neg__',
 '__new__',
 '__nonzero__',
 '__or__',
 '__pos__

In [65]:
list(s)

[10.0, 5.0, 3.0, 2.5, 8.0, 11.0]

In [66]:
dict(s)

{0: 10.0, 1: 5.0, 2: 3.0, 3: 2.5, 4: 8.0, 5: 11.0}

Python `in` keyword returns a boolean value that compares the value you provide to the values in the list. It's going to return `True` if it exists among those values and `False` if it does not.

In [69]:
# by default Pandas is going to look among the index labels not the actual values within the Series. 
2.5 in s

False

In [68]:
2.5 in s.values

True