In [1]:
import numpy as np
import pandas as pd

# Series
Pandas Series is the equivalent of a column of data, and cover their basic properties, creation, manipulation, and useful functions for analysis


In [11]:
array = np.arange(5)
array

array([0, 1, 2, 3, 4])

In [24]:
pd.Series(array)

0    0
1    1
2    2
3    3
4    4
dtype: int32

In [18]:
# pd.Series(np.arange(6).reshape(3,2), name="Test Array") -> Throws an error of data must be 1-dimensional

# Pandas Series Basics



- Series also contain an index and an optional name, in addition to the array of data
- They can be created from other data types, but are usually imported from external sources
- Two or more Series grouped form a Pandas DataFrame

In [3]:
sales = [0, 5, 155, 0, 518, 0, 1827, 616, 317, 325]

sales_series = pd.Series(sales, name="Sales")

sales_series

0       0
1       5
2     155
3       0
4     518
5       0
6    1827
7     616
8     317
9     325
Name: Sales, dtype: int64

In [4]:
# Pandas series function converts Python lists and Numpy arrays into Pandas Series
# The index is an array of integers starting at 0 by default, but it can be modified

***Pandas Series have these key properties:***
- values - the data array in the Series
- index - the index array in the Series
- name - the optional name for the Series (useful for accessing columns in a DataFrame)
- dtype - the data type of the elements in the values array

In [5]:
sales_series.values

array([   0,    5,  155,    0,  518,    0, 1827,  616,  317,  325],
      dtype=int64)

In [6]:
sales_series.index

RangeIndex(start=0, stop=10, step=1)

In [7]:
sales_series.name

'Sales'

In [8]:
sales_series.dtype

dtype('int64')

In [26]:
# Store the series data array in a variable
series = pd.Series(array)

In [27]:
# Accessing array values
series.values

array([0, 1, 2, 3, 4])

In [30]:
# Mean operation to series values
series.values.mean()

2.0

In [29]:
# Mean operation to series without accessing values
series.mean()

2.0

In [31]:
# Changing the name to the series
series.name = "special series"
series

0    0
1    1
2    2
3    3
4    4
Name: special series, dtype: int32

## Pandas Data Types

Pandas data types mostly expand on their base Python and NumPy equivalents

![image.png](attachment:35c36402-523e-4c36-9453-768a89e35151.png)

## Type Conversion

You can convert the data type in a Pandas Series by using the .astype() method and specifying the desired data type (if compatible)

In [33]:
sales_series

0       0
1       5
2     155
3       0
4     518
5       0
6    1827
7     616
8     317
9     325
Name: Sales, dtype: int64

In [34]:
# This converts sales series values to floats
sales_series.astype("float")

0       0.0
1       5.0
2     155.0
3       0.0
4     518.0
5       0.0
6    1827.0
7     616.0
8     317.0
9     325.0
Name: Sales, dtype: float64

In [35]:
# This converts sales series values to booleans (0 is False, others are True)
sales_series.astype("bool")

0    False
1     True
2     True
3    False
4     True
5    False
6     True
7     True
8     True
9     True
Name: Sales, dtype: bool

In [36]:
# We can also try to convert it into datatime but it has to be compatible
# sales_series.astype("datetime64)

# Series Indexing
The index lets you easily access "rows" in a Pandas Series or DataFrame

In [4]:
# Here we use default integer index, which is preferred
sales = [0, 5, 155, 0, 518]
sales_series = pd.Series(sales, name="Sales")
sales_series

0      0
1      5
2    155
3      0
4    518
Name: Sales, dtype: int64

In [5]:
# Indexing series (there is a better method for panda's data structures)
sales_series[2]

155

In [6]:
# Slicing series (there is a better method for panda's data structures)
sales_series[2:4]

2    155
3      0
Name: Sales, dtype: int64

## Custom Indices
There are cases where it's applicable to use a custom index for acessing rows

In [8]:
sales = [0, 5, 155, 0, 518]
items = ["coffee", "bananas", "tea", "coconut", "sugar"]

sales_series = pd.Series(sales, index=items, name="Sales")
sales_series

coffee       0
bananas      5
tea        155
coconut      0
sugar      518
Name: Sales, dtype: int64

In [9]:
# Custom indices can be asigned when creating the series or by assignment
sales_series.index = ["coffee", "bananas", "tea", "coconut", "sugar"]
sales_series

coffee       0
bananas      5
tea        155
coconut      0
sugar      518
Name: Sales, dtype: int64

In [10]:
# Accesing values through custom indices
sales_series["tea"]

155

In [11]:
# Accesing values through custom slicing (they make the stop inclusive)
sales_series["bananas":"coconut"]

bananas      5
tea        155
coconut      0
Name: Sales, dtype: int64

## The Iloc Method
The .iloc[] method is the preferred way to access values by their positional index

- This method works even when Series have a custom, non-integer index
- It is more efficient than slicing and is recomended by Pandas' creators

**Sintax:** df.iloc[row position, column position]

**df** can be either a series or a dataframe to access values from

**row position** is the row position(s) for the value(s) you want to access

**column position** is the column position(s) for the value(s) you want to access

Examples:
- 0 (single row)
- [5, 9] (multiple rows)
- [0:11] (range of rows)

In [12]:
sales_series

coffee       0
bananas      5
tea        155
coconut      0
sugar      518
Name: Sales, dtype: int64

In [13]:
# This returns the value in the 3rd position even though the custom index for
# that value is "tea"
sales_series.iloc[2]

155

In [14]:
# This returns the values from the 3rd to the 4th position (stop is non-inclusive)
sales_series.iloc[2:4]

tea        155
coconut      0
Name: Sales, dtype: int64

## The Loc Method
The .loc[] method is the preferred way to access values by their custom labels

**Sintax:** df.loc[row label, column label]

**df** series or dataframe to access values from
**row label** the custom row index for the value(s) you want to access
**column label** the custom column index for the value(s) you want to access

**Examples:**
- "pizza" (single row)
- ["mike", "ike"] (multiple rows)
- ["jan", "dec"] (range of rows)

In [15]:
sales_series

coffee       0
bananas      5
tea        155
coconut      0
sugar      518
Name: Sales, dtype: int64

In [16]:
sales_series.loc["tea"]

155

In [18]:
# Note that slices are inclusive when using custom labels
sales_series.loc["bananas":"coconut"]

bananas      5
tea        155
coconut      0
Name: Sales, dtype: int64

The **.loc[]** method works even when the indices are integers, but if they are custom integers not ordered from 0 to n-1, the rows will be returned based on the **labels** themselves and NOT their numeric position

## Duplicate Index Values
It is possible to have duplicate index values in a Pandas Series or DataFrame

- Accessing these indices by their label using .loc[] returns all corresponding rows

In [21]:
sales = [0, 5, 155, 0, 518]
items = ["coffee", "coffee", "tea", "coconut", "sugar"]

sales_series = pd.Series(sales, index=items, name="Sales")
sales_series

# Note that 'coffee' is used as an index value twice

coffee       0
coffee       5
tea        155
coconut      0
sugar      518
Name: Sales, dtype: int64

In [22]:
# This returns both rows with the same label

sales_series.loc["coffee"]

coffee    0
coffee    5
Name: Sales, dtype: int64

## Resetting the Index
You can reset the index in a Pandas Series or DataFrame back to the default range of integers by using the .reset_index() method.

- By default, the existing index will become a new column in a DataFrame


In [24]:
sales_series

coffee       0
coffee       5
tea        155
coconut      0
sugar      518
Name: Sales, dtype: int64

In [26]:
# This returns a DataFrame by default, with the previous index values stored as
# a new column to avoid lose data

sales_series.reset_index()

Unnamed: 0,index,Sales
0,coffee,0
1,coffee,5
2,tea,155
3,coconut,0
4,sugar,518


In [27]:
# Use drop = True when resetting the index if you don't want the previous index
# values restored

sales_series.reset_index(drop=True)

0      0
1      5
2    155
3      0
4    518
Name: Sales, dtype: int64

# Sorting & Filtering Series

## Filtering Series

You can filter a Series by passing a logical test into the .loc[] accessor (like arrays)

In [28]:
sales_series

coffee       0
coffee       5
tea        155
coconut      0
sugar      518
Name: Sales, dtype: int64

In [30]:
# This returns all rows from sales_series with a value grater than 0

sales_series.loc[sales_series > 0]

coffee      5
tea       155
sugar     518
Name: Sales, dtype: int64

In [31]:
# This uses a mask to store complex logic and returns all rows from
# sales_series with a greater than 0 and an index equal to "coffee"

mask = (sales_series > 0) & (sales_series.index == "coffee")

sales_series.loc[mask]

coffee    5
Name: Sales, dtype: int64

## Logical Operators & Methods

You can use these operators & methods to create Boolean filters for logical tests

In [32]:
# Python operator
sales_series == 5

coffee     False
coffee      True
tea        False
coconut    False
sugar      False
Name: Sales, dtype: bool

In [33]:
# Pandas method
sales_series.eq(5)

coffee     False
coffee      True
tea        False
coconut    False
sugar      False
Name: Sales, dtype: bool

In [34]:
sales_series.index.isin(["coffee","tea"])

array([ True,  True,  True, False, False])

In [36]:
# The tilde ~ inverts boolean values

~sales_series.index.isin(["coffee","tea"])

array([False, False, False,  True,  True])

### Excercises

In [37]:
my_series = pd.Series(
    [0,1,2,3,4], index=["day 0", "day 1", "day 2", "day 3", "day 4",]
)
my_series

day 0    0
day 1    1
day 2    2
day 3    3
day 4    4
dtype: int64

In [38]:
my_series.loc[~my_series.isin([1,2])]

day 0    0
day 3    3
day 4    4
dtype: int64

In [39]:
my_series.loc[my_series > 2]

day 3    3
day 4    4
dtype: int64

In [40]:
mask = (my_series.isin([1,2])) | (my_series > 2)
my_series.loc[mask]

day 1    1
day 2    2
day 3    3
day 4    4
dtype: int64

## Sorting Series
You can sort series by their values or their index

- The **.sort_values()** method sorts a series by its values in ascending order
- The **.sort_index()** method sorts a series by its index in ascending order

ascending parametrer can be set to (ascending=False)

In [45]:
my_series = pd.Series(
    [4,0,2,3,1], index=["day 0", "day 1", "day 2", "day 3", "day 4",]
)
my_series

day 0    4
day 1    0
day 2    2
day 3    3
day 4    1
dtype: int64

In [49]:
my_series.sort_values(ascending=False)

day 0    4
day 3    3
day 2    2
day 4    1
day 1    0
dtype: int64

In [54]:
my_series.sort_index(ascending=False)

## Arithmetic Operators & Methods

You can use these operatos & methods to perform numeric operations on Series

In [56]:
monday_sales = pd.Series(
    [0,5,155,0,518], index=[0,1,2,3,4]
)
monday_sales

0      0
1      5
2    155
3      0
4    518
dtype: int64

In [59]:
# Both add two to every row but this is kind of better
monday_sales + 2

0      2
1      7
2    157
3      2
4    520
dtype: int64

In [58]:
monday_sales.add(2)

0      2
1      7
2    157
3      2
4    520
dtype: int64

In [62]:
# This uses string arithmetic to add a dollar sign, converts to float to add 
# decimals then converts back to a string
"$" + monday_sales.astype("float").astype("string")

0      $0.0
1      $5.0
2    $155.0
3      $0.0
4    $518.0
dtype: string

In [63]:
# If there is a missing value for example NaN we can do this
my_series = pd.Series(
    [1, np.NaN, 2, 3, 4], index=["day 0", "day 1", "day 2", "day 3", "day 4",]
)
my_series

day 0    1.0
day 1    NaN
day 2    2.0
day 3    3.0
day 4    4.0
dtype: float64

In [64]:
# fill_value argument fills with the value specified any missing value
my_series.add(1, fill_value=0)

day 0    2.0
day 1    1.0
day 2    3.0
day 3    4.0
day 4    5.0
dtype: float64

## String Methods
The Pandas str accessor lets you access many string methods

- These methods all return a Series (split returns multiple series)

In [65]:
prices = pd.Series(
    ["$3.99", "$5.99", "$22.99", "$7.99", "$33.99",], index=[0,1,2,3,4]
)
prices

0     $3.99
1     $5.99
2    $22.99
3     $7.99
4    $33.99
dtype: object

In [66]:
# The str accessor lets you access the string methods
prices.str.contains("3")

0     True
1    False
2    False
3    False
4     True
dtype: bool

In [68]:
# This is removing the dollar sign, then converting to float
clean = prices.str.strip("$").astype("float")
clean

0     3.99
1     5.99
2    22.99
3     7.99
4    33.99
dtype: float64