**Table of contents**<a id='toc0_'></a>    
- [Import Statements](#toc1_1_)    
  - [Importing the data](#toc1_2_)    
- [Indexing Operations](#toc2_)    
  - [*Index Positions and Index Labels*](#toc2_1_)    
    - [Renaming Index Labels](#toc2_1_1_)    
    - [`.reindex(index)` is used for reindexing index labels with new indexes](#toc2_1_2_)    
    - [Filtering Index Labels with `.filter(items, like, regex)`](#toc2_1_3_)    
    - [Resetting Index Labels](#toc2_1_4_)    
  - [*Accessing elements with the `.loc[]` and `.iloc[]` methods*](#toc2_2_)    
    - [The `.loc[]` method ](#toc2_2_1_)    
    - [The `.iloc[]` method ](#toc2_2_2_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=2
	maxLevel=5
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

**Read the official documentation on pandas Series objects @ https://pandas.pydata.org/pandas-docs/stable/reference/series.html**

**`Note:`** We can actually use python built in functions on pandas series objects. i.e., **len, type, dir, in, sum, product, mean, sorted, max, min** etc.

Also, the notion of **chaining functions/methods** in pandas is similar to python.

### <a id='toc1_1_'></a>[Import Statements](#toc0_)

In [1]:
# import statements
import numpy as np
import pandas as pd

### <a id='toc1_2_'></a>[Importing the data](#toc0_)

One of the many datasets we will use for our examples in this notebook is the `/Data/vehicles.csv.zip` dataset.

In [2]:
# read the vehicles.csv dataset
df = pd.read_csv("Data/vehicles.csv.zip")

  df = pd.read_csv("Data/vehicles.csv.zip")


Columns of a dataframe can be accessed in various ways. One of which is to use the **dot i.e, ' . ' notation**.

In [3]:
# the city08 and highway08 columns from the vehicles.csv dataset provides information on
# miles per gallon usage while driving around in the city and highway respectively.
city_mpg = df.city08
highway_mpg = df.highway08

In [4]:
# The make in the vehicles dataset provides the manufacturer name (strings) and is stored as an object.
manufac = df.make

**Note:** The first thing we should do when we load in a dataset is to check the datatypes of each column and cast each of them to more suitable datatypes. This is to save space and speed up our code execution.

----------------------

## <a id='toc2_'></a>[Indexing Operations](#toc0_)

---------------------

We will see later when we discuss about DataFrames that most of what we learn here (indexing of Series objects) applies to the DataFrame objects as well.

To view the index of a Series we can use the `.index` method.

In [5]:
city_mpg.index

RangeIndex(start=0, stop=41144, step=1)

### <a id='toc2_1_'></a>[*Index Positions and Index Labels*](#toc0_)

Many of the operations we will discuss here works on the index position while others work on the index label. If these are both integer values, it can be a little confusing but becomes more clear if the index has string labels. So first, we will relabel the indexes to some string values.

#### <a id='toc2_1_1_'></a>[Renaming Index Labels](#toc0_)

The `.rename(index)` method will return a new series with the original values but new index labels. If you pass in a scalar value it will change the .name attribute of the new series it returns, leaving the index intact.

We can pass in a dictionary to map the previous index label to the new label. It also accepts a series, a scalar or, a function that takes an old label and returns a new label or a sequence. When we pass in a series and the index values are the same, the values from the series that we passed in are used as the index.

In [6]:
# renaming the index labels of the city_mpg series with manufacturers names
# to_dict() will create a dict with the format of, idx as key: series content as value
city_rnm = city_mpg.rename(index=manufac.to_dict())

**Note:** The `.index()` method can be used on a Series/DataFrame object to extract the index labels as a separate series

In [7]:
city_rnm.index

Index(['Alfa Romeo', 'Ferrari', 'Dodge', 'Dodge', 'Subaru', 'Subaru', 'Subaru',
       'Toyota', 'Toyota', 'Toyota',
       ...
       'Saab', 'Saturn', 'Saturn', 'Saturn', 'Saturn', 'Subaru', 'Subaru',
       'Subaru', 'Subaru', 'Subaru'],
      dtype='object', length=41144)

#### <a id='toc2_1_2_'></a>[`.reindex(index)` is used for reindexing index labels with new indexes](#toc0_)

Index is array like which defines the new labels/index to conform to. But note that, reindexing on an axis with duplicate labels will not work. Also, it places NA/NaN in locations (with the new index labels) having no value in the previous index. This behavious can be avoided by defining an optional fill_value.

In [8]:
index = ['Firefox', 'Chrome', 'Safari', 'IE10', 'Konqueror']

In [9]:
browser_http_codes = pd.Series(data=[200, 300, 400, 404, 308], name='http_codes', index=index)

In [10]:
browser_http_codes

Firefox      200
Chrome       300
Safari       400
IE10         404
Konqueror    308
Name: http_codes, dtype: int64

In [11]:
new_index = ['Safari', 'Iceweasel', 'Comodo Dragon', 'IE10', 'Chrome']

In [12]:
browser_http_codes.reindex(index=new_index, fill_value="missing")

Safari               400
Iceweasel        missing
Comodo Dragon    missing
IE10                 404
Chrome               300
Name: http_codes, dtype: object

**Note:** This `.reindex()` method can also be used with DataFrames and can be used to reindex column lables as well.

#### <a id='toc2_1_3_'></a>[Filtering Index Labels with `.filter(items, like, regex)`](#toc0_)

- **items** (passed as a list) is used for exact matches. Note that exact match (with items) fails with duplicate index labels but if the index doesn't exist it will not throw an error.
- **like** is used for substring matches.
- **regex** allows to specify a regular expression to match against index values.

In [13]:
# items
try:
    city_rnm.filter(items=["Panos"])
except ValueError as err:
    print(err)

cannot reindex on an axis with duplicate labels


In [14]:
# like
city_rnm.filter(like="B")

BMW              14
BMW              14
BMW              11
Buick            21
Buick            17
                 ..
BMW              15
BMW              14
Buick            19
Buick            18
Mercedes-Benz    16
Name: city08, Length: 4344, dtype: int64

In [15]:
# regex for filtering labels that starts with A/B/C
city_rnm.filter(regex="^[A-C].")

Alfa Romeo    19
Audi          17
Audi          17
BMW           14
BMW           14
              ..
Chevrolet     12
Chevrolet     11
Chevrolet     15
Chevrolet     16
Chevrolet     10
Name: city08, Length: 9742, dtype: int64

#### <a id='toc2_1_4_'></a>[Resetting Index Labels](#toc0_)

Sometimes we need a unique index to perform an operation. If you want to set the index to monotonic increasing, and therefore unique integers starting at zero, you can use the `.reset_index()` method. By default, this method will return a dataframe, moving the current index into a new column. To drop the current index and return a Series, we can set **drop=True**.

In [16]:
city_rnm.reset_index().head()

Unnamed: 0,index,city08
0,Alfa Romeo,19
1,Ferrari,9
2,Dodge,23
3,Dodge,10
4,Subaru,17


### <a id='toc2_2_'></a>[*Accessing elements with the `.loc[]` and `.iloc[]` methods*](#toc0_) [&#8593;](#toc0_)

#### <a id='toc2_2_1_'></a>[The `.loc[]` method](#toc0_)  [&#8593;](#toc0_)

The **.loc** attribute is **primarily label based**, but may also be used with a boolean array.

Allowable inputs may be:
- **Scalar:** if only a scalar index label is passed it will return a series in case of duplicate labels and a scalar in case of a unique label. For it to return a series in all cases we have to pass in the scalar as a list.
- **Array like:** a list or array of labels. Will return a series object.
- **Slice object:** one thing to note, to slice a series with duplicate index labels we will first need to sort the indexes with **.sort_index()**. Slicing with .loc includes both the start and end string.
- **A boolean array:** of the same length as the series.
- An alignable pandas **Index object**.
- **A callable function:** that returns one of the above.

In [17]:
# scalar as input to .loc
city_rnm.loc["Ferrari"].sample(3)

Ferrari    11
Ferrari    11
Ferrari     9
Name: city08, dtype: int64

In [18]:
# array/list as input to .loc
city_rnm.loc[["Ferrari", "Honda", "Toyota"]].sample(4)

Honda     19
Toyota    26
Toyota    20
Honda     24
Name: city08, dtype: int64

In [19]:
# slice object as input to .loc
city_rnm.sort_index().loc["Federal":"Ferrari"]

Federal Coach    15
Federal Coach    13
Federal Coach    13
Federal Coach    14
Federal Coach    13
                 ..
Ferrari          13
Ferrari           8
Ferrari           9
Ferrari          13
Ferrari          10
Name: city08, Length: 243, dtype: int64

In [20]:
# slicing with partial strings
city_rnm.sort_index().loc["F":"G"]

Federal Coach    15
Federal Coach    13
Federal Coach    13
Federal Coach    14
Federal Coach    13
                 ..
Ford             13
Ford             17
Ford             12
Ford             17
Ford             15
Name: city08, Length: 3686, dtype: int64

In [21]:
# Boolean array as input to .loc
boolean_mask = city_rnm > 120
city_rnm.loc[boolean_mask].sample(3)

Scion    138
smart    122
Fiat     121
Name: city08, dtype: int64

In [22]:
# Function as input to .loc

# say, we estimate that in the coming year due to regulations, all the vehicles will loose
# 10% of the current mileage and we want to calculate that mileage from our current data
# and see which cars will still have mpg > 120
city_rnm.mul(0.9).loc[lambda x: x > 120].sample(3)

BMW        123.3
Tesla      126.0
Hyundai    135.0
Name: city08, dtype: float64

**`Note:`** The lambda function is applied to the elements of the Series one by one, not to the entire Series at once. This behavior is not fully vectorized in the sense that the lambda function is not applied to all elements simultaneously, like in a true vectorized operation would. Instead, it involves element-wise processing similar to a loop as if you were looping through the elements and applying the lambda function to each element one by one.

#### <a id='toc2_2_2_'></a>[The `.iloc[]` method](#toc0_)  [&#8593;](#toc0_)

The **.iloc** attribute operates on **indexes and not index labels**. It can also be used with a boolean array.

Allowable inputs may be:
- **Scalar:** The value, a scalar, at that index.
- **Array like:** a list or array of indexes. Will return a series object.
- **Slice object:** end of the slice is exclusive i.e, works similarly as list slicing would.
- **A numpy array of booleans (or, a python list):** of the same length as the series. Note that, it must be numpy array or a list and not pandas series objects (aka, boolean array).
- **A callable function:** that returns one of the above.
- **A tuple:** applicable for DataFrame objects. A tuple of row and column indexes.

In [23]:
mask = city_rnm > 120

In [24]:
mask

Alfa Romeo    False
Ferrari       False
Dodge         False
Dodge         False
Subaru        False
              ...  
Subaru        False
Subaru        False
Subaru        False
Subaru        False
Subaru        False
Name: city08, Length: 41144, dtype: bool

**Note:** a pandas series object of boolean values needs to be first converted to a numpy array i.e, a boolean array or a python list before it can be used with iloc.

In [25]:
city_rnm.iloc[mask.to_numpy()].sample()

Mitsubishi    126
Name: city08, dtype: int64