# Series

Series is the primary building block of pandas. It is similar to NumPy but whereas in NumPy, you can only access an element by the integer value of its position in the array, pandas provides a means to access an element with your custom defined index as well. For more on the difference between NumPy and pandas series, see [this medium blog post](https://medium.com/@ericvanrees/pandas-series-objects-and-numpy-arrays-15dfe05919d7) by Eric van Rees.

Let's define the problem we are going to solve here using pandas series.

## Problem 

You are given the heights (in inch) and weights (in lbs) of 5 people as shown in the table below:

| Person | Height(inch) | Weight(lbs) |
|--------|--------------|-------------|
|    A   |     72       |      186    |
|    B   |     69       |      205    |
|    C   |     70       |      201    |
|    D   |     62       |      125    |
|    E   |     57       |      89     |

It is later found that the table actually misses entries for persons F and G whose heights are 65 inch and 60 inch respectively and the weight for F is 121 lbs, but data for G's weight is missing. Also, values for all the heights is found to be 5 inch less than it should have been, whereas values for all the weights is found to be 5 lbs more than it should have been. Find the correct Body Mass Index (BMI) for each person, if possible.

## Solution

First, let's represent the heights and weights for the persons using 2 dictionaries.

In [1]:
height_dictionary = {'A': 72, 'B': 69, 'C': 70, 'D': 62, 'E': 57}
weight_dictionary = {'A': 186, 'B': 205, 'C': 201, 'D': 125, 'E': 89}

Now, we can form pandas series using the two dictionaries above. Before that we need to import the pandas library.

In [2]:
import pandas as pd

In [3]:
height_series = pd.Series(height_dictionary)
height_series

A    72
B    69
C    70
D    62
E    57
dtype: int64

As we can see, the keys are the index for the corresponding values. The values can be accessed using the labels.

In [4]:
height_series['B']

69

But it can also be accessed using integer index as in NumPy.

In [5]:
height_series[1]

69

We do the same thing for the weight_dictionary. 

In [6]:
weight_series = pd.Series(weight_dictionary)
weight_series

A    186
B    205
C    201
D    125
E     89
dtype: int64

The table missed the values for F and G. Let's create new series for the missing values. This time we will use python list instead of dictionary.

In [7]:
missing_height = pd.Series([65,60],['F','G'])
missing_height

F    65
G    60
dtype: int64

We do the same for the missing weight. This time we will change the order for the index and the data. For that we would need to explicitly define index and data in the arguments.

In [8]:
missing_weight = pd.Series(index=['F'],data=[121])
missing_weight

F    121
dtype: int64

Now, that we have the series for missing values, let's append these to the original series.

In [9]:
updated_height_series = height_series.append(missing_height)
updated_height_series

A    72
B    69
C    70
D    62
E    57
F    65
G    60
dtype: int64

In [10]:
updated_weight_series = weight_series.append(missing_weight)
updated_weight_series

A    186
B    205
C    201
D    125
E     89
F    121
dtype: int64

The data for height and weight of each person is found to be 5 less and 5 more respectively than it should have been. To correct it, first let's create an array with 7 entries for the 7 persons where each entry is 5. We will do that with NumPy, so let's import NumPy and create the correction_series.

In [11]:
import numpy as np
correction_series = pd.Series(5*np.ones(7),index=['A','B','C','D','E','F','G'])
correction_series

A    5.0
B    5.0
C    5.0
D    5.0
E    5.0
F    5.0
G    5.0
dtype: float64

Now let's add 5 to the heights for each person.

In [12]:
corrected_height_series = updated_height_series + correction_series
corrected_height_series

A    77.0
B    74.0
C    75.0
D    67.0
E    62.0
F    70.0
G    65.0
dtype: float64

Upon adding, the data type of the series has changed from int64 to float64.

And let's subtract 5 from the weights for each person.

In [13]:
corrected_weight_series = updated_weight_series - correction_series
corrected_weight_series

A    181.0
B    200.0
C    196.0
D    120.0
E     84.0
F    116.0
G      NaN
dtype: float64

As we can see from above, the weight for G was missing, so when we tried to subtract 5 from a non-exising value, the resulting series now has a Not a Number value for G's weight.

Since the weight of G is not available, we cannot calculate G's BMI. So, let's remove G's entry from both the corrected_height_series and corrected_weight_series.

G's height is available, so to remove it we specify the index.

In [14]:
corrected_height_series.drop(['G'])

A    77.0
B    74.0
C    75.0
D    67.0
E    62.0
F    70.0
dtype: float64

We see that simply by using drop with the array of indices, the entry has been removed. But has it really? Let's check again.

In [15]:
corrected_height_series

A    77.0
B    74.0
C    75.0
D    67.0
E    62.0
F    70.0
G    65.0
dtype: float64

Using drop just created a new series with the dropped entry removed, but didn't change the original series. In order to reflect this in the original series, we must specify inplace in the arguments and set it to True. Let's do that.

In [16]:
corrected_height_series.drop(['G'],inplace=True)

Now, when we check the original series, corrected_height_series, the entry for G will have been removed.

In [17]:
corrected_height_series

A    77.0
B    74.0
C    75.0
D    67.0
E    62.0
F    70.0
dtype: float64

For the case of weights, G's weight is unavailable. In order to drop the entry for G, we simply use the dropna method. Again, we need to specify inplace=True, otherwise the drop will not be reflected in the original series.

In [18]:
corrected_weight_series.dropna(inplace=True)

In [19]:
corrected_weight_series

A    181.0
B    200.0
C    196.0
D    120.0
E     84.0
F    116.0
dtype: float64

Now that we have all the values correctly in place, it's time to find the Body Mass Index. The formula to calculate BMI is as follows:

\begin{equation*}
BMI = \frac{Weight(kg)}{[Height(meter)]^2}
\end{equation*}

To apply the above formula, the weights must first be converted from lbs to kg and heights from inch to meter. We know,  

\begin{equation*}
1 lb = 0.453592 kg, \\\
1 inch = 0.0254 m
\end{equation*}

In [20]:
lbs_to_kg_ratio = 0.453592
inch_to_meter_ratio = 0.0254

In [21]:
weights_in_kg = np.multiply(corrected_weight_series,lbs_to_kg_ratio)
weights_in_kg

A    82.100152
B    90.718400
C    88.904032
D    54.431040
E    38.101728
F    52.616672
dtype: float64

In [22]:
heights_in_m = np.multiply(corrected_height_series,inch_to_meter_ratio)
heights_in_m

A    1.9558
B    1.8796
C    1.9050
D    1.7018
E    1.5748
F    1.7780
dtype: float64

Finally, we can compute the BMI using the above formula.

In [23]:
BMI = np.divide(weights_in_kg,np.square(heights_in_m))
BMI

A    21.463230
B    25.678196
C    24.498049
D    18.794449
E    15.363631
F    16.644083
dtype: float64