# <div class="alert alert-success" >(2) Tidying Up Your Data
    
Data analysis typically flows in a processing pipeline that starts with retrieving data from one or more sources. Upon receipt of this data, it is often the case that it can be in a raw form and can be difficult to use for data analysis. This can be for a multitude of reasons such as data is not recorded, it is lost, or it is just in a different format than what you require. 
    
Therefore, one of the most common things you will do with pandas involves tidying your data, which is the process of preparing raw data for analysis. 

    
## <div class= "alert alert-info"> What is tidying your data?
    
<div class= "alert alert-success">
    
Tidy data is a term that was created in what many refer to as a famous data science paper, "Tidy Data" by Hadley Wickham, which I highly recommend that you read and it can be downloaded at http://vita.had.co.nz/papers/tidy-data.pdf. The paper covers many details of the process that he calls tidying data, with the result of the process being that you now have tidy data; data that is ready for analysis.
    
We'll introduce and briefly demonstrate many of the capabilities of pandas. A  brief summary of the reasons why you need
to tidy data and what are the characteristics of tidy data, so that you know you have completed the task and are ready to move on to analysis. Tidying of data is required for many reasons including these:
    
- The names of the variables are different from what you require
    
- There is missing data
    
- Values are not in the units that you require
    
- The period of sampling of records is not what you need
    
- Variables are categorical and you need quantitative values
    
- There is noise in the data
    
- Information is of an incorrect type
    
- Data is organized around incorrect axes
    
- Data is at the wrong level of normalization
    
- Data is duplicated
    
There are several characteristics of data that can be considered good, tidy, and ready for analysis, which are as follows:
    
- Each variable is in one column
    
- Each observation of the variable is in a different row
    
- There should be one table for each kind of variable
    
- If multiple tables, they should be relatable
    
- Qualitative and categorical variables have mappings to values useful for analysis
    


<div class= "alert alert-success">
    
### <div class= "alert alert-warning">Setting up the IPython notebook
    
To utilize the examples in this chapter, we will need to include the following imports and settings:    

In [1]:
# import pandas, numpy and datetime

import numpy as np
import pandas as pd
import datetime

# Set some pandas options for controlling output
pd.set_option('display.notebook_repr_html', False)
pd.set_option('display.max_columns', 10)
pd.set_option('display.max_rows', 10)

## <div class= "alert alert-info"> Working with missing data
    
<div class= "alert alert-success">
    
Data is "missing" in pandas when it has a value of NaN (also seen as np.nan—the form from NumPy). The NaN value represents that in a particular Series that there is not a value specified for the particular index label. In pandas, there are a number of reasons why a value can be NaN:
    
- A join of two sets of data does not have matched values
    
- Data that you retrieved from an external source is incomplete
    
- The NaN value is not known at a given point in time and will be filled in later
    
- There is a data collection error retrieving a value, but the event must still be recorded in the index 
    
- Reindexing of data has resulted in an index that does not have a value
    
- The shape of data has changed and there are now additional rows or columns, which at the time of reshaping could not be determined
    
To demonstrate handling missing data, we will use the following DataFrame object, which exhibits various patterns of missing data:    

In [2]:
# create a DataFrame with 5 rows and 3 columns
df = pd.DataFrame(np.arange(0, 15).reshape(5, 3),index=['a', 'b', 'c', 'd', 'e'],columns=['c1', 'c2', 'c3'])
df

   c1  c2  c3
a   0   1   2
b   3   4   5
c   6   7   8
d   9  10  11
e  12  13  14

There is no missing data at this point, so let's add some:

In [3]:
df['c4'] = np.nan                   # adding column c4 with NaN values
df.loc['f'] = np.arange(15, 19)     # adding row 'f' with 15 through 18
df.loc['g'] = np.nan                # adding row 'g' will all NaN
df['c5'] = np.nan                   # adding column 'C5' with NaN's
df['c4']['a'] = 20                  # adding c4 and change value in col 'c4' row 'a'
df

     c1    c2    c3    c4  c5
a   0.0   1.0   2.0  20.0 NaN
b   3.0   4.0   5.0   NaN NaN
c   6.0   7.0   8.0   NaN NaN
d   9.0  10.0  11.0   NaN NaN
e  12.0  13.0  14.0   NaN NaN
f  15.0  16.0  17.0  18.0 NaN
g   NaN   NaN   NaN   NaN NaN

This DataFrame object exhibits the following characteristics that will support most of the examples that follow in this section:

- One row consisting only of NaN values

- One column is consisiting only of NaN values

- Several rows and columns consisting of both numeric values and NaN values

<div class= "alert alert-success">
    
### <div class= "alert alert-warning"> Determining NaN values in Series and DataFrame objects

The NaN values in a DataFrame object can be identified using the <span style="color:red">.isnull()</span> method.
Any True value means that the item is a NaN value:

In [4]:
df.isnull()         # which items are NaN?

      c1     c2     c3     c4    c5
a  False  False  False  False  True
b  False  False  False   True  True
c  False  False  False   True  True
d  False  False  False   True  True
e  False  False  False   True  True
f  False  False  False  False  True
g   True   True   True   True  True

In [5]:
df.isnull().sum()               # count the number of NaN values in each column

c1    1
c2    1
c3    1
c4    5
c5    7
dtype: int64

Applying .sum() to the resulting series gives the total number of NaN values in the original DataFrame object.

In [6]:
df.isnull().sum().sum()     # total count of NaN values

15

Another way to determine this is to use the <span style="color:red">.count()</span> method of a Series object and DataFrame. <mark>For a Series method, this method will return the number of non-NaN values. For a DataFrame object, it will count the number of non-NaN values in each column:

In [7]:
df.count()      # number of non-NaN values in each column

c1    6
c2    6
c3    6
c4    2
c5    0
dtype: int64

This then needs to be flipped around to sum the number of NaN values, which can be calculated as follows:

In [8]:
(len(df) - df.count()).sum()         # and this counts the number of NaN values too

15

We can also determine whether an item is not NaN using the <span style="color:red">.notnull()</span> method, which returns True if the value is not a NaN value, otherwise it returns False:

In [14]:
df.notnull()             # which items are not null?

      c1     c2     c3     c4     c5
a   True   True   True   True  False
b   True   True   True  False  False
c   True   True   True  False  False
d   True   True   True  False  False
e   True   True   True  False  False
f   True   True   True   True  False
g  False  False  False  False  False

<div class= "alert alert-success">
    
### <div class= "alert alert-warning">Selecting out or dropping missing data
    
One technique of handling missing data, is to simply remove it from your dataset. A scenario for this would be where data is sampled at regular intervals, but devices are offline and do not receive a reading, but you only need the actual periodic values.
    
 The pandas library makes this possible using several techniques; one is through Boolean selection using the results of .isnull() and .notnull() to retrieve the values that are NaN or not NaN out of a Series object. To demonstrate, the following example selects all non-NaN values from the c4 column of DataFrame:  

In [15]:
df.c4[df.c4.notnull()]    # select the non-NaN items in column c4

a    20.0
f    18.0
Name: c4, dtype: float64

pandas also provides a convenience function <span style="color:red">.dropna()</span>, which will drop the items in a Series where the value is NaN, involving less typing than the previous example.

In [16]:
df.c4.dropna()     # .dropna will also return non NaN values. This gets all non NaN items in column c4

a    20.0
f    18.0
Name: c4, dtype: float64

Note that .dropna() has actually returned a copy of DataFrame without the rows. The original DataFrame is not changed:

In [17]:
df.c4   # dropna returns a copy with the values dropped, the source DataFrame / column is not changed

a    20.0
b     NaN
c     NaN
d     NaN
e     NaN
f    18.0
g     NaN
Name: c4, dtype: float64

When applied to a DataFrame object, .dropna() will drop all rows from a DataFrame object that have at least one NaN value. The following code demonstrates this in action, and since each row has at least one NaN value, there are no rows in the result:

In [19]:
df.dropna()  # on a DataFrame this will drop entire rows where there is at least one NaN in this case, that is all rows

Empty DataFrame
Columns: [c1, c2, c3, c4, c5]
Index: []

If you want to only drop rows where all values are NaN, you can use the<span style="color:green"> <b>how = ' all '</b></span> parameter. The following code only drops the g row since it has all NaN values:

In [20]:
df.dropna(how = 'all')      # using how='all', only rows that have all values as NaN will be dropped

     c1    c2    c3    c4  c5
a   0.0   1.0   2.0  20.0 NaN
b   3.0   4.0   5.0   NaN NaN
c   6.0   7.0   8.0   NaN NaN
d   9.0  10.0  11.0   NaN NaN
e  12.0  13.0  14.0   NaN NaN
f  15.0  16.0  17.0  18.0 NaN

This can also be applied to the columns instead of the rows, by changing the axis parameter to axis=1. The following code drops the c5 column as it is the only one with all NaN values:

In [21]:
df.dropna(how='all', axis=1) # say goodbye to c5  

     c1    c2    c3    c4
a   0.0   1.0   2.0  20.0
b   3.0   4.0   5.0   NaN
c   6.0   7.0   8.0   NaN
d   9.0  10.0  11.0   NaN
e  12.0  13.0  14.0   NaN
f  15.0  16.0  17.0  18.0
g   NaN   NaN   NaN   NaN

We can also examine this process using a slightly different DataFrame object that has columns c1 and c3 with all values that are not NaN. In this case, all columns except c1 and c3 will be dropped:

In [23]:
df2 = df.copy()
df2.dropna(how='any', axis=1)   # now drop columns with any NaN values

Empty DataFrame
Columns: []
Index: [a, b, c, d, e, f, g]

The .dropna() methods also has a parameter, <span style="color:green"> <b>thresh</b></span>, which when given an integer
value specifies the minimum number of NaN values that must exist before the drop is performed. The following code drops all columns with at least five NaN values; these are the c4 and c5 columns:

In [24]:
df.dropna(thresh=5, axis=1)        # only drop columns with at least 5 NaN values

     c1    c2    c3
a   0.0   1.0   2.0
b   3.0   4.0   5.0
c   6.0   7.0   8.0
d   9.0  10.0  11.0
e  12.0  13.0  14.0
f  15.0  16.0  17.0
g   NaN   NaN   NaN

<mark><b>Note that the .dropna() method (and the Boolean selection) returns a copy of the DataFrame object, and the data is dropped from that copy. If you want to drop the data in the actual DataFrame, use the inplace=True parameter.

<div class= "alert alert-success">
    
### <div class= "alert alert-warning">How pandas handles NaN values in mathematical operations
    
The NaN values are handled differently in pandas than in NumPy. This is demonstrated using the following example

In [27]:
a = np.array([1, 2, np.nan, 3])      # create a NumPy array with one NaN value
s = pd.Series(a)                     # create a Series from the array
a.mean(), s.mean()                   # the mean of each is different

(nan, 2.0)

<mark>Note that the mean of the preceding series was calculated as (1+2+3)/3 = 2, not (1+2+3)/4, or (1+2+0+4)/4. This verifies that NaN is totally ignored and not even counted as an item in the Series.

 NumPy functions, when encountering a NaN value, will return NaN. pandas functions  will typically ignore the NaN values and continue processing the function as though the values were not part of the Series object.More specifically, the way that pandas handles NaN values is as follows:
 
- Summing of data treats NaN as 0

- If all values are NaN, the result is NaN

- Methods like .cumsum() and .cumprod() ignore NaN values, but preserve them in the resulting arrays

The following code demonstrates all of these concepts:

In [30]:
s = df.c4         # demonstrate sum, mean and cumsum handling of NaN, we get one column 
s

a    20.0
b     NaN
c     NaN
d     NaN
e     NaN
f    18.0
g     NaN
Name: c4, dtype: float64

In [31]:
s.sum()           # NaN values treated as 0

38.0

In [32]:
s.mean()         # NaN also treated as 0

19.0

In [33]:

s.cumsum()      # NaN as 0 in the cumsum, but NaN values preserved in result Series

a    20.0
b     NaN
c     NaN
d     NaN
e     NaN
f    38.0
g     NaN
Name: c4, dtype: float64

When using traditional mathematical operators, NaN is propagated through to the result.

In [34]:
df.c4 + 1   # in arithmetic, a NaN value will result in NaN

a    21.0
b     NaN
c     NaN
d     NaN
e     NaN
f    19.0
g     NaN
Name: c4, dtype: float64

<div class= "alert alert-success">
    
### <div class= "alert alert-warning">Filling in missing data
    
If you prefer to replace the NaN values with a specific value, instead of having them propagated or flat out ignored, you can use the <span style="color:red">.fillna() </span>method. The following code fills the NaN values with 0:

In [35]:
filled = df.fillna(0)       # return a new DataFrame with NaN values filled with 0
filled 

     c1    c2    c3    c4   c5
a   0.0   1.0   2.0  20.0  0.0
b   3.0   4.0   5.0   0.0  0.0
c   6.0   7.0   8.0   0.0  0.0
d   9.0  10.0  11.0   0.0  0.0
e  12.0  13.0  14.0   0.0  0.0
f  15.0  16.0  17.0  18.0  0.0
g   0.0   0.0   0.0   0.0  0.0

Be aware that this causes differences in the resulting values. As an example, the following code shows the result of applying the .mean() method to the DataFrame object with the NaN values, as compared to the DataFrame that has its NaN values
filled with 0:

In [36]:
df.mean()           # NaNs don't count as an item in calculating the means

c1     7.5
c2     8.5
c3     9.5
c4    19.0
c5     NaN
dtype: float64

In [37]:
filled.mean()    # having replaced NaN with 0 can make operations such as mean have different results

c1    6.428571
c2    7.285714
c3    8.142857
c4    5.428571
c5    0.000000
dtype: float64

It is also possible to limit the number of times that the data will be filled using the limit parameter. Each time the NaN values are identified, pandas will fill the NaN values with the previous value up to the limit times in each group of NaN values.

In [38]:
df.fillna(0, limit=2)       # only fills the first two NaN values in each column with 0

     c1    c2    c3    c4   c5
a   0.0   1.0   2.0  20.0  0.0
b   3.0   4.0   5.0   0.0  0.0
c   6.0   7.0   8.0   0.0  NaN
d   9.0  10.0  11.0   NaN  NaN
e  12.0  13.0  14.0   NaN  NaN
f  15.0  16.0  17.0  18.0  NaN
g   0.0   0.0   0.0   NaN  NaN

<div class= "alert alert-danger">
    
### <div class= "alert alert-warning">Forward and backward filling of missing values
    
Gaps in data can be filled by propagating non-NaN values forward or backward along a Series. To demonstrate this, the following example will "fill forward" the c4 column of DataFrame:

In [40]:
df.c4.fillna(method="ffill")   # extract the c4 column and fill NaNs forward (i.e fill NaN with last non NaN value encountered)

a    20.0
b    20.0
c    20.0
d    20.0
e    20.0
f    18.0
g    18.0
Name: c4, dtype: float64

#### When working with time series data, this technique of filling is often referred to as the "last known value".

The direction of the fill can be reversed using method='bfill':

In [42]:
df.c4.fillna(method="bfill")        # perform a backwards fill(i.e fill NaN with next not NaN value encountered)

a    20.0
b    18.0
c    18.0
d    18.0
e    18.0
f    18.0
g     NaN
Name: c4, dtype: float64

To save a little typing, pandas also has global level functions pd.ffill() and pd.bfill(), which are equivalent to  fillna(method="ffill") and .fillna(method="bfill").

<div class= "alert alert-danger">
    
### <div class= "alert alert-warning">Filling using index labels
    
Data can be filled using the labels of a Series or keys of a Python dictionary. This
allows you to specify different fill values for different elements based upon the value
of the index label:

In [43]:
fill_values = pd.Series([100, 101, 102], index=['a', 'e', 'g']) # new series to fill NaN values where the index label matches
fill_values    

a    100
e    101
g    102
dtype: int64

In [44]:
df.c4.fillna(fill_values)   # using c4, fill using fill_values a, e and g will be filled with matching values

a     20.0
b      NaN
c      NaN
d      NaN
e    101.0
f     18.0
g    102.0
Name: c4, dtype: float64

Only values of NaN will be filled. Notice that the values with label a are not changed. Another common scenario, is to fill all the NaN values in a column with the mean of the column:

In [45]:
df.fillna(df.mean())           # fill NaN values in each column with the mean of the values in that column

     c1    c2    c3    c4  c5
a   0.0   1.0   2.0  20.0 NaN
b   3.0   4.0   5.0  19.0 NaN
c   6.0   7.0   8.0  19.0 NaN
d   9.0  10.0  11.0  19.0 NaN
e  12.0  13.0  14.0  19.0 NaN
f  15.0  16.0  17.0  18.0 NaN
g   7.5   8.5   9.5  19.0 NaN

<div class= "alert alert-danger">
    
### <div class= "alert alert-warning">Interpolation of missing values
    
Both DataFrame and Series have an <span style="color:red">.interpolate()</span> method that will, by default, perform a linear interpolation of missing values:

In [46]:
s = pd.Series([1, np.nan, np.nan, np.nan, 2])
s.interpolate()    # linear interpolate the NaN values from 1 through 2

0    1.00
1    1.25
2    1.50
3    1.75
4    2.00
dtype: float64

The value of the interpolation is calculated by taking the first value before and after any sequence of NaN values and then incrementally adding that value from the start and substituting NaN values. <mark><b>In this case, 2.0 and 1.0 are the surrounding values, resulting in (2.0 – 1.0)/(5-1) = 0.25, which is then added incrementally through all the NaN values.</b></mark>

The interpolation method also has the ability to specify a specific method of interpolation. One of the common methods is to use time-based interpolation. Consider the following Series of dates and values:

In [49]:
# create a time series, but missing one date in the Series
ts = pd.Series([1, np.nan, 2],
               index=[datetime.datetime(2014, 1, 1),datetime.datetime(2014, 2, 1),datetime.datetime(2014, 4, 1)])
ts

2014-01-01    1.0
2014-02-01    NaN
2014-04-01    2.0
dtype: float64

In [50]:
ts.interpolate()    # linear interpolate based on the number of items in the Series

2014-01-01    1.0
2014-02-01    1.5
2014-04-01    2.0
dtype: float64

 The value for 2014-02-01 is calculated as 1.0 + (2.0-1.0)/2 = 1.5, since there is one NaN value between the values 2.0 and 1.0.
 
The important thing to note is that the series is missing an entry for 2014-03-01. If we were expecting to interpolate daily values, there would be two values calculated, one for 2014-02-01 and another for 2014-03-01, resulting in one more value in the
numerator of the interpolation. This can be corrected by specifying the method of interpolation as "time":

In [51]:
ts.interpolate(method="time")  # this accounts for the fact that we don't have an entry for 2014-03-01

2014-01-01    1.000000
2014-02-01    1.344444
2014-04-01    2.000000
dtype: float64

This is the correct interpolation for 2014-02-01 based upon dates. Also note that the index label and value for 2014-03-01 is not added to the Series, it is just factored into the interpolation. Interpolation can also be specified to calculate values relative to the index values when using numeric index labels. To demonstrate this, we will use the following Series:

In [52]:
s = pd.Series([0, np.nan, 100], index=[0, 1, 10])   # a Series to demonstrate index label based interpolation
s

0       0.0
1       NaN
10    100.0
dtype: float64

If we perform a linear interpolation, we get the following value for label 1, which is correct for a linear interpolation:

In [53]:
s.interpolate() # linear interpolate

0       0.0
1      50.0
10    100.0
dtype: float64

However, what if we want to interpolate the value to be relative to the index value? To do this, we can use method="values":

In [54]:
s.interpolate(method="values")  # interpolate based upon the values in the index

0       0.0
1      10.0
10    100.0
dtype: float64

Now, the value calculated for NaN is interpolated using relative positioning based upon the labels in the index. The NaN value has a label of 1, which is one tenth of the way between 0 and 10, so the interpolated value will be 0 + (100-0)/10, or 10.

## <div class= "alert alert-info"> Handling duplicate data
<div class= "alert alert-success">
The data in your sample can often contain duplicate rows. This is just a reality of dealing with data collected automatically, or even a situation created in manually collecting data. Often, it is considered best to err on the side of having duplicates
instead of missing data, especially if the data can be considered to be <b>idempotent ( relating to or being a mathematical quantity which when applied to itself under a given binary operation, say multiplication, equals itself. also : relating to or being an operation under which a mathematical quantity is idempotent)</b>. However, duplicate data can increase the size of the dataset, and if it is not idempotent, then it would not be appropriate to process the duplicates. 
    
To facilitate finding duplicate data, pandas provides a <span style="color:red">.duplicates()</span> method that returns a Boolean Series where each entry represents whether or not the row is a duplicate. A True value represents that the specific row has appeared earlier in the DataFrame object with all column values being identical. To demonstrate this, the following code creates a DataFrame object with duplicate rows:

In [67]:
data = pd.DataFrame({'a': ['x'] * 3 + ['y'] * 4,'b': [1, 1, 2, 3, 3, 4, 4]}) # a DataFrame with lots of duplicate data

data

   a  b
0  x  1
1  x  1
2  x  2
3  y  3
4  y  3
5  y  4
6  y  4

A DataFrame object with duplicate rows which were created by the preceding code can be analyzed using .duplicated() method. This method determines that a row is a duplicate if the values in all columns were seen already in a row earlier in the
DataFrame object:

In [68]:
data.duplicated()    # reports which rows are duplicates based upon if the data in all columns was seen before

0    False
1     True
2    False
3    False
4     True
5    False
6     True
dtype: bool

Duplicate rows can be dropped from a DataFrame using the <span style="color:red"> .drop_duplicates()</span>
method. This method will return a copy of the DataFrame object with the duplicate rows removed. It is also possible to use the inplace=True parameter to remove the rows without
making a copy:

In [69]:
data.drop_duplicates()      # drop duplicate rows retaining first row of the duplicates

   a  b
0  x  1
2  x  2
3  y  3
5  y  4

Note that there is a ramification to which indexes remain when dropping duplicates. The duplicate records may have different index labels (labels are not taken into account in calculating a duplicate). So, which row is kept can affect the set of labels
in the resulting DataFrame object. 

The default operation is to keep the first row of the duplicates. If you want to keep the last row of duplicates, you can use the <span style="color:green"><b>keep =' first ' </b></span> parameter. The following code demonstrates how the result differs using this parameter:

In [70]:
data.drop_duplicates(keep='first')    # drop duplicate rows, only keeping the first instance of any data


   a  b
0  x  1
2  x  2
3  y  3
5  y  4

In [71]:
data.drop_duplicates(keep='last')    # drop duplicate rows, only keeping the first instance of any data

   a  b
1  x  1
2  x  2
4  y  3
6  y  4

 If you want to check for duplicates based on a smaller set of columns, you can specify a list of columns names:

In [74]:
data['c'] = range(7)    # add a column c with values 0..6 this makes .duplicated() report no duplicate rows
data.duplicated()

0    False
1    False
2    False
3    False
4    False
5    False
6    False
dtype: bool

In [75]:
data.drop_duplicates(['a', 'b']) # but if we specify duplicates to be dropped only in columns a & b they will be dropped

   a  b  c
0  x  1  0
2  x  2  2
3  y  3  3
5  y  4  5

## <div class= "alert alert-info"> Transforming Data
<div class= "alert alert-success">
Another part of tidying data involves transforming existing data into another presentation. This may be needed for the following reasons:
    
- Values are not in the correct units
    
- Values are qualitative and need to be converted to appropriate numeric values
    
- There is extraneous data that either wastes memory and processing time, or can affect results simply by being included
    
To address these situations, we can take one or more of the following actions:
    
- Map values to other values using a table lookup process
    
- Explicitly replace certain values with other values (or even another type of data)
    
- Apply methods to transform the values based on an algorithm
    
- Simply remove extraneous columns and rows
    
We have already seen how to delete rows and columns with several techniques, so we will not reiterate those here. We will cover the facilities provided by pandas for mapping, replacing, and applying functions to transform data based upon its content.    
    

<div class= "alert alert-success">
    
### <div class= "alert alert-warning"> Mapping
    
One of the basic tasks in data transformations is mapping of a set of values to another set. pandas provides a generic ability to map values using a lookup table (via a Python dictionary or a pandas Series) using the .map() method. This method
performs the mapping by matching the values of the outer Series with the index labels of the inner Series, and returning a new Series with the index labels of the outer Series but the values from the inner Series:
In [46]:
# create two Series objects to demonstrate mapping
x = pd.Series({"one": 1, "two": 2, "three": 3})
y = pd.Series({1: "a", 2: "b", 3: "c"})
x

In [81]:
x = pd.Series({"one": 1, "two": 2, "three": 3})  # create two Series objects to demonstrate mapping
y = pd.Series({1: "a", 2: "b", 3: "c"})
x  ,    y

(one      1
 two      2
 three    3
 dtype: int64,
 1    a
 2    b
 3    c
 dtype: object)

<div class= "alert alert-success">
    
### <div class= "alert alert-warning"> Replacing values

<div class= "alert alert-success">
    
### <div class= "alert alert-warning"> Applying functions to transform data

### <div class= "alert alert-danger">