## ![logo](../../img/license_header_logo.png)
> **Copyright &copy; 2021 CertifAI Sdn. Bhd.**<br>
 <br>
This program and the accompanying materials are made available under the
terms of the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0). <br>
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations
under the License. <br>
<br>**SPDX-License-Identifier: Apache-2.0**

# <a name="top">05 - Exercise for Fundamentals of Pandas & Basic Functions</a>
Authored by: Scotrraaj Gopal - scotrraaj.gopal@certifai.ai

## <a name="description">Notebook Description</a>

One is able to consolidate their understanding during practical use of the knowledge. This notebook contains exercises to test your understanding and help you identify which application needs more reading.

By the end of this exercise, you will be able to:

1. Relate and apply the fundamental methods in `pandas`.
2. Explain and use the basic functions in `pandas`.

## Notebook Outline
Here's the outline for this tutorial:
1. [Notebook Description](#description)
2. [Notebook Configurations](#configuration)
3. [Section I - Fundamentals of `pandas`](#fundamentals)
4. [Section II - Basic Functions](#basic)
5. [Summary](#summary)
6. [Reference](#reference)

## <a name="configuration">Notebook Configurations</a>

**Task 0:** Import `pandas` with alias `pd`.

**Expected output:**
>![05-00](../../img/pandas/05-00.png)

In [None]:
### BEGIN SOLUTION
import pandas as pd
### END SOLUTION


dir(pd)

## <a name="fundamentals">Section I - Fundamentals of `pandas`</a>

**Task I-1:** Create a `DataFrame` object called `groceries` that looks like the expected output.

**Expected Output:**
>![05-01](../../img/pandas/05-01.png)

In [None]:
### BEGIN SOLUTION
groceries = pd.DataFrame({"Eggs": [12], "Milk":[3]})
### END SOLUTION


groceries

**Task I-2:** Create a `DataFrame` object called `grocery_sales` that looks like the expected output.

**Expected Output:**
>![05-02](../../img/pandas/05-02.png)

In [None]:
### BEGIN SOLUTION
grocery_sales = pd.DataFrame({"Eggs": [12], "Milk":[3]}, index = ["2019 Sales", "2020 Sales"])
### END SOLUTION


grocery_sales

**Task I-3:** Create a `Series` object called `recipe` that looks like the expected output.

**Expected Output:**
>![05-03](../../img/pandas/05-03.png)

In [None]:
### BEGIN SOLUTION
recipe = pd.Series({"Eggs": "4 Grade A's", 
                    "Milk":"1.5 cups", 
                    "Flour":"4 cups", 
                    "Baking Powder": "2 teaspoons"}, name="Plain Cake")
### END SOLUTION


recipe

**Task I-4:** Load the CSV file at `../../Datasets/pandas/winemag-data-130k-v2.csv` in the `csv_data` variable.

**Expected Output:**
>![05-04](../../img/pandas/05-04.png)

In [None]:
### BEGIN SOLUTION
csv_data = pd.read_csv("../../Datasets/pandas/winemag-data-130k-v2.csv", index_col=0)
### END SOLUTION


csv_data

**Task I-5:** Save the `grocery_sales`variable in a CSV file entitled `sales.csv`. Open `sales.csv`.

**Expected Output:**
>![05-05](../../img/pandas/05-05.png)

In [None]:
### BEGIN SOLUTION
grocery_sales.to_csv("sales.csv")
### END SOLUTION


## <a name="basic">Section II - Basic Functions</a>
**Task II-1:** Load the CSV file at `../../Datasets/pandas/winemag-data-130k-v2.csv` in the `wine` variable and show only the first three rows.

**Expected Output:**
>![05-04](../../img/pandas/05-04.png)

In [None]:
### BEGIN SOLUTION
wine = pd.read_csv("../../Datasets/pandas/winemag-data-130k-v2.csv", index_col=0)
wine.head(3)
### END SOLUTION


**Task II-2:** Obtain the median of the `points` column in the `wine` object.

**Expected Output:**

> 88.0

In [None]:
### BEGIN SOLUTION
wine.points.median()
### END SOLUTION


**Task II-3:** List out the countries in the dataset without any duplicates.

**Expected Output:**

>![05-06](../../img/pandas/05-06.png)

In [None]:
### BEGIN SOLUTION
wine.country.unique()
### END SOLUTION


**Task II-4:** Obtain the number of times that each country appear in the dataset.

**Expected Output:**

>![05-07](../../img/pandas/05-07.png)

In [None]:
### BEGIN SOLUTION
wine.country.value_counts()
### END SOLUTION


**Task II-5:** Carry out the centering transformation on the `price` column and save it in the `centered_price` variable. This transformation can be done by **subtracting the mean of the column from every datapoint**.</br></br>


<text><center>*Centering transformation is quite common while preprocessing data for Machine Learning application*</center></text>

**Expected Output:**

>![05-08](../../img/pandas/05-08.png)

In [None]:
### BEGIN SOLUTION
centered_price = wine.price - wine.price.mean()
centered_price.name = "centered_price"
### END SOLUTION


centered_price

##  <a name="summary">Summary</a>
To conclude, you should now be able to:

1. Relate and apply the fundamental methods in `pandas`.
2. Explain and use the basic functions in `pandas`.

Congratulations, you have completed this exercise!

## <a name="reference">Reference</a>
* [Dataset Source](https://www.kaggle.com/zynicide/wine-reviews)
* [Question Reference](https://www.kaggle.com/learn/pandas)

<font size=2>[Back to Top](#top)</font>