<div style="text-align: center;">
  <h1>To stay or not to stay — that is the question!</h1>
  <img src="data/img.jpg">
</div>

### Introduction

In previous lessons, we learned how to remove data and handle missing values.
Now, in this exercise, we aim to challenge our skills through practice and coding!

This exercise is divided into two parts:
- The first part is focused on **preprocessing**.
- The second part involves **answering several analytical questions** based on the data.

The file `migration_rate.csv` contains statistics for **immigration to** and **emigration from** various countries from **1990 to 2020**.
The goal of this exercise is to analyze the **migration rate** for several countries listed in this file.

In the cell below, you will find the description of the different columns in this dataset.

| Column Name | Column Description |
|:-----------:|:------------------:|
| Country     | Name of the country |
| 1990        | Migration rate for each country in the year 1990 |
| 1995        | Migration rate for each country in the year 1995 |
| 2000        | Migration rate for each country in the year 2000 |
| 2005        | Migration rate for each country in the year 2005 |
| 2010        | Migration rate for each country in the year 2010 |
| 2015        | Migration rate for each country in the year 2015 |
| 2020        | Migration rate for each country in the year 2020 |


### Migration Rate Formula

The migration rate is calculated as the ratio of the difference between immigration and emigration to the population of the country in that year, per 1,000 people. See the formula below:

####
$$
\frac{(\text{migration in} - \text{migration out}) \times 1000}{\text{population}}
$$

> 💬  This formula is not required for solving the problem, and is only provided for better understanding.

👺 **Is the migration rate always positive?**
No. The migration rate can be negative, zero, or positive.
A negative rate indicates a decrease in migration, while a positive rate indicates an increase in migration.

## Dataset

In [1]:
import numpy as np
import pandas as pd

In [None]:
# TODO: read csv file in df
df = None


<p dir=rtl style="direction: rtl;text-align: right;line-height:200%;font-family:vazir;font-size:medium">
<font face="vazirmatn" size=3>
</font>
</p>



## Part One: Preprocessing

Before performing any analysis, you need to handle missing values in the dataset and remove some data.

Follow the steps below in the preprocessing stage:

- Permanently remove countries from the dataset if their migration rate is missing for all 7 years (i.e., no migration data is recorded for any year).

- For countries that are missing data for some years, fill the missing values with the **mean migration rate of that specific country**.
  For example, if a country’s migration rates from 1995 to 2020 are:
  `-5.2`, `-3.4`, `-0.7`, `0.9`, `1.3`, `2.5`,
  then the missing value for 1990 should be filled with `-0.76`, which is the average of the available values.

### Step One of Preprocessing

Permanently remove countries from the dataset if their migration rate is missing for all 7 years (i.e., no migration data is recorded for any year).

💡 **Hint:** <br>
To remove rows that have no recorded data, you should use the `thresh` argument in the `drop` function.

The value you pass to this argument is an integer that specifies the minimum number of non-null values required in a row.
This means that if the number of null values in a row exceeds this threshold, the row will be dropped.
Note that each row in this DataFrame contains at least one value: the country name.


In [None]:
# TODO: drop some rows

👺 **Why are rows with some non-null values still being dropped?**

One of the most common mistakes in this question is related to the `drop` function and the `thresh` argument.
For example, if `thresh` is set to 3, only rows with at least 3 non-null values will be retained.

In this question, since each row also contains a country name,
you should set the `thresh` value such that all rows with **at least one valid data point** (in addition to the country name) are kept.

### Preprocessing Step 2

Fill in the missing values for countries that have only **some** migration data missing
by replacing them with the **mean migration rate for that specific country**.

For example, if a country's migration rates from 1995 to 2020 are:
`5.2`, `3.4`, `0.7`, `0.9`, `1.3`, and `2.5`,
then the missing value for the year 1990 should be filled with the average of these six values: `0.76`.

> 💬 This ensures that no missing values remain for countries with at least some available data.

In [None]:
# TODO: fill null values of rows

👺 **Why do I get an error when calculating the mean?**

Note that the `Country` column contains string values,
and it should not be included when calculating the mean.
For example, you can use the following code:

```python
df.drop(['Country'], axis=1).apply(lambda x: x.mean(), axis=1)
```

After completing both preprocessing steps,
the number of missing values in **all columns** should be **zero**.

In [None]:
df.isna().sum()

## Part Two: Analysis

After completing the two preprocessing steps, answer the following questions.

By responding to these questions, we aim to gain clear insights into the migration trends in various countries.

### Question 1

In the year 2020, find the **names** of the top three countries with the highest immigration rate.
Store your answer in the list `top_countries`.
The first, second, and third elements of the list should correspond to the countries ranked 1st, 2nd, and 3rd in immigration rate, respectively.

In [None]:
# TODO: find the top 3 countries
top_countries = None
print(top_countries)

### Question 2

Calculate the average migration rate of **Iran** over this 30-year period and store the result in the variable `iran_mean`.
The type of this variable should be `numpy.float64`.

💬 **Note:** The name of Iran in the dataset is written as `Iran (Islamic Republic of)`.

In [None]:
# TODO: find iran_mean
iran_mean = None
print(iran_mean)

### Question 3

Write the **name** of the country that has experienced the highest growth over this 30-year period in the variable `highest_growth`.
This variable should be of type *str*.
By growth, we mean the difference between the migration rate in 2020 and that in 1990.

In [None]:
# TODO: find highest_growth
highest_growth = None
print(highest_growth)