**Import Libraries and Read Data**

* Imports necessary libraries: `NumPy`, `Pandas`, and `Plotly` Express.
* Sets a display option for `Pandas` to format floating-point numbers.
* Reads four CSV files into `Pandas` DataFrames (`expectancy`, `fertility`, `population`, and `metadata`).

In [None]:
import numpy as np
import pandas as pd
import plotly.express as px

# Set display option for Pandas
pd.set_option('display.float_format', lambda x: '%.2f' % x)

# Read Data
expectancy = pd.read_csv("/content/drive/MyDrive/Prepinsta winter internship/week 4/life_expectancy.csv")
fertility = pd.read_csv("/content/drive/MyDrive/Prepinsta winter internship/week 4/fertility_rate.csv")
population = pd.read_csv("/content/drive/MyDrive/Prepinsta winter internship/week 4/country_population.csv")
metadata = pd.read_csv("/content/drive/MyDrive/Prepinsta winter internship/week 4/Metadata_Country.csv")

**Data Cleaning - Population DataFrame**

* Cleans the `population` DataFrame by removing unnecessary columns, renaming columns, filtering rows, standardizing column names, and replacing NaN values with row medians.

In [None]:
# Population DataFrame
# Drop unnecessary columns
columns_to_remove = ['Indicator Name', 'Indicator Code']
population.drop(columns=columns_to_remove, inplace=True)

# Rename the 'Country Name' column
population.rename(columns={'ï»¿"Country Name"': 'Country Name'}, inplace=True)

# Filter out rows with 'Country Name' as 'Not classified'
population = population[population['Country Name'] != 'Not classified']

# Standardize column names
population.columns = population.columns.str.lower().str.replace(' ', '_')

# Replace NaN values with the median of each row for columns starting from the third column
population.iloc[:, 2:] = population.iloc[:, 2:].apply(lambda row: row.fillna(row.median()), axis=1)

# Display the updated DataFrame
population.head()

**Data Cleaning - Fertility DataFrame**

* cleans the `fertility` Dataframe by removing unnecessary columns, renaming columns, filtering rows, standardizing columns name.



In [None]:
# Fertility DataFrame

fertility.drop(['Indicator Name', 'Indicator Code'], axis=1, inplace=True)
fertility.rename(columns={'ï»¿"Country Name"': 'Country Name'}, inplace=True)
fertility = fertility[fertility['Country Name'] != 'Not classified']
fertility.columns = fertility.columns.str.lower().str.replace(' ', '_')
fertility.head()

**Data Cleaning - Expectancy DataFrame**

* cleans the `expectancy` Dataframe by removing unnecessary columns, renaming columns, filtering rows, standardizing columns name.

In [None]:
# Expectancy DataFrame
expectancy.drop(['Indicator Name', 'Indicator Code'], axis=1, inplace=True)
expectancy.rename(columns={'ï»¿"Country Name"': 'Country Name'}, inplace=True)
expectancy = expectancy[expectancy['Country Name'] != 'Not classified']
expectancy.columns = expectancy.columns.str.lower().str.replace(' ', '_')
expectancy.head()

**Data Cleaning - Metadata DataFrame**

* Cleans the `metadata` DataFrame by removing unnecessary columns, renaming columns, shifting a column, and standardizing column names.

In [None]:
#metadata
columns_to_remove = ['SpecialNotes','Unnamed: 5']
metadata = metadata.drop(columns=columns_to_remove)
# Renaming columns
column_rename_dict = {
    'ï»¿"Country Code"': 'Country Code',
    'IncomeGroup': 'Income Group',
    'TableName': 'Country Name'
}
metadata.rename(columns=column_rename_dict, inplace=True)

# Shifting column
column_to_shift = metadata.pop('Country Name')
metadata.insert(0, 'Country Name', column_to_shift)

# Standardizing column names
metadata.columns = metadata.columns.str.lower().str.replace(' ', '_')

metadata.head()

**Merging DataFrames and Data Transformation**

* Merges the `population` DataFrame with selected columns from the `metadata` DataFrame based on `'country_name'`.
* Performs data transformation by melting and sorting for the `population` DataFrame.

In [None]:
# Columns to merge
columns_to_merge = ['country_name', 'region']

# Merging DataFrames
population_metadata = pd.merge(population, metadata[columns_to_merge], on='country_name')

# Shifting column
column_to_shift = population_metadata.pop('region')  # Assuming 'region' is the second column in 'columns_to_merge'
population_metadata.insert(2, 'region', column_to_shift)

# Melting and Sorting Data
melted_data_population = population_metadata.melt(
    id_vars=['country_name', 'country_code', 'region'],
    var_name='year',
    value_name='population'
)

sorted_data_population = melted_data_population.sort_values(by=['country_name', 'year'])


**Merging DataFrames(Continues)**

* Merges the melted and sorted DataFrames for `population` and `fertility` based on `'country_name'` and `'year'`.

In [None]:
# Merging DataFrames (Continued)
melted_data_fertility = fertility.melt(id_vars=['country_name','country_code'], var_name='year', value_name='fertility')
sorted_data_fertility = melted_data_fertility.sort_values(by=['country_name','year'])
columns_to_merge = ['fertility','country_name','year']
population_fertility = sorted_data_population.merge(sorted_data_fertility[columns_to_merge], on=['country_name','year'])


**Merging DataFrames (Continued)**

* Merges the sorted DataFrames for `population_fertility` and `expectancy` based on `'country_name'` and `'year'`.

In [None]:
# Merging DataFrames (Continued)
melted_data_expectancy = expectancy.melt(id_vars=['country_name','country_code'], var_name='year', value_name='expectancy')
sorted_data_expectancy = melted_data_expectancy.sort_values(by=['country_name','year'])
columns_to_merge = ['expectancy','country_name','year']
final = population_fertility.merge(sorted_data_expectancy[columns_to_merge], on=['country_name','year'])


**Data Cleaning (Continued)**

* Continues data cleaning by dropping rows with `NaN` values in the `'region'` column, rounding `'fertility'` and `'expectancy'` values, converting the `'year'` column to an integer, and filling missing `'population'` values with 0.

In [None]:
# Data Cleaning (Continued)
final.dropna(subset=['region'], inplace=True)
final['fertility'] = final['fertility'].round(decimals=2)
final['expectancy'] = final['expectancy'].round(decimals=2)
final["year"] = final["year"].astype(int)
final.dropna(subset=['population'], inplace=True)
final['population'].fillna(0, inplace=True)



**Data Visualization by Country Name**:
1. **Data Visualization with Plotly Express:**
   - Uses Plotly Express (`px`) to create an animated scatter plot (`fig_fertility_expectation_region`) based on the `final` DataFrame.

2. **Scatter Plot Parameters:**
   - X-axis represents the fertility rate (`'fertility'`).
   - Y-axis represents the life expectancy at birth (`'expectancy'`).
   - Bubble size is determined by the population of each country, with a maximum bubble size set to 50.
   - Each point on the plot is associated with a country, and hovering over a point displays the country's name.

3. **Color and Animation:**
   - The plot is colored based on the 'region' column, allowing visual differentiation of regions.
   - Animation is applied over the 'year' column, showcasing changes in fertility rates and life expectancies over time.

4. **Plot Layout Customization:**
   - The template 'plotly_dark' is used for a dark background.
   - Customizes the plot layout with a title, x-axis title, and y-axis title.

5. **Displaying the Plot:**
   - Displays the animated scatter plot (`fig_fertility_expectation_region`) using the `show()` method.

In [None]:
# Data Visualization
fig_fertility_expectation_region = px.scatter(
    data_frame=final,
    x='fertility',
    y='expectancy',
    size='population',
    size_max=50,
    hover_name='country_name',
    color='region',
    animation_frame='year',
    animation_group='country_name',
    template='plotly_dark',
    range_x=[0, 10],
    range_y=[10, 90]
)

fig_fertility_expectation_region.update_layout(
    title='Fertility Rate vs. Life Expectancy',
    xaxis_title='Fertility Rate - Total [Births per Woman]',
    yaxis_title='Life Expectancy at Birth - Total'
)

fig_fertility_expectation_region.show()


**Data visualization by region**
1. *Import `Plotly` Express:*

* Imports the `Plotly` Express library as `px`, which is a high-level interface for creating interactive visualizations.
2. *Grouping Data:*

* Groups the `final` DataFrame by `'region'` and `'year'` and calculates the mean for each group using the `groupby` and `mean` functions. The result is stored in a new DataFrame named `grouped_data`.
* The data is then reset to have a clean index.
3. *Creating Scatter Plot:*

* Utilizes `Plotly` Express to create an animated scatter plot (`fig_fertility_expectation_region_grouped`) based on the grouped data.
* X-axis represents the mean fertility rate (`'fertility'`).
* Y-axis represents the mean life expectancy at birth (`'expectancy'`).
* Bubble size is determined by the mean population of each group, with a maximum bubble size set to 50.
* The plot is colored and labeled by the 'region' column.
* The animation is performed over the 'year' column, showing changes in fertility and life expectancy over time for each region.
* The visualization is presented using the 'plotly_dark' template for a dark background.
4. *Setting Plot Layout*:

* Sets the plot title to 'Grouped Fertility Rate vs. Life Expectancy by Region'.
* Defines custom labels for the x and y axes.
5.*Displaying the Plot*:

* Displays the created scatter plot (`fig_fertility_expectation_region_grouped`) using the `show()` method.

In [None]:
import plotly.express as px

# Group by region and calculate the mean for each group
grouped_data = final.groupby(['region', 'year']).mean().reset_index()

# Create the scatter plot using Plotly Express with animation
fig_fertility_expectation_region_grouped = px.scatter(
    data_frame=grouped_data,
    x='fertility',
    y='expectancy',
    size='population',
    size_max=50,
    hover_name='region',
    color='region',
    animation_frame='year',  # Animate over the years
    animation_group='region',  # Animation grouping
    template='plotly_dark',
    range_x=[0, 10],
    range_y=[10, 90],
    title='Grouped Fertility_Rate vs. Life_Expectancy_by_Region',
    labels={'fertility': 'Fertility_Rate - Total [Births_per_Woman]', 'expectancy': 'Life_Expectancy_at_Birth - Total'}
)

fig_fertility_expectation_region_grouped.show()
