<a href="https://www.kaggle.com/code/amirbaniasadi/world-air-pollution-analysis-folium?scriptVersionId=140009360" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

In [None]:
# **Introduction**

One issue threatening earth's atmosphere and specially human health is air pollution which is emitted and created from consuming fossil fuel and even household air pollution. In regard to this fact, it worth to scrutinize the data of the air pollution in different countries to practice preventive measures.

In order to use the air pollution data some other data are also important to take into account like geographical data to visualize, and population from demographic data to create integral and meaningful insights.

Therefore, population data is downloaded from https://github.com/datasets/population/blob/main/data/population.csv, countries' location from https://gist.github.com/tadast/8827699 and geojson file for countries geojson polygons from https://github.com/datasets/geo-countries/tree/master/data. 

**Now it's time for importing required libraries and files:**

In [None]:
#Import required libraries

import pandas as pd
import numpy as np
import json

# Libraries for creating and using maps
import folium
from folium import Choropleth, Circle, Marker
from folium.plugins import HeatMap, MarkerCluster, FeatureGroupSubGroup
from folium import plugins

# To display stuff in notebook
from IPython.display import display, Markdown

In [None]:
# Importing files
death_factor = pd.read_csv('/kaggle/input/air-pollution/number-of-deaths-by-risk-factor.csv')
death_air_pollution = pd.read_csv('/kaggle/input/air-pollution/death-rates-from-air-pollution.csv')
share_death = pd.read_csv('/kaggle/input/air-pollution/share-deaths-air-pollution.csv')
country_population = pd.read_csv('/kaggle/input/countries-popolation/population.csv')
country_loc_extraction = pd.read_csv('/kaggle/input/d/amirbaniasadi/countries-location/countries_codes_and_coordinates.csv')

# Reading GeoJSON with Countries' borders
with open('/kaggle/input/world-geojson/countries.geojson', 'r') as file:
    countries_borders = json.loads(file.read())

# **EDA**

In [None]:
death_air_pollution.head()

In [None]:
death_air_pollution.Year.unique()

**Based on &uarr; the data is for period 1990 to 2017**

In [None]:
#changing the columns' names
death_air_pollution.columns = ['Country', 'Code', 'Year',
       'Air pollution',
       'Household air pollution from solid fuels',
       'Ambient particulate matter pollution',
       'Ambient ozone pollution']

In [None]:
death_factor['Entity'].unique().shape,death_air_pollution['Country'].unique().shape

In [None]:
death_air_pollution[death_air_pollution.Code.notnull()].Country.unique().shape

In [None]:
death_air_pollution[death_air_pollution.Code.isnull()].Country.unique().shape

**Totally there are 196 countries in world therefore the country or entity column may include some other teritory types as well for instance Western Europe is in this column but it is apparently not a country. this regions are 35 as you can see there are no country code for them**

In [None]:
share_death.head()

In [None]:
share_death.Entity.unique().shape

Based on description this table is showing the share of Air pollution among various causes in different countries from 1990 to 2017

# **Geographical Visualization**

## **1. Total death by Total air pollution**

In [None]:
stations_map = folium.Map(location=[20,100], tiles='cartodbpositron', zoom_start=2)
stations_map

In [None]:
#the average of death for each year through these years(1990-2017) by country code
death_avg = pd.DataFrame(death_air_pollution.loc[:, ['Code', 'Air pollution']].groupby('Code').mean().round())

In [None]:
death_avg.head()

In [None]:
country_loc_extraction

In [None]:
country_loc_extraction = country_loc_extraction.loc[:,['Country','Alpha-3 code', 'Latitude (average)', 'Longitude (average)']]

In [None]:
country_loc_extraction['Alpha-3 code'] = country_loc_extraction['Alpha-3 code'].str[2:-1]
country_loc_extraction['Latitude (average)'] = country_loc_extraction['Latitude (average)'].str[2:-1]
country_loc_extraction['Longitude (average)'] = country_loc_extraction['Longitude (average)'].str[2:-1]

In [None]:
country_loc = country_loc_extraction.copy()

In [None]:
country_loc = country_loc.rename(columns={'Alpha-3 code': 'Country Code',
                                          'Latitude (average)': 'Latitude',
                                          'Longitude (average)': 'Longitude'})

In [None]:
country_population.head()

In [None]:
country_population.Year.unique()

**Since our death rate data is from 1990 to 2017, the population data should be extracted for these period as well**

In [None]:
pop_data = country_population[(country_population.Year>=1990)&(country_population.Year<=2017)]

In [None]:
avg_pop = pd.DataFrame(pop_data.loc[:,['Country Code','Value']].groupby('Country Code').mean())
avg_pop = avg_pop.reset_index()

In [None]:
avg_pop.head()

In [None]:
country_loc.head()

**In order to have population data along with location data, they can be merged together to have more insightful table and use it in making markers for maps.**

In [None]:
country_stats = pd.merge(country_loc,avg_pop, on = 'Country Code')

In [None]:
country_stats = country_stats.rename(columns = {'Value':'Population'})

In [None]:
country_stats.head()

In [None]:
Choropleth(
        geo_data=countries_borders,
        data=death_avg['Air pollution'],
        key_on="feature.properties.ISO_A3",
         fill_color="OrRd",
        nan_fill_color="black",
        legend_name='Average of death rate from 1990 to 2017'
).add_to(stations_map)

In [None]:
def plotDot(point):
  folium.CircleMarker(location=[point.Latitude,point.Longitude],
                      radius=5,
                      weight=2,
                      tooltip="click me!",
                      popup=folium.Popup('Average popolation: ' + str(round(point.Population))),
                      fill_color="#000000").add_to(stations_map)

In [None]:
country_stats.apply(plotDot,axis=1)
stations_map.fit_bounds(stations_map.get_bounds())
stations_map

## **2. Death rate based on population of each country**

In [None]:
country_population.head()

In [None]:
avg_pop.head()

In [None]:
avg_pop = avg_pop.set_index('Country Code')

In [None]:
avg_pop.index.shape

In [None]:
death_avg.index.shape

In [None]:
len(set(list(avg_pop.index)).intersection(death_avg.index))

There are two countries without population 

In [None]:
for i in death_avg.index:
  if i not in avg_pop.index:
    print(i)

**One of them Taiwan(part of china) and one is not a country but whole World; so both of them can be deleted**

In [None]:
death_avg = death_avg.drop('OWID_WRL')

In [None]:
death_avg = death_avg.drop('TWN')

In [None]:
death_avg.head()

**One Idea to be able to compare crisis in different countries with each other is to devide the death cases to the population of each country to have a more sensible measure named death_percapita**

In [None]:
death_percapita = death_avg.copy()

In [None]:
for i in death_percapita.index:
    death_percapita.loc[i] = (death_percapita.loc[i]['Air pollution']/avg_pop.loc[i].values)

In [None]:
death_percapita.head()

In [None]:
death_percapita[death_percapita['Air pollution'] == max(death_percapita['Air pollution'])]

In [None]:
tops = list(death_percapita.sort_values('Air pollution',ascending = False).head(10).index)
tops

In [None]:
max_countries = []
for i in tops:
    new = country_population[country_population['Country Code'] == i]['Country Name'].unique()[0]
    if new not in max_countries:
        max_countries.append(new)

In [None]:
max_countries

These are the countries with most death rates in regard to population

In [None]:
# This is a function generator that creates functions to say if a death rate is good, normal, bad or very bad.
def evaluation_generator(good, normal, bad):
    def death_rate_evaluator(value):
        if(pd.isnull(value) or value < 0):
            return np.nan
        elif(value <= good):
            return 1
        elif(value <= normal):
            return 2
        elif(value <= bad):
            return 3
        else:
            return 4
        
    return death_rate_evaluator

evaluator = evaluation_generator(death_percapita['Air pollution'].quantile(0.25), death_percapita['Air pollution'].quantile(0.5), death_percapita['Air pollution'].quantile(0.75))

In [None]:
death_percapita['status']= 0

In [None]:
for i in death_percapita.index:
    death_percapita.loc[i,'status'] = evaluator(death_percapita.loc[i,'Air pollution'])

In [None]:
death_percapita['Population']= 0

In [None]:
for i in death_percapita.index.values:
  death_percapita.loc[i,'Population'] = avg_pop.loc[i].values

In [None]:
death_percapita

In [None]:
station_map = folium.Map(location=[20,100], tiles='cartodbpositron', zoom_start=2)
Choropleth(
        geo_data=countries_borders,
        data=death_percapita['status'],
        key_on="feature.properties.ISO_A3",
         fill_color="OrRd",
        nan_fill_color="black",
        legend_name='Average of death rate percapita from 1990 to 2017'
).add_to(station_map)

In [None]:
def plotdot(point):
  folium.CircleMarker(location=[point.Latitude,point.Longitude],
                      radius=5,
                      weight=2,
                      tooltip="click me!",
                      popup=folium.Popup('Average popolation: ' + str(round(point.Population))),
                      fill_color="#000000").add_to(station_map)

In [None]:
country_stats.apply(plotdot,axis=1)
station_map.fit_bounds(station_map.get_bounds())
station_map

## **3. creating a toggle map for death rate due to various causes**

In [None]:
death_factor.head()

In [None]:
death_factor = death_factor.rename(columns={"Entity": "Country"})
overview = death_factor.groupby('Code').mean().loc[:,'Unsafe water source':]
overview = overview.reset_index()
overview = overview.set_index('Code')

In [None]:
print(overview.shape)
overview.head()

**In order to compare countries with each other we should omit regions like World or South east Asia...**

In [None]:
len(countries_borders['features'])

In [None]:
countries = []
for i in range(255):
    k = countries_borders['features'][i]['properties']['ISO_A3']
    countries.append(k)

In [None]:
len(set(list(overview.index)).intersection(countries))

In [None]:
p = 0
regions = []
for i in list(overview.index):
    if i not in countries:
        p+=1
        regions.append(i)
print(p)

In [None]:
regions

As it can be seen the difference between countries in overview are regions not countries thus they should be ommitted in order to be able comparing the death numbers based on countries

In [None]:
for i,row in overview.iterrows():
    if i in regions:
        overview = overview.drop(i)

In [None]:
# This is a function generator that creates functions to say if a death rate is good, normal, bad or very bad.
def evaluation_generator(good, normal, bad):
    def death_rate_evaluator(value):
        if(pd.isnull(value) or value < 0):
            return np.nan
        elif(value <= good):
            return 1
        elif(value <= normal):
            return 2
        elif(value <= bad):
            return 3
        else:
            return 4

    return death_rate_evaluator

# A dictionary that maps causes to functions that evaluate the death level.
evaluators = {
    i: evaluation_generator(overview[i].quantile(0.25), overview[i].quantile(0.5), overview[i].quantile(0.75))
    for i in list(overview)[1:]
}

In [None]:
overall = pd.DataFrame()
overall.head()

In [None]:
for idx, row in overview.iterrows():
  for i in row.index[1:]:
        overall.loc[idx,i] = evaluators[i](overview.loc[idx,i])

In [None]:
overall.head()

In [None]:
overall = overall.reset_index()
overall = overall.rename(columns = {'index':'Code'})

In [None]:
overall.columns

In [None]:
toggle_map = pd.melt(overall, id_vars='Code', value_vars=list(overall.columns))

In [None]:
toggle_map.head()

In [None]:
toggle_map = toggle_map.set_index('Code')
toggle_map.value = toggle_map.value.fillna(0.0).astype(int)
# toggle_map_new = toggle_map[(toggle_map.variable == 'Unsafe sanitation')|(toggle_map.variable == 'Diet low in nuts and seeds')]

In [None]:
# Load your death cause data into a pandas DataFrame (replace with your actual data)
death_data = toggle_map_new

# Load GeoJSON data for country borders (replace with the actual path)
geojson_path = 'countries.geojson'

# Create the base map
my_map = folium.Map(location=[20,100], tiles='cartodbpositron', zoom_start=2)
my_map

In [None]:
# Create Choropleth layers for each death cause
for cause in death_data['variable'].unique():
  subset = death_data[death_data['variable'] == cause]
  Choropleth(
    geo_data=countries_borders,
    data=subset['value'],
    key_on= "feature.properties.ISO_A3",
    fill_color='OrRd',
    fill_opacity=0.7,
    line_opacity=0.5,
    name=f'{cause} Death Count',
    nan_fill_color="black",
    ).add_to(my_map)

In [None]:
# Add LayerControl to toggle between different causes
folium.LayerControl(collapsed=False).add_to(my_map)

In [None]:
my_map