# Indicator Analysis & Decisions

## 1. Connectivity Indicators

### 1.1 Internet Users & Cellular Subscriptions
<b>(No missing values)</b>: The dataset was formed using a weighted average and should therefore be used with caution. However, as far as we can see, the <a href="https://www.itu.int/net4/ITU-D/icteye/#/">original source</a> of data for both <a href="https://www.itu.int/net4/ITU-D/icteye/#/topics/2001">Internet Users</a> and <a href="https://www.itu.int/net4/ITU-D/icteye/#/topics/1002">Cellcular Subscriptions</a> correlates with our data.

<em>&rarr; Decision: Keep</em>

The issue with a weighted average aggregation method is that it only allows for country comparison over a period of time (from 1960 - 2020). For example in terms of internet users for Iraq (an outlier):
- Our WA data states that 1% of the population is using the internet. 
- The original data states that 75% of the population in 2017 are using the internet. 

The same can be said of cellular subscriptions in the case of Kuwait (an outlier):
- Our WA data states that 57% of the population has cellular subscriptions.
- The original data states that 98% of the population in 2017 has cellular subscriptions.
<br>

![](https://i.ibb.co/DrYnjWD/Boxplot-of-Internet-Cellular-Usage-in-Arabian-Peninsula-Edited.png).

## 2. Economic Indicators
As in figure 3, there are seven economic indicators. While Syria, Yemen and West Bank and Gaza experience civil war situations (<a href= https://www.cia.gov/library/publications/the-world-factbook/geos/sy.html>CIA, 2020a</a>)( <a href=https://www.cia.gov/library/publications/the-world-factbook/geos/ym.html>CIA, 2020b</a>)( <a href=https://www.cia.gov/library/publications/the-world-factbook/geos/we.html>CIA, 2020c</a>)( <a href=https://www.cia.gov/library/publications/the-world-factbook/geos/gz.html>CIA, 2020d</a>  ), the economic situation is challenging resulting in a low to lower middle income level as can be withdrawn from figure 3. The other countries in the region have upper middle to high income levels and hence, assumingly a considerably stable economy.

### 2.1 GDP per person employed  

<b>(6.67% missing values)</b>: GDP per employed person represents the labor productivity in each country and is estimated according to national account conventions to allow comparisons. These national account conventions differ significantly in their topicality: While Iraq, Jordan and Syria use systems from 1968, others updated their system at 1993 and some other countries are using a system from 2008. West Bank and Gaza does not even have a system for national accounts, which explains why it is the only country missing a value for this indicator. Furthermore, the metadata hints "there are still significant limitations on the availability of reliable data".

<em> &rarr; Decision: Drop</em>

![](https://i.ibb.co/WF34zNv/e-Income-group.png)

###  2.2 GNI per capita

<b>(6.67% missing values)</b>: As of the metadata, GNI per capita is computed using Worldbank's National Accounts data and OECD National Account data. The combination of both data sources gives confidence in the reliability, and the source data matches the given data set.

<em> &rarr; Decision: Keep</em>

The only country without this indicator is Syria (SYR). <a href= https://www.cia.gov/library/publications/the-world-factbook/geos/sy.html>CIA (2020a)</a> states that Syria's economy is deteriorated because of the civil war and ongoing humanitarian crisis within the country. The missing value will be imputed with the mean of West Bank and Gaza's and Yemen's GNI per capita because those two countries face similar economic conditions. The imputed distribution is not deviating much from the original distribution as of figure 4.

As visualized in figure 5, Quatar is an upper outlier in the region. The massive oil and natural gas reserves in the country are making it one of the richest countries in the world and leading to the highest GNI per capita in the region(<a href=https://www.cia.gov/library/publications/the-world-factbook/geos/qa.html>CIA, 2020e</a>).

![](https://i.ibb.co/N67Kp6r/GNI.png)
<br>

### 2.3 International Trade

<b>(6.67% missing values)</b>: As of the meta data, International Trade is the sum of import and exports measured as a share of GDP. It is computed using the same data sources as GNI per capita. 

<em> &rarr; Decision: Keep</em>

As before Syria is missing a value for this indicator. Again, this will be imputed with the mean of West Bank and Gaza's and Yemen's value due to the similar economic situations of the countries. The imputed distribution is not deviating much from the original distribution as of figure 6. There are no outliers for this indicator according to the box plot. However, it is noticeable that Bahrain, Jordan and ARE are not part of the statistically expected normal distribution as they have indicator values above 130 % of GDP. 

![](https://i.ibb.co/BtkhzCN/international-trade.png)

### 2.4 Parliament seats hold by women

<b>(26.67% missing values)</b>: This indicator gives a hint on the gender equality in the region. Obtaining this information can be difficult with ad hoc changes like replacements due to death or resignations and the reliability can be questioned because it is compiled on national information. However, the numbers from the dataset and the source seem to be aligned with the Political Empowerment Section of the Global Gender Gap Report (<a href= "./outside_sources/WEF_GGGR_2020.pdf">World Economic Forum, 2020</a>). Furthermore, it is a Sustainable Development Goal Indicator (<a href="https://unstats.un.org/sdgs/metadata/">United Nations, 2020</a>).

<em> &rarr; Decision: Keep</em>

Oman, Qatar and Saudi Arabia are missing values for this indicator while the data source actually provides data for them (<a href="https://www.ipu.org/parliament/OM">IPU, 2020a</a>)(<a href="https://www.ipu.org/parliament/QA">IPU, 2020b</a>)(<a href="https://www.ipu.org/parliament/SA">IPU, 2020c</a>). These data points will be used to impute the missing values. As the indicator is only compiled for countries with an existing national legislature, there is no data available for West Bank and Gaza. The Palestinian Legislative Council was dissolved in December 2018 and since then t there is no parliament in place (<a href=https://www.cia.gov/library/publications/the-world-factbook/geos/we.html>CIA, 2020c</a>). Hence, it is decided to not impute the null value for PSE. 

The imputed distribution is not deviating much from the original distribution as of figure 7 and there are no outliers.

<br><br><br>
![](https://i.ibb.co/Tmxh7V7/parlement-seats.png)


### 2.5 ODA per Capita

<b>(46.67% missing values)</b>: Official Development Assistance (ODA) can be received by countries which are on the DAC list of aid recipients (<a href = "./outside_sources/DAC-List-of-ODA-Recipients-for-reporting-2018-and-2019-flows.pdf">OECD, 2020</a>). This indicator displays to what extend a country received ODA. As it does not take into account how much aid was given by the recipient countries to other developing countries, some countries might be reflected as aid receivers while they are actually net donors. Due to the missing counter part, this indicator can be misleading and is not seen meaningful.

<em> &rarr; Decision: Drop</em>

### 2.6 BOP income share

<b>(80% missing values)</b>: Poverty data like the income share held by the lowest 20% of the population are difficult to obtain. The indicator is retrieved from national household surveys, which are typically computed every few years. Often, the insights from the household survey are not comparable due to different computation methods and times. Furthermore, it is only filled by 3 countries: Cyprus, Jordan and Turkey. All of these have a strong affiliation to the European Union by being member state, candidate or due to a close relationship (<a href="https://reliefweb.int/report/jordan/report-eu-jordan-relations-framework-revised-european-neighbourhood-policy-june-2018">European Commission, 2019</a>).

<em> &rarr; Decision: Drop</em>

### 2.7 Poverty Line Gap

<b>(93.33% missing values)</b>: As BOP income, this indicator is retrieved from national household surveys and faces the same difficulties. In our region, only Jordan is reporting the indicator while worldwide over 90% of the countries do not reported it. Jordan has implemented a poverty reduction strategy which is the reason for them monitoring the poverty line gap closely (<a href="http://www.undp.org/content/dam/jordan/docs/Poverty/Jordanpovertyreductionstrategy.pdf">UNDP, 2013</a> 

<em> &rarr; Decision: Drop</em>

## 3. Education Indicators

### 3.1 Enrollment Rate
- Adjusted net enrollment rate, primary
- School enrollment, primary

<b>(46.67% missing values)</b>: These indicators are considered as a pair due to an identical definition and source, also, the data points for each enrollment rate differ in value by an average of 1.5. Furthermore, 40% of values for all regions is null. Secondly, the datasets <a href="http://data.uis.unesco.org">source</a> only provides "Adjusted net enrollment rate, *one year before the official primary entry age*". But according to our meta data, the enrollment rates are the percentage of children *within the school-age group for primary education*. Therefore, the data cannot be verified, rendering it untrustworthy.

<em>&rarr; Decision: Drop</em>

### 3.2 Completion Rate
<b>(40% missing values)</b>: The data from the <a href="http://data.uis.unesco.org/index.aspx?queryid=121#">original source</a> contradicts both the data from the <a href="https://data.worldbank.org/indicator/SE.PRM.CMPT.ZS">world bank</a> which also contradicts the data provided to us. We assume the data for this indicator has been altered at least twice and is therefore untrustworthy.

<em>&rarr; Decision: Drop</em>

### 3.3 Literacy Rate
<b>(73.33% missing values)</b>: The meta data states that literacy rate is difficult to measure, and its definition and methods of data collection differ across countries and so should be used cautiously. Estimating these rates requires census measurements under controlled conditions.As such, we looked into the dates of the latest population census per country in Arabian Peninsula (from meta data) and found that:
- Years vary to a high degree
- Two outliers exist, Lebanon and Yemen with census reported in 1943 and 1997 respectively
- The average year of census reports for our region is 2010 (10 years old) (figure 8).

<em>&rarr; Decision: Drop</em>

These findings render the data untrustworthy.

![](https://i.ibb.co/wWGFf2X/census-data.png)


## 4. Employment

<b>(No missing values)</b>: The seven indicators for the employment category shine by having 100% fill rate, so no missing values. In addition, the indicators share the same trustworthy source, the International Labor Organization (ILO). However, there are strong factors in the metadata indicating that the data is not usable for an international comparison as conducted in this report: 

- ILO models the indicators with data drawn from labor force surveys and supplement it with estimates, resulting in different reporting standards regarding definitions, coverage and timelines
- Metadata specifically states that these indicators have gender biases. Depending on different demographic, social, legal and cultural trends and norms, it is differently determined whether women's activities are regarded as economic.
- 2/3 of the countries in our region are within the last 25 ranks of the latest Global Gender Gap Report (<a href= "./outside_sources/WEF_GGGR_2020.pdf">World Economic Forum, 2020</a>) (50% within the last 15 ranks)

<em> &rarr; Decision: Drop</em>

## 5 Environment

### 5.1 Energy Usage per GDP & GDP per Energy
<b>(13.33% missing values)</b>: Firstly, both data sets date back to 2014. Secondly, although the world bank displays the data for <a href="https://data.worldbank.org/indicator/EG.USE.COMM.GD.PP.KD">Energy Use</a> and 
<a href="https://data.worldbank.org/indicator/EG.GDP.PUSE.KO.PP.KD">GDP per Energy</a>, the link to their <a href="https://www.iea.org/stats/index.asp">source</a> is broken. After further research within the original source's <a href="https://www.iea.org/data-and-statistics?country=WORLD&fuel=Energy%20supply&indicator=TPESbySource">database</a>, we notice that the specific indicators cannot be identified due to different labels and a lack of metadata on our end.

<em> &rarr; Decision: Drop</em>

### 5.2 Improved Water & Sanitation
<b>(No missing values)</b>: In 2015, the <a href="https://www.washdata.org/data/household#!/">Joint Monitoring Programme by WHO/UNICEF</a> segmented these two indicators into:
- People using safely managed sanitation services (% of population) (SH.STA.SMSS.ZS) 
- People using basic sanitation services (% of population) (SH.STA.BASS.ZS)
- People using safely managed drinking water services (% of population) (SH.H2O.SMDW.ZS) 
- People using basic drinking water services (% of population) (SH.H2O.BASW.ZS).

Therefore, our data is outdated by 5 years, and since we're not allowed to import new columns for our analysis, we cannot conduct an analysis for the new indicators.

<em> &rarr; Decision: Drop</em>

### 5.3 CO2 Emissions
<b>(No missing values)</b>: Although the data goes up to 2014, it is accurate and verifiable at its <a href="https://cdiac.ess-dive.lbl.gov/trends/emis/top2014.cap">source</a>. However, it's important to note there's a discrepancy of almost x3 between the numbers from the source and the numbers we have. This might be because the original data is expressed as metric tons of carbon, whereas our data isn't specified whether it's metric tons of carbon or carbon dioxide. Regardless of the discrepancy, the trend and ranking of countries from both data sets are similar: Qatar, Kuwait, Bahrain, and UAE are amongst the top 10 countries in the world with the highest CO2 Emissions per Capita (figure 9).

External research on why Qatar is top:
https://www.qscience.com/content/papers/10.5339/qfarc.2018.EEPD592#abstract_content

<em> &rarr; Decision: Keep</em>
<br>
![](https://i.ibb.co/VBPVnxg/co2-emisions.png)
<br><br>

## 6. Health

According to <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4176927/">(Cammett, et al., 2014)</a>, the World Health Organization relies on government reported information. Due to unstable governmental systems of the majority of the countries in the Arabian Peninsula, health organization provide estimates or perceptions of the populations. 

From 1980 to 2003, the countries in the regions have been implementing a system for registering vital events such as deaths and births. Depending on when the system was implemented. Missing data has been imputed with estimated ages
<a href="https://www.cdc.gov/nchs/isp/isp_iivrs.htm">(U.S. Department of Health & Human Services, 2015)</a>. Due to the lack of access to appropriate health care a lot of the cases are not reported properly. 

Religion also plays an important role for the lack of sex education and safety <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5303702/">(Dupont, 2017)</a>

### 6.1 Aids Deaths, ART Coverage, HIV cases 
<b>(66.67% missing values)</b>: A proportion of HIV cases would results in deaths due to Aids due to no treatment (or late discovery). Art coverage indicates the percentage of all people living with HIV who are receiving therapy <a href="https://archive.org/details/peninsulas0000nize/page/19/mode/2up">(HIV.gov, 2020)</a>. 

The numbers in the sheets are estimates by UNAIDS. Nothing is known about the collection of data. 

If more people are aware of the problems through sex education (which is limited) or by more receiving more treatment, the cases and deaths should decrease in relation to the total population <a href="https://www.avert.org/professionals/hiv-around-world/middle-east-north-africa-mena">(Avert, 2020)</a>. There is no correlation with aids deaths, HIV cases, and treatment (as there should), so the estimates are not an accurate representation <a href="https://www.thelancet.com/journals/lanhiv/article/PIIS2352-3018(16)30087-X/fulltext#%20">(GBD 2015 HIV Collaborators, 2016)</a>. This is also shown in figure 10. There is no linear correlation.

Lastly, due to the relative low number of people that get treated, it is also hard to indicate whether this the death was from aids or HIV. 

This is why our team decided to drop these indicators. 

<em> &rarr; Decision: Drop </em>


![](https://i.ibb.co/dJ3n6Lq/correlation-aids.png)

<br>
<br>

### 6.2 Malaria Cases  
<b>(66.67% missing values)</b>: The data collection for the cases is unclear, however, the data does seem to be accurate in its proportion, looking at figure 11 <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4176927/">(Zamani, et al., 2013)</a>.

The supplemented research study explains that the majority of the countries are not suitable for the growth of malaria. The missing values represent the countries that don't have suitable climates. Yemen in this case is an outlier, which is also shown in figure 13. 

<em> &rarr; Decision: Keep & impute with 0 </em>

It is decided to impute the NaN values with 0, as there cases in the other countries are very unlikely and it would not make too much sense to impute them with the mean or median. As expected, the imputed distribution in figure 12 peaked much higher and the original distribution. 

![](https://i.ibb.co/yhbGgLG/malaria-environment.png)

![](https://i.ibb.co/N9zKnzr/malaria.png)

### 6.3 Undernourishment 

<b>(26.67% missing values)</b>: The indicator represents the the percentage of the population whose food intake is insufficient to meet dietary energy The Accuracy of indicator is questionable and has its limitation according to the meta data:

Food insecurity is not a problem of inadequate access of food (regardless of availability)
Average food available to each person, even corrected for possible effects of low income, is not a good predictor of food insecurity.
Nutrition security is dependent on the quality of care of mothers and children as well as the household's health environment.
As the information on access to food, the average availability of food per person, and the quality of care is not in our data available as a separate indicators, it is hard to interpret the actual meaning of the percentage.

→ Decision: Drop

<br>

### 6.4 Tuberculosis Cases & Mortality
The report mentioned in the date meta states that new information on Tuberculosis (in combination with HIV) becomes more available every day. The report and country outlook on the original source shows that the numbers (in form of the weighted average, matches with the data reported in our data set. It also shows that most countries in our region has received support of treatment within the last five years <a href="https://www.who.int/tb/country/data/profiles/en/">(WHO, 2020)</a>.

<em> &rarr; Decision: Keep both </em>
<br>
<br>

<b><ins>6.4.1 Tuberculosis Cases (No missing values) </ins>  </b> <br>
The cases is an the estimated number of cases expressed as the rate per 100,000 population. Figure 14 shows there are no obscure outliers, however, there are two : United Arab Emirates (1.6) and Westbank and Gaza (1.3). It is important to keep those two in mind when comparing it with the other countries, therefore a threshold of 5.0 has been set. These countries are supposed to have better supportive systems for Tuberculosis <a href="https://www.who.int/tb/country/data/profiles/en/">(WHO, 2020)</a>.
<br>
<br>

<b><ins>6.4.2 Tuberculosis Mortality (6.67% missing values) </ins>  </b> <br>
The meta data describes that the Tuberculosis death rate exist of tuberculosis among HIV-negative people,expressed as the rate per 100,000 population. As only one value is missing, it it is best to impute it with the median looking at figure 15 and 16. <br>

When looking at the outliers in figure 17, we can see that Yemen has higher rate for Tuberculosis Mortality. According to <a href="https://www.iamat.org/country/yemen/risk/tuberculosis#">(International Association for Medical Assistance to travelers, 2020)</a>, Yemen is experiencing a high endemic of Tuberculosis.

<div style = "width:image width px; font-size:80%; text-align:center;"><img src="./PDF_Images/tccases_outliers.png" width="900" height="300" style="padding-bottom:0.5em;"></div>


<div style = "width:image width px; font-size:80%; text-align:center;"><img src="./PDF_Images/tcm_distribution.png" width="900" height="300" style="padding-bottom:0.5em;"></div>

<div style = "width:image width px; font-size:80%; text-align:center;"><img src="./PDF_Images/tcm_distribution_imp.png" width="900" height="300" style="padding-bottom:0.5em;"></div>

### 6.5 Measles Immunization
<b>(6.67% missing values)</b>: The meta data describes that the data from WHO was gathered from national censuses and nationally representative household surveys. <i> The last census data collection year varies within our region from 1943 to 2017 </i>. In this case, the outdated censuses do not reflect an accurate representation of the immunization situation. However, more recent data has been published on <a href="https://apps.who.int/immunization_monitoring/globalsummary/wucoveragecountrylist.html
">(WHO, n.d)</a>, showing that the reported number in our data set does reflect more recent reports from 2019. This means that the data does still reflect the situation, regardless of the census collection year. 

<em> &rarr; Decision: keep </em>

With only one missing value, looking at figure 18 and 19, the <i> mean </i> would better represent the missing value. <i> Two outliers were found in figure 20, which are significantly lower than the rest: Yemen and Iraq </i>. As the box plot did not mark these as outliers, a threshold of 79 has been set.

<div style = "width:image width px; font-size:80%; text-align:center;"><img src="./PDF_Images/measles_distr.png" width="3000" height="1000" style="padding-bottom:0.5em;"></div>

<div style = "width:image width px; font-size:80%; text-align:center;"><img src="./PDF_Images/measles_out_im.png" width="3000" height="1000" style="padding-bottom:0.5em;"></div>

### 6.6 Life Expectancy & Fertility Rate

<b>(No missing values)</b>: According to the meta date, the annual data series data are interpolated data from 5-year period data. The data was takes from six different sources in order to compliment each other: 
1. United Nations Population Division. World Population Prospects: 2019 Revision. 
2. Census reports and other statistical publications from national statistical offices.
3. Eurostat: Demographic Statistics.
4. United Nations Statistical Division. Population and Vital Statistics Reports (various years).
5. U.S. Census Bureau: International Database.
6. Secretariat of the Pacific Community: Statistics and Demography Programme.

In general it is more difficult to measure the accuracy of the data due the instability in some of the countries. In order to give some indication of the life expectancy and fertility rate, it is needed to combine different sources, given the situation.

→ Decision: Keep

There are two outliers in figure 21, Yemen and Iraq, which have lower life expectancy age. There are no outliers for the fertility rate (figure 22).

<div style = "width:image width px; font-size:80%; text-align:center;"><img src="./PDF_Images/life_fertility_outliers.png" width="3000" height="1000" style="padding-bottom:0.5em;"></div>

### 6.7 Maternal Mortality 
<b>(No missing values)</b>: The meta data described that the ratios are generally of unknown reliability, and therefore it cannot be assumed that the provided ratios represent accurate estimates. There are other dependencies for mortality, which makes it difficult to measure. Therefore, our team cannot really trust the accuracy of the data and has decided to drop this indicator. 

<em> &rarr;Decision: Drop </em>

### 6.8  Prenatal Care, Delivery Care, and Infant Mortality
Prenatal Care (80% missing values)<br> Delivery care (46.67% missing values)<br>
Infant Mortality (no missing values)
</b>  <br>

Prenatal care refers to the percentage of pregnant women who were attended by skilled workers at least once during pregnancy. Delivery Care, on the other hand, refers to births attended by skills staff as a percentage. As described before and plotted in 3.3 Literacy Rate, the data of these indicators are dependent on censuses data.  
As there is no other data substitutes, justifying the numbers and direct reasons for missing values is difficult.

<em> &rarr; Decision: Drop </em>


### 6.9 Adolescent Fertility
<b>(No missing values)</b>: The source mentioned in the Meta data does not specifically has data related to Adolescent Fertility. There are data excel sheets that have certain numbers per age, however, the average-weighted data does not match with the data given in our dataset. Furthermore, there is no other sources or info on the collection data, and can therefore not verify the accuracy. 

<em> &rarr; Decision Drop </em>