# Project 1
### Problem statement
For this project you must create a data set by simulating a real-world phenomenon of
your choosing. You may pick any phenomenon you wish – you might pick one that is
of interest to you in your personal or professional life. Then, rather than collect data
related to the phenomenon, you should model and synthesise such data using Python.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import io
import requests

Create a function to download a dataset from the WHO website and create a pandas dataframe with it

In [2]:
def get_data(url):
    s = requests.get(url).content
    df = pd.read_csv(io.StringIO(s.decode('utf-8'))) 
    return df

For now the plan is to look at either HIV https://apps.who.int/gho/data/node.main.618?lang=en or life expectancy https://github.com/Priyankkoul/Life-Expectancy-WHO---Data-Analytics/blob/master/DATASET.csv

In [3]:
life_expectancy = get_data('https://github.com/Priyankkoul/Life-Expectancy-WHO---Data-Analytics/raw/master/DATASET.csv')
life_expectancy

Unnamed: 0,Country,Year,Status,Life expectancy,Adult Mortality,infant deaths,Alcohol,percentage expenditure,Hepatitis B,Measles,...,Polio,Total expenditure,Diphtheria,HIV/AIDS,GDP,Population,thinness 1-19 years,thinness 5-9 years,Income composition of resources,Schooling
0,Afghanistan,2015,Developing,65.0,263.0,62,0.01,71.279624,65.0,1154,...,6.0,8.16,65.0,0.1,584.259210,33736494.0,17.2,17.3,0.479,10.1
1,Afghanistan,2014,Developing,59.9,271.0,64,0.01,73.523582,62.0,492,...,58.0,8.18,62.0,0.1,612.696514,327582.0,17.5,17.5,0.476,10.0
2,Afghanistan,2013,Developing,59.9,268.0,66,0.01,73.219243,64.0,430,...,62.0,8.13,64.0,0.1,631.744976,31731688.0,17.7,17.7,0.470,9.9
3,Afghanistan,2012,Developing,59.5,272.0,69,0.01,78.184215,67.0,2787,...,67.0,8.52,67.0,0.1,669.959000,3696958.0,17.9,18.0,0.463,9.8
4,Afghanistan,2011,Developing,59.2,275.0,71,0.01,7.097109,68.0,3013,...,68.0,7.87,68.0,0.1,63.537231,2978599.0,18.2,18.2,0.454,9.5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2933,Zimbabwe,2004,Developing,44.3,723.0,27,4.36,0.000000,68.0,31,...,67.0,7.13,65.0,33.6,454.366654,12777511.0,9.4,9.4,0.407,9.2
2934,Zimbabwe,2003,Developing,44.5,715.0,26,4.06,0.000000,7.0,998,...,7.0,6.52,68.0,36.7,453.351155,12633897.0,9.8,9.9,0.418,9.5
2935,Zimbabwe,2002,Developing,44.8,73.0,25,4.43,0.000000,73.0,304,...,73.0,6.53,71.0,39.8,57.348340,125525.0,1.2,1.3,0.427,10.0
2936,Zimbabwe,2001,Developing,45.3,686.0,25,1.72,0.000000,76.0,529,...,76.0,6.16,75.0,42.1,548.587312,12366165.0,1.6,1.7,0.427,9.8


we will look at a few different HIV variables to see if we can combine them into single dataset containing all the relevant data.

In [4]:

deaths = get_data('https://apps.who.int/gho/athena/data/xmart.csv?target=GHO/HIV_0000000006&profile=crosstable&filter=COUNTRY:-;REGION:*&ead=&x-sideaxis=REGION;YEAR&x-topaxis=GHO&x-collapse=true')
deaths

Unnamed: 0,WHO region; Year,Number of people dying of HIV-related causes
0,Global; 2022,630 000 [480 000-880 000]
1,Global; 2021,660 000 [500 000-920 000]
2,Global; 2020,690 000 [520 000-960 000]
3,Global; 2019,720 000 [550 000-1 000 000]
4,Global; 2018,760 000 [580 000-1 100 000]
...,...,...
226,Western Pacific; 1994,10 000 [6000-16 000]
227,Western Pacific; 1993,7500 [4400-12 000]
228,Western Pacific; 1992,5900 [3500-9400]
229,Western Pacific; 1991,4600 [2700-7300]


In [5]:
new_infections = get_data('https://apps.who.int/gho/athena/data/xmart.csv?target=GHO/HIV_0000000026,SDGHIV&profile=crosstable&filter=COUNTRY:-;REGION:*&ead=&x-sideaxis=REGION;YEAR;&x-topaxis=GHO;SEX')
new_infections

Unnamed: 0.1,Unnamed: 0,Unnamed: 1,Number of new HIV infections,Number of new HIV infections.1,Number of new HIV infections.2,New HIV infections (per 1000 uninfected population),New HIV infections (per 1000 uninfected population).1,New HIV infections (per 1000 uninfected population).2
0,WHO region,Year,Both sexes,Male,Female,Both sexes,Male,Female
1,Global,2022,1 300 000 [1 000 000-1 700 000],710 000 [540 000 - 940 000],610 000 [450 000 - 830 000],0.17 [0.13-0.23],0.18 [0.14–0.24],0.16 [0.12–0.22]
2,Global,2021,1 400 000 [1 100 000-1 800 000],730 000 [560 000 - 970 000],660 000 [490 000 - 900 000],0.18 [0.14-0.24],0.19 [0.14–0.25],0.17 [0.13–0.24]
3,Global,2020,1 500 000 [1 100 000-1 900 000],760 000 [580 000 - 1 000 000],710 000 [530 000 - 960 000],0.19 [0.14-0.26],0.2 [0.15–0.26],0.19 [0.14–0.26]
4,Global,2019,1 500 000 [1 200 000-2 000 000],780 000 [600 000 - 1 000 000],750 000 [560 000 - 1 000 000],0.20 [0.15-0.27],0.21 [0.16–0.27],0.2 [0.15–0.28]
...,...,...,...,...,...,...,...,...
227,Western Pacific,1994,130 000 [91 000-170 000],93 000 [67 000 - 120 000],34 000 [23 000 - 46 000],0.08 [0.06-0.11],0.12 [0.08–0.15],0.04 [0.03–0.06]
228,Western Pacific,1993,42 000 [30 000-55 000],32 000 [23 000 - 42 000],10 000 [6800 - 14 000],0.03 [0.02-0.04],0.04 [0.03–0.05],0.01 [&lt;0.01–0.02]
229,Western Pacific,1992,34 000 [24 000-44 000],26 000 [19 000 - 34 000],7500 [5100 - 10 000],0.02 [0.02-0.03],0.03 [0.02–0.04],&lt;0.01 [&lt;0.01–0.01]
230,Western Pacific,1991,28 000 [20 000-36 000],22 000 [16 000 - 29 000],5700 [3900 - 7800],0.02 [0.01-0.02],0.03 [0.02–0.04],&lt;0.01 [&lt;0.01–0.01]


In [6]:
prevalence_18_45 = get_data('https://apps.who.int/gho/athena/data/xmart.csv?target=GHO/MDG_0000000029&profile=crosstable&filter=COUNTRY:-;REGION:*&ead=&x-sideaxis=REGION&x-topaxis=GHO;YEAR')
prevalence_18_45

Unnamed: 0.1,Unnamed: 0,Prevalence of HIV among adults aged 15 to 49 (%),Prevalence of HIV among adults aged 15 to 49 (%).1,Prevalence of HIV among adults aged 15 to 49 (%).2,Prevalence of HIV among adults aged 15 to 49 (%).3,Prevalence of HIV among adults aged 15 to 49 (%).4,Prevalence of HIV among adults aged 15 to 49 (%).5,Prevalence of HIV among adults aged 15 to 49 (%).6,Prevalence of HIV among adults aged 15 to 49 (%).7,Prevalence of HIV among adults aged 15 to 49 (%).8,...,Prevalence of HIV among adults aged 15 to 49 (%).23,Prevalence of HIV among adults aged 15 to 49 (%).24,Prevalence of HIV among adults aged 15 to 49 (%).25,Prevalence of HIV among adults aged 15 to 49 (%).26,Prevalence of HIV among adults aged 15 to 49 (%).27,Prevalence of HIV among adults aged 15 to 49 (%).28,Prevalence of HIV among adults aged 15 to 49 (%).29,Prevalence of HIV among adults aged 15 to 49 (%).30,Prevalence of HIV among adults aged 15 to 49 (%).31,Prevalence of HIV among adults aged 15 to 49 (%).32
0,WHO region,2022,2021,2020,2019,2018,2017,2016,2015,2014,...,1999,1998,1997,1996,1995,1994,1993,1992,1991,1990
1,Global,0.7 [0.6-0.8],0.7 [0.6-0.8],0.7 [0.6-0.8],0.7 [0.6-0.9],0.7 [0.6-0.9],0.7 [0.6-0.9],0.7 [0.6-0.9],0.7 [0.6-0.9],0.7 [0.6-0.8],...,0.7 [0.6-0.8],0.7 [0.6-0.8],0.6 [0.5-0.7],0.6 [0.5-0.7],0.6 [0.5-0.7],0.5 [0.4-0.6],0.5 [0.4-0.5],0.4 [0.3-0.5],0.4 [0.3-0.4],0.3 [0.2-0.3]
2,Africa,3.2 [2.7-3.7],3.3 [2.8-3.9],3.5 [2.9-4],3.6 [3-4.1],3.7 [3-4.3],3.8 [3.1-4.4],3.8 [3.2-4.5],3.9 [3.2-4.5],4 [3.3-4.6],...,4.7 [3.9-5.5],4.7 [3.9-5.5],4.6 [3.8-5.3],4.4 [3.7-5.2],4.2 [3.5-4.9],4 [3.3-4.6],3.7 [3-4.2],3.3 [2.7-3.8],2.9 [2.4-3.3],2.4 [2-2.8]
3,Americas,0.5 [0.4-0.5],0.5 [0.4-0.5],0.5 [0.4-0.5],0.5 [0.4-0.5],0.5 [0.4-0.5],0.5 [0.4-0.5],0.5 [0.4-0.5],0.5 [0.4-0.5],0.5 [0.4-0.5],...,0.4 [0.3-0.4],0.4 [0.3-0.4],0.3 [0.3-0.4],0.3 [0.3-0.4],0.3 [0.3-0.4],0.3 [0.3-0.4],0.3 [0.3-0.3],0.3 [0.2-0.3],0.3 [0.2-0.3],0.3 [0.2-0.3]
4,South-East Asia,0.2 [0.2-0.3],0.3 [0.2-0.3],0.3 [0.2-0.3],0.3 [0.2-0.3],0.3 [0.2-0.3],0.3 [0.2-0.3],0.3 [0.2-0.4],0.3 [0.3-0.4],0.3 [0.3-0.4],...,0.5 [0.4-0.6],0.5 [0.4-0.6],0.5 [0.4-0.6],0.4 [0.4-0.5],0.4 [0.3-0.5],0.3 [0.3-0.4],0.3 [0.2-0.3],0.2 [0.2-0.3],0.2 [0.1-0.2],0.1 [&lt;0.1-0.1]
5,Europe,0.5 [0.4-0.5],0.5 [0.4-0.5],0.5 [0.4-0.5],0.5 [0.4-0.5],0.4 [0.4-0.5],0.4 [0.4-0.5],0.4 [0.4-0.4],0.4 [0.3-0.4],0.4 [0.3-0.4],...,0.2 [0.1-0.2],0.1 [0.1-0.2],0.1 [0.1-0.1],0.1 [0.1-0.1],0.1 [&lt;0.1-0.1],0.1 [&lt;0.1-0.1],&lt;0.1 [&lt;0.1-0.1],&lt;0.1 [&lt;0.1-&lt;0.1],&lt;0.1 [&lt;0.1-&lt;0.1],&lt;0.1 [&lt;0.1-&lt;0.1]
6,Eastern Mediterranean,0.1 [&lt;0.1-0.1],&lt;0.1 [&lt;0.1-0.1],&lt;0.1 [&lt;0.1-0.1],&lt;0.1 [&lt;0.1-0.1],&lt;0.1 [&lt;0.1-0.1],&lt;0.1 [&lt;0.1-0.1],&lt;0.1 [&lt;0.1-&lt;0.1],&lt;0.1 [&lt;0.1-&lt;0.1],&lt;0.1 [&lt;0.1-&lt;0.1],...,&lt;0.1 [&lt;0.1-&lt;0.1],&lt;0.1 [&lt;0.1-&lt;0.1],&lt;0.1 [&lt;0.1-&lt;0.1],&lt;0.1 [&lt;0.1-&lt;0.1],&lt;0.1 [&lt;0.1-&lt;0.1],&lt;0.1 [&lt;0.1-&lt;0.1],&lt;0.1 [&lt;0.1-&lt;0.1],&lt;0.1 [&lt;0.1-&lt;0.1],&lt;0.1 [&lt;0.1-&lt;0.1],&lt;0.1 [&lt;0.1-&lt;0.1]
7,Western Pacific,0.2 [0.1-0.2],0.2 [0.1-0.2],0.2 [0.1-0.2],0.2 [0.1-0.2],0.2 [0.1-0.2],0.2 [0.1-0.2],0.1 [0.1-0.2],0.1 [0.1-0.2],0.1 [&lt;0.1-0.2],...,&lt;0.1 [&lt;0.1-&lt;0.1],&lt;0.1 [&lt;0.1-&lt;0.1],&lt;0.1 [&lt;0.1-&lt;0.1],&lt;0.1 [&lt;0.1-&lt;0.1],&lt;0.1 [&lt;0.1-&lt;0.1],&lt;0.1 [&lt;0.1-&lt;0.1],&lt;0.1 [&lt;0.1-&lt;0.1],&lt;0.1 [&lt;0.1-&lt;0.1],&lt;0.1 [&lt;0.1-&lt;0.1],&lt;0.1 [&lt;0.1-&lt;0.1]


In [7]:
infected = get_data('https://apps.who.int/gho/athena/data/xmart.csv?target=GHO/HIV_0000000001&profile=crosstable&filter=COUNTRY:-;REGION:*&ead=&x-sideaxis=REGION&x-topaxis=GHO;YEAR&x-collapse=true')
infected

Unnamed: 0,WHO region,Number of people (all ages) living with HIV; 2022,Number of people (all ages) living with HIV; 2021,Number of people (all ages) living with HIV; 2020,Number of people (all ages) living with HIV; 2019,Number of people (all ages) living with HIV; 2018,Number of people (all ages) living with HIV; 2017,Number of people (all ages) living with HIV; 2016,Number of people (all ages) living with HIV; 2015,Number of people (all ages) living with HIV; 2014,...,Number of people (all ages) living with HIV; 1999,Number of people (all ages) living with HIV; 1998,Number of people (all ages) living with HIV; 1997,Number of people (all ages) living with HIV; 1996,Number of people (all ages) living with HIV; 1995,Number of people (all ages) living with HIV; 1994,Number of people (all ages) living with HIV; 1993,Number of people (all ages) living with HIV; 1992,Number of people (all ages) living with HIV; 1991,Number of people (all ages) living with HIV; 1990
0,Global,39 000 000 [33 100 000-45 700 000],38 700 000 [32 800 000-45 200 000],38 300 000 [32 500 000-44 700 000],37 700 000 [32 000 000-44 100 000],37 100 000 [31 500 000-43 400 000],36 500 000 [31 000 000-42 700 000],35 800 000 [30 400 000-41 900 000],35 100 000 [29 800 000-41 100 000],34 400 000 [29 200 000-40 200 000],...,25 700 000 [21 800 000-30 100 000],24 600 000 [20 900 000-28 700 000],23 200 000 [19 700 000-27 100 000],21 500 000 [18 200 000-25 100 000],19 600 000 [16 700 000-23 000 000],17 600 000 [14 900 000-20 600 000],15 400 000 [13 100 000-18 100 000],13 300 000 [11 300 000-15 600 000],11 200 000 [9 500 000-13 100 000],9 200 000 [7 800 000-10 700 000]
1,Africa,25 600 000 [21 600 000-30 000 000],25 500 000 [21 600 000-30 000 000],25 400 000 [21 500 000-29 900 000],25 200 000 [21 400 000-29 600 000],24 900 000 [21 100 000-29 300 000],24 600 000 [20 900 000-28 900 000],24 300 000 [20 600 000-28 500 000],23 800 000 [20 200 000-28 000 000],23 400 000 [19 800 000-27 500 000],...,17 600 000 [14 900 000-20 700 000],16 900 000 [14 300 000-19 900 000],16 000 000 [13 600 000-18 800 000],15 000 000 [12 700 000-17 600 000],13 800 000 [11 700 000-16 200 000],12 500 000 [10 600 000-14 700 000],11 200 000 [9 400 000-13 100 000],9 700 000 [8 200 000-11 400 000],8 200 000 [7 000 000-9 700 000],6 800 000 [5 700 000-8 000 000]
2,Americas,3 800 000 [3 400 000-4 300 000],3 700 000 [3 300 000-4 200 000],3 600 000 [3 200 000-4 100 000],3 600 000 [3 100 000-4 000 000],3 500 000 [3 000 000-3 900 000],3 400 000 [3 000 000-3 800 000],3 300 000 [2 900 000-3 700 000],3 200 000 [2 800 000-3 600 000],3 100 000 [2 700 000-3 500 000],...,1 800 000 [1 600 000-2 000 000],1 700 000 [1 500 000-1 900 000],1 600 000 [1 400 000-1 800 000],1 500 000 [1 300 000-1 700 000],1 500 000 [1 300 000-1 600 000],1 400 000 [1 200 000-1 600 000],1 300 000 [1 200 000-1 500 000],1 200 000 [1 100 000-1 400 000],1 200 000 [1 000 000-1 300 000],1 100 000 [950 000-1 200 000]
3,South-East Asia,3 900 000 [3 400 000-4 600 000],3 900 000 [3 400 000-4 600 000],3 900 000 [3 400 000-4 600 000],3 900 000 [3 400 000-4 500 000],3 900 000 [3 400 000-4 500 000],3 900 000 [3 400 000-4 500 000],3 900 000 [3 400 000-4 600 000],3 900 000 [3 400 000-4 600 000],4 000 000 [3 400 000-4 600 000],...,4 700 000 [4 000 000-5 400 000],4 500 000 [3 900 000-5 200 000],4 200 000 [3 700 000-4 900 000],3 800 000 [3 300 000-4 400 000],3 400 000 [2 900 000-3 900 000],2 800 000 [2 500 000-3 300 000],2 300 000 [2 000 000-2 700 000],1 800 000 [1 500 000-2 100 000],1 300 000 [1 100 000-1 500 000],820 000 [710 000-950 000]
4,Europe,3 000 000 [2 600 000-3 300 000],2 900 000 [2 500 000-3 200 000],2 800 000 [2 400 000-3 100 000],2 600 000 [2 300 000-2 900 000],2 500 000 [2 200 000-2 800 000],2 400 000 [2 100 000-2 700 000],2 300 000 [2 000 000-2 500 000],2 100 000 [1 900 000-2 400 000],2 000 000 [1 800 000-2 200 000],...,780 000 [690 000-870 000],720 000 [630 000-800 000],650 000 [580 000-730 000],590 000 [520 000-660 000],540 000 [470 000-600 000],500 000 [440 000-550 000],460 000 [410 000-510 000],430 000 [380 000-480 000],400 000 [350 000-450 000],360 000 [320 000-410 000]
5,Eastern Mediterranean,490 000 [420 000-600 000],460 000 [390 000-560 000],430 000 [370 000-530 000],400 000 [340 000-490 000],370 000 [320 000-460 000],350 000 [300 000-430 000],330 000 [290 000-410 000],310 000 [270 000-390 000],300 000 [260 000-370 000],...,89 000 [76 000-110 000],80 000 [69 000-98 000],71 000 [61 000-88 000],63 000 [54 000-77 000],55 000 [47 000-68 000],48 000 [41 000-59 000],41 000 [35 000-51 000],34 000 [30 000-43 000],29 000 [25 000-36 000],25 000 [22 000-31 000]
6,Western Pacific,2 200 000 [1 700 000-2 800 000],2 200 000 [1 600 000-2 700 000],2 100 000 [1 600 000-2 600 000],2 000 000 [1 500 000-2 500 000],1 900 000 [1 400 000-2 400 000],1 800 000 [1 400 000-2 300 000],1 800 000 [1 300 000-2 200 000],1 700 000 [1 300 000-2 100 000],1 600 000 [1 200 000-2 000 000],...,750 000 [560 000-930 000],670 000 [500 000-830 000],580 000 [430 000-720 000],490 000 [370 000-620 000],400 000 [300 000-500 000],300 000 [230 000-370 000],180 000 [140 000-230 000],150 000 [110 000-190 000],120 000 [92 000-150 000],99 000 [74 000-120 000]
