# PFDA Big Project (working title)
## Author: Andre Hoarau

* Country: The name of the country where the health data was recorded.
* Year: The year in which the data was collected.
* Disease Name: The name of the disease or health condition tracked.
* Disease Category: The category of the disease (e.g., Infectious, Non-Communicable).
* Prevalence Rate (%): The percentage of the population affected by the disease.
* Incidence Rate (%): The percentage of new or newly diagnosed cases.
* Mortality Rate (%): The percentage of the affected population that dies from the disease.
* Age Group: The age range most affected by the disease.
* Gender: The gender(s) affected by the disease (Male, Female, Both).
* Population Affected: The total number of individuals affected by the disease.
* Healthcare Access (%): The percentage of the population with access to healthcare.
* Doctors per 1000: The number of doctors per 1000 people.
* Hospital Beds per 1000: The number of hospital beds available per 1000 people.
* Treatment Type: The primary treatment method for the disease (e.g., Medication, Surgery).
* Average Treatment Cost (USD): The average cost of treating the disease in USD.
* Availability of Vaccines/Treatment: Whether vaccines or treatments are available.
* Recovery Rate (%): The percentage of people who recover from the disease.
* DALYs: Disability-Adjusted Life Years, a measure of disease burden.
* Improvement in 5 Years (%): The improvement in disease outcomes over the last five years.
* Per Capita Income (USD): The average income per person in the country.
* Education Index: The average level of education in the country.
* Urbanization Rate (%): The percentage of the population living in urban areas.

In [None]:
# Imports that we will need
import pandas as pd
import plotly.express as px
import sqlite3


In [15]:
# Read in our first data set
filepath = ".data/updatedglobalhealthstatistics.zip"
dfglobalhealth = pd.read_csv(filepath, compression='zip' )
dfglobalhealth.head()


FileNotFoundError: [Errno 2] No such file or directory: '.data/updatedglobalhealthstatistics.zip'

In [None]:
# Create a database file for efficient use of the data.


In [17]:
# Deadliest disease recorded per country
deadliestdisease = dfglobalhealth.loc[dfglobalhealth.groupby("Country")["Mortality Rate (%)"].idxmax()]
# Create a choropleth map
fig = px.choropleth(
    deadliestdisease,
    locations="Country",             # Use country names directly
    locationmode="country names",    # Specify that we're using country names
    color="Mortality Rate (%)",          # Values to color by
    hover_name="Country",            # Display country name on hover
    hover_data={"Disease Name": True, "Mortality Rate (%)": True},  # Additional data on hover
    title="Deadliest Disease by Country",
    color_continuous_scale="Reds"    # Color scale for the map
)

fig.update_layout(geo=dict(showframe=False, showcoastlines=True))
fig.show()

# References:
[MalaiarasuGRaj. (2024). Global Health Statistics [Data set]. Kaggle.](https://doi.org/10.34740/KAGGLE/DSV/10028650) - This is the global health statistics I used.

# End