# Group Questions

> **Things to remember**:
> - The questions must be stated, justified, and an additional question is proposed above the minimum of 2 or at least outlined with a study afterward (sometimes these extra questions are not made explicit but are still studied).
> - We should make an exploratory data analysis (EDA) for each question.
> - We must apply more than one advanced technique (statistical test or data mining algorithm). For example, one advanced technique per question.
> - Conclusions must be consistent with the results and understandable, answering the questions of interest raised. Also, we have to interpret the conclusions within the context of the dataset’s domain (e.g., therefore, treatment X is better than Y for treating fever, and the drug guides should be modified to improve the treatment of this type of symptom).
> - We should include more than one correct and coherent visualization for the type of information being presented, making proper use of visual elements. For a higher grade, we should make use of advanced visualizations features of the libraries we use.

First, we will import the necessary libraries.

In [2]:
import polars as pl

# Types
from polars.dataframe.frame import DataFrame

Now, we will load the cleaned dataset generated in the `01_introduction_and_processing.ipynb` notebook and perform and show its first few rows.

In [3]:
cleaned_df: DataFrame = pl.read_csv(
    source="../data/cleaned/data.csv",
)
cleaned_df.head()

Country,Density(P/Km2),Abbreviation,Agricultural Land(%),Land Area(Km2),Armed Forces size,Birth Rate,Co2-Emissions,CPI,CPI Change (%),Fertility Rate,Forested Area (%),Gasoline Price,GDP,Gross primary education enrollment (%),Gross tertiary education enrollment (%),Infant mortality,Life expectancy,Maternal mortality ratio,Out of pocket health expenditure,Physicians per thousand,Population,Population: Labor force participation (%),Tax revenue (%),Total tax rate,Unemployment rate,Urban population,Latitude,Longitude,Median Salary,Daily calorie supply per person from other commodities,Daily calorie supply per person from alcoholic beverages,Daily calorie supply per person from sugar,Daily calorie supply per person from oils and fats,Daily calorie supply per person from meat,Daily calorie supply per person from dairy and eggs,Daily calorie supply per person from fruits and vegetables,Daily calorie supply per person from starchy roots,Daily calorie supply per person from pulses,Daily calorie supply per person from cereals and grains,Daily total caloric ingestion
str,f64,str,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64
"""Afghanistan""",60.0,"""AF""",58.1,652230.0,323000.0,32.49,8672.0,149.9,2.3,4.47,2.1,0.7,19101000000.0,104.0,9.7,47.9,64.5,638.0,78.4,0.28,38041754.0,48.9,9.3,71.4,11.12,9797273.0,33.93911,67.709953,853.74,8.821558,0.0,202.237697,254.483138,50.166415,102.460547,123.306214,25.98626,24.000058,1519.539312,2311.001198
"""Albania""",105.0,"""AL""",43.1,28748.0,9000.0,11.78,4536.0,119.05,1.4,1.62,28.1,1.36,15278000000.0,107.0,55.0,7.8,78.5,15.0,56.9,1.2,2854191.0,55.7,18.6,36.6,12.33,1747593.0,41.153332,20.168331,832.84,53.712393,50.965451,223.651168,421.348781,212.191398,814.849454,713.243894,90.512176,51.973577,942.069818,3574.518109
"""Algeria""",18.0,"""DZ""",17.4,2381741.0,317000.0,24.28,150006.0,151.36,2.0,3.02,0.8,0.28,169990000000.0,109.9,51.4,20.1,76.7,112.0,28.1,1.72,43053054.0,41.2,37.2,66.1,11.7,31510100.0,28.033886,1.659626,1148.84,24.801911,4.999601,301.633378,560.377034,101.906382,326.188734,615.630856,115.823008,51.358187,1611.030679,3713.74977
"""Andorra""",164.0,"""AD""",40.0,468.0,45891.309772,7.2,469.0,144.893552,8.814943,1.27,34.0,1.51,3154100000.0,106.4,52.574324,2.7,84.04,-38.784641,36.4,3.33,77142.0,62.618059,19.248218,39.416895,6.042135,67873.0,42.506285,1.521801,3668.08,56.997138,208.266626,407.11814,873.135015,392.957022,506.879572,211.999502,120.999896,12.999883,821.000324,3612.353118
"""Angola""",26.0,"""AO""",47.5,1246700.0,117000.0,40.73,34693.0,261.73,17.1,5.52,46.3,0.97,94635000000.0,113.5,9.3,51.6,60.8,241.0,33.4,0.21,31825295.0,77.5,9.2,49.1,6.89,21061025.0,-11.202692,17.873887,284.39,14.899377,84.089373,123.999842,350.435213,124.822425,18.054628,156.596191,888.107249,62.843695,875.956841,2699.804832


### Q1: What factors influence a country's life expectancy?

> - We can add external data to this analysis, such as climate.
> - We can explore correlations between the variables and apply association rules to find patterns.

In order to give an answer to the question *What factors influence a country's life expectancy?*, we will investigate the key factors that influence life expectancy across different countries. Life expectancy is a crucial indicator of a nation’s overall well-being, reflecting healthcare quality, economic conditions, and other factors.

Understanding the determinants of life expectancy is essential for policymakers, healthcare professionals, and economists to improve public health strategies and resource allocation. 
By analyzing global data, this study will identify the most influential factors and their relationships with life expectancy, offering insights into how countries can improve public health outcomes.

#### Correlation Analysis

#### Association Rules

Association rules are a powerful technique for discovering relationships between variables in large datasets. They are widely used in market basket analysis, where the goal is to identify patterns in consumer behavior. In this study, we will apply association rules to identify patterns between different variables and life expectancy. This method will allow us to discover meaningful patterns, such as whether healthcare expenditures are strongly associated with increased life expectancy or if specific economic conditions correlate with lower life expectancy.

The first thing we need to do is to select the columns that we will use in the analysis.

#### Conclusion

### Q2

> We can try to make a clustering of countries (if clustering doesn't give us much information, we should consider one of the other questions we had planned and adapt them to apply an advanced technique).