## QTM 350: Data Science Computing

### Assignment 05 - Literate Programming with Quarto 

### Due Date: 11:59 PM on Wednesday, October 09, 2024

### Instructions

In this assignment, you will demonstrate your proficiency with Quarto by creating data science reports and presentations. You will analyse a sample of the [World Development Indicators dataset](https://databank.worldbank.org/source/world-development-indicators), focusing on one year (2022) and 14 variables. Your task involves performing data analysis, generating visualisations, and producing reproducible documents in multiple formats.

### Data

The sample dataset is provided in the file `wdi.csv`. The dataset is available in [our GitHub repository](https://github.com/danilofreire/qtm350/tree/main/assignments/world_bank_data.csv). You can also create the dataset by running the Python code below.

In [1]:
# Install the necessary libraries
# pip install pandas
# pip install wbgapi

# Import the libraries
import pandas as pd
import wbgapi as wb

In [2]:
# Define the indicators to download
indicators = {
    'gdp_per_capita': 'NY.GDP.PCAP.CD', # in current US dollars
    'gdp_growth_rate': 'NY.GDP.MKTP.KD.ZG', # in percent
    'inflation_rate': 'FP.CPI.TOTL.ZG', # in percent
    'unemployment_rate': 'SL.UEM.TOTL.ZS', # in percent
    'total_population': 'SP.POP.TOTL', # in persons
    'life_expectancy': 'SP.DYN.LE00.IN', # in years
    'adult_literacy_rate': 'SE.ADT.LITR.ZS', # in percent
    'income_inequality': 'SI.POV.GINI', # Gini index (100 = perfect inequality)
    'health_expenditure_gdp_share': 'SH.XPD.CHEX.GD.ZS', # in percent
    'measles_immunisation_rate': 'SH.IMM.MEAS', # in percent of children
    'education_expenditure_gdp_share': 'SE.XPD.TOTL.GD.ZS', # in percent
    'primary_school_enrolment_rate': 'SE.PRM.ENRR', # in percent of children
    'exports_gdp_share': 'NE.EXP.GNFS.ZS' # in percent
}

# Download data for all countries in 2022
df = wb.data.DataFrame(indicators.values(), time=2022, skipBlanks=True, labels=True).reset_index()

# Delete the 'economy' column
df = df.drop(columns=['economy'], errors='ignore')

# Create a reversed dictionary mapping indicator codes to names
# Rename the columns and convert all names to lowercase
df.rename(columns=lambda x: {v: k for k, v in indicators.items()}.get(x, x).lower(), inplace=True)

# Display the number of rows and columns
print(df.shape)

# Display the first few rows of the data
print(df.head(3))

# Save the data to a CSV file
# df.to_csv('wdi.csv', index=False)

(265, 14)
       country  inflation_rate  exports_gdp_share  gdp_growth_rate  \
0     Zimbabwe      104.705171          27.955246         6.522375   
1       Zambia       10.993204          40.193998         5.249622   
2  Yemen, Rep.             NaN                NaN              NaN   

   gdp_per_capita  adult_literacy_rate  primary_school_enrolment_rate  \
0     1676.821489            89.849998                      95.790001   
1     1456.901570                  NaN                            NaN   
2      698.850350                  NaN                            NaN   

   education_expenditure_gdp_share  measles_immunisation_rate  \
0                              NaN                       90.0   
1                            3.583                       90.0   
2                              NaN                       73.0   

   health_expenditure_gdp_share  income_inequality  unemployment_rate  \
0                           NaN                NaN             10.087   
1        

### Tasks

1. Please initialise a new `.qmd` file with an appropriate `YAML` header. Include metadata such as `title`, `author`, `date`, and specify the output format as `HTML` and `PDF`.
   
2. Load the dataset using your preferred programming language (R or Python). 
   
3. Conduct exploratory data analysis on at least three indicators of your choice. Summarise your findings in markdown sections. Show your code and results.
   
4. Create at least two different types of plots (e.g., bar chart, scatter plot) to represent your analysis. Use Quarto code chunks to embed these visualisations. Add a title and axis labels to each plot. Use Quarto to include a caption and a reference to the source of the data. Hide your code in the final document.
   
5. Construct a table that highlights some key statistics from your analysis. Ensure the table is well-formatted and included in the report.
   
6. Include cross-references to your figures and tables within the text. Demonstrate proper labeling and referencing techniques.
   
7. Add a bibliography using BibTeX (`.bib`). Cite at least two sources related to your analysis.
   
8.  Create a new `.qmd` file configured for `revealjs` output. Include a title slide, a few content slides, and a concluding slide. 
   
9.  Incorporate your analysis and visualisations from the report into the presentation.
    
10. Customise the presentation theme and incorporate at least one transition effect between slides.
    
11. Render your report and presentation to HTML, PDF, and Revealjs (HTML) formats. 
    
12. Use Git to manage your project and create a repository on GitHub. Submit the link to your repository on Canvas.
    
13. Set up GitHub Pages (preferably) or use GitHack to host your HTML report and presentation. 

### Bonus Questions

14. Develop an interactive dashboard within your report using Quarto's dashboard features. Incorporate dynamic filters or widgets.
    
15. Configure automated rendering of your report using Quarto's command-line interface, possibly integrating with GitHub Actions for continuous integration.