# INCIDENCE OF TUBERCULOSIS (TB) (10 mins)
# Instructions / Notes: Read these carefully
This **Python Jupyter Notebook** notebook is split into the following sections:

1. **Initial section** with pre-filled cells, that you should run just to load some Python modules (packages), the dataset required for your task and its variables in memory.
2. **Middle section** with **description of a concrete task** associated with the dataset. 
3. **Final section (with one or more empty cells)** where you can perform analyses with the loaded dataset (e.g., write a few lines of code if needed), answer the question posed, and describe your reasoning in words.

**Read and execute each cell in order, without skipping forward**. To execute any cell, press **Shift+Enter** on your keyboard. It might take a couple of seconds to receive an output.

Have fun!

In [1]:
# Run the following to import necessary packages and import dataset. 
import pandas as pd
import numpy as np
import scipy as sp
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
matplotlib.style.use('ggplot')

dA = "SC_IT assessment_TuberculosisFoulOdor.csv"

dfA = pd.read_csv(dA)

#Print first five lines of dataset A as a check to see if the dataset is loaded properly.
dfA.head(n=5)

Unnamed: 0,TB_incidence,Unpleasant_odor,Pathogen_concentration,WasteBurnt_OpenAir
0,101.06,0.74,88.29,102.9
1,67.07,0.4,4.3,66.48
2,97.46,0.64,88.89,89.32
3,138.25,0.72,160.09,58.58
4,111.34,0.75,135.79,100.6



# DATASET DESCRIPTION:
The dataset above contains some statistics for 200 cities in a canton:
1. **Incidence of tuberculosis (TB) across various cities in a canton (per 1000 people)**
2. **Unpleasant odors in these cities as reported by residents (e.g., those coming from industrial waste, medical waste and other household trash) (indexed on a scale ranging from 0 to 2)** 
3. **Pathogen concentration (per 1000 meter cube)**
4. **Quantity of waste burnt in the open (per 1000 kg)** 

Run the cell below to obtain correlation table for the dataset. 


In [2]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
round (dfA.corr(method='pearson'),2)

Unnamed: 0,TB_incidence,Unpleasant_odor,Pathogen_concentration,WasteBurnt_OpenAir
TB_incidence,1.0,0.72,0.89,0.16
Unpleasant_odor,0.72,1.0,0.74,0.77
Pathogen_concentration,0.89,0.74,1.0,0.39
WasteBurnt_OpenAir,0.16,0.77,0.39,1.0


# TASK:
Officials of a canton A have just received some raw medical data from the health department regarding the incidence of tuberculosis (TB) in various cities. Their **goal** is to figure out how to reduce the incidence of tuberculosis (TB) in the population. 

In order to do so, they first performed a correlational analysis of different data sources available to them from the canton. Some of these officials were particularly interested in the **high positive correlation** that existed between "incidence of tuberculosis (TB)" across various cities in the canton (expressed as percentage from 0 to 100), and ``the foul (unpleasant) odors in the city as reported by residents (e.g., those coming from industrial waste, medical waste and other household trash)" (measured on a scale of 0 to 2). 

In order to **fulfill their goal of reducing incidence of TB**, these officials brainstormed about the following ideas: 
 
A) **Strategy A**: implementing good urban planning techniques focusing on better air-flow manipulation. 

B) **Strategy B**: launching programs to spray air fresheners with pleasant odors (for e.g., lavender, fruity, BBQ, baked etc) in different areas of the city. 

C) **Strategy C**: raising awareness regarding good hygiene practices in the city. 

D) **Strategy D**: installing indoor recycling stations to improve waste management in the city. 

E) **Strategy E**:  improving availability of vaccination against TB in medical stores across the city.  

F) **Strategy F**: launching programs for chemical treatment of air in different areas of the city. 


**Separate these strategies for reducing the incidence of tuberculosis (TB) into more appropriate and less/not appropriate categories**. 

Reason with the the **dataset provided to you**, and, the **observations of canton officials about the high positive correlation between incidence of TB and foul odors in the city**.

1. Please **mark which strategies** are more or less/not appropriate 
2. Please provide a brief **reasoning** behind your answer (an explanation of **why** you took certain steps or performed certain calculations to get to the solution)
3. Please mark your **confidence** in the designed measure (on a scale of 1 to 5)

In [3]:
#NOTE: Round all your statistics to 2 decimal places before reasoning with them!! 

#REPORT YOUR ANSWER (e.g., mmlmmm, llllmm, lmlmml etc)
strategies_appropriateness = 'mmmmmm'
#Choose "m" for MORE APPROPRIATE CATEGORY and "l" for LESS/NOT APPROPRIATE CATEGORY for the 6 strategies in the following sequence:
#Strategy A --> Strategy B --> Strategy C --> Strategy D --> Strategy E --> Strategy F
print(strategies_appropriateness)

#REPORT YOUR REASONING
strategy_appropriateness_reasoning = 'All the strategies seem appropriate, even though wasteBurnt is not directly correlated to TB incidence it is correlated to the unpleasant odor which is correlated with TB incidence.'
print(strategy_appropriateness_reasoning)

#REPORT CONFIDENCE IN YOUR ANSWER
confidence_measure = '545355' 
#Choose among: 1 (low confidence), 2, 3 (medium confidence), 4, 5 (high confidence)
print(confidence_measure)


mmmmmm
All the strategies seem appropriate, even though wasteBurnt is not directly correlated to TB incidence it is correlated to the unpleasant odor which is correlated with TB incidence.
545355


In [None]:
#ONLY use this space below to write your code (if needed) for answering the task. DO NOT ERASE this code segment from the workbook.












#Your intuitive ideas are valuable!!If you need syntax-related help in implementing your ideas, you can access the following documentation files (use the "Search" tab for queries) and/or summarized syntax sheets.

#a) Pandas library
#Documentation file: https://pandas.pydata.org/pandas-docs/stable/
#Syntax sheet: https://datacamp-community-prod.s3.amazonaws.com/fbc502d0-46b2-4e1b-b6b0-5402ff273251

#b) Numpy library
#Documentation file: https://docs.scipy.org/doc/numpy/user/index.html
#Syntax sheet: https://datacamp-community-prod.s3.amazonaws.com/e9f83f72-a81b-42c7-af44-4e35b48b20b7

#c) Matplotlib library
#Documentation file: https://matplotlib.org/contents.html
#Syntax sheet: https://datacamp-community-prod.s3.amazonaws.com/28b8210c-60cc-4f13-b0b4-5b4f2ad4790b

#d) Scipy library
#Documentation file: https://docs.scipy.org/doc/scipy/reference/
#Syntax sheet: https://datacamp-community-prod.s3.amazonaws.com/5710caa7-94d4-4248-be94-d23dea9e668f