BIOF309 Final Project

In [None]:
##This project is a proof-of-concept in interactive data presentation
##Taking the Pew Global Attitudes Survey (2014-2016, inclusive), I looked at public opinion on healthcare (years selected were the first three years in which this data was available for all three of these countries)
##Because countries surveyed, questions asked, and which countries are asked which questions changes from year-to-year, I focused on questions and countries consistent across all surveys
##Due to the massive number of questions asked, I decided to focus on how participants viewed the state of the economy and healthcare in their country, and how optimistic they were about their country's economy, as well as changes in those readings from the previous year
##Three countries were consistently asked all of these questions; Kenya, Nigeria, and South Africa
##Because of limited data and numerous potential confounds, this information is presented to users with correlations (R-values) and visuals, but not confidence measurements (P-values), as those would be inappropriate
##Only linear correlations are examined in this project (with a conceivable user request of only two or three points, it's hard to justify another model)
##Users are able to first request which health measure they want to see as the Y-axis
##The program will then ask which economic measure they wish to see run on the X-axis, and only options with the same number of entries (either direct results or change from previous survey results, depending upon healthcare dataset chosen) will be available
##Lastly, users will be able to select which of the countries they want displayed

In [None]:
#Importing packages I plan on using
import numpy as np #For calculating linear regressions
import pandas as pd #For DataFrames
import matplotlib.pyplot as plt #For building scatter plot visuals

In [None]:
#Just a user intro
print("This program will generate a plot and correlations for the user based upon requested data.")
print("Specifically, this program uses Pew Global Attitudes Survey results from 2014-2016 for Kenya, Nigeria, and South Africa, and allows users to compare public opinion on health care with either opinions on the economy at the same point, or the future of the economy as perceived at that point.")
print("Users may select either survey results directly, or changes from the previous survey, for any combination of the three countries")

In [None]:
####Building PewResults DataFrame

###Countries used
countries = ["Kenya", "Nigeria", "South Africa"]

###Survey results - All data in chronological order from left to right, for those curious; order not actually relevant for this project so long as it's consistent
##Direct survey results
#Percentage of respondents indicating that poor health care was a "very big problem" for the country
kenya_hcproblem = [53, 51, 75] 
nigeria_hcproblem = [60, 81, 85] 
safrica_hcproblem = [41, 57, 75]
hcproblem = [kenya_hcproblem, nigeria_hcproblem, safrica_hcproblem]

#Percentage of respondents indicating that they thought their country's economic situation was "very good"
kenya_econproblem = [10, 11, 6] 
nigeria_econproblem = [11, 26, 6]
safrica_econproblem = [14, 20, 10]
econproblem = [kenya_econproblem, nigeria_econproblem, safrica_econproblem]

#Percentage of respondents who expected improvement in their country's economy within 12 months of the survey
kenya_econoptimism = [46, 53, 56] 
nigeria_econoptimism = [72, 92, 86]
safrica_econoptimism = [51, 45, 62]
econoptimism = [kenya_econoptimism, nigeria_econoptimism, safrica_econoptimism]

##Changes between surveys
#Change in percentage of respondents indicating that poor health care was a "very big problem" for the country from the previous survey
kenya_hcproblemchange = [-2, 24] 
nigeria_hcproblemchange = [21, 4] 
safrica_hcproblemchange = [16, 18]
hcproblemchange = [kenya_hcproblemchange, nigeria_hcproblemchange, safrica_hcproblemchange]

#Change in percentage of respondents indicating that their country's economic situation was a "very good" for the country from the previous survey
kenya_econproblemchange = [1, -5] 
nigeria_econproblemchange = [15, -20]  
safrica_econproblemchange = [6, -10]
econproblemchange = [kenya_econproblemchange, nigeria_econproblemchange, safrica_econproblemchange]

#Change in percentage of respondents who expected improvement in their country's economy within 12 months of the survey between surveys
kenya_econoptimismchange = [7, 3] 
nigeria_econoptimismchange = [20, -6]
safrica_econoptimismchange = [-6, 17]
econoptimismchange = [kenya_econoptimismchange, nigeria_econoptimismchange, safrica_econoptimismchange]


###Pulling it all together
dict = {'Country':countries, 
        'Health Care Problem':hcproblem,
        'Econ Problem':econproblem,
        'Econ Optimism':econoptimism,
        'Health Care Problem Change':hcproblemchange,
        'Econ Problem Change':econproblemchange,
        'Econ Optimism Change':econoptimismchange
       }
pewdata = pd.DataFrame(dict) #Creates a DataFrame compiling all of the data from Pew
pewdata.index = countries
#print(pewdata) #Confirmation of successful creation, commented out in final version

In [None]:
#Choosing data
y = "unselected" #For use as the y-axis - health care - in the scatter plot
hcverified = 0 #Used in loop to force user input until a valid choice is selected

print("Please choose what type of health care metric you wish to examine; type 'Current' to examine values from each survey, or 'Change' to look at the shifts between surveys: ")
while (hcverified < 1) : #Repeats until selection
    if (y != "Change" or "Current") : #Basically, first run or invalid entry
        chosenhc = input() #User input
        if chosenhc == "Change" :
            y = "Change"
            hcverified = 2 #Successful input
        elif chosenhc == "Current" :
            y = "Current"
            hcverified = 2 #Successful input
        else :
            print("Invalid response, please try again: ")
    
#print(y) #Test for loop function, commented out in final program

x = "unselected" #For use as the x-axis - economy - in the scatter plot
econverified = 0 #Used in loop to force user input until a valid choice is selected

print("Please choose what type of economic metric you wish to examine; type 'Immediate' to examine opinions about the status of the economy at the time of survey, or 'Optimism' for opinions about it's future: ")
while (econverified < 1) : #Repeats until selection
    if (x != "Immediate" or "Optimism"): #Basically, first run or invalid entry
        chosenecon = input() #User input
        if (chosenecon == "Immediate") :
            if (y == "Change") : #Change and Current have different numbers of values; this ensures only matching lengths can be selected
                x = "Immediate, Change"
                econverified = 2 #Successful input
            else :
                x = "Immediate, Current"
                econverified = 2 #Successful input
        elif (chosenecon == "Optimism") :
            if (y == "Change") :
                x = "Optimism, Change"
                econverified = 2 #Successful input
            else :
                x = "Optimism, Current"
                econverified = 2 #Successful input
        else :
                print("Invalid response, please try again: ")
            
#print(x) #Testing loop, commented out in final version

herenow = 0 #Reset counter for next loop
kenya = 0 #Kenya not selected
nigeria = 0 #Nigeria not selected
safrica = 0 #South Africa not selected
countriesverified = 0 #Used in loop to force user input until a valid choice is selected

while (countriesverified < 1) : #Forces user input until successful input
    if (kenya == 0 and nigeria == 0 and safrica == 0) : #Checks for valid outputs
        print("Type the first letter of all countries you would like to use in the analysis. Other characters will be ignored")
        chosencountries = input() #They can type whatever they want here
        if (("K" or "k") in chosencountries) :
            kenya = 1
            countriesverified = 2 #Successful input
        elif (("N" or "n") in chosencountries) :
            nigeria = 1
            countriesverified = 2 #Successful input
        elif (("S" or "s") in chosencountries) :
            safrica = 1
            countriesverified = 2 #Successful input
        else :
            print("Invalid response, please try again")

#print(kenya + nigeria + safrica) #Testing loop, commented out in final version

#Compiling above to select datasets
countryset = [] #Country parameters to select in pewdata DataFrame
healthset = [] #Health care parameters to select in pewdata DataFrame
econset = [] #Economy parameters to select in pewdata DataFrame

#if (kenya == 1):
#    if (nigeria == 1):
#        if (safrica == 1):
#            pewdata.iloc = ["Country"] #All three
#        else :
#            pewdata.iloc = ["Country"[0, 2]] #Kenya and Nigeria
#    elif (safrica == 1):
#        pewdata.iloc = ["Country"[0, 3]] #Kenya and South Africa
#    else :
#        pewdata("Country"[0]) #Kenya alone
#elif (nigeria == 1):
#    if (safrica == 1):
#        pewdata.iloc = ["Country"[1, 2]] #Nigeria and South Africa
#    else :
#        pewdata.iloc = ["Country"[1]] #Nigeria alone
#else :
#    pewdata.iloc = ["Country"[2]] #South Africa alone

if (y == "Current"):
    healthset = hcproblem #Current
else :
    healthset = hcproblemchange #Change
    
if (x == "Immediate, Current"):
    econset = econproblem #Immediate, Current
elif (x == "Immediate, Change"): 
    econset = econproblemchange #Immediate, Change
elif (x == "Optimism, Current"):
    econset = econoptimism #Optimism, Current
else :
    econset = econoptimismchange #Optimism, Change
    
#entry = { #Dictionary for a new DateFrame
#    countryset:[healthset, econset]
#}
#entrydataframe = pd.DataFrame(entry) #Builds a new dataframe to call new columns/rows for the plot
#print(econset) #Testing, commented out
#print(healthset) #Testing, commented out
#print(countryset) #Testing, commented out

#ycat = pewdata.iloc["countryset", "healthset"] #Health care bracketed by country
#xcat = pewdata.iloc["countryset", "econset"] #Economics bracketed by country
ycat = healthset
xcat = econset

#print(ycat) #Testing, commented out in final version
#print(xcat) #Testing, commented out in final version

In [None]:
#Labeling axes
hclabel = "%age of population considering poor health care a major issue" #Label if users choose to look at direct health care numbers
hcchangelabel = "Change in %age of population considering poor health care a major issue" #Label is users choose to look at change in health care numbers

econproblemlabel = "%age of population considering the economy very good" #Label if users choose to look at direct views on the economy
econproblemchangelabel = "Change in %age of population considering poor health care a major issue" #Label if users choose to look at changes in views on the economy
econoptimismlabel = "%age of population optimistic the economy will improve" #Label if users choose to look at direct optimism about the economy
econoptimismchangelabel = "Change in %age of population optimistic the economy will improve" #Label if users choose to look at changes in optimism about the economy

ylab = "" #y-axis - health care - label
xlab = "" #x-axis - economic - label

if (y == "Current"): #Simple loop to match labels to user requested data
    ylab = hclabel
    if (x == "Immediate"):
        xlab = (econproblabel)
    else :
        xlab = (econoptimismlabel)
else :
    ylab = hcchangelabel
    if (x == "Immediate"):
        xlab = econproblemchangelabel
    else :
        xlab = econoptimismchangelabel
        
#print (xlab + ylab) #Testing loop, commented out in final version

#Naming the chart
print("Please give your chart a title: ") #Much more straightforward than writing out every option, and satisfies user preference
title = input() #chart title

#Constructing visuals
plt.scatter(xcat, ycat)
plt.xlabel(xlab)
plt.ylabel(ylab)
plt.title(title)
plt.show()