# Best States to Live In

##### 	In this project analysis, I have set out to find the best states within the United States to live in. These rankings derive from the best-states.csv dataset provided by the U.S News. In addition, I  used a dataset from the World Population Review containing the states with these lowest cost of living. This dataset is summarized of 6 different categories combined into a rank.  Eight different categories with 71 different metrics from the best-states dataset were used to find out which states are most accommodating for their residents. With over 65,000 people participating in a survey taken over 3 years (2017, 2019, 2021), weights were assigned based on what residents prioritized in these categories in their states. 
##### The data obtained from the U.S. News includes the raw values for each state determining the metric-level results. A z- score distribution was used to assess the averages among states for each metric. An index score was created for each metric and state. The state that performed the best was given 100 points while the ladder received none. States that lay between these two were indexed accordingly. After obtaining the index scores for each metric, the average of these scores were used to determine subcategory  scores and again, with those scores, used to determine category scores. 
##### 	With these scores, I created a pie graph to display the overall weight each category help.  I then used a function that would add up the scores for each category in each state and then divided this score by 100 to get the overall score for each state. After getting the overall totals for these categories, I then took the mean of weighted cost rank and the weighted totals to get the final overall scores for each state and then sorted these totals. These states with the lowest value rank as the top states. States with a larger value rank further down. With this information I had three questions to ask. 
##### What are the top 5/ bottom 5 states to live in according to each factor?
##### What are the top 5 states to live in overall?
##### What are the top 5 states to live in according to the factors I find most important? (Health Care, Education, Crime, and Natural Environment)
##### Note:
#####    The lower the score of each category, the more this factor was favored between states.

#####    The higher the score, the more residents were not pleased with this factor. This is how the graphs are displayed too.

In [None]:
#dependencies and setup
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline


In [None]:
#csv files to read
best_states = r"C:\Users\marie\Documents\KU_Bootcamp\Project1\BestStates.csv"
cheapest_states = r"C:\Users\marie\Documents\KU_Bootcamp\Project1\CheapestStates.csv"

#read the best states file
best_states_df = pd.read_csv(best_states)
best_states_df

In [None]:
#we need to clean up the data and get rid of NaN
best_states_df = best_states_df.dropna()
best_states_df

In [None]:
#read the next dataset
cheapest_states_df = pd.read_csv(cheapest_states)
cheapest_states_df

In [None]:
#list column names
cheapest_states_df.columns

In [None]:
#reorganize dataset
cheapest_states_cleaned = cheapest_states_df.drop(['costIndex','utilitiesCost', 'miscCost'], axis=1)
cheapest_states_renamed = cheapest_states_cleaned.rename(columns={"costRank":"Cost Rank", "groceryCost":"Grocery Cost", 
                                                                  "housingCost":"Housing Cost", 
                                                                  "transportationCost":"Transportation Cost"})
cheapest_states_renamed

In [None]:
#merge dataframes
combined_states_df = pd.merge(best_states_df, cheapest_states_renamed, how='inner', on='State')
combined_states_df

In [None]:
#we need to find the weighted scores of the best states categories
#make a pie chart showing each categories weights
labels = 'Health Care', 'Education', 'Economy', 'Infrastructure', 'Opportunity', 'Fiscal Stability', 'Crime & Corrections', 'Natural Environment'
sizes = [16, 16, 13, 12, 12, 11, 10, 10]
colors = ["lightblue", "purple", "pink", "yellowgreen","lightskyblue", "lightcoral", "orange", "green"]
fig1, ax1 = plt.subplots(figsize = (8,8))
ax1.pie(sizes, labels=labels, autopct='%1.1f%%', startangle=140, colors=colors)
plt.title("% of Weight Per Country")
plt.axis("equal")

plt.show()

In [None]:
#create new columns for these totals
combined_states_df['Health Care Total'] = combined_states_df['Health Care']*16
combined_states_df['Education Total'] = combined_states_df['Education']*16
combined_states_df['Economy Total'] = combined_states_df['Economy']*13
combined_states_df['Infrastructure Total'] = combined_states_df['Infrastructure']*12
combined_states_df['Opportunity Total'] = combined_states_df['Opportunity']*12
combined_states_df['Fiscal Stability Total'] = combined_states_df['Fiscal Stability']*11
combined_states_df['Crime Total'] = combined_states_df['Crime & Corrections']*10
combined_states_df['Natural Environment Total'] = combined_states_df['Natural Environment']*10

#combine all of the weighted totals
#create a new column for these totals
combined_states_df['Weighted Totals'] = (combined_states_df.loc[:, 'Health Care Total':'Natural Environment Total'].sum(axis=1))/100

#find the average between cost rank and the weighted total
combined_states_df['Overall Totals'] = combined_states_df[['Cost Rank', 'Weighted Totals']].mean(axis=1)
combined_states_df.sort_values(by=['Overall Totals'])


# What are the top 5/ bottom 5 states to live in according to each factor?

### Health Care

In [None]:
#pull just the state name and the health care weighted total
health_care = combined_states_df[['State', 'Health Care Total']]
health_care_rank = health_care.sort_values('Health Care Total')
health_care_rank.head()

In [None]:
#plot the top states in healthcare using a bar graph
states = ["Hawaii", "Massachusetts", "Connecticut", "Washington", "Rhode Island"]
health_total = [18.0, 36.0, 54.0, 72.0, 90.0]
x_axis = np.arange(len(health_total))
plt.bar(x_axis, health_total, color="b", align="center")
plt.title("States With the Best Health Care")
plt.xlabel("States")
plt.ylabel("Health Care Scores")
tick_locations = [value for value in x_axis]
plt.xticks(tick_locations, states)

plt.show

In [None]:
health_care_rank.tail()

### Education

In [None]:
#pull just the state name and the education weighted total
education = combined_states_df[['State', 'Education Total']]
education_rank = education.sort_values('Education Total')
education_rank.head()

In [None]:
education_rank.tail()

In [None]:
#plot the findings for education using a line graph
states = ["Massachusetts", "New Jersey", "Florida", "Washington", "New Hampshire"]
education_total = [18.0, 36.0, 54.0, 72.0, 90.0]
x_axis = np.arange(len(education_total))
plt.bar(x_axis, education_total, color="b", align="center")
plt.title("States With the Best Education")
plt.xlabel("States")
plt.ylabel("Education Scores")
tick_locations = [value for value in x_axis]
plt.xticks(tick_locations, states)

plt.show

### Economy

In [None]:
#pull just the state name and the economy weighted total
economy = combined_states_df[['State', 'Economy Total']]
economy_rank = economy.sort_values('Economy Total')
economy_rank.head()

In [None]:
economy_rank.tail()

### Opportunity

In [None]:
#pull just the state name and the opportunity weighted total
opportunity = combined_states_df[['State', 'Opportunity Total']]
opportunity_rank = opportunity.sort_values('Opportunity Total')
opportunity_rank.head()

In [None]:
opportunity_rank.tail()

### Fiscal Stability

In [None]:
#pull just the state name and the fiscal stability weighted total
fiscal_stability = combined_states_df[['State', 'Fiscal Stability Total']]
fiscal_stability_rank = fiscal_stability.sort_values('Fiscal Stability Total')
fiscal_stability_rank.head()

In [None]:
fiscal_stability_rank.tail()

### Crime & Corrections

In [None]:
#pull just the state name and the crimes weighted total
crime_corrections = combined_states_df[['State', 'Crime Total']]
crime_corrections_rank = crime_corrections.sort_values('Crime Total')
crime_corrections_rank.head()

In [None]:
crime_corrections_rank.tail()

### Infrastructure

In [None]:
#pull just the state name and the infrastructure weighted total
infrastructure = combined_states_df[['State', 'Infrastructure Total']]
infrastructure_rank = infrastructure.sort_values('Infrastructure Total')
infrastructure_rank.head()

In [None]:
infrastructure_rank.tail()

### Natural Environments

In [None]:
#pull just the state name and the natural environments weighted total
natural_environment = combined_states_df[['State', 'Natural Environment Total']]
natural_environment_rank = natural_environment.sort_values('Natural Environment Total')
natural_environment_rank.head()

In [None]:
natural_environment_rank.tail()

### Cost of Living

In [None]:
cost_of_living = combined_states_df[['State', 'Cost Rank']]
cost_rank = cost_of_living.sort_values('Cost Rank')
cost_rank.head()

In [None]:
cost_rank.tail()

# What are the top 5 states to live in overall?

In [None]:
#pull states and overall totals
overall_rank = combined_states_df[['State', 'Overall Totals']]
overall_states_rank = overall_rank.sort_values('Overall Totals')
overall_states_rank

In [None]:
overall_states_rank.tail()

# What are the top 5 states to live in according to the factors I find most important? (Health Care, Education, Crime, and Natural Environment)

In [None]:
combined_states_df.columns

In [None]:
#create a dataframe with only these factors
combined_states_df.columns
personal_rank = combined_states_df.drop(['Health Care', 'Education', 'Economy', 'Opportunity',
       'Fiscal Stability', 'Crime & Corrections', 'Natural Environment',
       'Grocery Cost', 'Housing Cost', 'Transportation Cost', 'Economy Total',
       'Opportunity Total', 'Fiscal Stability Total', 'Weighted Totals', 'Overall Totals'], axis=1)
personal_rank

In [None]:
#combine weighted totals
personal_rank['Personal Weighted Totals'] = (personal_rank.loc[:, 'Health Care Total':'Crime Total'].sum(axis=1))/100
personal_rank

In [None]:
#find the average between cost rank and the personal weighted total
personal_rank['Personal Overall Totals'] = personal_rank[['Cost Rank', 'Personal Weighted Totals']].mean(axis=1)
personal_rank.sort_values(by=['Personal Overall Totals'])
personal_rank

In [None]:
#pull just the states and personal overall total
personal_overall = personal_rank[['State', 'Personal Overall Totals']]
personal_overall_rank = personal_overall.sort_values('Personal Overall Totals')
personal_overall_rank.head()