# 02.7 Overall Health by State - Rankings

1. Imports and display options
1. Data prep
1. Ranking calculations
1. Export
1. Conclusion

<div class="alert" style='background:#DCDCDC;color:#4D4D4D;padding:15px'>
    This notebook will create a system to rank each state's health in six categories (BMI, general health, doctor visits, mental health, dietary health, and physical activity) to help foster an understanding of which areas of health each state is doing well and which areas could use some improving. This data will help generate visualizations based on the analysis conducted in the previous Overall Health by State jupyter notebooks.
    </div>

## 1. Imports and display options

In [1]:
import pandas as pd
import os
import numpy as np

In [2]:
#Creating a path to the root directory for this project

path = r'C:\Users\theva\Documents\Data Analyses\BRFSS'

In [3]:
#Importing data

df = pd.read_pickle(os.path.join(path, '02 Data','Prepared Data', 'BRFSS Overall Health by State TABLEAU.pkl'))

In [4]:
#Checking df

df.shape

(385, 35)

In [5]:
#Checking df

df.head()

Unnamed: 0,location,year,BMI: Obese (BMI 30.0 - 99.8),BMI: Overweight (BMI 25.0-29.9),BMI: Normal Weight (BMI 18.5-24.9),BMI: Underweight (BMI 12.0-18.4),General health: Excellent,General health: Very good,General health: Good,General health: Fair,...,Vegetables eaten: Daily,Vegetables eaten: Not daily,Aerobic activity per week: Over 150 min,Aerobic activity per week: Under 150 min,Met physical activity guidelines: Yes,Met physical activity guidelines: No,Strength training: 2x+ per week,Strength training: 1x- per week,Physically active in last month: Yes,Physically active in last month: No
0,Alabama,2015,35.6,33.0,28.9,2.4,16.2,29.1,32.5,15.2,...,72.2,27.8,44.6,55.4,16.7,83.3,25.8,74.2,68.1,31.9
1,Alabama,2016,35.7,33.8,28.4,2.1,15.2,29.8,32.5,15.7,...,,,,,,,,,70.6,29.4
2,Alabama,2017,36.3,33.9,28.5,1.3,15.4,28.2,33.1,16.2,...,80.7,19.3,42.8,57.2,16.1,83.9,26.0,74.0,68.0,32.0
3,Alabama,2018,36.2,33.4,28.4,2.0,14.5,28.6,34.0,15.7,...,,,,,,,,,69.3,30.7
4,Alabama,2019,36.1,34.2,27.5,2.2,15.4,29.5,32.4,16.2,...,77.7,22.3,44.8,55.2,17.6,82.4,29.2,70.8,68.5,31.5


In [6]:
#Adjusting display settings

pd.set_option('display.max_rows', 200)
pd.set_option('display.max_colwidth', None)

%matplotlib inline

# 2. Data prep

In [7]:
# Getting column names

df.columns

Index(['location', 'year', 'BMI: Obese (BMI 30.0 - 99.8)',
       'BMI: Overweight (BMI 25.0-29.9)', 'BMI: Normal Weight (BMI 18.5-24.9)',
       'BMI: Underweight (BMI 12.0-18.4)', 'General health: Excellent',
       'General health: Very good', 'General health: Good',
       'General health: Fair', 'General health: Poor',
       'Health status: Good or Better', 'Health status: Fair or Poor',
       'Doc visit: Past year', 'Doc visit: Past 2 years',
       'Doc visit: Past 5 years', 'Doc visit: 5+ years ago',
       'Doc visit: Never', 'Concentrate, remember, decide: Difficult',
       'Concentrate, remember, decide: Not difficult',
       'Bad mental health days: 0', 'Bad mental health days: 1-13',
       'Bad mental health days: 14+', 'Fruit eaten: Daily',
       'Fruit eaten: Not daily', 'Vegetables eaten: Daily',
       'Vegetables eaten: Not daily',
       'Aerobic activity per week: Over 150 min',
       'Aerobic activity per week: Under 150 min',
       'Met physical activity g

In [8]:
# Dropping year, certain calculated variables, and negative response to binary question pairs, 
# i.e.: keeping 'Fruit eaten: Daily' and dropping 'Fruit eaten: Not daily'. 

df.drop(columns=['year',
                 'Health status: Good or Better',
                 'Health status: Fair or Poor',
                 'Concentrate, remember, decide: Not difficult',
                 'Fruit eaten: Not daily',
                 'Vegetables eaten: Not daily',
                 'Aerobic activity per week: Under 150 min',
                 'Met physical activity guidelines: Yes',
                 'Met physical activity guidelines: No',
                 'Strength training: 1x- per week',
                 'Physically active in last month: No'
                 ], inplace=True)

In [9]:
# Dropping national averages as they will prevent a geospatial analysis

df = df.loc[(df['location'] != 'All States and DC (median) **') &
            (df['location'] != 'All States, DC and Territories (median) **')]

In [10]:
#Checking changes

df.head()

Unnamed: 0,location,BMI: Obese (BMI 30.0 - 99.8),BMI: Overweight (BMI 25.0-29.9),BMI: Normal Weight (BMI 18.5-24.9),BMI: Underweight (BMI 12.0-18.4),General health: Excellent,General health: Very good,General health: Good,General health: Fair,General health: Poor,...,Doc visit: Never,"Concentrate, remember, decide: Difficult",Bad mental health days: 0,Bad mental health days: 1-13,Bad mental health days: 14+,Fruit eaten: Daily,Vegetables eaten: Daily,Aerobic activity per week: Over 150 min,Strength training: 2x+ per week,Physically active in last month: Yes
0,Alabama,35.6,33.0,28.9,2.4,16.2,29.1,32.5,15.2,7.0,...,1.3,15.0,,,,51.7,72.2,44.6,25.8,68.1
1,Alabama,35.7,33.8,28.4,2.1,15.2,29.8,32.5,15.7,6.9,...,1.1,14.3,,,,,,,,70.6
2,Alabama,36.3,33.9,28.5,1.3,15.4,28.2,33.1,16.2,7.1,...,1.2,15.8,,,,55.2,80.7,42.8,26.0,68.0
3,Alabama,36.2,33.4,28.4,2.0,14.5,28.6,34.0,15.7,7.2,...,0.5,15.8,,,,,,,,69.3
4,Alabama,36.1,34.2,27.5,2.2,15.4,29.5,32.4,16.2,6.5,...,0.7,16.7,58.0,24.5,17.5,53.9,77.7,44.8,29.2,68.5


In [11]:
# Making a dataframe of mean values per state

df = df.groupby('location').mean().reset_index()

# Checking dataframe
df

Unnamed: 0,location,BMI: Obese (BMI 30.0 - 99.8),BMI: Overweight (BMI 25.0-29.9),BMI: Normal Weight (BMI 18.5-24.9),BMI: Underweight (BMI 12.0-18.4),General health: Excellent,General health: Very good,General health: Good,General health: Fair,General health: Poor,...,Doc visit: Never,"Concentrate, remember, decide: Difficult",Bad mental health days: 0,Bad mental health days: 1-13,Bad mental health days: 14+,Fruit eaten: Daily,Vegetables eaten: Daily,Aerobic activity per week: Over 150 min,Strength training: 2x+ per week,Physically active in last month: Yes
0,Alabama,36.971429,33.414286,27.642857,1.942857,15.642857,29.428571,33.042857,15.542857,6.385714,...,0.9,15.242857,58.5,24.266667,17.233333,53.75,77.5,44.066667,27.0,69.171429
1,Alaska,31.542857,35.028571,32.0,1.45,20.057143,33.728571,31.442857,11.2,3.528571,...,1.44,9.242857,62.833333,25.166667,11.966667,59.35,81.15,57.766667,35.466667,79.342857
2,Arizona,30.0,35.4,32.385714,2.228571,19.657143,31.042857,31.171429,13.571429,4.528571,...,1.014286,11.028571,60.466667,25.1,14.433333,60.85,78.925,53.433333,33.266667,76.557143
3,Arkansas,36.385714,33.142857,28.5,1.942857,15.014286,28.928571,32.371429,16.528571,7.142857,...,1.342857,15.571429,58.133333,23.9,18.0,53.575,77.725,45.433333,27.4,68.314286
4,California,26.3,35.9,35.628571,2.185714,20.314286,30.171429,32.257143,13.4,3.842857,...,1.171429,10.314286,60.4,26.6,13.0,64.825,78.4,56.3,33.666667,79.257143
5,Colorado,23.028571,35.714286,39.085714,2.157143,21.071429,36.414286,29.0,10.542857,2.942857,...,1.1,8.8,57.7,29.166667,13.1,64.025,81.675,59.233333,36.733333,82.642857
6,Connecticut,27.757143,36.114286,34.242857,1.885714,21.942857,35.871429,28.414286,10.528571,3.214286,...,0.46,9.142857,61.933333,25.833333,12.233333,66.0,81.05,53.266667,34.733333,77.4
7,Delaware,32.928571,35.285714,30.0,1.785714,18.671429,32.8,31.785714,12.785714,3.942857,...,0.716667,10.942857,63.166667,23.966667,12.933333,62.425,80.325,48.633333,32.033333,72.528571
8,District of Columbia,23.585714,31.585714,42.8,2.014286,26.242857,36.371429,25.157143,9.4,2.828571,...,,9.714286,52.033333,34.533333,13.4,65.875,83.9,55.2,36.3,81.085714
9,Florida,28.116667,36.2,33.55,2.1,20.65,29.966667,30.633333,13.766667,4.983333,...,0.933333,12.3,65.75,21.2,13.05,62.833333,79.4,51.866667,33.333333,72.45


# 3. Ranking calculations

### 3.1 BMI ranking

<div class='alert alert-block alert-warning'>
    <b>Ranking calculation:</b><br> 
(% Obese * -1) + (% Overweight * -0.5) + (% Normal weight) + (% Underweight * -0.5)
   </div>

In [12]:
# Calculating BMI ranking 

# Making a dataframe of the variables to be used in this calculation
bmi_score = df[['location',
            'BMI: Obese (BMI 30.0 - 99.8)',
            'BMI: Overweight (BMI 25.0-29.9)',
            'BMI: Underweight (BMI 12.0-18.4)',
            'BMI: Normal Weight (BMI 18.5-24.9)'
            ]].copy()

# Applying weights to the variables
bmi_score['BMI: Obese (BMI 30.0 - 99.8)'] *= -1
bmi_score['BMI: Overweight (BMI 25.0-29.9)'] *= -0.5
bmi_score['BMI: Underweight (BMI 12.0-18.4)'] *= -0.5

# Calculating total_score as a sum of the weighted variables
bmi_score = bmi_score.fillna(bmi_score.median())
bmi_score['total_score'] = (bmi_score['BMI: Obese (BMI 30.0 - 99.8)'] +
                            bmi_score['BMI: Overweight (BMI 25.0-29.9)'] +
                            bmi_score['BMI: Underweight (BMI 12.0-18.4)'] +
                            bmi_score['BMI: Normal Weight (BMI 18.5-24.9)']
                           )
                            
# Reducing dataframe to location and total_score, sorting by total_score in descending order
bmi_rank = bmi_score[['location', 'total_score']].sort_values(by = ['total_score'], ascending = False).reset_index(drop = True)

# Calculating rank
bmi_rank['rank'] = bmi_rank['total_score'].rank(method='min', ascending = False)

# Changing rank to an integer
bmi_rank['rank'] = bmi_rank['rank'].astype(int)

# Checking dataframe
bmi_rank

  bmi_score = bmi_score.fillna(bmi_score.median())


Unnamed: 0,location,total_score,rank
0,District of Columbia,2.414286,1
1,Colorado,-2.878571,2
2,Hawaii,-3.9,3
3,Massachusetts,-6.585714,4
4,Vermont,-7.885714,5
5,Utah,-9.614286,6
6,New York,-9.7,7
7,California,-9.714286,8
8,Montana,-11.55,9
9,New Jersey,-11.766667,10


### 3.2 General health ranking

<div class='alert alert-block alert-warning'>
    <b>Ranking calculation:</b><br> 
(% Excellent * 1.5) + (% Very good) + (% Good * 0.5) + (% Fair * 0) + (% Poor * -1)
   </div>

In [13]:
# Calculating General health ranking

# Making a dataframe of the variables to be used in this calculation
health_score = df[['location',
               'General health: Excellent',
               'General health: Very good',
               'General health: Good',
               'General health: Poor'
                ]].copy()

# Applying weights to the variables
health_score['General health: Excellent'] *= 1.5
health_score['General health: Good'] *= 0.5
health_score['General health: Poor'] *= -1

# Calculating total_score as a sum of the weighted variables
health_score = health_score.fillna(health_score.median())
health_score['total_score'] = (health_score['General health: Excellent'] +
                                  health_score['General health: Very good'] +
                                  health_score['General health: Good'] +
                                  health_score['General health: Poor']
                           )
                            
# Reducing dataframe to location and total_score, sorting by total_score in descending order
health_rank = health_score[['location', 'total_score']].sort_values(by = ['total_score'], ascending = False).reset_index(drop = True)

# Calculating rank
health_rank['rank'] = health_rank['total_score'].rank(method='min', ascending = False)

# Changing rank to an integer
health_rank['rank'] = health_rank['rank'].astype(int)

# Checking dataframe
health_rank

  health_score = health_score.fillna(health_score.median())


Unnamed: 0,location,total_score,rank
0,District of Columbia,85.485714,1
1,Utah,80.321429,2
2,Connecticut,79.778571,3
3,Massachusetts,79.7,4
4,Colorado,79.578571,5
5,New Hampshire,79.521429,6
6,Minnesota,79.364286,7
7,Vermont,79.114286,8
8,South Dakota,77.478571,9
9,Maryland,77.2,10


### 3.3 Doctor visits ranking

<div class='alert alert-block alert-warning'>
    <b>Ranking calculation:</b><br> 
(% Past year) + (% Past 2 years * 0) + (% Past 5 years * -0.5) + (% 5+ years ago * -1) + (% Never * -2)
   </div>

In [14]:
# Calculating Doctor visits ranking

# Making a dataframe of the variables to be used in this calculation
doc_score = df[['location',
                'Doc visit: Past year',
                'Doc visit: Past 5 years',
                'Doc visit: 5+ years ago',
                'Doc visit: Never'
                ]].copy()

# Applying weights to the variables
doc_score['Doc visit: Past 5 years'] *= -0.5
doc_score['Doc visit: 5+ years ago'] *= -1
doc_score['Doc visit: Never'] *= -2

# Calculating total_score as a sum of the weighted variables
doc_score = doc_score.fillna(doc_score.median())
doc_score['total_score'] = (doc_score['Doc visit: Past year'] +
                            doc_score['Doc visit: Past 5 years'] +
                            doc_score['Doc visit: 5+ years ago'] +
                            doc_score['Doc visit: Never']
                           )
                            
# Reducing dataframe to location and total_score, sorting by total_score in descending order
doc_rank = doc_score[['location', 'total_score']].sort_values(by = ['total_score'], ascending = False).reset_index(drop = True)

# Calculating rank
doc_rank['rank'] = doc_rank['total_score'].rank(method='min', ascending = False)

# Changing rank to an integer
doc_rank['rank'] = doc_rank['rank'].astype(int)

# Checking dataframe
doc_rank

  doc_score = doc_score.fillna(doc_score.median())


Unnamed: 0,location,total_score,rank
0,Rhode Island,75.42381,1
1,Massachusetts,72.066667,2
2,Puerto Rico,71.645238,3
3,Delaware,69.92381,4
4,Connecticut,69.78,5
5,New Jersey,69.7,6
6,New York,69.442857,7
7,West Virginia,69.37,8
8,District of Columbia,69.138095,9
9,Maryland,68.404762,10


### 3.4 Mental health ranking

<div class='alert alert-block alert-warning'>
    <b>Ranking calculation:</b><br> 
(% 0 bad mental health days * 1) + (% 1-13 bad mental health days * 0) + (% 14+ bad mental health days * -1) + (% Difficulty concentrating, deciding, remembering * -1)
   </div>

In [15]:
# Calculating Mental health ranking

# Making a dataframe of the variables to be used in this calculation
mhealth_score = df[['location',
                    'Concentrate, remember, decide: Difficult',
                    'Bad mental health days: 0',
                    'Bad mental health days: 14+'
                   ]].copy()

# Applying weights to the variables
mhealth_score['Bad mental health days: 14+'] *= -1
mhealth_score['Concentrate, remember, decide: Difficult'] *= -1

# Calculating total_score as a sum of the weighted variables
mhealth_score = mhealth_score.fillna(mhealth_score.median())
mhealth_score['total_score'] = (mhealth_score['Concentrate, remember, decide: Difficult'] +
                                mhealth_score['Bad mental health days: 0'] +
                                mhealth_score['Bad mental health days: 14+']
                               )
                            
# Reducing dataframe to location and total_score, sorting by total_score in descending order
mhealth_rank = mhealth_score[['location', 'total_score']].sort_values(by = ['total_score'], ascending = False).reset_index(drop = True)

# Calculating rank
mhealth_rank['rank'] = mhealth_rank['total_score'].rank(method='min', ascending = False)

# Changing rank to an integer
mhealth_rank['rank'] = mhealth_rank['rank'].astype(int)

# Checking dataframe
mhealth_rank

  mhealth_score = mhealth_score.fillna(mhealth_score.median())


Unnamed: 0,location,total_score,rank
0,Puerto Rico,51.619048,1
1,Hawaii,47.095238,2
2,South Dakota,46.22381,3
3,Guam,44.457143,4
4,Illinois,43.885714,5
5,North Dakota,43.819048,6
6,New York,42.019048,7
7,Virgin Islands,42.0,8
8,Alaska,41.62381,9
9,Nebraska,41.371429,10


### 3.5 Dietary health ranking

<div class='alert alert-block alert-warning'>
    <b>Ranking calculation:</b><br> 
(% Daily fruit eaters) + (% Daily vegetable eaters)
   </div>

In [16]:
# Calculating Dietary health ranking

# Making a dataframe of the variables to be used in this calculation
dhealth_score = df[['location',
                    'Fruit eaten: Daily',
                    'Vegetables eaten: Daily'
                   ]].copy()

# No weight applied

# Calculating total_score as a sum of the weighted variables
dhealth_score = dhealth_score.fillna(dhealth_score.median())
dhealth_score['total_score'] = (dhealth_score['Fruit eaten: Daily'] +
                                dhealth_score['Vegetables eaten: Daily']
                               )
                            
# Reducing dataframe to location and total_score, sorting by total_score in descending order
dhealth_rank = dhealth_score[['location', 'total_score']].sort_values(by = ['total_score'], ascending = False).reset_index(drop = True)

# Calculating rank
dhealth_rank['rank'] = dhealth_rank['total_score'].rank(method='min', ascending = False)

# Changing rank to an integer
dhealth_rank['rank'] = dhealth_rank['rank'].astype(int)

# Checking dataframe
dhealth_rank

  dhealth_score = dhealth_score.fillna(dhealth_score.median())


Unnamed: 0,location,total_score,rank
0,Vermont,153.525,1
1,Maine,151.725,2
2,New Hampshire,150.825,3
3,District of Columbia,149.775,4
4,Massachusetts,149.45,5
5,Oregon,147.85,6
6,Washington,147.05,7
7,Connecticut,147.05,7
8,Colorado,145.7,9
9,Idaho,144.85,10


### 3.6 Physical activity ranking

<div class='alert alert-block alert-warning'>
    <b>Ranking calculation:</b><br> 
(% Aerobically active 150 min per week * 1) + (% Strength train 2x per week * 1) + (% Physically active in last month * 0.5)
   </div>

In [17]:
# Calculating Physical activity ranking

# Making a dataframe of the variables to be used in this calculation
activity_score = df[['location',
                     'Aerobic activity per week: Over 150 min',
                     'Strength training: 2x+ per week',
                     'Physically active in last month: Yes'
                   ]].copy()

# Applying weights to the variables
activity_score['Physically active in last month: Yes'] *= 0.5

# Calculating total_score as a sum of the weighted variables
activity_score = activity_score.fillna(activity_score.median())
activity_score['total_score'] = (activity_score['Aerobic activity per week: Over 150 min'] +
                                 activity_score['Strength training: 2x+ per week'] +
                                 activity_score['Physically active in last month: Yes']
                               )
                            
# Reducing dataframe to location and total_score, sorting by total_score in descending order
activity_rank = activity_score[['location', 'total_score']].sort_values(by = ['total_score'], ascending = False).reset_index(drop = True)

# Calculating rank
activity_rank['rank'] = activity_rank['total_score'].rank(method='min', ascending = False)

# Changing rank to an integer
activity_rank['rank'] = activity_rank['rank'].astype(int)

# Checking dataframe
activity_rank

  activity_score = activity_score.fillna(activity_score.median())


Unnamed: 0,location,total_score,rank
0,Colorado,137.288095,1
1,Vermont,133.961905,2
2,Alaska,132.904762,3
3,Washington,132.742857,4
4,District of Columbia,132.042857,5
5,Montana,131.92619,6
6,Utah,131.211905,7
7,Hawaii,130.764286,8
8,Oregon,130.104762,9
9,California,129.595238,10


### 3.7 Total ranking
Ranking calculation:  
BMI score + General health score + Doctor visits score + Mental health score + Dietary health score + Physical activity score

In [18]:
# Making a dataframe of BMI, general health, mental health, dietary health, doctor visits, and physical activity scores and ranks

# Creating data from from a copy of bmi_rank
total_rank = bmi_rank.copy()
# Changing score and rank to reflect origin dataframe
total_rank.rename(columns={'total_score': "bmi_score", "rank": "bmi_rank"}, inplace = True)

# Merging with health_rank
total_rank = pd.merge(total_rank, health_rank, how='left', left_on=['location'], right_on=['location'])
# Changing score and rank to reflect origin dataframe
total_rank.rename(columns={'total_score': "general_health_score", "rank": "general_health_rank"}, inplace = True)

# Merging with mhealth_rank
total_rank = pd.merge(total_rank, mhealth_rank, how='left', left_on=['location'], right_on=['location'])
# Changing score and rank to reflect origin dataframe
total_rank.rename(columns={'total_score': "mental_health_score", "rank": "mental_health_rank"}, inplace = True)

# Merging with dhealth_rank
total_rank = pd.merge(total_rank, dhealth_rank, how='left', left_on=['location'], right_on=['location'])
# Changing score and rank to reflect origin dataframe
total_rank.rename(columns={'total_score': "dietary_health_score", "rank": "dietary_health_rank"}, inplace = True)

# Merging with doc_rank
total_rank = pd.merge(total_rank, doc_rank, how='left', left_on=['location'], right_on=['location'])
# Changing score and rank to reflect origin dataframe
total_rank.rename(columns={'total_score': "doctor_visits_score", "rank": "doctor_visits_rank"}, inplace = True)

# Merging with activity_rank
total_rank = pd.merge(total_rank, activity_rank, how='left', left_on=['location'], right_on=['location'])
# Changing score and rank to reflect origin dataframe
total_rank.rename(columns={'total_score': "physical_activity_score", "rank": "physical_activity_rank"}, inplace = True)

# Checking dataframe
total_rank

Unnamed: 0,location,bmi_score,bmi_rank,general_health_score,general_health_rank,mental_health_score,mental_health_rank,dietary_health_score,dietary_health_rank,doctor_visits_score,doctor_visits_rank,physical_activity_score,physical_activity_rank
0,District of Columbia,2.414286,1,85.485714,1,28.919048,46,149.775,4,69.138095,9,132.042857,5
1,Colorado,-2.878571,2,79.578571,5,35.8,28,145.7,9,50.985714,49,137.288095,1
2,Hawaii,-3.9,3,75.207143,20,47.095238,2,138.15,33,64.9,18,130.764286,8
3,Massachusetts,-6.585714,4,79.7,4,37.447619,22,149.45,5,72.066667,2,122.292857,20
4,Vermont,-7.885714,5,79.114286,8,31.714286,42,153.525,1,61.878571,30,133.961905,2
5,Utah,-9.614286,6,80.321429,2,26.595238,49,144.35,12,48.885714,51,131.211905,7
6,New York,-9.7,7,74.364286,23,42.019048,7,143.5,17,69.442857,7,118.304762,32
7,California,-9.714286,8,72.928571,28,37.085714,24,143.225,18,55.342857,41,129.595238,10
8,Montana,-11.55,9,75.964286,14,35.566667,30,143.8,16,52.935714,46,131.92619,6
9,New Jersey,-11.766667,10,75.816667,15,40.283333,14,144.3,14,69.7,6,117.75,33


In [19]:
# Calculating total ranking across all variables

# Finding mean rank per state
total_rank['ave_rank'] = ((total_rank['bmi_rank'] +
                           total_rank['general_health_rank'] +
                           total_rank['mental_health_rank'] +
                           total_rank['dietary_health_rank'] +
                           total_rank['doctor_visits_rank'] +
                           total_rank['physical_activity_rank'])/6
                          )

# Calculating rank
total_rank['overall_health_rank'] = total_rank['ave_rank'].rank(method='min')

# Changing rank to an integer
total_rank['overall_health_rank'] = total_rank['overall_health_rank'].astype(int)

# Sorting by overall_health_rank
total_rank = total_rank.sort_values(by = ['overall_health_rank'], ascending = True).reset_index(drop = True)

# Checking dataframe
total_rank

Unnamed: 0,location,bmi_score,bmi_rank,general_health_score,general_health_rank,mental_health_score,mental_health_rank,dietary_health_score,dietary_health_rank,doctor_visits_score,doctor_visits_rank,physical_activity_score,physical_activity_rank,ave_rank,overall_health_rank
0,Connecticut,-12.514286,12,79.778571,3,40.557143,12,147.05,7,69.78,5,126.7,15,9.0,1
1,Massachusetts,-6.585714,4,79.7,4,37.447619,22,149.45,5,72.066667,2,122.292857,20,9.5,2
2,District of Columbia,2.414286,1,85.485714,1,28.919048,46,149.775,4,69.138095,9,132.042857,5,11.0,3
3,Hawaii,-3.9,3,75.207143,20,47.095238,2,138.15,33,64.9,18,130.764286,8,14.0,4
4,Vermont,-7.885714,5,79.114286,8,31.714286,42,153.525,1,61.878571,30,133.961905,2,14.666667,5
5,New Hampshire,-14.678571,16,79.521429,6,34.880952,34,150.825,3,64.895238,19,128.77381,11,14.833333,6
6,New Jersey,-11.766667,10,75.816667,15,40.283333,14,144.3,14,69.7,6,117.75,33,15.333333,7
7,New York,-9.7,7,74.364286,23,42.019048,7,143.5,17,69.442857,7,118.304762,32,15.5,8
8,Maryland,-17.671429,24,77.2,10,40.004762,15,143.825,15,68.404762,10,123.828571,19,15.5,8
9,Colorado,-2.878571,2,79.578571,5,35.8,28,145.7,9,50.985714,49,137.288095,1,15.666667,10


# 4. Export

In [20]:
# Exporting total_rank to csv

total_rank.to_csv(os.path.join(path, '02 Data','Prepared Data', 'BRFSS Overall Health by State - Rankings.csv'))

# 5. Conclusion

<div class='alert alert-block alert-warning'>
    <b>Summary</b><br><br>
    <li>No state scored high across the board.
    <li>Every state has areas that could be improved upon.
    <li>Some states have fewer areas to improve upon than others. I.e., the District of Columbia ranked in the top 5 for 4 categories, but 46th for mental health.
    <li>Fine tuning the weight system for question responses may yield more accurate calulations.
</div>

<div class='alert alert-block alert-success'>
    <b>Next steps</b><br><br>
    <li><b>Make a dashboard for state health statistics and ranks</b> based on the findings of this and previous analyses.
    <li>Research methods to more accurately weigh question responses and health categories.
    <li>Start preparations for Overall Health by Demographic analysis.
   </div>