<a href="https://colab.research.google.com/github/Jlok17/2022MSDS/blob/main/Story_4_Data_608.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Story 4 Data 608

I have introduced the term "Data Practitioner" as a generic job descriptor because we have so many different job role titles for individuals whose work activities overlap including Data Scientist, Data Engineer, Data Analyst, Business Analyst, Data Architect, etc. For this story we will answer the question, "How much do we get paid?" Your analysis and data visualizations must address the variation in average salary based on role descriptor and state.


#### Objectives:



1.   You will need to identify reliable sources for salary data and assemble the data sets that you will need.
2.   Your visualization(s) must show the most salient information (variation in average salary by role and by state).
3.   For this Story you must use a code library and code that you have written in R, Python or Java Script (additional coding in other languages is allowed).
4.   Post generation enhancements to you generated visualization will be allowed (e.g. Addition of kickers and labels).


In [91]:
!pip install dash
!pip install plotly-geo

Collecting plotly-geo
  Downloading plotly_geo-1.0.0-py3-none-any.whl (23.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m23.7/23.7 MB[0m [31m40.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: plotly-geo
Successfully installed plotly-geo-1.0.0


In [93]:
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output
import dash
import numpy as np

In [94]:
## Websites where I gathered the data into the Google Sheet

jobs = ["Data-Scientist","Data-Engineer","Data-Analyst","Business-Analyst","Data-Architect"]
url_link = 'https://www.ziprecruiter.com/Salaries/What-Is-the-Average-{}-Salary-by-State'
for i in jobs:
  url = url_link.format(i)
  print(url)

https://www.ziprecruiter.com/Salaries/What-Is-the-Average-Data-Scientist-Salary-by-State
https://www.ziprecruiter.com/Salaries/What-Is-the-Average-Data-Engineer-Salary-by-State
https://www.ziprecruiter.com/Salaries/What-Is-the-Average-Data-Analyst-Salary-by-State
https://www.ziprecruiter.com/Salaries/What-Is-the-Average-Business-Analyst-Salary-by-State
https://www.ziprecruiter.com/Salaries/What-Is-the-Average-Data-Architect-Salary-by-State


In [95]:
df = pd.read_csv("https://raw.githubusercontent.com/Jlok17/2022MSDS/main/Source/Job_State%20Salary%20-%20Sheet1.csv", sep = ",")

In [96]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 250 entries, 0 to 249
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   State          250 non-null    object 
 1   Annual Salary  250 non-null    object 
 2   Monthly Pay    250 non-null    object 
 3   Weekly Pay     250 non-null    object 
 4   Hourly Wage    250 non-null    float64
 5   Job            250 non-null    object 
dtypes: float64(1), object(5)
memory usage: 11.8+ KB


In [97]:
# Changing Data Type from Object to Float
df['Annual Salary'] = pd.to_numeric(df['Annual Salary'].str.replace(',', ''), errors='coerce')
df['Monthly Pay'] = pd.to_numeric(df['Monthly Pay'].str.replace(',', ''), errors='coerce')
df['Weekly Pay'] = pd.to_numeric(df['Weekly Pay'].str.replace(',', ''), errors='coerce')

In [98]:
df.head()

Unnamed: 0,State,Annual Salary,Monthly Pay,Weekly Pay,Hourly Wage,Job
0,New York,136172.0,11347.0,2618.0,65.47,Data Scientist
1,Vermont,133828.0,11152.0,2573.0,64.34,Data Scientist
2,California,131441.0,10953.0,2527.0,63.19,Data Scientist
3,Maine,127644.0,10637.0,2454.0,61.37,Data Scientist
4,Idaho,126275.0,10522.0,2428.0,60.71,Data Scientist


In [99]:
# Box plot by Job
fig_job = px.box(df, x='Job', y='Annual Salary', title='Annual Salary Distribution by Job')

# Box Plot by State
fig_state = px.box(df, x='State', y='Annual Salary', title='Annual Salary Distribution by State')


fig_job.show()
fig_state.show()

In [100]:
df2 = df.copy()
# Calculate the average annual salary for each state
avg_salary_per_state = df.groupby('State')['Annual Salary'].mean().reset_index()

# Rename the column for clarity
avg_salary_per_state = avg_salary_per_state.rename(columns={'Annual Salary': 'Avg Annual Salary'})

# Print the resulting DataFrame
print(avg_salary_per_state)

             State  Avg Annual Salary
0          Alabama           103565.4
1           Alaska           121887.2
2          Arizona           106479.2
3         Arkansas            92122.4
4       California           116585.0
5         Colorado           116922.2
6      Connecticut           102900.4
7         Delaware           111996.4
8          Florida            85413.0
9          Georgia            96479.2
10          Hawaii           118263.0
11           Idaho           109211.4
12        Illinois           107957.2
13         Indiana           108727.2
14            Iowa           103908.2
15          Kansas            97589.2
16        Kentucky            94945.4
17       Louisiana            95055.2
18           Maine           113532.2
19        Maryland           111035.8
20   Massachusetts           122342.2
21        Michigan            96869.0
22       Minnesota           108872.0
23     Mississippi           103650.6
24        Missouri           101600.4
25         M

In [127]:
# Create the Dash app
app = dash.Dash(__name__)

# Create the layout of the app
app.layout = html.Div([
    dcc.Graph(id='salary-map', figure={})
])

# Define the callback to update the map
@app.callback(
    Output('salary-map', 'figure'),
    Input('salary-map', 'hoverData')
)
def update_map(hoverData):
    # Create the choropleth map using Plotly Express
    fig = px.choropleth(
        avg_salary_per_state,
        locations='State',
        locationmode='USA-states',
        color='Avg Annual Salary',
        color_continuous_scale='Viridis',
        range_color=(avg_salary_per_state['Avg Annual Salary'].min(), avg_salary_per_state['Avg Annual Salary'].max()),
        scope='usa',
        labels={'Avg Annual Salary': 'Average Annual Salary ($)'}
    )

    # Update layout
    #fig.update_layout(
     #   title='Average Annual Salaries in the US by State',
      #  geo=dict(
       #     scope='usa',
        #    projection=go.layout.geo.Projection(type='albers usa'),
         #   showlakes=True,
          #  lakecolor='rgb(255, 255, 255)'
        #)
    #)

    return fig

# Run the app
if __name__ == '__main__':
    app.run_server(debug=True)

<IPython.core.display.Javascript object>

In [102]:
avg_salary_per_state['State'].unique()

array(['Alabama', 'Alaska', 'Arizona', 'Arkansas', 'California',
       'Colorado', 'Connecticut', 'Delaware', 'Florida', 'Georgia',
       'Hawaii', 'Idaho', 'Illinois', 'Indiana', 'Iowa', 'Kansas',
       'Kentucky', 'Louisiana', 'Maine', 'Maryland', 'Massachusetts',
       'Michigan', 'Minnesota', 'Mississippi', 'Missouri', 'Montana',
       'Nebraska', 'Nevada', 'New Hampshire', 'New Jersey', 'New Mexico',
       'New York', 'North Carolina', 'North Dakota', 'Ohio', 'Oklahoma',
       'Oregon', 'Pennsylvania', 'Rhode Island', 'South Carolina',
       'South Dakota', 'Tennessee', 'Texas', 'Utah', 'Vermont',
       'Virginia', 'Washington', 'West Virginia', 'Wisconsin', 'Wyoming'],
      dtype=object)

In [107]:
import plotly.figure_factory as ff
!pip install plotly-geo



In [125]:
import plotly.graph_objects as go

# Define the choropleth map using go.Figure and go.Choropleth
heat_map2 = go.Figure(go.Choropleth(
    locations = avg_salary_per_state['State'],  # Specify the locations (states)
    z = avg_salary_per_state['Avg Annual Salary'],  # Specify the values for the color scale
    locationmode = 'USA-states',  # Set the location mode to USA-states
    colorscale = 'sunset',  # Set color scale
    colorbar_title = 'Average Annual Salary ($)',  # Set color bar title
    hoverinfo = 'location+z',  # Specify hover info to include location and value
    text = avg_salary_per_state['State'],  # Specify the text to be displayed on hover
))

# Update the layout
heat_map2.update_layout(
    title = 'Average Annual Salaries in the US by State',
    geo = dict(
        scope = 'usa',
        projection = dict(type = 'albers usa'),
        showlakes = True,
        lakecolor = 'rgb(255, 255, 255)'
    ),
    annotations = [
        dict(
            x = 0.05,  # X-coordinate of the subtitle
            y = 1.00,  # Y-coordinate of the subtitle (adjusted for the subtitle)
            xref = 'paper',
            yref = 'paper',
            text = 'Can stricter gun laws reduce firearm gun deaths?',  # Text of the subtitle
            showarrow = False,  # Set to True if you want an arrow pointing to the subtitle
            font = dict(size = 14)  # Adjust the font size as needed
        ),
        dict(
            x = 0.05,  # X-coordinate of the note
            y = 0.05,  # Y-coordinate of the note
            xref = 'paper',
            yref = 'paper',
            text = 'Rank is from 1 being an A and 5 being an F',  # Text of the note
            showarrow = False  # Set to True if you want an arrow pointing to the note
        )
    ]
)

# Display the heat map
heat_map2.show()
