# Makeover Monday | 2026 W4 Job titles in Highest Deman - State by State

The data for this weeks Makeover Monday challenge came from the article published by Lightcast "[Job titles in highest-demand, state by state](https://lightcast.io/resources/blog/most-posted-for-jobs-in-each-us-state)" (published on Nov 19, 2025 by Hannah Grieser & JP Lespinasse). The article provides a high-level view of the overall national and state-level hiring trends and includes an interactive US map encouraging users to explore the data for themselves. The data used in this notebook and the final Tableau Dashboard was prepared by the [Makeover Monday](https://makeovermonday.co.uk/), for use in the Tableau Makeover Monday Data Visualization challenge.

**How Lightcast Collects their Data**

I was interested in understanding how this data was collected, so I took a closer look at Lightcast’s methodology. Lightcast is a labor market and economic data intelligence company that specializes in large-scale job posting and workforce data across the world.
Their data is compiled from millions of online job postings, employer career sites, and job boards, which are then cleaned and refined by AI and human experts to create a comprehensive database of job posting activity.

see their [About : How We Do It](https://lightcast.io/why-lightcast/about) section on their website for mor information



* **Article Link**: https://lightcast.io/resources/blog/most-posted-for-jobs-in-each-us-state

* **Compiled Data**: (*publisehd by Makeover Monday*):https://data.world/makeovermonday/2025w4-most-posted-us-jobs-by-state


In [46]:
import pandas as pd
import pandas_gbq
import numpy as np
import geopandas as gpd #you need this to use the spatial files

In [47]:
df = pd.read_csv('https://query.data.world/s/cexlh3qmmucu3oifgigaual53fahxm?dws=00000',sep=';')

## Exploratory Data Analysis

In [48]:
df.describe(include='all').T

Unnamed: 0,count,unique,top,freq
State Name,50,50,Alabama,1
1st Place,50,2,Registered Nurse,49
2nd Place,50,5,Retail Sales Associate,31
3rd Place,50,8,Tractor-Trailer Truck Driver,16


In [49]:
df

Unnamed: 0,State Name,1st Place,2nd Place,3rd Place
0,Alabama,Registered Nurse,Tractor-Trailer Truck Driver,Retail Sales Associate
1,Alaska,Registered Nurse,Retail Sales Associate,Physician
2,Arizona,Registered Nurse,Retail Sales Associate,Physician
3,Arkansas,Registered Nurse,Tractor-Trailer Truck Driver,Retail Sales Associate
4,California,Registered Nurse,Retail Sales Associate,Sales Representative
5,Colorado,Registered Nurse,Retail Sales Associate,Tractor-Trailer Truck Driver
6,Connecticut,Registered Nurse,Retail Sales Associate,Physician
7,Delaware,Registered Nurse,Software Developer / Engineer,Retail Sales Associate
8,Florida,Registered Nurse,Retail Sales Associate,Physician
9,Georgia,Registered Nurse,Retail Sales Associate,Tractor-Trailer Truck Driver


In [50]:
first_place = df['1st Place'].unique()
print(first_place)

['Registered Nurse' 'Retail Sales Associate']


In [51]:
second_place = df['2nd Place'].unique()
print(second_place)

['Tractor-Trailer Truck Driver' 'Retail Sales Associate'
 'Software Developer / Engineer' 'Physician' 'Registered Nurse']


In [52]:
third_place = df['3rd Place'].unique()
print(third_place)

['Retail Sales Associate' 'Physician' 'Sales Representative'
 'Tractor-Trailer Truck Driver' 'Customer Service Representative'
 'Retail Store Manager / Supervisor' 'Software Developer / Engineer'
 'Licensed Practical / Vocational Nurse']


# Plan for Tableau Viz: **Top Jobs vs Popular Jobs**

The plan for this visualization is to explore the relationship between top-ranked jobs (1st place by state) and overall job popularity across the US using a simple bar chart paired with a supporting hexmap.

**Have you ever heard of Ranked Choice Voting?**

This was the first thing that came to mind when looking at this data, and I immediately thought - how can I show the interaction between the top job (1st place - ranked choice) and the most popular jobs in the USA. Maybe they are the same, maybe they aren't.
There isn't a lot of data processing that needs to be done for this, the data is all here we just need to reshape it to be "long" so my "melting technique" which I use for everything will apply - and then we just count!

**Hexmap to show Highlight Geospatial Relationship**

My favorite way to show US state data (when you dont really care about where in the state the data is coming from) is to use a Hexmap and I find myself going back to this saved YouTube video over and over :
[Tableau Hexagon Map Tutorial: How to Create Padded & Non-Padded Hex Maps](https://www.youtube.com/watch?v=f4teeqBkwLs).
The Hexmap in my viz is not going to do the heavy lifting, its going to be an aside in the dashboard - what I really want to show is the bar chart.

Here is where is gets complicated - in the youtube tutorial they use a relationship - which is great but I dont want to do that.  I will be enriching my df_long with the spatial files from https://stanke.co/hexstatespadded/
I dont want to pull in two different files into Tableau - I just want 1.

## Data Processing

In [53]:
#clean up the column names to snake_case
df_1 = df.copy()
df_1.columns = (
    df.columns
      .astype(str)
      .str.strip()
      .str.replace(r"\s+", "_", regex=True)
      .str.lower()
)

df_1

Unnamed: 0,state_name,1st_place,2nd_place,3rd_place
0,Alabama,Registered Nurse,Tractor-Trailer Truck Driver,Retail Sales Associate
1,Alaska,Registered Nurse,Retail Sales Associate,Physician
2,Arizona,Registered Nurse,Retail Sales Associate,Physician
3,Arkansas,Registered Nurse,Tractor-Trailer Truck Driver,Retail Sales Associate
4,California,Registered Nurse,Retail Sales Associate,Sales Representative
5,Colorado,Registered Nurse,Retail Sales Associate,Tractor-Trailer Truck Driver
6,Connecticut,Registered Nurse,Retail Sales Associate,Physician
7,Delaware,Registered Nurse,Software Developer / Engineer,Retail Sales Associate
8,Florida,Registered Nurse,Retail Sales Associate,Physician
9,Georgia,Registered Nurse,Retail Sales Associate,Tractor-Trailer Truck Driver


In [54]:
##create the df_long which unpivot the data

df_1['count'] = 1

df_long = (
    df_1
      .melt(
        id_vars=["state_name"],
        value_vars=["1st_place", "2nd_place", "3rd_place"],  # explicitly only these
        var_name="rank_text",
        value_name="job_title"
      )
)

df_long["rank"] = (
    df_long["rank_text"]
    .str.extract(r"(\d+)")      # pulls 1,2,3
    .astype(int)
)

df_long["count"] = 1

df_long

Unnamed: 0,state_name,rank_text,job_title,rank,count
0,Alabama,1st_place,Registered Nurse,1,1
1,Alaska,1st_place,Registered Nurse,1,1
2,Arizona,1st_place,Registered Nurse,1,1
3,Arkansas,1st_place,Registered Nurse,1,1
4,California,1st_place,Registered Nurse,1,1
...,...,...,...,...,...
145,Virginia,3rd_place,Retail Sales Associate,3,1
146,Washington,3rd_place,Physician,3,1
147,West Virginia,3rd_place,Tractor-Trailer Truck Driver,3,1
148,Wisconsin,3rd_place,Retail Sales Associate,3,1


In [55]:
job_title_count = df_long['job_title'].nunique()
job_title = df_long['job_title'].unique()
print(f'{job_title_count} unique jobs'),
print(job_title)

9 unique jobs
['Registered Nurse' 'Retail Sales Associate'
 'Tractor-Trailer Truck Driver' 'Software Developer / Engineer'
 'Physician' 'Sales Representative' 'Customer Service Representative'
 'Retail Store Manager / Supervisor'
 'Licensed Practical / Vocational Nurse']


### Enrich the Data with the Spatial Data

In [56]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [57]:
hex_gdf = gpd.read_file("/content/drive/MyDrive/Makeover Monday/HexStatesPadded/HexStatesPadded.shp")

### Inspect the HeStates Padded Shapefile

We want to see:
 * state_name or state
 * state_abbr
 * geometry << this is the polygon object
This geometry column is the hex shape, that fit so nicely together and are arranged so perfectly. I am not going to try to recreate these - as someone else has already done the work!
see https://stanke.co/hexstatespadded/

In [69]:
hex_gdf.columns

Index(['State', 'State_Abbr', 'geometry'], dtype='object')

In [70]:
hex_gdf.head()

Unnamed: 0,State,State_Abbr,geometry
0,Alabama,AL,"POLYGON ((14.06 0.133, 14.06 1.156, 14.99 1.714, 15.92 1.156, 15.92 0.133, 14.99 -0.425, 14.06 0.133))"
1,Alaska,AK,"POLYGON ((0.06 10.333, 0.06 11.356, 0.99 11.914, 1.92 11.356, 1.92 10.333, 0.99 9.775, 0.06 10.333))"
2,Arizona,AZ,"POLYGON ((5.06 1.833, 5.06 2.856, 5.99 3.414, 6.92 2.856, 6.92 1.833, 5.99 1.275, 5.06 1.833))"
3,Arkansas,AR,"POLYGON ((11.06 1.833, 11.06 2.856, 11.99 3.414, 12.92 2.856, 12.92 1.833, 11.99 1.275, 11.06 1.833))"
4,California,CA,"POLYGON ((3.06 1.833, 3.06 2.856, 3.99 3.414, 4.92 2.856, 4.92 1.833, 3.99 1.275, 3.06 1.833))"


In [86]:
pd.set_option('display.max_colwidth', None)
hex_gdf[hex_gdf["State"]=="Alabama"]

Unnamed: 0,State,State_Abbr,geometry
0,Alabama,AL,"POLYGON ((14.06 0.133, 14.06 1.156, 14.99 1.714, 15.92 1.156, 15.92 0.133, 14.99 -0.425, 14.06 0.133))"


In [100]:
hex_gdf_1 = hex_gdf.copy()
hex_gdf_1.columns = (
    hex_gdf.columns
      .astype(str)
      .str.strip()
      .str.replace(r"\s+", "_", regex=True)
      .str.lower()
)

hex_gdf_1.head()

Unnamed: 0,state,state_abbr,geometry
0,Alabama,AL,"POLYGON ((14.06 0.133, 14.06 1.156, 14.99 1.714, 15.92 1.156, 15.92 0.133, 14.99 -0.425, 14.06 0.133))"
1,Alaska,AK,"POLYGON ((0.06 10.333, 0.06 11.356, 0.99 11.914, 1.92 11.356, 1.92 10.333, 0.99 9.775, 0.06 10.333))"
2,Arizona,AZ,"POLYGON ((5.06 1.833, 5.06 2.856, 5.99 3.414, 6.92 2.856, 6.92 1.833, 5.99 1.275, 5.06 1.833))"
3,Arkansas,AR,"POLYGON ((11.06 1.833, 11.06 2.856, 11.99 3.414, 12.92 2.856, 12.92 1.833, 11.99 1.275, 11.06 1.833))"
4,California,CA,"POLYGON ((3.06 1.833, 3.06 2.856, 3.99 3.414, 4.92 2.856, 4.92 1.833, 3.99 1.275, 3.06 1.833))"


You can see that the geometry field is a POLYGON - and what we want to do is get the **boundary coordinates** i.e. the lines/vertices that make up the hexagon shape. To do this we need to convert the polygon to lines - and then pull out the points in those lines. This can be done using **exterior.coords** by accessing the coordinate points that define the outer boundary (exterior ring) of a polygon object.
Spot check Alabama (for instance) as we will ensure that the coordinates are flowing through)

In [98]:
rows = []

for _, r in hex_gdf_1.iterrows():
    state = r["state"]
    abbr = r["state_abbr"]
    poly = r.geometry

    # get 6 vertices (drop closing point if repeated)
    coords = list(poly.exterior.coords)
    if coords[0] == coords[-1]:
        coords = coords[:-1]

    # centroid
    cx, cy = poly.centroid.x, poly.centroid.y

    # sort vertices by angle around centroid
    pts = []
    for (x, y) in coords:
        angle = np.arctan2(y - cy, x - cx)
        pts.append((angle, x, y))
    pts.sort(key=lambda t: t[0])  # clockwise order

    for path, (_, x, y) in enumerate(pts):
        rows.append({
            "state": state,
            "state_abbr": abbr,

            "x": x, "y": y, "path": path})

hex_points_clean = pd.DataFrame(rows)
hex_points_clean["path"] = hex_points_clean["path"].astype(int)

In [103]:
hex_points_clean.groupby("state").size().describe()

Unnamed: 0,0
count,51.0
mean,6.0
std,0.0
min,6.0
25%,6.0
50%,6.0
75%,6.0
max,6.0


In [104]:
##rename state to state_name
hex_points_clean = hex_points_clean.rename(columns={'State': 'state_name'})
hex_points_clean.rename(columns={'state': 'state_name'}, inplace=True)

Double check Alabama again to make sure the coordinates are flowing thought as expected - and also check state_name cleaning step

In [105]:
hex_points_clean[hex_points_clean["state_name"]=="Alabama"].sort_values("path")

Unnamed: 0,state_name,state_abbr,x,y,path
0,Alabama,AL,14.06,0.133,0
1,Alabama,AL,14.99,-0.425,1
2,Alabama,AL,15.92,0.133,2
3,Alabama,AL,15.92,1.156,3
4,Alabama,AL,14.99,1.714,4
5,Alabama,AL,14.06,1.156,5


Now add the centroids/centerpoints so we can put some text in the middle of our Hexagons

In [108]:
import warnings #silence warnings
centroids = hex_gdf.copy()
centroids["x_center"] = centroids.geometry.centroid.x
centroids["y_center"] = centroids.geometry.centroid.y

hex_centers = centroids[["State", "x_center", "y_center"]]

warnings.filterwarnings("ignore", message="Geometry is in a geographic CRS") #just a warning...

In [76]:
hex_centers.shape

(51, 3)

In [77]:
hex_centers.head()

Unnamed: 0,State,x_center,y_center
0,Alabama,14.99,0.6445
1,Alaska,0.99,10.8445
2,Arizona,5.99,2.3445
3,Arkansas,11.99,2.3445
4,California,3.99,2.3445


Now join hex_points_clean to df_long for the final data and load that into Tableau for visualization!

In [109]:
df_final = pd.merge(
    df_long,
    hex_points_clean,
    on='state_name',
    how='left',
    validate="m:m"   # many df_long rows -> one geometry set per state
)

df_final.shape
##note - in df_long you have 3 rows per state, in hex_points_clean you have 6 rows per state

(900, 9)

In [112]:
df_final = df_final.merge(
    hex_centers,
    left_on='state_name',
    right_on='State',
    how="left"
)

In [113]:
df_final.head()

Unnamed: 0,state_name,rank_text,job_title,rank,count,state_abbr,x,y,path,State,x_center,y_center
0,Alabama,1st_place,Registered Nurse,1,1,AL,14.06,0.133,0,Alabama,14.99,0.6445
1,Alabama,1st_place,Registered Nurse,1,1,AL,14.99,-0.425,1,Alabama,14.99,0.6445
2,Alabama,1st_place,Registered Nurse,1,1,AL,15.92,0.133,2,Alabama,14.99,0.6445
3,Alabama,1st_place,Registered Nurse,1,1,AL,15.92,1.156,3,Alabama,14.99,0.6445
4,Alabama,1st_place,Registered Nurse,1,1,AL,14.99,1.714,4,Alabama,14.99,0.6445


In [114]:
df_final["state_rank_key"] = (
    df_final["state_name"] + " | rank " + df_final["rank"].astype(str)
)

In [115]:
# check to see if we have exactly 1 1 vertex per (state, rank, path)
bad = (
    df_final.groupby(["state_name", "rank", "path"])
    .size()
    .reset_index(name="n")
    .query("n != 1")
)

bad.head(), bad.shape
# if bad.shape[0] is 0, your geometry is clean.

(Empty DataFrame
 Columns: [state_name, rank, path, n]
 Index: [],
 (0, 4))

# Save to BigQuery Table

Save the df_final to a dedicated table in my BigQuery warehouse which I will then use with Connected Sheets to connect to Tableau Public.

In [116]:
#convert the df to target table in bigquery dataet
UPLOAD_TO_BQ = False # set True when you actually want to write tables

if UPLOAD_TO_BQ:
  project_id = 'your_project_id'
  destination_table = 'your_dataset.another_new_table'

  pandas_gbq.to_gbq(
      dataframe=df_final,
      destination_table=destination_table,
      project_id=project_id,
      if_exists='replace' ## 'if_exists' options: 'fail', 'replace', 'append'
  )

100%|██████████| 1/1 [00:00<00:00, 7869.24it/s]


# Tableau Dashboard

The [final dashboard](https://public.tableau.com/views/MoM2026_w4JobTitlesinHighestDemand/TopJobsacrosstheUnitedStatesin2025?:language=en-US&:sid=&:redirect=auth&:display_count=n&:origin=viz_share_link) was published to Tableau Public and explored the ![Top 3 Ranked Jobs posted in the United States in 2025](images/mom_2026_w4/tableaudashboard_mom_2026_w4_Topjobs.png)