- The financial crisis in 2008 may impact the investment in products, which is also likely to see a decrease in patent applications overall. The U.S. was the ground zero of the crisis and was probably mostly impacted by the crisis. Therefore, we expect to see an overall decrease in patent applications between 2007 to 2009, compared to the previous and post-financial crisis.
- - Datasets: g_location_disambiguated.tsv, g_application.tsv, g_inventor_disambiguated.tsv
- University dataset from https://hifld-geoplatform.hub.arcgis.com/datasets/geoplatform::colleges-and-universities/about

In [1]:
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px
import plotly.io as pio
import json
import requests

In [2]:
loc_df = pd.read_csv("g_location_disambiguated.tsv", sep="\t", header=0)

In [3]:
inv_df = pd.read_csv("g_inventor_disambiguated.tsv", sep="\t", header=0)

In [4]:
univ_df = pd.read_csv("Colleges_and_Universities_-3122497483864735259.csv")

In [5]:
# https://carnegieclassifications.acenet.edu/carnegie-classification/classification-methodology/2025-institutional-classification/
research_df = pd.read_excel("2025-Public-Data-File.xlsx", sheet_name="data")

In [6]:
# The merge code from chatgpt, I asked how to add location data to the inventor table
inv_loc_df = inv_df.merge(
    loc_df[['location_id', 'latitude', 'longitude', 'state_fips', 'county_fips', 'disambig_country']],
    on="location_id",
    how="left")

In [7]:
research_df = research_df[research_df['research2025'] == 1]
reseach_uni_df = research_df.merge(
    univ_df[["IPEDSID", 'COUNTYFIPS', 'LATITUDE', 'LONGITUDE']],
    left_on="unitid",
    right_on="IPEDSID",
    how="left"
)
reseach_uni_df = reseach_uni_df.drop(columns=["IPEDSID"])

In [8]:
us_inv_loc_df = inv_loc_df[inv_loc_df['disambig_country'] == 'US']
# https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sample.html
loc_sample_df = us_inv_loc_df.sample(n=500000, random_state=42)
print(len(us_inv_loc_df))
print(len(loc_sample_df))

11296809
500000


In [9]:
fig = go.Figure()

# 🟦 Add individuals
fig.add_trace(go.Scattergeo(
    lon = loc_sample_df['longitude'],
    lat = loc_sample_df['latitude'],
    hovertext = loc_sample_df['location_id'],  # or individual ID
    mode = 'markers',
    marker=dict(
        size=4,
        color='blue',
        opacity=0.6
    ),
    name='Patent'
))

# 🟥 Add universities
fig.add_trace(go.Scattergeo(
    lon = reseach_uni_df['LONGITUDE'],
    lat = reseach_uni_df['LATITUDE'],
    hovertext = reseach_uni_df['instnm'],
    mode = 'markers',
    marker=dict(
        size=7,
        color='red',
        symbol='star'
    ),
    textposition='top center',
    name='Universities'
))

# 🌎 Map layout
fig.update_layout(
    title='Individuals and Universities on USA Map',
    geo=dict(
        scope='usa',
        showland=True,
        landcolor='lightgray',
        showlakes=True,
        lakecolor='lightblue',
    ),
    legend=dict(
        yanchor="top",
        y=0.99,
        xanchor="left",
        x=0.01
    )
)

pio.renderers.default = 'browser'
fig.show()

In [10]:
## Drop rows with missing FIPS values
clean_df = inv_loc_df.dropna(subset=['state_fips', 'county_fips'])

# Convert to integers (handle float like 17.0 → 17)
clean_df['state_fips'] = clean_df['state_fips'].astype(int)
clean_df['county_fips'] = clean_df['county_fips'].astype(int)

# Combine to full 5-digit county FIPS code (e.g., 17031 for Cook County, IL)
clean_df['county_fips'] = (
    clean_df['state_fips'].astype(str).str.zfill(2) +
    clean_df['county_fips'].astype(str).str.zfill(3)
)

# Count individuals per county
county_counts = clean_df['county_fips'].value_counts().reset_index()
county_counts.columns = ['county_fips', 'count']
county_counts[:5]



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



Unnamed: 0,county_fips,count
0,6085,962585
1,53033,377284
2,6073,356272
3,25017,297127
4,6037,296847


In [11]:
# Download U.S. counties GeoJSON (simplified)
geojson_url = "https://raw.githubusercontent.com/plotly/datasets/master/geojson-counties-fips.json"
counties_geo = requests.get(geojson_url).json()

In [12]:
# Choropleth for individual density
fig = px.choropleth(
    county_counts,
    geojson=counties_geo,
    locations='county_fips',
    color='count',
    color_continuous_scale="Blues",
   # range_color=(0, 12),
    scope="usa",
    labels={'count': 'Patents'},
    title="Patent Density per County with Universities"
)

# Add universities as points
fig.add_trace(go.Scattergeo(
    lon = reseach_uni_df['LONGITUDE'],
    lat = reseach_uni_df['LATITUDE'],
    hovertext = reseach_uni_df['instnm'],
    mode = 'markers',
    marker=dict(
        size=7,
        color='red',
        symbol='star'
    ),
    name='Universities'
))

# Update layout for map styling
#fig.update_geos(fitbounds="locations", visible=False)
fig.update_layout(margin={"r":0,"t":40,"l":0,"b":0})

pio.renderers.default = 'browser'
fig.show()

In [13]:
# 1. mean density of patent application count for university counties
# - Add (total patent count for the county / total patent applications) for university counties
# - Divided by number of counties occupied with universities
# 2. mean density of patent application count for other than university counties

univ_counties = set(reseach_uni_df['COUNTYFIPS'])

# Flag whether each county has a university
county_counts['has_university'] = county_counts['county_fips'].isin(univ_counties)

# Compare average density
summary = county_counts.groupby('has_university')['count'].describe()
print(summary)

                 count          mean           std    min     25%      50%  \
has_university                                                               
False           3022.0   1623.593977   7647.321405    1.0    31.0    124.0   
True             153.0  41576.032680  98580.299467  530.0  4368.0  13235.0   

                    75%       max  
has_university                     
False             544.5  213359.0  
True            35304.0  962585.0  


In [14]:
summary

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
has_university,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
False,3022.0,1623.593977,7647.321405,1.0,31.0,124.0,544.5,213359.0
True,153.0,41576.03268,98580.299467,530.0,4368.0,13235.0,35304.0,962585.0
