# NSF Grant Directorate Analysis: 2025 vs 2021-2024 Comparison

This notebook analyzes grants awarded to NSF Directorates to compare funding patterns between 2025 and the 2021-2024 baseline period.

## Analysis Objectives

For each Directorate name (`org_dir_long_name`), we will compute:
1. The proportion of total funding in 2025 allocated to grants with this Directorate
2. The percentage of total grants in 2025 which belong to this Directorate
3. The proportion of total funding from 2015-2024 (average) allocated to grants with this Directorate
4. The percentage of total grants from 2015-2024 (average) which belong to this Directorate

## Step 1: Setup and Data Loading

In [1]:
import pandas as pd
import numpy as np
import json
import glob
from tqdm import tqdm
import os

# Set display options for better readability:
# Shows all columns regardless of how many there are
pd.set_option('display.max_columns', None)
# Each row will be displayed on a single line regardless of length, preventing text wrapping
pd.set_option('display.width', None)
# Shows up to 100 characters per cell before truncating
pd.set_option('display.max_colwidth', 100)

In [2]:
def load_nsf_grants_from_year(year):
    """
    Load all NSF grant JSON files for a given year and extract relevant data.
    Returns a DataFrame with awd_amount and org_dir_long_name.
    """
    print(f"Loading NSF grants for {year}...")
    
    # Find all JSON files for this year
    json_pattern = f"./data/nsf/{year}/*.json"
    json_files = glob.glob(json_pattern)
    
    print(f"  - Found {len(json_files)} JSON files")
    
    grants_data = []
    
    for json_file in tqdm(json_files, desc=f"Processing {year}"):
        try:
            with open(json_file, 'r', encoding='utf-8') as f:
                grant_data = json.load(f)
            
            # Extract award amount
            awd_amount = grant_data.get('awd_amount', 0)
            if awd_amount is None:
                awd_amount = 0
            
            # Extract organizational directory long name
            org_dir_long_name = grant_data.get('org_dir_long_name', '')
            if org_dir_long_name:
                org_dir_long_name = org_dir_long_name.strip()
            
            grants_data.append({
                'awd_amount': float(awd_amount),
                'org_dir_long_name': org_dir_long_name,
                'file_path': json_file
            })
            
        except Exception as e:
            print(f"    Error processing {json_file}: {e}")
            continue
    
    # Create DataFrame
    df = pd.DataFrame(grants_data)
    
    print(f"  - Successfully processed {len(df)} grants")
    print(f"  - Total funding: ${df['awd_amount'].sum():,.0f}")
    
    return df

In [3]:
# Load data for all years
years = [2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023, 2024, 2025]
dataframes_json_data = {}

for year in years:
    dataframes_json_data[year] = load_nsf_grants_from_year(year)
    print()

Loading NSF grants for 2015...
  - Found 12848 JSON files


Processing 2015: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 12848/12848 [00:02<00:00, 5078.50it/s]


  - Successfully processed 12848 grants
  - Total funding: $6,056,396,906

Loading NSF grants for 2016...
  - Found 12787 JSON files


Processing 2016: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 12787/12787 [00:02<00:00, 4999.84it/s]


  - Successfully processed 12787 grants
  - Total funding: $7,744,449,036

Loading NSF grants for 2017...
  - Found 12309 JSON files


Processing 2017: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 12309/12309 [00:02<00:00, 5151.17it/s]


  - Successfully processed 12309 grants
  - Total funding: $6,847,583,927

Loading NSF grants for 2018...
  - Found 12684 JSON files


Processing 2018: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 12684/12684 [00:02<00:00, 5086.45it/s]


  - Successfully processed 12684 grants
  - Total funding: $9,049,234,864

Loading NSF grants for 2019...
  - Found 12180 JSON files


Processing 2019: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 12180/12180 [00:02<00:00, 5103.96it/s]


  - Successfully processed 12180 grants
  - Total funding: $7,120,008,713

Loading NSF grants for 2020...
  - Found 13041 JSON files


Processing 2020: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 13041/13041 [00:02<00:00, 5015.31it/s]


  - Successfully processed 13041 grants
  - Total funding: $7,502,284,231

Loading NSF grants for 2021...
  - Found 12166 JSON files


Processing 2021: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 12166/12166 [00:02<00:00, 5561.15it/s]


  - Successfully processed 12166 grants
  - Total funding: $8,193,757,121

Loading NSF grants for 2022...
  - Found 11912 JSON files


Processing 2022: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 11912/11912 [00:02<00:00, 5579.25it/s]


  - Successfully processed 11912 grants
  - Total funding: $7,269,309,476

Loading NSF grants for 2023...
  - Found 12022 JSON files


Processing 2023: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 12022/12022 [00:02<00:00, 5191.24it/s]


  - Successfully processed 12022 grants
  - Total funding: $7,348,114,730

Loading NSF grants for 2024...
  - Found 11687 JSON files


Processing 2024: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 11687/11687 [00:01<00:00, 5996.80it/s]


  - Successfully processed 11687 grants
  - Total funding: $6,385,263,306

Loading NSF grants for 2025...
  - Found 9249 JSON files


Processing 2025: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 9249/9249 [00:01<00:00, 5752.11it/s]


  - Successfully processed 9249 grants
  - Total funding: $4,772,095,689



## Step 2: Examine data

Investigate grants with no award amount or no Directorate.

In [4]:
# Examine the structure of one dataframe
print("Sample dataframe structure (2015):")
print(f"Columns: {list(dataframes_json_data[2015].columns)}")
print("\nData types:")
print(dataframes_json_data[2015].dtypes)
print("\nSample organizational directory name:")
print(dataframes_json_data[2015]['org_dir_long_name'].iloc[0])

Sample dataframe structure (2015):
Columns: ['awd_amount', 'org_dir_long_name', 'file_path']

Data types:
awd_amount           float64
org_dir_long_name     object
file_path             object
dtype: object

Sample organizational directory name:
Office Of The Director


In [5]:
dataframes_json_data[2015].head()

Unnamed: 0,awd_amount,org_dir_long_name,file_path
0,70.0,Office Of The Director,./data/nsf/2015/1514971.json
1,47267.0,Directorate for STEM Education,./data/nsf/2015/1504588.json
2,5070.0,Office Of The Director,./data/nsf/2015/1515399.json
3,432000.0,Directorate for Computer and Information Science and Engineering,./data/nsf/2015/1544396.json
4,19702.0,Directorate for Biological Sciences,./data/nsf/2015/1501620.json


In [6]:
# Filter for rows where awd_amount is 0 or null/empty
grants_with_no_value_2025 = dataframes_json_data[2025][
    (dataframes_json_data[2025]['awd_amount'] == 0) | 
    (dataframes_json_data[2025]['awd_amount'].isna())
]

print(f"Found {len(grants_with_no_value_2025)} grants with zero or empty award amounts in 2025:")

Found 9 grants with zero or empty award amounts in 2025:


In [7]:
grants_with_no_value_2025

Unnamed: 0,awd_amount,org_dir_long_name,file_path
228,0.0,"Directorate for Social, Behavioral and Economic Sciences",./data/nsf/2025/2446802.json
1088,0.0,Directorate for Computer and Information Science and Engineering,./data/nsf/2025/2443948.json
1223,0.0,"Directorate for Technology, Innovation, and Partnerships",./data/nsf/2025/2514924.json
1666,0.0,"Directorate for Technology, Innovation, and Partnerships",./data/nsf/2025/2531191.json
2294,0.0,"Directorate for Technology, Innovation, and Partnerships",./data/nsf/2025/2513146.json
5992,0.0,"Directorate for Technology, Innovation, and Partnerships",./data/nsf/2025/2501581.json
6937,0.0,"Directorate for Social, Behavioral and Economic Sciences",./data/nsf/2025/2452034.json
8141,0.0,Directorate for Computer and Information Science and Engineering,./data/nsf/2025/2434764.json
8541,0.0,Directorate for Biological Sciences,./data/nsf/2025/2434499.json


In [8]:
# Filter for rows where awd_amount is 0 or null/empty
grants_with_no_value_2015 = dataframes_json_data[2015][
    (dataframes_json_data[2015]['awd_amount'] == 0) | 
    (dataframes_json_data[2015]['awd_amount'].isna())
]

print(f"Found {len(grants_with_no_value_2015)} grants with zero or empty award amounts in 2015:")

Found 10 grants with zero or empty award amounts in 2015:


In [9]:
grants_with_no_value_2015

Unnamed: 0,awd_amount,org_dir_long_name,file_path
3101,0.0,"Directorate for Technology, Innovation, and Partnerships",./data/nsf/2015/1521151.json
4347,0.0,Directorate for Engineering,./data/nsf/2015/1540003.json
4373,0.0,Directorate for Mathematical and Physical Sciences,./data/nsf/2015/1546092.json
9363,0.0,"Directorate for Technology, Innovation, and Partnerships",./data/nsf/2015/1462233.json
9915,0.0,Directorate for Mathematical and Physical Sciences,./data/nsf/2015/1508593.json
10255,0.0,Office Of The Director,./data/nsf/2015/1505270.json
10509,0.0,"Directorate for Social, Behavioral and Economic Sciences",./data/nsf/2015/1521719.json
10523,0.0,"Directorate for Technology, Innovation, and Partnerships",./data/nsf/2015/1447642.json
10781,0.0,Directorate for Mathematical and Physical Sciences,./data/nsf/2015/1515987.json
12140,0.0,"Directorate for Technology, Innovation, and Partnerships",./data/nsf/2015/1520607.json


In [10]:
# Filter for rows where the Directorate name is empty or null
grants_with_no_directorate_2025 = dataframes_json_data[2025][
    (dataframes_json_data[2025]['org_dir_long_name'] == '') | 
    (dataframes_json_data[2025]['org_dir_long_name'].isna())
]

print(f"Found {len(grants_with_no_directorate_2025)} grants with zero or empty award amounts in 2025:")

Found 0 grants with zero or empty award amounts in 2025:


In [12]:
# Filter for rows where the Directorate name is empty or null
grants_with_no_directorate_2015 = dataframes_json_data[2015][
    (dataframes_json_data[2015]['org_dir_long_name'] == '') | 
    (dataframes_json_data[2015]['org_dir_long_name'].isna())
]

print(f"Found {len(grants_with_no_directorate_2015)} grants with zero or empty award amounts in 2015:")

Found 3 grants with zero or empty award amounts in 2015:


In [13]:
grants_with_no_directorate_2015

Unnamed: 0,awd_amount,org_dir_long_name,file_path
612,3900.0,,./data/nsf/2015/1542281.json
10179,18512.0,,./data/nsf/2015/1503577.json
12679,15000.0,,./data/nsf/2015/1542282.json


In [11]:
# Count total rows with empty or null org_dir_long_name across all years
print("Counting grants with empty or null org_dir_long_name across all years:")
print("=" * 60)

total_empty_directorate_grants = 0
year_breakdown = {}

for year in sorted(dataframes_json_data.keys()):
    df = dataframes_json_data[year]
    empty_directorate_count = len(df[
        (df['org_dir_long_name'] == '') | 
        (df['org_dir_long_name'].isna())
    ])
    
    year_breakdown[year] = empty_directorate_count
    total_empty_directorate_grants += empty_directorate_count
    
    print(f"{year}: {empty_directorate_count:,} grants with empty/null directorate")

print("=" * 60)
print(f"TOTAL across all years: {total_empty_directorate_grants:,} grants with empty/null org_dir_long_name")

# Calculate total grants across all years for context
total_grants_all_years = sum(len(df) for df in dataframes_json_data.values())
percentage_empty = (total_empty_directorate_grants / total_grants_all_years) * 100

print(f"Total grants across all years: {total_grants_all_years:,}")
print(f"Percentage with empty/null directorate: {percentage_empty:.2f}%")

Counting grants with empty or null org_dir_long_name across all years:
2015: 3 grants with empty/null directorate
2016: 4 grants with empty/null directorate
2017: 4 grants with empty/null directorate
2018: 3 grants with empty/null directorate
2019: 2 grants with empty/null directorate
2020: 4 grants with empty/null directorate
2021: 7 grants with empty/null directorate
2022: 12 grants with empty/null directorate
2023: 2 grants with empty/null directorate
2024: 0 grants with empty/null directorate
2025: 0 grants with empty/null directorate
TOTAL across all years: 41 grants with empty/null org_dir_long_name
Total grants across all years: 132,885
Percentage with empty/null directorate: 0.03%


## Step 3: Data Preprocessing

In [13]:
def preprocess_nsf_dataframe(df, year):
    """
    Clean and preprocess the NSF dataframe:
    1. Remove grants with zero award amount
    2. Remove grants with no organizational directory
    3. Validate data types
    """
    print(f"Preprocessing {year} NSF data...")
    
    # Make a copy to avoid modifying original
    df_clean = df.copy()
    
    initial_rows = len(df_clean)
    
    # Remove grants with zero or null award amount
    df_clean = df_clean[df_clean['awd_amount'].notna()]
    df_clean = df_clean[df_clean['awd_amount'] > 0]
    
    # Remove grants with no organizational directory
    df_clean = df_clean[df_clean['org_dir_long_name'].notna()]
    df_clean = df_clean[df_clean['org_dir_long_name'].str.strip() != '']
    
    final_rows = len(df_clean)
    rows_removed = initial_rows - final_rows
    
    print(f"  - Initial grants: {initial_rows}")
    print(f"  - Final grants: {final_rows}")
    print(f"  - Grants removed: {rows_removed}")
    print(f"  - Total funding: ${df_clean['awd_amount'].sum():,.0f}")
    
    return df_clean

In [14]:
# Preprocess all dataframes
clean_dataframes = {}
for year, df in dataframes_json_data.items():
    clean_dataframes[year] = preprocess_nsf_dataframe(df, year)
    print()

print("NSF data preprocessing completed!")

Preprocessing 2015 NSF data...
  - Initial grants: 12848
  - Final grants: 12835
  - Grants removed: 13
  - Total funding: $6,056,359,494

Preprocessing 2016 NSF data...
  - Initial grants: 12787
  - Final grants: 12764
  - Grants removed: 23
  - Total funding: $7,739,362,894

Preprocessing 2017 NSF data...
  - Initial grants: 12309
  - Final grants: 12291
  - Grants removed: 18
  - Total funding: $6,847,511,550

Preprocessing 2018 NSF data...
  - Initial grants: 12684
  - Final grants: 12653
  - Grants removed: 31
  - Total funding: $9,045,038,378

Preprocessing 2019 NSF data...
  - Initial grants: 12180
  - Final grants: 12160
  - Grants removed: 20
  - Total funding: $7,119,985,058

Preprocessing 2020 NSF data...
  - Initial grants: 13041
  - Final grants: 13005
  - Grants removed: 36
  - Total funding: $7,502,202,533

Preprocessing 2021 NSF data...
  - Initial grants: 12166
  - Final grants: 12131
  - Grants removed: 35
  - Total funding: $8,193,745,106

Preprocessing 2022 NSF data

## Step 3: Organizational Directory Extraction and Normalization

In [15]:
def extract_org_directories_from_dataframe(df, year):
    """
    Extract all unique organizational directory names from a dataframe.
    Organizational directories are single string values in org_dir_long_name column.
    """
    print(f"Extracting organizational directories from {year} data...")
    
    all_org_directories = set()
    
    for org_dir_name in df['org_dir_long_name']:
        if org_dir_name and org_dir_name.strip():
            # Normalize organizational directory name
            normalized_name = org_dir_name.strip()
            all_org_directories.add(normalized_name)
    
    print(f"  - Found {len(all_org_directories)} unique organizational directories")
    return all_org_directories

In [16]:
# Extract organizational directories from each year
yearly_org_directories = {}
for year, df in clean_dataframes.items():
    yearly_org_directories[year] = extract_org_directories_from_dataframe(df, year)

Extracting organizational directories from 2015 data...
  - Found 12 unique organizational directories
Extracting organizational directories from 2016 data...
  - Found 12 unique organizational directories
Extracting organizational directories from 2017 data...
  - Found 11 unique organizational directories
Extracting organizational directories from 2018 data...
  - Found 13 unique organizational directories
Extracting organizational directories from 2019 data...
  - Found 14 unique organizational directories
Extracting organizational directories from 2020 data...
  - Found 11 unique organizational directories
Extracting organizational directories from 2021 data...
  - Found 12 unique organizational directories
Extracting organizational directories from 2022 data...
  - Found 12 unique organizational directories
Extracting organizational directories from 2023 data...
  - Found 9 unique organizational directories
Extracting organizational directories from 2024 data...
  - Found 9 unique

In [17]:
# Create master list of all unique organizational directories
all_unique_org_directories = set()
for directories in yearly_org_directories.values():
    all_unique_org_directories.update(directories)

In [18]:
print(f"\nTotal unique organizational directories across all years: {len(all_unique_org_directories)}")


Total unique organizational directories across all years: 14


In [19]:
# Display the list of organizational directorates
print("\nAll organizational directorates:")
for directory in all_unique_org_directories:
    print(f"  - {directory}")


All organizational directorates:
  - Directorate for Engineering
  - Directorate for Biological Sciences
  - Directorate for Technology, Innovation, and Partnerships
  - National Nanotechnology Coordinating Office
  - Office of Information & Resource Management
  - National Coordination Office
  - Office Of The Director
  - Directorate for STEM Education
  - Directorate for Mathematical and Physical Sciences
  - Directorate for Computer and Information Science and Engineering
  - Directorate for Social, Behavioral and Economic Sciences
  - Directorate for Geosciences
  - Office of the Chief Information Officer
  - Office of Budget, Finance, & Award Management


## Step 4: Calculate Annual Organizational Directory Metrics

In [20]:
def calculate_org_directory_metrics_for_year(df, year, all_org_directories):
    """
    For each organizational directory, calculate:
    - Number of grants belonging to the organizational directory
    - Total funding for grants belonging to the organizational directory
    """
    print(f"Calculating metrics for {year}...")
    
    directory_metrics = {}
    total_grants = len(df)
    total_funding = df['awd_amount'].sum()
    
    print(f"  - Processing {len(all_org_directories)} organizational directories for {total_grants} grants")
    
    # Use tqdm for progress bar on large directory sets
    for directory in tqdm(all_org_directories, desc=f"Processing {year}"):
        # Find grants that belong to this organizational directory
        mask = df['org_dir_long_name'] == directory
        grants_with_directory = df[mask]
        
        grant_count = len(grants_with_directory)
        funding_total = grants_with_directory['awd_amount'].sum() if grant_count > 0 else 0
        
        directory_metrics[directory] = {
            'grant_count': grant_count,
            'funding_total': funding_total,
            'grant_percentage': grant_count / total_grants,
            'funding_proportion': funding_total / total_funding
        }
    
    return directory_metrics, total_grants, total_funding

In [21]:
# Calculate metrics for each year
annual_metrics = {}
annual_totals = {}

for year, df in clean_dataframes.items():
    metrics, total_grants, total_funding = calculate_org_directory_metrics_for_year(
        df, year, all_unique_org_directories
    )
    annual_metrics[year] = metrics
    annual_totals[year] = {
        'total_grants': total_grants,
        'total_funding': total_funding
    }
    print(f"  - Completed {year}: {total_grants} grants, ${total_funding:,.0f} total funding\n")

print("Annual metrics calculation completed!")

Calculating metrics for 2015...
  - Processing 14 organizational directories for 12835 grants


Processing 2015: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 14/14 [00:00<00:00, 586.31it/s]


  - Completed 2015: 12835 grants, $6,056,359,494 total funding

Calculating metrics for 2016...
  - Processing 14 organizational directories for 12764 grants


Processing 2016: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 14/14 [00:00<00:00, 873.70it/s]


  - Completed 2016: 12764 grants, $7,739,362,894 total funding

Calculating metrics for 2017...
  - Processing 14 organizational directories for 12291 grants


Processing 2017: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 14/14 [00:00<00:00, 1591.12it/s]


  - Completed 2017: 12291 grants, $6,847,511,550 total funding

Calculating metrics for 2018...
  - Processing 14 organizational directories for 12653 grants


Processing 2018: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 14/14 [00:00<00:00, 1638.95it/s]


  - Completed 2018: 12653 grants, $9,045,038,378 total funding

Calculating metrics for 2019...
  - Processing 14 organizational directories for 12160 grants


Processing 2019: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 14/14 [00:00<00:00, 1749.76it/s]


  - Completed 2019: 12160 grants, $7,119,985,058 total funding

Calculating metrics for 2020...
  - Processing 14 organizational directories for 13005 grants


Processing 2020: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 14/14 [00:00<00:00, 1686.10it/s]


  - Completed 2020: 13005 grants, $7,502,202,533 total funding

Calculating metrics for 2021...
  - Processing 14 organizational directories for 12131 grants


Processing 2021: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 14/14 [00:00<00:00, 1797.16it/s]


  - Completed 2021: 12131 grants, $8,193,745,106 total funding

Calculating metrics for 2022...
  - Processing 14 organizational directories for 11870 grants


Processing 2022: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 14/14 [00:00<00:00, 697.25it/s]


  - Completed 2022: 11870 grants, $7,268,924,136 total funding

Calculating metrics for 2023...
  - Processing 14 organizational directories for 11998 grants


Processing 2023: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 14/14 [00:00<00:00, 945.50it/s]


  - Completed 2023: 11998 grants, $7,348,106,730 total funding

Calculating metrics for 2024...
  - Processing 14 organizational directories for 11662 grants


Processing 2024: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 14/14 [00:00<00:00, 1112.59it/s]


  - Completed 2024: 11662 grants, $6,385,263,306 total funding

Calculating metrics for 2025...
  - Processing 14 organizational directories for 9240 grants


Processing 2025: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 14/14 [00:00<00:00, 2473.89it/s]

  - Completed 2025: 9240 grants, $4,772,095,689 total funding

Annual metrics calculation completed!





In [29]:
# Save annual metrics and totals as CSV files
# Ensure export directory exists
os.makedirs('export/nsf', exist_ok=True)

# Convert annual_metrics to DataFrame for CSV export
metrics_data = []
for year, year_metrics in annual_metrics.items():
    for directory, metrics in year_metrics.items():
        metrics_data.append({
            'year': year,
            'org_directory': directory,
            'grant_count': metrics['grant_count'],
            'funding_total': metrics['funding_total'],
            'grant_percentage': metrics['grant_percentage'],
            'funding_proportion': metrics['funding_proportion']
        })

metrics_df = pd.DataFrame(metrics_data)
metrics_df.to_csv('export/nsf/annual_metrics.csv', index=False)

# Convert annual_totals to DataFrame for CSV export
totals_data = []
for year, totals in annual_totals.items():
    totals_data.append({
        'year': year,
        'total_grants': totals['total_grants'],
        'total_funding': totals['total_funding']
    })

totals_df = pd.DataFrame(totals_data)
totals_df.to_csv('export/nsf/annual_totals.csv', index=False)

print("Saved annual metrics and totals to CSV files:")
print("  - export/nsf/annual_metrics.csv")
print("  - export/nsf/annual_totals.csv")

Saved annual metrics and totals to CSV files:
  - export/nsf/annual_metrics.csv
  - export/nsf/annual_totals.csv


## Step 5: Compute Proportional Comparisons

In [23]:
# Calculate 2015-2024 average metrics
print("Calculating 2015-2024 average metrics...")

baseline_years = [2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023, 2024]
average_metrics = {}

print("Average period (2015-2024):")
for year in baseline_years:
    print(f"  - {year}: {annual_totals[year]['total_grants']:,} grants, ${annual_totals[year]['total_funding']:,.0f} funding")

Calculating 2015-2024 average metrics...
Average period (2015-2024):
  - 2015: 12,835 grants, $6,056,359,494 funding
  - 2016: 12,764 grants, $7,739,362,894 funding
  - 2017: 12,291 grants, $6,847,511,550 funding
  - 2018: 12,653 grants, $9,045,038,378 funding
  - 2019: 12,160 grants, $7,119,985,058 funding
  - 2020: 13,005 grants, $7,502,202,533 funding
  - 2021: 12,131 grants, $8,193,745,106 funding
  - 2022: 11,870 grants, $7,268,924,136 funding
  - 2023: 11,998 grants, $7,348,106,730 funding
  - 2024: 11,662 grants, $6,385,263,306 funding


In [24]:
# Calculate average metrics for each organizational directory across 2015-2024
for directory in tqdm(all_unique_org_directories, desc="Calculating average metrics"):
    # Get proportions and percentages for each year
    yearly_funding_props = [annual_metrics[year][directory]['funding_proportion'] for year in baseline_years]
    yearly_grant_pcts = [annual_metrics[year][directory]['grant_percentage'] for year in baseline_years]
    
    # Calculate averages
    avg_funding_proportion = sum(yearly_funding_props) / len(baseline_years)
    avg_grant_percentage = sum(yearly_grant_pcts) / len(baseline_years)
    
    average_metrics[directory] = {
        'funding_proportion': avg_funding_proportion,
        'grant_percentage': avg_grant_percentage
    }

print("\n2025 metrics:")
print(f"  - Total grants: {annual_totals[2025]['total_grants']:,}")
print(f"  - Total funding: ${annual_totals[2025]['total_funding']:,.0f}")

print("\nAverage calculations completed!")

Calculating average metrics: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 14/14 [00:00<00:00, 116740.07it/s]


2025 metrics:
  - Total grants: 9,240
  - Total funding: $4,772,095,689

Average calculations completed!





In [25]:
# Display sample average metrics
average_metrics_sample = dict(list(average_metrics.items())[:5])
print("\nSample average metrics (2015-2024):")
for element, metrics in average_metrics_sample.items():
    print(f"  - {element}: {metrics}")


Sample average metrics (2015-2024):
  - Directorate for Engineering: {'funding_proportion': np.float64(0.10720528619704965), 'grant_percentage': 0.14296019794794557}
  - Directorate for Biological Sciences: {'funding_proportion': np.float64(0.11145019620178627), 'grant_percentage': 0.09983362870966825}
  - Directorate for Technology, Innovation, and Partnerships: {'funding_proportion': np.float64(0.05232635832900551), 'grant_percentage': 0.06783494509102812}
  - National Nanotechnology Coordinating Office: {'funding_proportion': np.float64(2.6588267457741074e-05), 'grant_percentage': 2.3918144207156267e-05}
  - Office of Information & Resource Management: {'funding_proportion': np.float64(0.0019923444067739956), 'grant_percentage': 0.001121956481707278}


## Step 6: Compile Final Results

In [26]:
# Create final comparison dataframe
print("Compiling final results...")

results_data = []

for directory in all_unique_org_directories:
    # Get 2025 metrics
    metrics_2025 = annual_metrics[2025][directory]
    metrics_average = average_metrics[directory]
    
    # Calculate changes
    funding_prop_change = metrics_2025['funding_proportion'] - metrics_average['funding_proportion']
    funding_relative_change = (funding_prop_change / metrics_average['funding_proportion'] * 100) if metrics_average['funding_proportion'] != 0 else np.nan
    grant_pct_change = metrics_2025['grant_percentage'] - metrics_average['grant_percentage']
    grant_relative_change = (grant_pct_change / metrics_average['grant_percentage'] * 100) if metrics_average['grant_percentage'] != 0 else np.nan
    
    results_data.append({
        'org_directory': directory,
        'funding_proportion_2025': metrics_2025['funding_proportion'],
        'funding_proportion_2015_2024_avg': metrics_average['funding_proportion'],
        'funding_relative_change': funding_relative_change,
        'funding_proportion_change': funding_prop_change,
        'grant_percentage_2025': metrics_2025['grant_percentage'],
        'grant_percentage_2015_2024_avg': metrics_average['grant_percentage'],
        'grant_percentage_change': grant_pct_change,
        'grant_relative_change': grant_relative_change,
        'grants_2025': metrics_2025['grant_count'],
        'funding_2025': metrics_2025['funding_total']
    })

# Create DataFrame and sort by funding impact
results_df = pd.DataFrame(results_data)

print(f"Results compiled for {len(results_df)} organizational directories")
print(f"Results dataframe shape: {results_df.shape}")

Compiling final results...
Results compiled for 14 organizational directories
Results dataframe shape: (14, 11)


In [27]:
# Display summary statistics
print("=== SUMMARY STATISTICS ===")
print(f"\nTotal unique organizational directories analyzed: {len(results_df):,}")
print("\n2025 vs 2015-2024 Average Comparison:")
print(f"  - 2025 total grants: {annual_totals[2025]['total_grants']:,}")
print(f"  - 2025 total funding: ${annual_totals[2025]['total_funding']:,.0f}")
print("\n2015-2024 Annual Averages:")
avg_grants = sum(annual_totals[year]['total_grants'] for year in baseline_years) / len(baseline_years)
avg_funding = sum(annual_totals[year]['total_funding'] for year in baseline_years) / len(baseline_years)
print(f"  - Average grants per year: {avg_grants:,.0f}")
print(f"  - Average funding per year: ${avg_funding:,.0f}")

print("\nOrganizational directories with highest funding in 2025:")
top_2025_funding = results_df.nlargest(10, 'funding_proportion_2025')[['org_directory', 'funding_proportion_2025', 'grant_percentage_2025']]
print(top_2025_funding.to_string(index=False))

print("\nOrganizational directories with largest funding proportion increases:")
top_increases = results_df.nlargest(10, 'funding_proportion_change')[['org_directory', 'funding_proportion_change', 'funding_proportion_2025', 'funding_proportion_2015_2024_avg']]
print(top_increases.to_string(index=False))

print("\nOrganizational directories with largest funding proportion decreases:")
top_decreases = results_df.nsmallest(10, 'funding_proportion_change')[['org_directory', 'funding_proportion_change', 'funding_proportion_2025', 'funding_proportion_2015_2024_avg']]
print(top_decreases.to_string(index=False))

=== SUMMARY STATISTICS ===

Total unique organizational directories analyzed: 14

2025 vs 2015-2024 Average Comparison:
  - 2025 total grants: 9,240
  - 2025 total funding: $4,772,095,689

2015-2024 Annual Averages:
  - Average grants per year: 12,337
  - Average funding per year: $7,350,649,918

Organizational directories with highest funding in 2025:
                                                   org_directory  funding_proportion_2025  grant_percentage_2025
              Directorate for Mathematical and Physical Sciences                 0.179450               0.234957
                                  Directorate for STEM Education                 0.152580               0.084740
Directorate for Computer and Information Science and Engineering                 0.146243               0.159307
                                     Directorate for Geosciences                 0.135573               0.109199
                                     Directorate for Engineering                

In [28]:
# Save results to CSV files
print("Saving results to CSV files...")

# Save complete results
results_df.to_csv('export/nsf/nsf_org_directory_analysis_complete.csv', index=False)
print(f"  - Complete results saved: nsf_org_directory_analysis_complete.csv ({len(results_df)} rows)")

print("\nNSF Organizational Directory Analysis complete! ðŸŽ‰")

Saving results to CSV files...
  - Complete results saved: nsf_org_directory_analysis_complete.csv (14 rows)

NSF Organizational Directory Analysis complete! ðŸŽ‰
