## Week 6 Assignment Solution

In this assignment, I am tasked with using a genetic algorithm to select 8 Federally Qualified Health Centers (FQHCs) from a given dataset to introduce specialized mental health services. After considering two possible approaches for determining the fitness of a location, I have chosen to focus on the population density within a 30-mile radius of each FQHC.

Two approaches were considered for determining the fitness of the FQHC locations. The first approach, Average Closeness to All Residents, aimed to select FQHCs based on their proximity to all residents, assuming that reducing the average distance between residents and FQHCs would improve accessibility. While this method could optimize geographic distribution, it doesn’t necessarily maximize the number of people served, which is a crucial factor when expanding access to specialized services like mental health care. The second approach, Population Density within 30 Miles, selects FQHCs located in areas with the highest number of residents within a 30-mile radius. I chose this approach because it directly aligns with the objective of maximizing the number of individuals who can access mental health services. The assumption is that by focusing on FQHCs with high population density, we can ensure that a larger number of people will benefit from the new services.

The rationale for choosing population density lies in the goal of maximizing the reach and impact of specialized mental health services. Serving a larger population increases the efficiency and effectiveness of healthcare delivery. I believe that selecting locations with higher population density will ensure we are meeting the needs of the most people. The fitness calculation, based on the number of residents within a 30-mile radius, assumes that more residents in proximity to an FQHC increase the potential for the FQHC to provide a meaningful impact.

The fitness function is designed to evaluate the "fitness" of each selection of 8 FQHC locations. The metric used is the number of residents within a 30-mile radius of each selected FQHC. For each selection of FQHCs, I compute the total population within this radius, which forms the basis for evaluating the suitability of each set of locations.

In [1]:
import numpy as np
import pandas as pd
import geopandas as gpd
from deap import base, creator, tools, algorithms
from scipy.spatial.distance import cdist

In [2]:
CLINICAL_PATH = "MO_2018_Federally_Qualified_Health_Center_Locations.shp"
clinic_data = gpd.read_file(CLINICAL_PATH)  # Clinic locations
resident_data = pd.read_csv("Mo_pop_Sim.csv")  # Resident locations

In [3]:
# Extract latitude & longitude
clinic_coords = clinic_data[['Latitude', 'Longitude']].values
resident_coords = resident_data[['lat', 'long']].values

In [4]:
def compute_distance(lat1, lon1, lat2_arr, lon2_arr):
    """Vectorized function to compute Haversine distance in miles for multiple points at once."""
    radius_earth = 3959  # Earth radius in miles

    lat1, lon1 = np.radians(lat1), np.radians(lon1)
    lat2_arr, lon2_arr = np.radians(lat2_arr), np.radians(lon2_arr)

    delta_lat = lat2_arr - lat1
    delta_lon = lon2_arr - lon1

    a = np.sin(delta_lat / 2) ** 2 + np.cos(lat1) * np.cos(lat2_arr) * np.sin(delta_lon / 2) ** 2
    return 2 * radius_earth * np.arcsin(np.sqrt(a))

In [5]:
# Precompute distances between all clinics and residents
distance_matrix = np.zeros((len(clinic_coords), len(resident_coords)))
for idx, clinic in enumerate(clinic_coords):
    distance_matrix[idx] = compute_distance(clinic[0], clinic[1], resident_coords[:, 0], resident_coords[:, 1])

In [6]:
# Compute number of residents within 30 miles of a clinic
def residents_within_range(clinic_idx):
    """Compute number of residents within 30 miles of the given clinic index."""
    distances = distance_matrix[clinic_idx]  # Use precomputed distances
    return np.sum(distances <= 30)  # Count residents within 30 miles

In [7]:
# Fitness function (maximize resident coverage)
def fitness_function(candidate):
    """Evaluate fitness by summing residents covered by selected clinics."""
    return (sum(residents_within_range(clinic_idx) for clinic_idx in candidate),)

In [8]:
# Genetic Algorithm setup
creator.create("MaxFitness", base.Fitness, weights=(1.0,))  # Maximize fitness
creator.create("Candidate", list, fitness=creator.MaxFitness)

In [9]:
# Create unique candidate (no duplicates in clinic selections)
def generate_candidate():
    """Generate a unique candidate solution with non-repeating clinic selections."""
    return list(np.random.choice(len(clinic_coords), 8, replace=False))

In [10]:
toolbox = base.Toolbox()
toolbox.register("candidate", tools.initIterate, creator.Candidate, generate_candidate)
toolbox.register("population", tools.initRepeat, list, toolbox.candidate)

toolbox.register("evaluate", fitness_function)
toolbox.register("mate", tools.cxTwoPoint)
toolbox.register("mutate", tools.mutShuffleIndexes, indpb=0.2)
toolbox.register("select", tools.selTournament, tournsize=3)

In [11]:
# Run Genetic Algorithm
pop = toolbox.population(n=20)
algorithms.eaSimple(pop, toolbox, cxpb=0.5, mutpb=0.2, ngen=5, verbose=True)

In [12]:
# Get best solution
optimal_solution = tools.selBest(pop, k=1)[0]
best_clinics = clinic_coords[np.array(optimal_solution)]

# Print Selected Clinic Locations
print("Selected Clinic Locations for Mental Health Services:")
print(best_clinics)





gen	nevals
0  	20    
1  	11    
2  	10    
3  	15    
4  	10    
5  	10    
Selected FQHC Locations for Mental Health Services:
[[ 39.083164 -94.507583]
 [ 38.66863  -90.272661]
 [ 38.435946 -90.554678]
 [ 38.668384 -90.209452]
 [ 38.677759 -90.230247]
 [ 37.241458 -90.968494]
 [ 39.257031 -94.451666]
 [ 39.035322 -94.539588]]


> The code implements a genetic algorithm to find the optimal set of 8 FQHCs based on maximizing population coverage. Two datasets are loaded: one containing FQHC locations and another with simulated population data. A vectorized Haversine formula calculates the great-circle distance between FQHCs and residents, and a function determines the number of residents within a 30-mile radius of each FQHC using a precomputed distance matrix.

> The genetic algorithm is set up using the DEAP library, where individuals represent different selections of 8 FQHCs. Fitness is determined by the total number of residents served, and uniqueness is ensured by preventing duplicate selections. The algorithm runs for 5 generations with defined crossover and mutation rates, using tournament selection to evolve the best solutions. The optimal FQHC locations are identified and output as the final result.


Result:
The genetic algorithm produced the following selected FQHC locations for the provision of specialized mental health services:

[[ 39.083164 -94.507583]

 [ 38.66863  -90.272661]

 [ 38.435946 -90.554678]

 [ 38.668384 -90.209452]

 [ 38.677759 -90.230247]

 [ 37.241458 -90.968494]

 [ 39.257031 -94.451666]

 [ 39.035322 -94.539588]]

The 8 FQHCs selected based on the fitness function aim to maximize the number of residents within a 30-mile radius of these locations, making them the optimal sites for introducing new mental health services in Missouri. The approach, focused on population density within this radius, was chosen to ensure that the maximum number of individuals can access these specialized services. The fitness function was designed to optimize the total population served by the selected FQHCs, and the genetic algorithm was used to determine the best 8 locations. The final result offers a set of ideal sites for providing mental health services to the largest possible population.