# USDA Plant Hardiness Zone Forecasting: Current Data & Future Projections

**Hardiness zones** are geographic areas defined by their average annual minimum temperature. The United States Department of Agriculture (USDA)'s Plant Hardiness Zone Map is crucial to many farmers and gardeners on deciding which types of plants to grow. However, hardiness zones change with climate change and global warming, and the current hardiness zones may not be what they are in 20 years.

This notebook takes the [current hardiness zone data published by USDA](https://prism.oregonstate.edu/projects/plant_hardiness_zones.php) and predicts the hardiness zones in Southeastern United States in early (2010–2039), mid
(2040–2069), and late century (2070–2099) according to USDA's research report on [climate change's projected effects on regional temperature](https://www.fs.usda.gov/nrs/pubs/rmap/rmap_nrs9.pdf).

We plan to overlay this forecasted data with regional maps and native plant datasets, so gardeners and city planners can easily see the hardiness zone changes in the century, and choose to plant from the list of native plants that will still survive in the hardiness zone in future years.

#1. Prepare data

In [542]:
import pandas as pd
import os

# Mount file to drive
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [543]:
def prepare_data():
  # Read in hardiness zone data
  df = pd.read_csv('/content/drive/My Drive/hardiness-zone-data/phzm_us_zipcode_2023.csv')

  if 'processed-data' not in os.listdir('/content/drive/My Drive/hardiness-zone-data/'):
    os.mkdir('/content/drive/My Drive/hardiness-zone-data/processed-data')

  # Extract the temperature range numbers
  # clean up the dataframe

  df = df.drop(columns=['zonetitle'])
  df[['t_low', 't_high']] = df['trange'].str.split(' to ', expand = True).astype(int)

  return df

In [544]:
df = prepare_data()
df.head()

Unnamed: 0,zipcode,zone,trange,t_low,t_high
0,501,7b,5 to 10,5,10
1,544,7b,5 to 10,5,10
2,1001,6b,-5 to 0,-5,0
3,1002,6a,-10 to -5,-10,-5
4,1003,6a,-10 to -5,-10,-5


# 2. Extract southeast region data

For the scale of this project, we decided to focus on the Southeast region, becuase cities like Miami are the most affected by dying trees: citizens will lose the shade that allow them to stroll around the town without being in a car!

In [545]:
def extract_zipcodes():
  # read in USPS zipcode database
  zipcodes = pd.read_csv('/content/drive/My Drive/hardiness-zone-data/zip_code_database.csv')

  # Southeastern States: North Carolina, South Carolina, Tennessee, Mississippi, Alabama, Georgia, Florida
  # State abbreviations: NC, SC, TN, MS, AL, GA, FL
  # Extract southeatern US zipcodes
  southeastern_zipcodes = zipcodes[zipcodes['state'].isin(['NC', 'SC', 'TN', 'MS', 'AL', 'GA', 'FL'])]['zip']

  return southeastern_zipcodes

In [548]:
def extract_original_zones(df):
  df.drop(columns=['trange', 't_low', 't_high'])
  df.to_csv('/content/drive/My Drive/hardiness-zone-data/original_zones.csv', index=False)

In [547]:
southeastern_zipcodes = extract_zipcodes()

# Keep only southeastern states in the dataframe
southeastern_df = df[df['zipcode'].isin(southeastern_zipcodes)]

print(len(southeastern_df))
southeastern_df.head()

extract_original_zones(southeastern_df)

6091


# 3. Forecast temperature change -> hardiness zones

In [523]:
import numpy as np
import random

In [524]:
temp_changes_low = [
    ['temp_change', 'early_century','mid_century','late_century'],
    [0,0.465,0,0],
    [1,0.532, 0.043, 0],
    [2,0.003,0.777, 0.108],
    [3,0, 0.18, 0.555],
    [4,0,0,0.307],
    [5,0,0,0.031]
]

temp_changes_low_df = pd.DataFrame(temp_changes_low[1:], columns=temp_changes_low[0])

temp_changes_high = [
    ['temp_change', 'early_century','mid_century','late_century'],
    [0, 0.403, 0.009, 0.001],
    [1, 0.428, 0.328, 0.168],
    [2, 0.136, 0.434, 0.427],
    [3, 0.033, 0.162, 0.199],
    [4, 0, 0.058, 0.112],
    [5, 0, 0.009, 0.069],
    [6, 0, 0, 0.023],
    [7, 0, 0, 0.01]
]

temp_changes_high_df = pd.DataFrame(temp_changes_high[1:], columns=temp_changes_high[0])
temp_changes_high_df

Unnamed: 0,temp_change,early_century,mid_century,late_century
0,0,0.403,0.009,0.001
1,1,0.428,0.328,0.168
2,2,0.136,0.434,0.427
3,3,0.033,0.162,0.199
4,4,0.0,0.058,0.112
5,5,0.0,0.009,0.069
6,6,0.0,0.0,0.023
7,7,0.0,0.0,0.01


In [525]:
def sample_zipcodes(southeastern_zipcodes, time_period, mode):
  if mode == 'low':
    proportions = list(temp_changes_low_df[time_period])
  elif mode == 'high':
    proportions = list(temp_changes_high_df[time_period])

  sampled_zipcodes = []

  for prop in proportions:
    sampled_zipcodes.append(southeastern_zipcodes.sample(frac=prop, random_state = 14, replace=False))

  return sampled_zipcodes

In [526]:
def update_temperatures(df, sampled_zipcodes):
  df['new_t_low'] = df['t_low']
  df['new_t_high'] = df['t_high']

  temp_change = 0
  for zipcode_list in sampled_zipcodes:
    # 1 degree celcius increase = 1.8 degree F increase

    df.loc[df['zipcode'].isin(zipcode_list), 'new_t_low'] += temp_change * 1.8
    df.loc[df['zipcode'].isin(zipcode_list), 'new_t_high'] += temp_change * 1.8
    temp_change += 1

  return df

In [527]:
hardiness_zones = [
    (-65,-60), (-60,-55), (-55,-50), (-50,-45), (-45,-40), (-40,-35), (-35,-30), (-30,-25), (-25,-20), (-20,-15),
    (-15,-10), (-10,-5), (-5,0), (0,5), (5,10), (10,15), (15,20), (20,25), (25,30), (30,35), (35,40),
    (40,45), (45,50), (50,55), (55,60), (60,65)
]

hardiness_zone_dict = {
    (-65, -60): "0b", (-60, -55): "1a", (-55, -50): "1b", (-50, -45): "2a", (-45, -40): "2b",
    (-40, -35): "3a", (-35, -30): "3b", (-30, -25): "4a", (-25, -20): "4b", (-20, -15): "5a",
    (-15, -10): "5b", (-10, -5): "6a", (-5, 0): "6b", (0, 5): "7a", (5, 10): "7b",
    (10, 15): "8a", (15, 20): "8b", (20, 25): "9a", (25, 30): "9b", (30, 35): "10a",
    (35, 40): "10b", (40, 45): "11a", (45, 50): "11b", (50, 55): "12a", (55, 60): "12b",
    (60, 65): "13a"
}

In [528]:
def update_hardiness_zones(df):
  df['new_trange'] = 'null'

  for index, row in df.iterrows():
    left_zone = None
    right_zone = None

    for zone in hardiness_zones:
      left_bound, right_bound = zone

      if left_bound <= row['new_t_low'] <= right_bound:
        left_zone = zone

      if left_bound <= row['new_t_high'] <= right_bound:
        right_zone = zone

      if left_zone and right_zone:
        if left_zone == right_zone:
          southeastern_df.at[index, 'new_trange'] = right_zone
          break

    if left_zone and right_zone and left_zone != right_zone:
        left_diff = abs(row['new_t_low'] - left_zone[1])
        right_diff = abs(row['new_t_high'] - right_zone[0])

        if left_diff > right_diff:
          df.at[index, 'new_trange'] = left_zone
        else:
          df.at[index, 'new_trange'] = right_zone

  return df

In [529]:
def map_hardiness_zones(new_zone):
  return hardiness_zone_dict.get(new_zone, None)

#4. Low

In [530]:
def post_process(df):
  df = df.drop(columns=['zone', 'trange', 't_low', 't_high', 'new_t_low', 'new_t_high', 'new_trange'])
  return df

In [535]:
def generate_dataset(southeastern_df, time_period):
  sampled_zipcodes = sample_zipcodes(southeastern_zipcodes,time_period, 'low')
  southeastern_df = update_temperatures(southeastern_df, sampled_zipcodes)
  updated_df = update_hardiness_zones(southeastern_df)
  updated_df['new_zone'] = updated_df['new_trange'].apply(map_hardiness_zones)
  return updated_df

low_early_century = generate_dataset(southeastern_df, 'early_century')
low_early_century = post_process(low_early_century)
low_early_century

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['new_t_low'] = df['t_low']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['new_t_high'] = df['t_high']
  df.loc[df['zipcode'].isin(zipcode_list), 'new_t_low'] += temp_change * 1.8
  df.loc[df['zipcode'].isin(zipcode_list), 'new_t_high'] += temp_change * 1.8
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-vers

Unnamed: 0,zipcode,new_zone
10190,27006,8a
10191,27007,7b
10192,27009,8a
10193,27010,7b
10194,27011,7b
...,...,...
16276,39877,8b
16277,39885,8b
16278,39886,8b
16279,39897,9a


In [None]:
low_early_century.to_csv('/content/drive/My Drive/hardiness-zone-data/low_early_century.csv', index=False)

In [536]:
low_mid_century = generate_dataset(southeastern_df, 'mid_century')
low_mid_century = post_process(low_mid_century)
low_mid_century.to_csv('/content/drive/My Drive/hardiness-zone-data/low_mid_century.csv', index=False)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['new_t_low'] = df['t_low']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['new_t_high'] = df['t_high']
 11.8 11.8 16.8 11.8 11.8 11.8 11.8 11.8 11.8 21.8 16.8 11.8 11.8 11.8
 11.8 11.8 11.8 11.8  6.8 11.8 11.8 11.8 11.8 11.8 11.8 16.8 11.8 16.8
 16.8 16.8 11.8 -3.2  6.8  1.8  6.8  1.8  1.8  1.8  1.8 11.8 16.8 16.8
 16.8 16.8 16.8 16.8 21.8 16.8 16.8 16.8 16.8 16.8 11.8 11.8 11.8 16.8
 16.8 11.8 21.8 16.8 11.8 11.8 11.8 11.8 11.8 11.8 11.8 11.8 11.8 16.8
 11.8 11.8  6.8  6.8 11.8  6.8 11.8 11.8 11.8 11.8 16

In [537]:
low_late_century = generate_dataset(southeastern_df, 'late_century')
low_late_century = post_process(low_late_century)
low_late_century.to_csv('/content/drive/My Drive/hardiness-zone-data/low_late_century.csv', index=False)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['new_t_low'] = df['t_low']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['new_t_high'] = df['t_high']
 13.6 13.6 13.6 13.6 13.6 13.6  8.6 13.6 13.6 13.6 13.6 13.6 13.6 13.6
 13.6 13.6 13.6 18.6 13.6 13.6 13.6 13.6 13.6 13.6 13.6 13.6 13.6 13.6
 13.6 23.6 13.6 18.6 13.6 23.6 23.6 13.6 13.6 13.6 13.6 13.6 13.6 13.6
 13.6 13.6 13.6 13.6 13.6  8.6 13.6 13.6 13.6 13.6 13.6 13.6 13.6 13.6
 13.6 13.6 13.6 13.6 18.6 13.6 13.6 18.6 18.6 13.6 13.6 18.6 13.6 13.6
 18.6 18.6 18.6 18.6 18.6 18.6 13.6 18.6 13.6 18.6 18

# 5. High

In [538]:
def generate_dataset(southeastern_df, time_period):
  sampled_zipcodes = sample_zipcodes(southeastern_zipcodes,time_period, 'high')
  southeastern_df = update_temperatures(southeastern_df, sampled_zipcodes)
  updated_df = update_hardiness_zones(southeastern_df)
  updated_df['new_zone'] = updated_df['new_trange'].apply(map_hardiness_zones)
  return updated_df

In [539]:
high_early_century = generate_dataset(southeastern_df, 'early_century')
high_early_century = post_process(high_early_century)
high_early_century.to_csv('/content/drive/My Drive/hardiness-zone-data/high_early_century.csv', index=False)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['new_t_low'] = df['t_low']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['new_t_high'] = df['t_high']
  df.loc[df['zipcode'].isin(zipcode_list), 'new_t_low'] += temp_change * 1.8
  df.loc[df['zipcode'].isin(zipcode_list), 'new_t_high'] += temp_change * 1.8
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-vers

In [540]:
high_mid_century = generate_dataset(southeastern_df, 'mid_century')
high_mid_century = post_process(high_mid_century)
high_mid_century.to_csv('/content/drive/My Drive/hardiness-zone-data/high_mid_century.csv', index=False)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['new_t_low'] = df['t_low']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['new_t_high'] = df['t_high']
  df.loc[df['zipcode'].isin(zipcode_list), 'new_t_low'] += temp_change * 1.8
  df.loc[df['zipcode'].isin(zipcode_list), 'new_t_high'] += temp_change * 1.8
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-vers

In [541]:
high_late_century = generate_dataset(southeastern_df, 'late_century')
high_late_century = post_process(high_late_century)
high_late_century.to_csv('/content/drive/My Drive/hardiness-zone-data/high_late_century.csv', index=False)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['new_t_low'] = df['t_low']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['new_t_high'] = df['t_high']
  df.loc[df['zipcode'].isin(zipcode_list), 'new_t_low'] += temp_change * 1.8
  df.loc[df['zipcode'].isin(zipcode_list), 'new_t_high'] += temp_change * 1.8
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-vers