# Texas and Georgia NASS districts Cotton Mask

**Project:** Georgia and Texas Agriculture

**Date:** 02/10/2025

**Code Contact:** Henry Osei, henryoseipoku77@gmail.com

**Inputs:** Shapefile with the boundaries of the NASS districts of interest. 2016 to 2023 crop sequence boundary (CSB) shapefile of Texas and Georgia

**Outputs:** Shapefiles/geoparquet files of one cotton mask per district. An interactive map that shows the cotton mask of one of the Georgia districts.

**Description:** This script creates a standardized cotton mask for the last decade (2015 to 2024) in Georgia and Texas NASS districts of interest. First, the shapefile for all crop cultivated areas in Texas and Georgia from 2016 to 2023 is converted into a geoparquet file. For each year and NASS district, cotton planted areas and extracted. Finally, for each district, all areas that cultivated cotton for at least two growing seasons from 2020 to 2023 are selected to create one cotton that shows areas that usually planted cotton the last decade.

- NASS districts of interest: **Georgia**: GA-70, GA-80, GA-90; **Texas**: TX-12, TX-21, TX-22, TX-60, TX-70

In [1]:
# Import the necessary libraries
import geopandas as gpd
import os
import glob
import pandas as pd
import folium

#NB: Georparquet file format is used in this code, so you need to preinstall pyarrow before you can read and manipulate the geoparquet files

In [3]:
# base directory for all the input files
# NB: On your computer, change this to the directory to the directory of the input files 
path= os.chdir('C:/TX_GA_CSB')

### PART A
- Convert the shapefile to geoparquet for fast and easy manipulation of data.
 
**NB:** Do not run this section if you already have the geoparquet file of the crop boundries in TX and GA.

In [None]:
# read the Texas (TX) and Georgia (GA) crop field boundary shapefile into a geodataframe
all_crops= gpd.read_file('2016_to_2023_WGS84.shp')

# convert the geodataframe to a geoparquet
all_crops.to_parquet('2016_to_2023_WGS84.geoparquet')

### PART B
- Extract a cotton mask for each district and visualize one of them on an interactive map.

In [4]:
# read the NASS districts shapefile and TXGA crop boundary data
NASS_districts= gpd.read_file('NASSDistrictsofInterest_GeorgiaTexas.shp')
TXGA_crops= gpd.read_parquet('2016_to_2023_WGS84.geoparquet')

In [5]:
# --------------------------------
# selects farm areas/polygons that cultivated cotton for at least two years from the 2020 to 2023 season
# --------------------------------

# NB: The crop data layer (CDL) code for cotton is 2
# create a function that assigns 1 if a farm/polygon cultivated cotton at least twice in the given timeframe, and 0 if not
def assign_cotton(row):
    total_count = sum([row['CDL2020'] == 2, row['CDL2021'] == 2, row['CDL2022'] == 2, row['CDL2023'] == 2])
    if total_count >= 2:
        return 1
    else:
        return 0


# apply the function to the TXGA_crops geodataframe and create a new column 'Cotton' to assign the cotton farms
TXGA_crops['Cotton']= TXGA_crops.apply(assign_cotton, axis= 1)

In [34]:
# check the unique values in the 'Cotton' column to confirm if there are only 1's and 0's
TXGA_crops['Cotton'].unique()

In [7]:
# select the assigned cotton fields ('1')
cotton_fields= TXGA_crops[TXGA_crops['Cotton']==1].reset_index(drop= True)

In [8]:
# creates a function that assigns the state name to each district. This will be used in naming the final files
# NB: the State Federal Information Processing Standard (FIPS) code for TX and GA is 48 and 13 respectively
def assign_state(row):
    if row['STATEFP']== '48':
        return 'TX'
    else:
        return 'GA'
    
    
# apply the function to creates a new column 'St_NAME' that assigns a state name to each district
NASS_districts['St_NAME']= NASS_districts.apply(assign_state, axis= 1)

In [None]:
# --------------------------------
# Selects cotton fields within each district and exports a cotton mask shapefile/geoparquet for each district
# --------------------------------


# we are about to do a spatial join, so
# ensure the same crs schema for both geodataframes
NASS_districts = NASS_districts.to_crs(cotton_fields.crs)

# create a folder to store the output files
output_dir = "per_district"
os.makedirs(output_dir)

# spatial join: 
# match the cotton field polygons with the NASS districts they fall within, and append the their attributes to their respective districts 
districts_cotton = gpd.sjoin(cotton_fields, NASS_districts, how='inner', predicate='within')

# add a column to identify each district
districts_cotton['Dist_NAME'] = districts_cotton['St_NAME'] + "_" + districts_cotton['NASS Dis_1']

# select only the necessary columns
nec_cols = ['Dist_NAME', 'CSBACRES', 'geometry']
districts_cotton = districts_cotton[nec_cols]

# save individual district files
for district_name, group in districts_cotton.groupby('Dist_NAME'):
    group.to_parquet(f"per_district/{district_name}_CottonMask.geoparquet")
    # group.to_file(f"per_district/{district_name}_CottonMask.shp")  # uncomment if you want to save as a shapefile

In [35]:
# fetch cotton mask data for each district
files= glob.glob('per_district/*geoparquet')
# files= glob.glob('per_district/*shp')  # uncomment this if you prefer using the shapefile

# read them into geodataframes
dist_cot_mask= [gpd.read_parquet(dist) for dist in files]

In [1]:
# --------------------------------
# create an interactive map that shows the cotton mask of the first geodataframe in the list
# --------------------------------


# obtain the centroid of the first GeoDataFrame to determine the map's center
centroid = dist_cot_mask[0].geometry.centroid.iloc[0]
map_center = [centroid.y, centroid.x]

# initialize the Folium map with the map_center and default OpenStreetMap tiles
m = folium.Map(location=map_center, zoom_start=10)


# overlay the cotton mask from the first GeoDataFrame onto the map
folium.GeoJson(dist_cot_mask[0]).add_to(m)

# add a layer control
folium.LayerControl().add_to(m)

# display the map
m