# The Environmental footprint of data centers in the United States
Script for performing sensitivity analysis based on geographical attribution methods for water footprint, carbon footprint, and water scarcity footprint of data centers. This script assumes State boundary as the geographical attribution boundary of consumed electricity. State boundary can be replaced with any other commonly used attribution boundary to analyze the sensitivity of our approach for estimating environmental footprint of data centers.

In [1]:
#import necessary libraries
import warnings; warnings.simplefilter('ignore')
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

# Data centers location and energy use

Ganeshalingam et al. [4] reports likely locations of in-house small and midsize data centers (DC). Detailed information on colocation and hyperscale data centers is derived from commercial compilations [19–21]. Floor space based electricity use are then matched with 2018 estimate of servers by data center type [3]. Scaled server estimates are then spatially distributed to the US States in proportion to the current spatial distribution of installed server bases.


In [2]:
State_Energy_Use = pd.read_excel(r"\XLS_SI\Input data.xlsx", "Table 6", skiprows = 2)
State_Energy_Use.head()

Unnamed: 0,State,Water intensity (m3/MWh),Emission intensity (m3/MWh),Scaled Power Consumption (MWh)
0,AK,0.845317,0.4553,78105.96
1,AL,5.610878,0.434148,1258304.0
2,AR,1.971335,0.609572,269573.5
3,AZ,16.624722,0.48599,3804335.0
4,CA,5.176452,0.210322,7485019.0


# Electricity generation, water consumption, and GHG emission of power plants

Power plant-specific electricity generation and water consumption data come from the US Energy Information Administration (EIA). We assigned national average values of water consumption per unit of electricity generation by fuel type (i.e., water intensity; m3/MWh) to all power plants with unspecified water consumption. Operational water footprints of solar and wind power were taken from Macknick et al. [25]. Following Grubert [26], we assign all reservoir evaporation to the dam’s primary purpose (e.g., hydropower). We connected hydroelectric dams with their respective power plants using data from Grubert [27]. Reservoir specific evaporation comes from Reitz et al. [28]. The U.S. Environmental Protection Agency’s eGRID database [29] provided GHG emissions associated with each power plant.

In [3]:
powerplant = pd.read_excel(r"\XLS_SI\Input data.xlsx", "Table 2", skiprows = 2)
powerplant.head()

Unnamed: 0,Plant state,Plant name,Plant Id,Balancing Authority Name,Balancing Authority Code,PCA Generation (MWh),Latitude,Longitude,Plant primary fuel,Plant primary fuel code,Net generation (MWh),CO2-e Emission (tons),Water intensity (m3/MWh),Carbon intensity (Tons/MWh),Generation Ratio of Power plant,Water Consumption (m3),Subbasin,HUC8,HUC8 ID
0,AL,ABC Coke,56076,"Southern Company Services, Inc. - Trans",SOCO,255414500.0,33.582793,-86.779866,COG,COAL,5290.0,1552.988,1.847267,0.293571,2.1e-05,9772.043741,Locust,3160111,3160111
1,AL,Alabama Pine Pulp,54429,"Southern Company Services, Inc. - Trans",SOCO,255414500.0,31.5825,-87.4889,BLQ,BIOMASS,413079.77,10436.106,1.937552,0.025264,0.001617,800363.491644,Lower Alabama,3150204,3150204
2,AL,Alabama River Pulp,10216,"Southern Company Services, Inc. - Trans",SOCO,255414500.0,31.5825,-87.4889,BLQ,BIOMASS,308063.28,12986.077,1.937552,0.042154,0.001206,596888.592313,Lower Alabama,3150204,3150204
3,AL,AMEA Sylacauga Plant,56018,"Southern Company Services, Inc. - Trans",SOCO,255414500.0,33.1661,-86.2825,NG,GAS,34570.0,22786.699,0.797026,0.659147,0.000135,27553.194096,Lower Coosa,3150107,3150107
4,AL,ANAD Solar Array,60680,"Southern Company Services, Inc. - Trans",SOCO,255414500.0,33.626728,-85.969481,SUN,SOLAR,17168.0,0.0,0.0076,0.0,6.7e-05,130.4768,Middle Coosa,3150106,3150106


# Indirect water footprint of data centers

A data center (and all other utilities) assumed to consume electricity only from the power plants located within the same State boundary that the data center is located. Electricity supplied from each power plant is estimated as the fraction of electricity generated by a power plant compared to the total generation within the State. Indirect water footprint associated with electricity used during the operation of a data center from power plant is finally aggreagated at HUC8 level.


In [4]:
#Estimate the indirect water supplied (IWS) from each power plant
powerplant["State Generation (MWh)"] = powerplant.groupby(["Plant state"])["Net generation (MWh)"].transform(sum)
IWF_State = powerplant.merge(State_Energy_Use[["State", "Scaled Power Consumption (MWh)"]], left_on = "Plant state", right_on= "State", how = "left")
IWF_State["Indirect water footprint (m3)"] = IWF_State["Scaled Power Consumption (MWh)"]*IWF_State["Net generation (MWh)"]/IWF_State["State Generation (MWh)"]*IWF_State["Water intensity (m3/MWh)"]

#Aggregate the indirect water footprint at HUC8 level by each PCA
IWF_at_HUC8= IWF_State.groupby(["HUC8 ID", "Subbasin"])["Indirect water footprint (m3)"].sum().reset_index()
IWF_at_HUC8["HUC8 ID"] = IWF_at_HUC8["HUC8 ID"].astype(str)
IWF_at_HUC8.head()

Unnamed: 0,HUC8 ID,Subbasin,Indirect water footprint (m3)
0,1010001,Upper St. John,1586.656733
1,1010004,Aroostook,8133.203468
2,1010005,Meduxnekeag,4.935269
3,1020001,West Branch Penobscot,3462.473688
4,1020003,Mattawamkeag,18.280093


# Direct water footprint of data centers

Direct water consumption of a data center can be estimated from the heat generation capacity of a data center [42], which is related to the amount of electricity used [43]. Estimates of data center specific electricity demand were multiplied by the typical water cooling requirement [1] – 1.8 m3/MWh – to estimate the direct water footprint of each data center. The direct water consumption is assigned to the watershed where the water utility supplying the data center withdraws its water using ArcGIS.


In [6]:
DWF_at_HUC8 = pd.read_excel(r"\XLS_SI\Input data.xlsx", "Table 5", skiprows = 2, usecols = 'A,B,C')
DWF_at_HUC8.head()

Unnamed: 0,HUC8 ID,Subbasin,Scaled Power Consumption (MWh)
0,1010001,Upper St. John,887.732534
1,1010002,Allagash,0.0
2,1010003,Fish,1242.825547
3,1010004,Aroostook,1953.011574
4,1010005,Meduxnekeag,1597.918561


In [7]:
#Estimate the direct water supply from each subbasin
DWF_at_HUC8["Direct water footprint (m3)"] = DWF_at_HUC8["Scaled Power Consumption (MWh)"]*1.8
DWF_at_HUC8.head()

Unnamed: 0,HUC8 ID,Subbasin,Scaled Power Consumption (MWh),Direct water footprint (m3)
0,1010001,Upper St. John,887.732534,1597.918561
1,1010002,Allagash,0.0,0.0
2,1010003,Fish,1242.825547,2237.085985
3,1010004,Aroostook,1953.011574,3515.420833
4,1010005,Meduxnekeag,1597.918561,2876.253409


# Total water footprint of data centers

Total water footprint is the sum of direct water consumption, and indirect water consumption associated with electricity used by data centers, public water system (PWS), and wastewater treatement plants (WWTP) that service a data center. Indirect water use by PWS and WWTP can be estimated using the similar approach mentioned above for indirect water supply from power plants to data centers. This script only accounts for the water and carbon footprint associated with the electricity use at the data center facility.


In [8]:
#Blue water footprint (BWF) is the sum of direct and indirect water consumption
DWF_at_HUC8["HUC8 ID"] = DWF_at_HUC8["HUC8 ID"].astype(str)
BWF_DC =DWF_at_HUC8.merge(IWF_at_HUC8[["HUC8 ID", "Indirect water footprint (m3)"]], left_on = [ "HUC8 ID"], right_on = ["HUC8 ID"], how = "left")
BWF_DC.fillna(value={"Indirect water footprint (m3)":0}, inplace = True)
BWF_DC["Blue water footprint (m3)"] = BWF_DC["Direct water footprint (m3)"]+BWF_DC["Indirect water footprint (m3)"]
BWF_DC.head()

Unnamed: 0,HUC8 ID,Subbasin,Scaled Power Consumption (MWh),Direct water footprint (m3),Indirect water footprint (m3),Blue water footprint (m3)
0,1010001,Upper St. John,887.732534,1597.918561,1586.656733,3184.575294
1,1010002,Allagash,0.0,0.0,0.0,0.0
2,1010003,Fish,1242.825547,2237.085985,0.0,2237.085985
3,1010004,Aroostook,1953.011574,3515.420833,8133.203468,11648.624302
4,1010005,Meduxnekeag,1597.918561,2876.253409,4.935269,2881.188678


In [None]:
#Export the output to excel
BWF_DC.to_excel("BWF_DataCenter_State_attribution_method.xlsx")

# Carbon footprint of data centers¶

Electricity supplied from each power plant is estimated as the fraction of electricity generated by a power plant compared to its operating PCA. Carbon footprint associated with electricity used during the operation of a data center from power plant is finally aggreagated at HUC8 level.


In [10]:
#Estimate the GHG emission (GHGI) from each power plant
IWF_State["Carbon Footprint (Tons CO2-eq)"] = IWF_State["Scaled Power Consumption (MWh)"]*IWF_State["Net generation (MWh)"]/IWF_State["State Generation (MWh)"]*IWF_State["Carbon intensity (Tons/MWh)"]

#Aggregate the indirect water footprint at HUC8 level by each PCA
IWF_at_HUC8= IWF_State.groupby(["HUC8 ID", "Subbasin"])["Carbon Footprint (Tons CO2-eq)"].sum().reset_index()
IWF_at_HUC8["HUC8 ID"] = IWF_at_HUC8["HUC8 ID"].astype(str)
IWF_at_HUC8.head()

Unnamed: 0,HUC8 ID,Subbasin,Carbon Footprint (Tons CO2-eq)
0,1010001,Upper St. John,0.0
1,1010004,Aroostook,137.375606
2,1010005,Meduxnekeag,0.0
3,1020001,West Branch Penobscot,0.0
4,1020003,Mattawamkeag,0.0


In [None]:
#Export the output to excel
CF_DC.to_excel("CF_DataCenter_State_Attribution_method.xlsx")

# Water scarcity footprint of data centers

We quantified the W SF of data centers using the AWARE method set forth by Boulay et al. [46] (see the Supportive Information for more details). Other societal and environmental water use data, as well as data on natural water availability within each US watershed, for estimating the characterization factors of subbasins come from ref [47–49].


In [12]:
Characterization_factors = pd.read_excel(r"\XLS_SI\Input data.xlsx", "Table 5", skiprows = 2, usecols = 'A,B,D')
Characterization_factors.tail()

Unnamed: 0,HUC8 ID,Subbasin,Characterization factor
2094,18100100,Southern Mojave,100.0
2095,18100201,Whitewater River,20.27964
2096,18100202,Carrizo Creek,46.577097
2097,18100203,San Felipe Creek,31.673298
2098,18100204,Salton Sea,100.0


In [28]:
#Multiply the water consumption from each subbasin by its characterization factor to get the water scarcity footprint of data centers
Characterization_factors["HUC8 ID"] = Characterization_factors['HUC8 ID'].astype(str)
HUC_WSF = BWF_DC.merge(Characterization_factors, left_on = ["HUC8 ID", "Subbasin"], right_on = ["HUC8 ID", "Subbasin"], how = "left")
HUC_WSF["Water Scarcity Footprint (m3-eq)"] = HUC_WSF["Blue water footprint (m3)"]*HUC_WSF["Characterization factor"]
DC_portfolio_at_HUC8 = HUC_WSF[["HUC8 ID", "Subbasin", "Scaled Power Consumption (MWh)", "Direct water footprint (m3)", "Indirect water footprint (m3)", "Blue water footprint (m3)", "Characterization factor", "Water Scarcity Footprint (m3-eq)"]]
DC_portfolio_at_HUC8.head()

Unnamed: 0,HUC8 ID,Subbasin,Scaled Power Consumption (MWh),Direct water footprint (m3),Indirect water footprint (m3),Blue water footprint (m3),Characterization factor,Water Scarcity Footprint (m3-eq)
0,1010001,Upper St. John,887.732534,1597.918561,1586.656733,3184.575294,0.240074,764.532751
1,1010002,Allagash,0.0,0.0,0.0,0.0,0.334706,0.0
2,1010003,Fish,1242.825547,2237.085985,0.0,2237.085985,0.311171,696.117157
3,1010004,Aroostook,1953.011574,3515.420833,8133.203468,11648.624302,0.309205,3601.816135
4,1010005,Meduxnekeag,1597.918561,2876.253409,4.935269,2881.188678,0.277446,799.373442


In [12]:
DC_portfolio_at_HUC8.to_excel("WF_profile_HUC8_state.xlsx")

# Notes

The authors of this iPython notebook make this code available with the MIT license, 2021. 
 https://opensource.org/licenses/MIT
 
All the description of methodology and references mentioned in this scripts can be found in the main manuscript. 

Siddik, M. A. B., Shehabi, A., & Marston, L. T. (2021). The environmental footprint of data centers in the United States. Environmental Research Letters. https://doi.org/10.1088/1748-9326/abfba1
