Data Catalog

Ryan Johnson edited this page Nov 1, 2018 · 72 revisions

This is where we're organizing information about the data. When we're done, there will be a section for each data set that lists:

  1. What the data is: a brief description, units, etc.;
  2. Its source (either one or more URLs or the name and email address of whoever sent it to us);
  3. A list of source columns;
  4. Any geographical hierarchy contained in the data;
  5. Any gotchas or other weirdnesses with the data.

Federal Revenue by Location

Note: see our data type overview for the level of data available for all data types.

Type Source Commodities Geography Years
Federal Revenue by Location ONRR all states, counties, offshore 2006-2017
  1. DOI Revenue data describes federal revenue from natural resource extraction on federal land and waters.
  • Revenues are measured in US dollars, and are collected on a calendar year basis.
  • They relate to both onshore (counties) and offshore areas.
  1. They come from the Department of the Interior's Office of Natural Resources Revenue, or ONRR. There are two files in CSV format:
  • County revenues (from which we can derive state revenues)
  • Offshore revenues (by region and planning area)
  1. See the README for information on source columns
  2. Revenue is organized geographically:
  • Onshore by state and county
  • Offshore by region, planning area, and protraction
  1. Gotchas:
  • Revenue numbers are formatted in the source CSV with the dollar sign and commas, e.g. $ 9,480,892.00.
  • Revenues may be negative! ONRR refers to these as "adjustments", and they account for revenues from overpayment in previous years.
  • Negative revenues are formatted with parentheses, e.g.: $ (2,165.80)

Federal Revenues by Company

Note: see our data type overview for the level of data available for all data types.

Type Source Commodities Geography Years
Federal Revenue by Company ONRR & IA all for ONRR, none for IA none 2013-2017
  1. This is revenue data by company as received by the US government
  2. We get this data from ONRR
  3. The data on the beta site contained:
  • Company name
  • Revenue type (rent, bonus, royalty, other revenues)
  • Commodity
  • Revenue ($)
  1. The data on version next will have:
  • Company name
  • Revenue type (rent, bonus, royalty, other revenues, offshore inspection fees, civil penalties, permit fees, aml fees)
  • Commodity
  • Government-reported Revenue ($)
  1. This data does not have geography.
  2. Gotchas:
  • As with federal revenue, this data contains some negative values or "adjustments".
  • ONRR expects to have another revenue type in updates to this dataset in the January/February timeframe: renewable energy collections from BLM. This will be in "megawatt capacity fees".
  • We originally had hoped to display company-reported revenues in this same interaction, but it turns out that we didn't ask for the same columns/types of information in the company-reported data so we can't combine the datasets. The company-reported information will be available as a download.
  • Federal corporate income tax payments will be shown elsewhere on the site.

Disbursements

Note: see our data type overview for the level of data available for all data types.

Type Source Commodities Geography Years
Disbursements ONRR n/a Depends on fund, mostly by state 2003-2018
  1. This is the amount of money earned from extraction on federal lands disbursed to various legislated funds.
  2. This is provided by ONRR.
  3. Source columns include:
  • Year (by government fiscal year)
  • Fund (historic preservation act, land and water conservation act, u.s. treasury, states, indian tribes, bureau of reclamation, other)
  • Source (onshore or offshore)
  • State (where fund === state)
  • Total ($)
  1. Gotchas
  • Since the buckets of money are mostly legislated amounts, the difference between years is negligible and comparisons across years does not add much value.
  • One tricky point of policy is that for the Land and Water Conservation Act, the bucket amount is legislated, but must be approved for spend by Congress every year. It does not always get approved to their bucket is not a true representation of dollars-to-conservation.
  1. Notes
  • We have a more detailed breakdown of data for the Land and Water Conservation Act (amounts granted to specific projects in specific states from 2011 - 2016). See the README for more information.
  • We have a request out for a similar breakdown for Historic Preservation Act data, but we don't expect it in time for December launch.

Production on Federal Lands and Waters

Note: see our data type overview for the level of data available for all data types.

Type Source Commodities Geography Years
Federal Production ONRR all states, counties, offshore 2007-2016
  1. Production data details the actual commodity volumes produced by the companies that extract natural resources on federal land and waters.
  2. Production figures come from ONRR in one excel file
  3. Source files include the following columns:
  • Calendar year
  • Product (typically with units in parentheses, e.g. Fuel Gas (mcf) or Oil (bbl))
  • State/Offshore Region
  • FIPS Code
  • CPS/Planning Area
  • Production volume (in product-specific units)
  • Onshore/Offshore
  • Category
  1. Revenue is organized geographically:
  • Onshore by state and county
  • Offshore by region, planning area
  1. Gotchas:
  • Production volumes can't be summed up across products because they're each measured in different units and have different uses.
  • Some data is withheld at the county level (for example, where there is only one mine per county)

Production on All Lands and Waters

Note: see our data type overview for the level of data available for all data types.

Type Source Commodities Geography Years
All-lands Production EIA oil, gas, coal, renewables states, counties (for coal) 2007-2016
  1. This is historical production data by state (and county for coal) and offshore area for coal, oil, gas and renewables. This data is different than the previous category in that it doesn't matter who owns the land/water/resources. This data includes the previous category, Production on Federal Lands and Waters.
  2. The data goes down to the state level, with the exception of Coal, which goes down to the county level.
  3. It comes from the Energy Information Agency, or EIA.
  4. Original data comes from the EIA API. See the README and the configuration file for more information on which tables we read and how.
  5. Gotchas:
  • With the exception of Renewables, Production volumes can't be summed up across products because they're each measured in different units and have different uses.
  • Renewables breaks down by natural resource type, but we don't have clear guidance from the MSG level of detail to show here. We have:
    • “All Other Renewables” and “Biomass (total)” are major categories
    • “All Other Renewables” includes Wind, Solar, Geothermal, and Biomass (total)
    • “Biomass (total)” includes Wood and wood-derived fuels and other biomass
    • The total production for a given state is the sum of “Conventional Hydroelectric” and “All Other Renewables”
  1. Web adaptations to data:
  • Combined “All Renewables” category, which production totals of all renewables for a given state and year.
  • Omitted all states or counties without production values.
  • For Oil, units changed from “Thousand barrels” to “bbl”, and adjusted the production values accordingly.
  • For Renewables, units changed from “Kwh” to “Mwh”, and adjusted the production values accordingly.
  • For Coal, localities were not defined, so County, Parish, Census Area, and Borough were appended to be consistent with the rest of EITI Data.

Exports

Note: see our data type overview for the level of data available for all data types.

Type Source Commodities Geography Years
Exports Census some states 2012-2015
  1. Exports data describes the amount of money earned in each state, expressed in terms of US dollars and each state's share of economic exports.
  2. The data comes from US Census in Excel format, and lists the top 25 exports for each state from 2011-2014 by HS6 code.
  3. The source data has one row per state and "rank" (0-25), each with:
  • The HS6 code
  • An abbreviation describing the commodity
  • The value ($) and share (%) of state exports for each year, expressed as a separate column (e.g. val2011, share14)
  • A column labeled "change", which appears to be the percent difference between the 2011 and 2014 numbers, but isn't.
  1. This data is only organized by state (no counties, no offshore areas)
  2. Gotchas:
  • We've had to determine which HS6 codes map to the commodities that we care about, which is difficult, and probably means that we've got gaps in our data. Here is the mapping we're using as of November 2016:
    • 260112: Iron
    • 2603: Copper
    • 261610: Silver
    • 261690, 710811, 710812: Gold
    • 2701: Coal
    • 2709, 2710, 2713, 2714, 2715: Oil
    • 2711: Gas
    • All others that begin with 25, 26, or 28 — Nonenergy minerals
    • We are not including HS codes in the 70s series
  • Because the data only lists the top 25 commodities, it's very likely that there is some data for many natural resources in most states, but most of it will appear as zero.
  1. Related discussion in issues:

Wage and Salary Employment

Note: see our data type overview for the level of data available for all data types.

Type Source Commodities Geography Years
Wage & Salary Employment BLS NAICS code 21 ("extractives industries") states (with commodities), counties (no commodities) 2007-2016
  1. This data describes the number of people (full-time and part-time) employed in natural resource extraction that receive wages or salaries from companies.
  2. It comes from the Bureau of Labor Statistics, or BLS, specifically the Quarterly Census of Employment and Wages. The best way to get this data is from this page: http://www.bls.gov/cew/datatoc.htm and to download CSVs single files annual averages (you will need to load titles).
  3. Use Annual Average Employment, not Quarterly numbers
  4. Related discussion in issues:

Self-Employment

Note: see our data type overview for the level of data available for all data types.

Type Source Commodities Geography Years
Self-Employment BEA one category called 'extractive industries' (21) that includes everything but renewables states 2007-2016
  1. This data describes the number of people (full-time and part-time) that are self-employed in natural resource extraction. Self-employed people don't receive wages/salary from a company because they own the company, either as a sole proprietor, in a partnership, or in a small business.
  2. It comes from the Bureau of Economic Analysis, or BEA, but must be calculated, by subtracting SA27N Full- and Part-Time Wage and Salary Employment by NAICS Industry from SA25N Full- and Part-Time Employment by NAICS Industry. These tables are available here. Once on that page, click on "Annual State Personal Income and Employment" and use tables SA27N and SA25N within that section.
  3. At the state and county levels provide a total for NAICS 21 (Mining, the equivalent of extractive industries).
  4. Gotchas:
  • This data includes some double counting because people appear on multiple tax forms and the county for partnerships is a sample-based calculation. We should note that on the site.

GDP

Note: see our data type overview for the level of data available for all data types.

Type Source Commodities Geography Years
GDP BEA one category called 'extractive industries' that includes everything but renewables states 2007-2016
  1. This data describes the contribution of natural resource extraction activities to national and state-level economic "size", or gross domestic product, measured in US dollars.
  2. The data comes from the US Bureau of Economic Analysis, or BEA, and is accessed via their API (user guide).
  • For national figures, we access the GDPbyIndustry dataset with the following parameters:
    • Industry=21 NAICS code 21 describes "Mining, quarrying, and oil and gas extraction")

      :warning: The USEITI Report breaks NAICS code 21 down into its three constituent codes: 211, 212 and 213. We can do the same by passing Industry=211,212,213 and summing up the numbers to get the overall total.

    • TableID=5, which gives us "Value Added by Industry as a Percentage of GDP"

    • Frequency=A for annual frequency

  • For state figures, we access the RegionalProduct dataset with the following parameters:
    • GeoFips=STATE to get state-level data
    • KeyCode=GDP_SP gives us "GDP in current dollars (state annual product)"; see key-codes.json for possible values
    • In order to get the percentage of total GDP, we then request both IndustryId=1 (all industries) and IndustryId=6 (this is the equivalent of NAICS code 21), and calculate the percentage by dividing the latter by the former value.
  1. Source columns include:
  • Year
  • Industry code and description
  • GDP contribution ($)
  • For states, region name and FIPS code
  1. The API does have some quirks:
  • The IndustryDescription JSON key is misspelled as IndustrYDescription. Our JavaScript accounts for this and should continue working if they fix the typo.

Other information

Offshore Areas

Offshore areas are managed by the Bureau of Ocean Energy Management, or BOEM. Offshore areas are divided into four primary regions:

a map of BOEM offshore regions

Each region is then subdivided into a number of smaller planning areas, each with a unique three-letter code:

  • Alaska
    • ALA: Aleutian Arc
    • ALB: Aleutian Basin
    • BFT: Beaufort Sea
    • BOW: Bowers Basin
    • CHU: Chukchi Sea
    • COK: Cook Inlet
    • GEO: St. George Basin
    • GOA: Gulf of Alaska
    • HOP: Hope Basin
    • KOD: Kodiak
    • MAT: St. Matthew-Hall
    • NAL: North Aleutian Basin
    • NAV: Navarin Basin
    • NOR: Norton Basin
    • SHU: Shumagin
  • Atlantic
    • FLS: Florida Straits
    • MDA: Mid Atlantic
    • NOA: North Atlantic
    • SOA: South Atlantic
  • Gulf
    • CGM: Central Gulf of Mexico
    • EGM: Eastern Gulf of Mexico
    • WGM: Western Gulf of Mexico
  • Pacific
    • CEC: Central California
    • NOC: Northern California
    • SOC: Southern California
    • WAO: Washington-Oregon

Planning areas are further subdivided into protraction areas, listed on BOEM's site.

Clone this wiki locally
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.