Skip to content
This repository was archived by the owner on Mar 28, 2025. It is now read-only.

Detailed formatcheck.py documentation

echang97 edited this page Jul 17, 2019 · 23 revisions

Functions

add_item

Parameter(s):

  • key - Item to be added or modified
  • value - Unit of measurement to be associated with key
  • dct - The Dictionary that this is being applied to

If key is already in the dictionary, add the value to the set associated with the key

Otherwise, add the associate the new key with a new set containing only value

[Ex 1.] key = "Gas", value = "mcf" -> { "Gas": {"mcf"} }

[Ex 2.] key = "Geothermal - Electrical Generation", value = "Kilowatt Hours" 
        -> { "Geothermal - Electrical Generation": {"Kilowatt Hours"} }

        key = "Geothermal - Electrical Generation", value = "Thousands of Pounds" 
        -> { "Geothermal - Electrical Generation": {"Kilowatt Hours", "Thousands of Pounds"} }

get_com_pro

Parameter(s):

  • cols - Columns from Pandas DataFrame Checks cols for "Commodity" or "Product"

Returns "n/a" if "Commodity" and "Product" are both present or both missing

Otherwise it returns whichever is present

[Ex 1.] cols = ["Commodity"] -> returns "Commodity"
[Ex 2.] cols = ["Product"] -> returns "Product"
[Ex 3.] cols = ["Commodity", "Product"] -> returns "n/a"

get_data_type

Parameter(s):

  • name - Name of the Excel file

Field(s):

  • lower - name in all lowercase letters
  • prefixes = ["cy","fy","monthly","company","federal","native","production","revenue","disbribution"]

Returns a String based on the Excel file given

If any entries from prefixes are found in name, they will be added to the final String

[Ex] name = "federal_production_CY03-18" -> returns "cyfederalproduction_"

split_unit

Parameter(s):

  • string - String to be split

Returns a List of Strings separated either by the right-most opening parentheses "(" or the left-most comma ","

[Ex 1] string = "Gas (mcf)" -> ["Gas", "mcf"]
[Ex 2] string = "Geothermal - Electrical Generation, Kilowatt Hours" 
       = ["Geothermal - Electrical Generation", "Kilowatt Hours"]
[Ex 3] string = "Geothermal - sulfur" = ["Geothermal - sulfur", ""]

Class: Setup

get_header

Parameter(s):

  • file - A Pandas DataFrame

Returns column names as a List

get_unit_dict

Returns a dictionary of item and units

Class: FormatChecker

read_config

Parameter(s):

  • type - Prefix for config file represented by a String

Returns an a dictionary based on the JSON file

get_w_count

Parameter(s):

  • file - A Pandas DataFrame

Returns a tuple based on the number of "W"s found in Volume or "Withheld"s found in State

Calendar Year  Land Category  Land Class     State  ... Product                       Volume
2003                 Onshore     Federal        CA  ... Salt (tons)                   33,622
2003                 Onshore     Federal        CA  ... Soda Ash (tons)                    W
2003                 Onshore     Federal        CA  ... Sodium Bi-Carbonate (tons)         W
2003                 Onshore     Federal        CA  ... Gas (mcf)                    4,885.6
2003	             Onshore	 Federal  Withheld  ... Borate Products (tons)	      31,124

Returns (2,1)

check_header

Parameter(s):

  • file - A Pandas DataFrame

Iterates through default header and checks if specific Field Names are present.

Prints out if a Field Name is missing or in the wrong order

Unexpected Field Names are printed separately.

[Ex] default = ["Month", "Calendar Year", "Land Class", "Land Category", "Commodity", "Volume"]
     columns = ["Moth", "Calendar Year", "Land Category", "Land Class", "Commodity", "Volume"]

-> "Month": Not Present, "Land Category": Unexpected Order, "Land Class": Unexpected Order
   New Cols: Moth

check_misc_cols

check_nan

check_unit_dict

read_config

reads a pickle

Clone this wiki locally