pygcam
is a Python package that provides classes, functions, and scripts for working with GCAM.
- Project workflow management framework that lets you define steps to run and run them all or run steps selectively.
- The main
gt
(GCAM tool) script, which provides numerous sub-commands, and can be extended by writing plug-ins. - The
gt
sub-commands facilitate key steps in working with GCAM, including:- Setting up experiments by modifying XML input files and configuration.xml
- Running GCAM, locally or by queueing jobs on a Linux cluster
- Querying the GCAM database to extract results to CSV files
- Interpolating between time-steps and computing differences between baseline and policy cases
- Plotting results
- Richard Plevin (rich@plevin.com)
- Updated system.cfg to include input files for GCAM 5.4 and 6.0.
- Note that pygcam has still not been tested against GCAM 6.0.
Updated default files in system.cfg, Darwin.cfg (macOS) and Windows.cfg for GCAM 5.4
Fixed error in "gt init" subcommand
Various updates to MCS runtime system * Added support for empirical distributions provide in CSV file * Allow user to override the SLURM "afterok" specifier, i.e., to use "after", meaning run
policies even if baseline fails. Set using config parameter "SLURM.DependencyFlag".
- Added support for full-factorial simulations. All parameters must be defined using distributions with fixed boundaries, i.e., 'Constant', 'Binary', 'Integers', 'Grid', or 'Sequence'.
- Updated docs to add reference to MCS subcommands on "gcamtool" page.
- Added XML schema elements for distribution metadata for parameters.xml file.
- Added option to gensim subcommand to generate CSV with metadata and distributions
Updated executable name in Windows to "gcam.exe" and macOS to "Release/gcam".
Added support for custom GCAM wrapper filter function to process lines of output from GCAM, settable in config file as "GCAM.WrapperFilterFunction", which must be of the format: GCAM.WrapperFilterFunction = /path/to/moduleDirectory:module.functionName
Added model_years() function to utils to retrieve list of model years defined in modeltime.xml, and modified setup step to use model_years()
Created new subcommand "gt batch" to run batch query files directly.
Added support for splitting LandLeaf and land_allocation columns into land_use, basin, irrigation, and soil type info.
Added quotes around file and directory arguments in example project.xml
- Updated default files in system.cfg for GCAM 5.3
- Fixed bug in query.runBatchQuery() reported by user huanglin6385. (Thanks!)
Fixed long-standing bug in "init" sub-command. Note that this also fixed a problem with the generated documentation at pygcam.readthedocs.org that prevented the sub-command docs from generating correctly.
Numerous internal improvements to Monte Carlo / cluster management subsystem
The XML <query> element now takes an optional "states" parameter to set the default scope of queries. The old default behavior is unchanged: the query is run on the 32 global GCAM regions.
- The full set of options is:
'withGlobal' # return states and global regions in one list 'withUSA' # return states and USA region only 'only' # return states only, excluding the 32 global regions 'none' # return only global regions, i.e., no states
New "region discovery" feature looks to the XML files in use to see if you're running GCAM-USA. This hasn't been tested on all recent versions of GCAM. To disable it, set the config variable:
GCAM.RegionDiscovery = False
User can now set location of JAVA home directory using config variable "GCAM.JavaHome"
Updated version of Dash and modified code as necessary for compatibility
- "transport" command and function "transportTechEfficiency"
- new "buildingElec" command for creating building electrification policies
- updates to "building" command
- numerous updates to callable functions setRegionalShareWeights and setInterpolationFunction
- modified setup step to remove stale local-xml files when setting up a scenario
- numerous updates to MCS subsystem
- improved support for using "restart" files
Many thanks to Robbie Orvis of Energy Innovation for funding the development of the three new policy-oriented features: "res", "building", and "transport".
Updated "res" sub-command to generate Renewable Energy Standards for 32 regions and for GCAM-USA.
Update "init" sub-command to recognize recent GCAM versions, updated default version to 5.1.3. (This will be updated to GCAM 5.2 after testing with that version is completed.)
Added support for making incremental improvements to building energy efficiencies, including - A new sub-command, "building" which creates a CSV template that can be modified to set the percentage improvement in building energy use by sector, subsector, technology, energy input, and year. - A new callable method, "buildingTechEfficiency" that converts the CSV file to the required XML and
adds an entry to the configuration.xml to load the generated file.
- GCAM regions are now read from the data system, if present. This supports use of other regionalizations.
- Added "callable" functions (callable from scenarios.xml) to: * Freeze population at any given year * Modify non-CO2 emission coefficients * Perform string replacement in generated config files (e.g., to change which "xml" dir to read from)
- Adjusted which files to copy/link on Windows
- Added "exe/restart" to list of files to copy
- MCS: xlabel on distribution plots is now set from units column in database "output" table
- Improved RF subplots
- Added ability to specify RES policy in a simple CSV file.
- Fix documentation build problems
- Added "res" sub-command, which reads a new XML file describing a set of renewable energy standards that can vary by region, year, and technology, and writes a GCAM XML input file that implements the policies.
- Added option to gt analyze to limit the number of variables displayed in tornado plots.
- Fixed another bug in gt init in setting Java home directory.
- Fixed error in new land-protection code.
- Version number is taken from gcam directory name (if possible) if executable doesn't accept --versionID flag.
- Corrected version number of tornado package in macOS YML file.
- Fixed bug preventing gt init from working properly in interactive mode.
- Updated YML files for creating pygcam-ready Anaconda environments for Python 2 and 3.
- Updated installation instructions to correspond with new YML files.
- Added string match functions to Constraint: startswith, endswith, contains are now supported.
- Fixed pathname bug that prevented multiple function calls on the same file (specified in scenarios.xml) to work correctly.
- Fixed detection of symbolic links on Windows
- Added support for suppressing "restart" files in v5.1.2 and later. (Set config variable GCAM.WriteRestartFiles = False)
- Fix for GCAM v5.1.2: create required 'restart' directory in sandbox 'exe' folder
- Pygcam now runs under Python versions 2.7 and 3.7.
- Updated example/tutorial project files to use GCAM 5.x query names
- Bug fixes in support of 5.1.1 on Windows
- Added option (
-P/--asPercentChange
) todiff
sub-command to compute percent-change. - Several revisions to Monte Carlo Simulation processing:
- Made policy scenarios dependent on completion of baseline scenarios so that for
any trial number, the baseline runs first, after which any policies can run. This
affects only uses of
gt runsim
for which both a baseline and at least one policy scenario are specified. - Updated ipyparallel requirement to version 6.2.2 on MacOS and Linux (not used on Windows).
- Added new option (
-E
filename) toanalyze
sub-command to write all inputs and outputs to a single CSV file. - The default is now to shutdown idle engines when there are no unallocated tasks.
This can be disabled with the new
-I/--dontShutdownIdle
flag - Added new distribution for logfactor Triangle:
logfactor=3
means a triangle with min, mode, max = (1/3, 1, 3) - Added symlink from "output" to temporary directory if
MCS.TempOutputDir
is defined, allowing output to be placed, e.g., on an SSD drive local to a node. - The number of engines to run is now computed from the indicated trials, though
you can still force a value using
gt runsim -n XXX
. The limit set byIPP.MaxEngines
is respected in either case. - Created new pseudo-distribution that returns values from a discrete list, in order.
is used to produce a repeating array of values in the order given. Use this to run
an explicit set of parameter values. Example:
<Sequence values="4, 6, 43.2"\>
- Converted various dicts to OrderedDicts, allowing user to place write funcs in parameters.xml in an order that ensures needed files are saved before read by other writeFuncs.
- Added two keywords to the
<Result>
element inresults.xml
:percentage
divides the difference between (scenario - baseline) by baseline to convert result into a percent change. (Use only with "diff" type results.)cumulative
sums values over the full time horizon.
- Made policy scenarios dependent on completion of baseline scenarios so that for
any trial number, the baseline runs first, after which any policies can run. This
affects only uses of
- Corrected reading of GCAM's reported version number to use only the first 2 digits. That is, version "5.1.1" is now correctly recognized as "5.1".
- Support for GCAM v5.1
- Corrected bug in Windows defaults that had set
GCAM.Temp = C:/tmp
, which is not writable by non-admin users. The default is now%(Home)s/tmp
. - Updated approach to land protection to support new geographical land units
- Support for change in the location of model interface in 5.1
- Monte Carlo Simulation improvements:
- Added units to database and results.xml schema
- Added support for setting land protection based on reg and basin
- Added support for
lowbound
andhighbound
attributes in<Distribution>
element. Bounds are applied to values produced by add/multiply/replace. This can be used to ensure that the resulting values are, say, between 0 and 1.
- Numerous tweaks to Monte Carlo simulation subsystem to allow placement of output and temporary files in chosen directories. The model's memory footprint has grown substantially in v5.0, creating challenges for earlier approaches to running many GCAM instances on a cluster. These changes allow the XML database to be placed on a local tmp or SSD drive on a compute node while query output can be written to persistent storage.
- Preliminary support for GCAM v5.1 -- note that pygcam v1.1.3 does not yet work completely with GCAM 5.1, which has moved the XML input files to a new location. Stay tuned!
- Performance improvements in writing to the sqlite3 database holding MCS status and results.
- Updated support for Monte Carlo simulations on NERSC.gov.
- Added preliminary support for dockerizing GCAM and pygcam. See, for example, https://hub.docker.com/r/plevin/pygcam-v1.0.1. The idea is that a Docker container is pre-loaded with some version of GCAM and pygcam, and it can be run using a script that mounts host directories inside the container and maps host locations in .pygcam.cfg to locations in the Linux container. Let me know if you want to use this and I can share the work in progress.
- Corrected .yml files to put
semver
specification in correct section. - Allow
gt --version
to run without having an .pygcam.cfg file in place. - Updated instructions for running on Windows to include using the Anaconda prompt.
- Configuration variable
GCAM.VersionNumber
is set based on the GCAM executable's reported version.
- Added code to gcam sub-command to create link to java libs on macOS, as is done in the run-gcam.command script in the Mac distribution.
- A bug in the ModelInterface code in gcam-v4.4 prevented the
pygcam
query sub-command from working. Please install gcam-v4.4.1 (when available) or update your the gcam-v4.4 installation, replacing the file.../input/gcam-data-system/_common/ModelInterface/src/ModelInterface.jar
with the updated file, available here - Modified
init
sub-command to use prompt_toolkit to provide filename completion via the tab key. This works on Windows only from a standard command prompt, not from a Cygwin terminal. (Theinit
sub-command works, but without filename completion.) - Added check that config variable GCAM.VersionNumber matches what the GCAM executable reports. If different, the config var is set as per the GCAM executable.
- Modified .yml installation files to deal with problem installing SALib.
- Much improved
init
sub-command and detection of missing configuration file, guiding user to run theinit
command. Theinit
command now sets up the tutorial files by default. - Improved tutorial to work with files provided by
init
, and improved documentation in general. - Configuration defaults are now saved to ~/.pygcam.defaults rather than cluttering the ~/.pygcam.cfg configuration file with this information.
- Eliminated config vars GCAM.Root and GCAM.Current in favor of GCAM.RefWorkspace. Some users may have to adjust their config files.
- Revised installation procedure now uses Anaconda environments to ensure Python package compatibility. Dropped "pyinstaller" versions.
- Created "conditional XML" to allow portions of XML input files to be selected based on the value of configuration and/or environment variables.
- All environment variables are now available in the configuration
system as
$
prefixed names as in Unix shells. That is, you can access, say, theUSER
environment variable as%($USER)s
in the config file. - Modified configuration of the logging system to allow Log Level to be set globally and/or by individual modules.
- Created browser-based "MCS Explorer" to help analyze Monte Carlo results. Features include distributions of results, tornado plots of uncertainty importance, scatterplots of inputs vs outputs, and an interactive parallel-coordinate plot for exploring parameter interactions.
- Created browser-based GUI that provides interactive access to all features of the "gt" (gcamtool) command.
- Merged pygcam-mcs into pygcam. Use command
gt mcs on
to enable the Monte Carlo features. Note that MCS support is available only on Linux currently. - Created sub-command
ippsetup
to configure ipython-parallel for the Slurm resource manager. Support for PBS and LSF is possible is users request it. - Re-designed the MCS framework to use ipython-parallel. Workers now receive instructions from the ipyparallel controller and return results to the controller, which updates the database.
- Added "optional" attribute to the
<step>
element to allow some steps to be defined for occasional use. Elements marked optional="true" are run only if explicitly mentioned on the command-line (via the -s flag). - The "query" sub-command now accepts arguments (
+b
and+B
) to control processing of pre-formed batch query files. - Modified all "global" single-letter arguments to use "+" prefix rather than "-" prefix, e.g., "gt +P my-project run" to specify the project to run. Long names retain the "--" prefix, e.g., "gt --projectName my-proj".
- No new features, just updates to get documentation building properly on ReadTheDocs.org.
- Created "init" command to interactively set key config variables
- Added config variables GCAM.LogFileFormat and GCAM.LogConsoleFormat to customize the messages produced by the logging system.
- Added setPriceElasticity function, callable from scenarios.xml scripts
- Improved GCAM installation script to work across all 3 GCAM platforms.
- Fixed home drive / home directory access on Windows
- Added "saveAs" attribute to query specification to allow a query to be rewritten (i.e., aggregated) different ways and saved to CSV files with different names.
- Fixed bugs in pyinstaller versions
- Changed default value of GCAM.SandboxRoot from {GCAM.Root}/ws to {GCAM.Root}/sandbox
- Added "mi" sub-command to invoke ModelInterface from the command-line after creating a model_interface.properties file that refers to the project's custom query file (if GCAM.MI.QueryFile is set) or to the reference query file.
- Various fixes for the "one-directory" version of pygcam installer
- Improved install-gcam.py script
- Addressed matplotlib issue on Macs
Added label to identify default scenario group in listing groups via "gt run -G"
Added function to carbonTax.py to create linked land-use change CO2 to carbon tax or cap policies:
genLinkedBioCarbonPolicyFile(filename, market='global', regions=None, forTax=True, forCap=False)
Also added function (bioCarbonTax) callable from XML setup file to access this feature.
Added initial support to integrate pygcam-mcs (coming soon!)
Made the <scenariosFile> element optional in project.xml, using the value of GCAM.ScenarioSetupFile by default.
Added function callable from setup XML, <protectionScenario name="xxx"/>, which indicates a protection scenario to use from the file defined by config variable GCAM.ProtectionXmlFile.
Reversed previous modification to handling of "gt config -e" (edit config file) which had placed quotes around the value of GCAM.TextEditor. This breaks commands like "emacs -nw" since this is now seen as the command name. Solution is for users with spaces within a command name to add the quotes in the config file, e.g.,
GCAM.TextEditor = "c:/Programs/Some Path With Spaces/someEditor.exe"
Added check to prevent deletion of files within reference workspace, which could happen under specific circumstances with symbolic links.
Added new "srcGroupDir" attribute to <scenario> element to identify a directory holding static XML files for a scenario, allowing related scenarios to share these files without requiring copying or symlinks.
- Minor adjustments to setup to label documentation with correct version and to allow symlink warning for Windows to be suppressed by setting config var GCAM.SymlinkWarning = False
- Fixed lingering symlink issues on Windows version.
- Fixed several problems with Windows version:
- Whereas on Linux and OS X, the user's home
directory is unambiguous, Windows has both
HOMESHARE
andHOMEPATH
, at least one of which should be non-empty, but neither is guaranteed correct. Thus for Windows, the user can definePYGCAM_HOME
to be the folder in which to create the.pygcam.cfg` file. Pygcam looks for the first directory found searching in the order ``PYGCAM_HOME
,HOMESHARE
, and finallyHOMEPATH
. - Pygcam was attempting to symlink some files and failing if the Windows user didn't have symlink permission. This has been corrected to copy in all cases if symlinks fail.
- When copying is required, pygcam was copying more than was required from the reference workspace. (With v4.3, the "input" folder holds much more than just XML files...) The copying is now limited to folders containing XML files. (But it's still best if you can arrange to have permission to create symbolic links, since that avoids all the copying.)
- Whereas on Linux and OS X, the user's home
directory is unambiguous, Windows has both
If you were stymied by the installation process, you can try the new zipped all-in-one directory that bundles everything needed to run gcamtool (the "gt" command) without any additional downloads or installation steps other than setting your PATH variable. This works only for Mac and Windows. See http://pygcam.readthedocs.io/en/latest/install.html for details.
A new feature of the "run" sub-command lets your run a scenario group on a cluster with one command. The baseline is queued and all policy scenarios are queued with a dependency on completion of the baseline job. Just specify the -D option to the run sub-command.
You can run all scenarios for all scenario groups of a project this way by specifying the -D (or --distribute) and -a (or --allGroups) flags together. All baselines will start immediately with all policy scenarios queued as dependent on the corresponding baseline.
The requirement to install xmlstarlet has been eliminated: all XML manipulation is now coded in Python, but it's still fast since it uses the same libxml2 library that xmlstartlet is based on.
All configuration variables have been updated with defaults appropriate for GCAM 4.3.
The "group" attribute of project <step> elements now is treated as a regular expression of an exact match is not found. So if you have, say, groups FuelShock-0.9 and FuelShock-1.0, you can declare a step like the following that applies to both groups:
<step name="plotCI" runFor="policy" group="FuelShock"> ... some command ... </step>
Updated carbon tax generator. This can be called from a scenarios.xml file as follows (default values are shown):
<function name="taxCarbon">initialValue, startYear=2020, endYear=2100, timestep=5, rate=0.05, regions=GCAM_32_REGIONS, market='global'</function>
- The regions argument must be a list of regions in Python syntax, e.g., ["USA"] or ["USA", "EU27"].
- It creates the carbon tax policy in a file called carbon-tax-{market-name}.xml, which is added automatically to the current configuration file.