Merge ff642a5 into e1ab9a0

EducationalTestingService · Mar 13, 2020 · b5dec1a · b5dec1a
2 parents e1ab9a0 + ff642a5
commit b5dec1a
Show file tree

Hide file tree

Showing 29 changed files with 1,934 additions and 511 deletions.
diff --git a/.travis.yml b/.travis.yml
@@ -9,10 +9,11 @@ notifications:
 env:
   global:
   - COVERALLS_PARALLEL=true
+  - BINPATH=${HOME}/miniconda3/envs/rsmenv/bin
   matrix:
   - TESTFILES="tests/test_experiment_rsmtool_1.py"
   - TESTFILES="tests/test_comparer.py tests/test_configuration_parser.py tests/test_experiment_rsmtool_2.py"
-  - TESTFILES="tests/test_analyzer.py tests/test_experiment_rsmeval.py tests/test_fairness_utils.py tests/test_prmse_utils.py tests/test_container.py tests/test_test_utils.py"
+  - TESTFILES="tests/test_analyzer.py tests/test_experiment_rsmeval.py tests/test_fairness_utils.py tests/test_prmse_utils.py tests/test_container.py tests/test_test_utils.py tests/test_cli.py"
   - TESTFILES="tests/test_experiment_rsmcompare.py tests/test_experiment_rsmsummarize.py tests/test_modeler.py tests/test_preprocessor.py tests/test_writer.py tests/test_experiment_rsmtool_3.py"
   - TESTFILES="tests/test_experiment_rsmpredict.py tests/test_reader.py tests/test_reporter.py tests/test_transformer.py tests/test_utils.py tests/test_experiment_rsmtool_4.py"
 sudo: false

diff --git a/DistributeTests.ps1 b/DistributeTests.ps1
@@ -43,6 +43,7 @@ elseif ($agentNumber -eq 3) {
     $testsToRun = $testsToRun + "tests/test_fairness_utils.py"
     $testsToRun = $testsToRun + "tests/test_prmse_utils.py"
     $testsToRun = $testsToRun + "tests/test_test_utils.py"
+    $testsToRun = $testsToRun + "tests/test_cli.py"
 }
 elseif ($agentNumber -eq 4) {
     $testsToRun = $testsToRun + "tests/test_experiment_rsmcompare.py"

diff --git a/doc/api.rst b/doc/api.rst
@@ -177,6 +177,7 @@ From :py:mod:`~rsmtool.transformer` Module
 From :py:mod:`~rsmtool.utils` Module
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
+.. autofunction:: rsmtool.utils.commandline.generate_configuration
 .. _agreement_api:
 .. autofunction:: rsmtool.utils.metrics.agreement
 .. _dsm_api:

diff --git a/doc/config_rsmcompare.rst b/doc/config_rsmcompare.rst
@@ -3,7 +3,7 @@
 Experiment configuration file
 """""""""""""""""""""""""""""
 
-This is a file in ``.json`` format that provides overall configuration options for an ``rsmcompare`` experiment. Here's an example configuration file for `rsmcompare <https://github.com/EducationalTestingService/rsmtool/blob/master/examples/rsmcompare/config_rsmcompare.json>`_.
+This is a file in ``.json`` format that provides overall configuration options for an ``rsmcompare`` experiment. Here's an `example configuration file <https://github.com/EducationalTestingService/rsmtool/blob/master/examples/rsmcompare/config_rsmcompare.json>`_ for ``rsmcompare``. To make it easy to get started with  ``rsmcompare``, we provide a way to automatically generate a configuration file that you can then edit based on your data and your needs. To do so, simply run ``rsmcompare generate`` at the commmand line. If you have :ref:`subgroups <subgroups_rsmtool>` in your data that you want to include in your analyses, run ``rsmcompare generate --subgroups`` instead. Next, we describe all of the ``rsmcompare`` configuration fields in detail.
 
 There are seven required fields and the rest are all optional. We first describe the required fields and then the optional ones (sorted alphabetically).
 

diff --git a/doc/config_rsmeval.rst b/doc/config_rsmeval.rst
@@ -3,7 +3,7 @@
 Experiment configuration file
 """""""""""""""""""""""""""""
 
-This is a file in ``.json`` format that provides overall configuration options for an ``rsmeval`` experiment. Here's an example configuration file for `rsmeval <https://github.com/EducationalTestingService/rsmtool/blob/master/examples/rsmeval/config_rsmeval.json>`_.
+This is a file in ``.json`` format that provides overall configuration options for an ``rsmeval`` experiment. Here's an `example configuration file <https://github.com/EducationalTestingService/rsmtool/blob/master/examples/rsmeval/config_rsmeval.json>`_ for ``rsmeval``. To make it easy to get started with  ``rsmeval``, we provide a way to automatically generate a configuration file that you can then edit based on your data and your needs. To do so, simply run ``rsmeval generate`` at the commmand line. If you have :ref:`subgroups <subgroups_eval>` in your data that you want to include in your analyses, run ``rsmeval generate --subgroups`` instead. Next, we describe all of the ``rsmeval`` configuration fields in detail.
 
 There are four required fields and the rest are all optional. We first describe the required fields and then the optional ones (sorted alphabetically).
 

diff --git a/doc/config_rsmpredict.rst b/doc/config_rsmpredict.rst
@@ -2,7 +2,7 @@
 
 Experiment configuration file
 """""""""""""""""""""""""""""
-This is a file in ``.json`` format that provides overall configuration options for an ``rsmpredict`` experiment. Here's an example configuration file for `rsmpredict <https://github.com/EducationalTestingService/rsmtool/blob/master/examples/rsmpredict/config_rsmpredict.json>`_.
+This is a file in ``.json`` format that provides overall configuration options for an ``rsmpredict`` experiment. Here's an `example configuration file <https://github.com/EducationalTestingService/rsmtool/blob/master/examples/rsmpredict/config_rsmpredict.json>`_ for ``rsmpredict``. To make it easy to get started with  ``rsmpredict``, we provide a way to automatically generate a configuration file that you can then edit based on your data and your needs. To do so, simply run ``rsmpredict generate`` at the commmand line. Next, we describe all of the ``rsmpredict`` configuration fields in detail.
 
 There are three required fields and the rest are all optional. We first describe the required fields and then the optional ones (sorted alphabetically).
 

diff --git a/doc/config_rsmsummarize.rst b/doc/config_rsmsummarize.rst
@@ -3,7 +3,7 @@
 Experiment configuration file
 """""""""""""""""""""""""""""
 
-This is a file in ``.json`` format that provides overall configuration options for an ``rsmsummarize`` experiment. Here's an example configuration file for `rsmsummarize <https://github.com/EducationalTestingService/rsmtool/blob/master/examples/rsmsummarize/config_rsmsummarize.json>`_.
+This is a file in ``.json`` format that provides overall configuration options for an ``rsmsummarize`` experiment. Here's an `example configuration file <https://github.com/EducationalTestingService/rsmtool/blob/master/examples/rsmsummarize/config_rsmsummarize.json>`_ for ``rsmsummarize``. To make it easy to get started with  ``rsmsummarize``, we provide a way to automatically generate a configuration file that you can then edit based on your data and your needs. To do so, simply run ``rsmsummarize generate`` at the commmand line. Next, we describe all of the ``rsmsummarize`` configuration fields in detail.
 
 There are two required fields and the rest are all optional. We first describe the required fields and then the optional ones (sorted alphabetically).
 

diff --git a/doc/config_rsmtool.rst b/doc/config_rsmtool.rst
@@ -3,7 +3,7 @@
 Experiment configuration file
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-This is a file in ``.json`` format that provides overall configuration options for an ``rsmtool`` experiment. Here's an example configuration file for `rsmtool <https://github.com/EducationalTestingService/rsmtool/blob/master/examples/rsmtool/config_rsmtool.json>`_.
+This is a file in ``.json`` format that provides overall configuration options for an ``rsmtool`` experiment. Here's an `example configuration file <https://github.com/EducationalTestingService/rsmtool/blob/master/examples/rsmtool/config_rsmtool.json>`_ for ``rsmtool``. To make it easy to get started with  ``rsmtool``, we provide a way to automatically generate a configuration file that you can then edit based on your data and your needs. To do so, simply run ``rsmtool generate`` at the commmand line. If you have :ref:`subgroups <subgroups_rsmtool>` in your data that you want to include in your analyses, run ``rsmtool generate --subgroups`` instead. Next, we describe all of the ``rsmtool`` configuration fields in detail.
 
 There are four required fields and the rest are all optional. We first describe the required fields and then the optional ones (sorted alphabetically).
 

diff --git a/rsmtool/__init__.py b/rsmtool/__init__.py
@@ -13,7 +13,7 @@
 import warnings
 
 try:
-    import rsmextra
+    import rsmextra # noqa
 except ImportError:
     HAS_RSMEXTRA = False
 else:
@@ -22,13 +22,13 @@
 from .version import __version__
 
 if HAS_RSMEXTRA:
-    from rsmextra.version import __version__ as rsmextra_version
+    from rsmextra.version import __version__ as rsmextra_version # noqa
     VERSION_STRING = '%(prog)s {}; rsmextra {}'.format(__version__,
                                                        rsmextra_version)
 else:
     VERSION_STRING = '%(prog)s {}'.format(__version__)
 
-from .analyzer import Analyzer
+from .analyzer import Analyzer  # noqa
 
 from .convert_feature_json import convert_feature_json_file  # noqa
 

diff --git a/rsmtool/configuration_parser.py b/rsmtool/configuration_parser.py
@@ -282,15 +282,22 @@ def __len__(self):
 
     def __str__(self):
         """
-        Return string representation of the object keys
-        as comma-separated list.
+        Return a string representation of the underlying configuration
+        dictionary.
 
         Returns
         -------
-        config_names : str
-            A comma-separated list of names from the config dictionary.
+        config_string : str
+            A string representation of the underlying configuration
+            dictionary as encoded by ``json.dumps()``. It only
+            includes the configuration options that can be set by
+            the user.
         """
-        return ', '.join(self._config)
+        expected_fields = (CHECK_FIELDS[self._context]['required'] +
+                           CHECK_FIELDS[self._context]['optional'])
+
+        output_config = {k: v for k, v in self._config.items() if k in expected_fields}
+        return json.dumps(output_config, indent=4, separators=(',', ': '))
 
     def __iter__(self):
         """
@@ -563,12 +570,8 @@ def save(self, output_dir=None):
         context = self._context
         outjson = output_dir / f"{experiment_id}_{context}.json"
 
-        expected_fields = (CHECK_FIELDS[self._context]['required'] +
-                           CHECK_FIELDS[self._context]['optional'])
-
-        output_config = {k: v for k, v in self._config.items() if k in expected_fields}
         with outjson.open(mode='w') as outfile:
-            json.dump(output_config, outfile, indent=4, separators=(',', ': '))
+            outfile.write(str(self))
 
     def check_exclude_listwise(self):
         """
@@ -827,6 +830,8 @@ def __init__(self, pathlike):
 
         Raises
         ------
+        FileNotFoundError
+            If the given config file path does not exist.
         ValueError
             If the configuration file does not have a valid extension.
             Valid extensions are ``.json`` and ``.cfg``.
@@ -835,6 +840,11 @@ def __init__(self, pathlike):
         if isinstance(pathlike, str):
             pathlike = Path(pathlike)
 
+        # raise an exception if the file does not exist
+        if not pathlike.exists():
+            raise FileNotFoundError(f"The configuration file {pathlike} "
+                                    "was not found.")
+
         # make sure have either a JSON or CFG configuration file
         extension = pathlike.suffix.lower()
         if extension not in ['.json', '.cfg']:

diff --git a/rsmtool/reader.py b/rsmtool/reader.py
@@ -227,7 +227,7 @@ def read_from_file(filename, converters=None, **kwargs):
         ------
         ValueError
             If the file has an extension that we do not support
-        pd.parser.CParserError
+        pandas.errors.ParserError
             If the file is badly formatted or corrupt.
 
         Note
@@ -263,10 +263,10 @@ def read_from_file(filename, converters=None, **kwargs):
             warnings.filterwarnings('ignore', category=pd.io.common.DtypeWarning)
             try:
                 df = do_read(filename, **kwargs)
-            except pd.parser.CParserError:
-                raise pd.parser.CParserError('Cannot read {}. Please check that it is '
-                                             'not corrupt or in an incompatible format. '
-                                             '(Try running dos2unix?)'.format(filename))
+            except pd.errors.ParserError:
+                raise pd.errors.ParserError('Cannot read {}. Please check that it is '
+                                            'not corrupt or in an incompatible format. '
+                                            '(Try running dos2unix?)'.format(filename))
         return df
 
     @staticmethod

diff --git a/rsmtool/rsmcompare.py b/rsmtool/rsmcompare.py
@@ -10,22 +10,17 @@
 :organization: ETS
 """
 
-
-import argparse
 import glob
 import logging
-import os
 import sys
 
-from os.path import (abspath,
-                     exists,
-                     join,
-                     normpath)
+from os.path import abspath, exists, join, normpath
 
-from . import VERSION_STRING
 from .configuration_parser import configure
 from .reader import DataReader
 from .reporter import Reporter
+from .utils.commandline import generate_configuration, setup_rsmcmd_parser
+from .utils.constants import VALID_PARSER_SUBCOMMANDS
 from .utils.logging import LogFormatter
 
 
@@ -84,8 +79,9 @@ def run_comparison(config_file_or_obj_or_dict, output_dir):
 
     Raises
     ------
-    ValueError
-        If any of the required fields are missing or ill-specified.
+    FileNotFoundError
+        If either of the two input directories in ``config_file_or_obj_or_dict``
+        do not exist, or if the directories do not contain rsmtool outputs at all.
     """
 
     logger = logging.getLogger(__name__)
@@ -180,44 +176,57 @@ def main():
     # set up the basic logging configuration
     formatter = LogFormatter()
 
-    handler = logging.StreamHandler(sys.stdout)
-    handler.setFormatter(formatter)
+    # we need two handlers, one that prints to stdout
+    # for the "run" command and one that prints to stderr
+    # from the "generate" command; the latter is necessary
+    # because do not want the warning to show up in the
+    # generated configuration file
+    stdout_handler = logging.StreamHandler(sys.stdout)
+    stdout_handler.setFormatter(formatter)
 
-    logging.root.addHandler(handler)
-    logging.root.setLevel(logging.INFO)
+    stderr_handler = logging.StreamHandler(sys.stderr)
+    stderr_handler.setFormatter(formatter)
 
-    # get a logger
+    logging.root.setLevel(logging.INFO)
     logger = logging.getLogger(__name__)
 
-    # set up an argument parser
-    parser = argparse.ArgumentParser(prog='rsmcompare')
-
-    parser.add_argument('config_file', help="The JSON configuration file for "
-                                            "this comparison")
+    # set up an argument parser via our helper function
+    parser = setup_rsmcmd_parser('rsmcompare',
+                                 uses_output_directory=True,
+                                 uses_subgroups=True)
+
+    # if the first argument is not one of the valid sub-commands
+    # or one of the valid optional arguments, then assume that they
+    # are arguments for the "run" sub-command. This allows the
+    # old style command-line invocations to work without modification.
+    if sys.argv[1] not in VALID_PARSER_SUBCOMMANDS + ['-h', '--help',
+                                                      '-V', '--version']:
+        args_to_pass = ['run'] + sys.argv[1:]
+    else:
+        args_to_pass = sys.argv[1:]
+    args = parser.parse_args(args=args_to_pass)
 
-    parser.add_argument('output_dir', nargs='?', default=os.getcwd(),
-                        help="The output directory where the report "
-                             "files for this comparison will be stored")
+    # call the appropriate function based on which sub-command was run
+    if args.subcommand == 'run':
 
-    parser.add_argument('-V', '--version', action='version',
-                        version=VERSION_STRING)
+        # when running, log to stdout
+        logging.root.addHandler(stdout_handler)
 
-    # parse given command line arguments
-    args = parser.parse_args()
-    logger.info('Output directory: {}'.format(args.output_dir))
+        # run the experiment
+        logger.info('Output directory: {}'.format(args.output_dir))
+        run_comparison(abspath(args.config_file),
+                       abspath(args.output_dir))
 
-    # convert all paths to absolute to make sure
-    # all files can be found later
-    config_file = abspath(args.config_file)
-    output_dir = abspath(args.output_dir)
+    else:
 
-    # make sure that the given configuration file exists
-    if not exists(config_file):
-        raise FileNotFoundError("Main configuration file {} not "
-                                "found.".format(config_file))
+        # when generating, log to stderr
+        logging.root.addHandler(stderr_handler)
 
-    # generate a comparison report
-    run_comparison(config_file, output_dir)
+        # auto-generate an example configuration and print it to STDOUT
+        configuration = generate_configuration('rsmcompare',
+                                               use_subgroups=args.subgroups,
+                                               as_string=True)
+        print(configuration)
 
 
 if __name__ == '__main__':