Merge pull request #316 from EducationalTestingService/final-updates-…

…to-documentation Final updates to documentation & other changes
EducationalTestingService · Nov 21, 2019 · 850459f · 850459f
2 parents 852649c + 860b4c7
commit 850459f
Show file tree

Hide file tree

Showing 14 changed files with 47 additions and 60 deletions.
diff --git a/.gitignore b/.gitignore
@@ -13,3 +13,5 @@
 *test_outputs/
 
 __pycache__
+/rsmtool.sublime-workspace
+/rsmtool.sublime-project
diff --git a/doc/config_rsmeval.rst b/doc/config_rsmeval.rst
@@ -92,7 +92,7 @@ RSMTool provides pre-defined sections for ``rsmeval`` (listed below) and, by def
 
     - ``evaluation by group``: Shows barplots with the main evaluation metrics by each of the subgroups specified in the configuration file.
 
-    - ``fairness_analyses``: Additional :ref:`fairness analyses <fairness_extra>` suggested in `Loukina, Madnani, & Zechner, 2019 <https://aclweb.org/anthology/papers/W/W19/W19-4401/>`_. The notebook shows:
+    - ``fairness_analyses``: Additional :ref:`fairness analyses <fairness_extra>` suggested in `Loukina, Madnani, & Zechner, 2019 <https://www.aclweb.org/anthology/W19-4401/>`_. The notebook shows:
 
         - percentage of variance in squared error explained by subgroup membership
         - percentage of variance in raw (signed) error explained by subgroup membership

diff --git a/doc/config_rsmtool.rst b/doc/config_rsmtool.rst
@@ -146,7 +146,7 @@ RSMTool provides pre-defined sections for ``rsmtool`` (listed below) and, by def
 
     - ``evaluation_by_group``: Shows barplots with the main evaluation metrics by each of the subgroups specified in the configuration file.
 
-    - ``fairness_analyses``: Additional :ref:`fairness analyses <fairness_extra>` suggested in `Loukina, Madnani, & Zechner, 2019 <https://aclweb.org/anthology/papers/W/W19/W19-4401/>`_. The notebook shows:
+    - ``fairness_analyses``: Additional :ref:`fairness analyses <fairness_extra>` suggested in `Loukina, Madnani, & Zechner, 2019 <https://www.aclweb.org/anthology/W19-4401/>`_. The notebook shows:
 
         - percentage of variance in squared error explained by subgroup membership
         - percentage of variance in raw (signed) error  error explained by subgroup membership

diff --git a/doc/contributing.rst b/doc/contributing.rst
@@ -10,13 +10,13 @@ To set up a local development environment, follow the steps below:
 
 1. Pull the latest version of RSMTool from GitHub and switch to the ``master`` branch.
 
-2. If you already have the ``conda`` package manager installed, skip to the next step. If you do not, follow the instructions on `this page <https://conda.io/docs/user-guide/install/index.html>`_ to install conda.
+2. If you already have the ``conda`` package manager installed, skip to the next step. If you do not, follow the instructions on `this page <https://conda.io/projects/conda/en/latest/user-guide/install/index.html>`_ to install conda.
 
-3. Create a new conda environment (say, ``rsmtool``) and install the packages specified in the ``requirements.txt`` file by running::
+3. Create a new conda environment (say, ``rsmdev``) and install the packages specified in the ``requirements.txt`` file by running::
 
-    conda create -n rsmtool -c defaults -c conda-forge -c desilinguist --file requirements.txt
+    conda create -n rsmdev -c conda-forge -c desilinguist --file requirements.txt
 
-4. Activate the environment using ``source activate rsmtool`` (use ``activate rsmtool`` if you are on Windows).
+4. Activate the environment using ``conda activate rsmdev``. [#]_
 
 5. Run ``pip install -e .`` to install rsmtool into the environment in editable mode which is what we need for development.
 
@@ -137,3 +137,6 @@ Here are some advanced tips and tricks when working with RSMTool tests.
 
 3. In the rare case that you *do* need to create an entirely new ``tests/test_experiment_X.py`` file instead of using one of the existing ones, you can choose whether to exclude the tests contained in this file from updating their expected outputs when ``update_files.py`` is run by setting ``_AUTO_UPDATE=False`` at the top of the file. This should *only* be necessary if you are absolutely sure that your tests will never need updating.
 
+.. rubric:: Footnotes
+
+.. [#] For older versions of conda, you may have to do ``source activate rsmtool`` on Linux/macOS and ``activate rsmtool`` on Windows.
diff --git a/doc/evaluation.rst b/doc/evaluation.rst
@@ -221,7 +221,7 @@ PRMSE is computed using :ref:`rsmtool.prmse_utils.compute_prmse <prmse_api>`.
 Fairness
 ~~~~~~~~
 
-Fairness of automated scores is an important component of RSMTool evaluations (see `Madnani et al, 2017 <https://www.aclweb.org/anthology/papers/W/W17/W17-1605/>`_).
+Fairness of automated scores is an important component of RSMTool evaluations (see `Madnani et al, 2017 <https://www.aclweb.org/anthology/W17-1605/>`_).
 
 When defining an experiment, the RSMTool user has the option of specifying which subgroups should be considered for such evaluations using :ref:`subgroups<subgroups_rsmtool>` field. These subgroups are then used in all fairness evaluations. 
 
@@ -267,7 +267,7 @@ DSM is computed using :ref:`rsmtool.utils.difference_of_standardized_means<dsm_a
 Additional fairness evaluations
 +++++++++++++++++++++++++++++++
 
-Starting with v7.0, RSMTool includes additional fairness analyses suggested in `Loukina, Madnani, & Zechner, 2019 <https://aclweb.org/anthology/papers/W/W19/W19-4401/>`_. The computed metrics from these analyses are available in :ref:`intermediate files<rsmtool_fairness_eval>` ``fairness_metrics_by_<SUBGROUP>``.
+Starting with v7.0, RSMTool includes additional fairness analyses suggested in `Loukina, Madnani, & Zechner, 2019 <https://www.aclweb.org/anthology/W19-4401/>`_. The computed metrics from these analyses are available in :ref:`intermediate files<rsmtool_fairness_eval>` ``fairness_metrics_by_<SUBGROUP>``.
 
 These include: 
 

diff --git a/doc/getting_started.rst b/doc/getting_started.rst
@@ -2,20 +2,20 @@
 
 Installation
 ============
-Note that RSMTool has only been tested with Python 3.6 and higher. 
+Note that RSMTool only works with Python >=3.6.
 
 Installing with conda
 ----------------------
 
 Currently, the recommended way to install RSMTool is by using the ``conda`` package manager. If you have already installed ``conda``, you can skip straight to Step 2.
 
-1. To install ``conda``, follow the instructions on `this page <https://conda.io/docs/install/quick.html>`_.
+1. To install ``conda``, follow the instructions on `this page <https://conda.io/projects/conda/en/latest/user-guide/install/index.html>`_. 
 
 2. Create a new conda environment (say, ``rsmtool``) and install the RSMTool conda package by running::
 
     conda create -n rsmtool -c conda-forge -c desilinguist python=3.6 rsmtool
 
-3. Activate this conda environment by running ``source activate rsmtool`` (``activate rsmtool`` on windows). You should now have all of the RSMTool command-line utilities in your path.
+3. Activate this conda environment by running ``conda activate rsmtool``. You should now have all of the RSMTool command-line utilities in your path. [#]_
 
 4. From now on, you will need to activate this conda environment whenever you want to use RSMTool. This will ensure that the packages required by RSMTool will not affect other projects.
 
@@ -33,6 +33,7 @@ Note that if you are on macOS, you will need to have the following line in your
 
     export MPLBACKEND=Agg
 
-.. note::
-
-    Currently the `statsmodels` PyPI package seems to be broken on Windows, so `pip` installation on Windows may not work. If you are using Windows, use `conda` to install RSMTool by following the instructions above.
+
+.. rubric:: Footnotes
+
+.. [#] For older versions of conda, you may have to do ``source activate rsmtool`` on Linux/macOS and ``activate rsmtool`` on Windows.
diff --git a/doc/index.rst b/doc/index.rst
@@ -15,7 +15,7 @@ Rater Scoring Modeling Tool (RSMTool)
 
     .. image:: spacer.png
 
-Automated scoring of written and spoken responses is a growing field in educational natural language processing. Automated scoring engines employ machine learning models to predict scores for such responses based on features extracted from the text/audio of these responses. Examples of automated scoring engines include `Project Essay Grade <https://pegwriting.com/>`_ for written responses and `SpeechRater <https://www.ets.org/research/topics/as_nlp/speech/>`_ for spoken responses.
+Automated scoring of written and spoken responses is a growing field in educational natural language processing. Automated scoring engines employ machine learning models to predict scores for such responses based on features extracted from the text/audio of these responses. Examples of automated scoring engines include `MI Write <https://measurementinc.com/miwrite>`_ for written responses and `SpeechRater <https://www.ets.org/research/topics/as_nlp/speech/>`_ for spoken responses.
 
 RSMTool is a python package which automates and combines in a *single* :doc:`pipeline <pipeline>` multiple analyses that are commonly conducted when building and evaluating automated scoring models. The output of RSMTool is a comprehensive, customizable HTML statistical report that contains the outputs of these multiple analyses. While RSMTool does make it really simple to run this set of standard analyses using a single command, it is also fully customizable and allows users to easily exclude unneeded analyses, modify the standard analyses, and even include custom analyses in the report.
 

diff --git a/doc/intermediate_files_rsmeval.rst b/doc/intermediate_files_rsmeval.rst
@@ -136,7 +136,7 @@ Evaluations based on test theory
 Additional fairness analyses
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-These files contain the results of additional fairness analyses suggested in suggested in `Loukina, Madnani, & Zechner, 2019 <https://aclweb.org/anthology/papers/W/W19/W19-4401/>`_. 
+These files contain the results of additional fairness analyses suggested in suggested in `Loukina, Madnani, & Zechner, 2019 <https://www.aclweb.org/anthology/W19-4401/>`_. 
 
 - ``<METRICS>_by_<SUBGROUP>.ols``: a serialized object of type ``pandas.stats.ols.OLS`` containing the fitted model for estimating the variance attributed to a given subgroup membership for a given metric. The subgroups are defined by the :ref:`configuration file<subgroups_eval>`. The metrics are ``osa`` (overall score accuracy), ``osd`` (overall score difference), and ``csd`` (conditional score difference). 
 

diff --git a/doc/intermediate_files_rsmtool.rst b/doc/intermediate_files_rsmtool.rst
@@ -260,7 +260,7 @@ Evaluations based on test theory
 Additional fairness analyses
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-These files contain the results of additional fairness analyses suggested in suggested in `Loukina, Madnani, & Zechner, 2019 <https://aclweb.org/anthology/papers/W/W19/W19-4401/>`_. 
+These files contain the results of additional fairness analyses suggested in suggested in `Loukina, Madnani, & Zechner, 2019 <https://www.aclweb.org/anthology/W19-4401/>`_. 
 
 - ``<METRICS>_by_<SUBGROUP>.ols``: a serialized object of type ``pandas.stats.ols.OLS`` containing the fitted model for estimating the variance attributed to a given subgroup membership for a given metric. The subgroups are defined by the :ref:`configuration file<subgroups_rsmtool>`. The metrics are ``osa`` (overall score accuracy), ``osd`` (overall score difference), and ``csd`` (conditional score difference). 
 

diff --git a/doc/release_process.rst b/doc/release_process.rst
@@ -3,37 +3,35 @@ RSMTool Release Process
 
 This process is only meant for the project administrators, not users and developers.
 
-1. Run ``tests/update_files.py`` to make sure that all test data in the new release have correct experiment ids and filenames. If any (non-model) files need to be changed this should be investigated before the branch is released. 
+1. Run the ``tests/update_files.py`` script with the appropriate arguments to make sure that all test data in the new release have correct experiment ids and filenames. If any (non-model) files need to be changed this should be investigated before the branch is released. 
 
 2. Create a release branch on GitHub.
 
 3. In that release branch, update the version numbers in ``version.py``, update the conda-recipe, and update the README, if necessary. You should also run `make linkcheck` on the documentation to fix and update any broken/redirected links.
 
 4. Upload source and wheel packages to PyPI using ``python setup.py sdist upload`` and ``python setup.py bdist_wheel upload``
 
-5. Build the new conda package locally on your mac using the following command (*Note*: you may have to replace the contents of the ``requirements()`` function in ``setup.py`` with a ``pass`` statement to get ``conda build`` to work)::
+5. Build the new generic conda package locally on your mac using the following command::
 
-    conda build -c defaults -c conda-forge --python=3.6 --numpy=1.14 rsmtool
+    conda build -c conda-forge rsmtool
 
-6. Convert the package for both linux and windows::
+6. Upload the built package to anaconda.org using ``anaconda upload --user ets <package tarball>``.
 
-    conda convert -p win-64 -p linux-64 <mac package tarball>
+7. Create pull requests on the `rsmtool-conda-tester <https://github.com/EducationalTestingService/rsmtool-conda-tester/>`_ and `rsmtool-pip-tester <https://github.com/EducationalTestingService/rsmtool-pip-tester/>`_ repositories to test the conda and PyPI packages on Linux and Windows.
 
-7. Upload each of the packages to anaconda.org using ``anaconda upload <package tarball>``.
+8. Draft a release on GitHub while the Linux and Windows builds are running.
 
-8. Create pull requests on the `rsmtool-conda-tester <https://github.com/EducationalTestingService/rsmtool-conda-tester/>`_ and `rsmtool-pip-tester <https://github.com/EducationalTestingService/rsmtool-pip-tester/>`_ repositories to test the conda and PyPI packages on Linux and Windows.
+9. Once both builds have passed, make a pull request with the release branch to be merged into ``master`` and request code review.
 
-9. Draft a release on GitHub while the Linux and Windows builds are running.
+10. Once the build for the PR passes and the reviewers approve, merge the release branch into ``master``.
 
-10. Once both builds have passed, make a pull request with the release branch to be merged into ``master`` and request code review.
+11. Make sure that the ReadTheDocs build for ``master`` passes.
 
-11. Once the build for the PR passes and the reviewers approve, merge the release branch into ``master``.
+12. Tag the latest commit in ``master`` with the appropriate release tag and publish the release on GitHub.
 
-12. Make sure that the RTFD build for ``master`` passes.
+13. Make another PR to merge ``master`` branch into ``stable`` so that the ``stable`` ReadTheDocs build always points to the latest release.
 
-13. Tag the latest commit in ``master`` with the appropriate release tag and publish the release on GitHub.
-
-14. Do an accompanying release of RSMExtra (only needed for ETS users).
+14. Update the CI plan for RSMExtra (only needed for ETS users) to use this newly built RSMTool conda package. Do any other requisite changes for RSMExtra. Once everything is done, do a release of RSMExtra.
 
 15. Update the RSMTool conda environment on the ETS linux servers with the latest packages for both RSMTool and RSMExtra.
 

diff --git a/doc/tutorial.rst b/doc/tutorial.rst
@@ -130,4 +130,4 @@ Next, you should read the detailed documentation on :ref:`rsmtool <usage_rsmtool
 
 .. rubric:: References
 
-.. [#] Attali, Y., & Burstein, J. (2006). Automated essay scoring with e-rater® V.2. Journal of Technology, Learning, and Assessment, 4(3). https://ejournals.bc.edu/ojs/index.php/jtla/article/download/1650/1492
+.. [#] Attali, Y., & Burstein, J. (2006). Automated essay scoring with e-rater® V.2. Journal of Technology, Learning, and Assessment, 4(3). https://ejournals.bc.edu/index.php/jtla/article/download/1650/1492
diff --git a/doc/who.rst b/doc/who.rst
@@ -5,7 +5,7 @@ Who is RSMTool for?
 
 We expect the primary users of RSMTool to be researchers working on developing new automated scoring engines or on improving existing ones. Here's the most common scenario.
 
-A group of researchers already *has* a set of responses such as essays or recorded spoken responses which have already been assigned numeric scores by human graders. They have also processed these responses and extracted a set of (numeric) features using systems such as `Coh-Metrix <http://cohmetrix.com/>`_, `TextEvaluator <https://textevaluator.ets.org/TextEvaluator/>`_, `OpenSmile <https://audeering.com/research/opensmile/>`_, or using their own custom text/speech processing pipeline. They wish to understand how well the set of chosen features can predict the human score.
+A group of researchers already *has* a set of responses such as essays or recorded spoken responses which have already been assigned numeric scores by human graders. They have also processed these responses and extracted a set of (numeric) features using systems such as `Coh-Metrix <http://cohmetrix.com/>`_, `TextEvaluator <https://textevaluator.ets.org/TextEvaluator/>`_, `OpenSmile <https://www.audeering.com/opensmile/>`_, or using their own custom text/speech processing pipeline. They wish to understand how well the set of chosen features can predict the human score.
 
 They can then run an RSMTool "experiment" to build a regression-based scoring model (using one of many available regressors) and produce a report. The report includes descriptive statistics for all their features, diagnostic information about the trained regression model, and a comprehensive evaluation of model performance on a held-out set of responses.
 

diff --git a/environment.yml b/environment.yml
@@ -1,25 +1,10 @@
 channels:
-  - defaults
   - conda-forge
   - desilinguist
 dependencies:
-  - python
-  - ipython=6.5.0
-  - jupyter=1.0.0
-  - joblib=0.11
-  - matplotlib=2.1.2
-  - nose=1.3.7
-  - notebook=5.7.2
-  - numpy
-  - pandas
-  - scipy
-  - seaborn
+  - python=3.6
+  - ipython=7.9.0
+  - notebook=6.0.1
+  - numpy=1.14.6
   - skll=1.5.2
-  - statsmodels
-  - coverage
-  - openpyxl
-  - parameterized
-  - sphinx
-  - sphinx_rtd_theme
-  - xlrd
-  - xlwt
+  - statsmodels=0.10.1
diff --git a/rsmtool/fairness_utils.py b/rsmtool/fairness_utils.py
@@ -8,17 +8,15 @@
 :organization: ETS
 """
 
-import pandas as pd
 import pickle
-import numpy as np
-
 from os.path import join
 
+import numpy as np
+import pandas as pd
 import statsmodels.formula.api as smf
-from statsmodels.stats.anova import anova_lm
-
-from rsmtool.writer import DataWriter
 from rsmtool.container import DataContainer
+from rsmtool.writer import DataWriter
+from statsmodels.stats.anova import anova_lm
 
 
 def convert_to_ordered_category(group_values, base_group=None):
@@ -169,7 +167,7 @@ def get_fairness_analyses(df,
                           human_score_column='sc1',
                           base_group=None):
     """Compute fairness analyses described
-    in `Loukina et al. 2019 <https://aclweb.org/anthology/papers/W/W19/W19-4401/>`_.
+    in `Loukina et al. 2019 <https://www.aclweb.org/anthology/W19-4401/>`_.
     The functions computes how much variance group membership explains in
     overall score accuracy (osa), overall score difference (osd),
     and conditional score difference (csd).