Skip to content

QuantGov/estimator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The QuantGov Estimator

Official QuantGov Estimators

This repository is for those who would like to create new datasets using the QuantGov platform. If you would like to find data that has been produced using the QuantGov platform, please visit http://www.quantgov.org/data.

This repository contains all official QuantGov estimators, with each estimator stored in its own branch.

The Generic Estimator

The master branch of this repository is the Generic Estimator, which evaluates and trains a Random Forests Classifier. By default, the create_labels.py script generates a random label of True or False for every document; you should modify this script to use the label or labels you are actually interested in.

This estimator uses a scikit-learn CountVectorizer to vectorize training documents as a preprocessing step. In many cases, it will be useful to modify the default parameters; see the Scikit-learn documentation for details. If vectorization will be include information about the final classes, it is necessary to move the vectorization step into the candidate model pipeline for correct cross-validation results.

Candidate models are defined in scripts\models.py. Parameters follow the naming convention for scikit-learn grid search; see the scikit-learn documentation for details.

The generic estimator will use the training corpus to exhaustively evaluate each combination of parameters for each candidate model, and output the results to data/model_evaluation.csv. The best scoring model will be suggested in the data/model.config file, but users can change the parameters or model based on the evaluation results (for example, using the one-standard-error rule).

Using this Estimator

To use or modify this estimator, clone it using git or download the archive from the QuantGov Site and unzip it on your computer.

Requirements

Using this estimator requires Python >= 3.4 and the make utility.

If you are using the Anaconda Python distribution (recommended), navigate to the estimator folder and use the command conda install --file conda-requirements.txt, then the command pip install -r requirements.txt. If you are on windows, also use the command conda install --file conda-requirements-windows.txt, which will install the make utility.

If you are not using Anaconda, use the command pip install requirements.txt. You must ensure that make is install separately.

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages