Skip to content

Latest commit

 

History

History
192 lines (140 loc) · 7.99 KB

test_suite_tools.md

File metadata and controls

192 lines (140 loc) · 7.99 KB

Test Suite Tools

Methodology

prix-fixe makes use of three TestSuite variants:

  • TesteSuite - this is a suite with all of the cart fields removed. It serves as input to a natural language processor that is under evaluation.
  • ValidationSuite - this is a TestSuite that also includes the expected carts. It serves as both an answer key, and the format by which a natural language processor returns its results. ValidationSuites are sometimes used to provide training examples.
  • ScoredSuite - this is a ValidationSuite, marked up with scoring information. The scoring information comes from comparing a ValidationSuite of expected carts with another containing carts produced by a natural language processor.

Scoring markup includes information about three measures:

  • perfect - whether the expected and observed carts match perfectly
  • complete - whether the expected and observed carts contain the same products in different arrangments
  • repair cost - the sequence of steps required to convert the observed cart into the expected cart.

Please see Measures and Repair Cost for more information on scoring. See Test Suite Format for more information on the test suite file format.

The testing workflow involves two personas:

  • Author - typically a data scientist who curates a collection of test cases in a ValidationSuite.
  • Candidate - the natural language processing system being evaluated.

The following diagram shows the testing workflow.

Workflow

Workflow

  1. Test author produces a ValidationSuite that provides the inputs (either transcriptions or links to audio files) and the expected carts. This could be a hand-authored regression suite or it could be a set of cases curated from labeled data collected from real-world scenarios.
  2. Use the filter-suite.js tool to strip the carts from the ValidationSuite to produce a TestSuite.
  3. Candidate System uses natural languaging processing to annotate the TestSuite with proposed Carts, producing a new ValidationSuite.
  4. Use the evaluate.js tool to compare the original ValidationSuite, containing the expected carts, with the Candidate's ValidationSuite that contains observed carts. This process annotates the Candidate's suite with Measures, producing a ScoredSuite.

Filter Suite Tool

$ filter -h

Test suite filter

  This utility filters carts, transcriptions, audio, and entire test cases from 
  a supplied test suite.                                                        

Usage

  node filter-suite.js <input file> <output file> [...options] 

Options

  -a, --a               Remove the audio field from each turn.                  
  -c, --c               Remove the cart field from each step.                   
  -t, --t               Remove the transcription field from each turn.          
  -s, --s suiteFilter   Boolean expression of suites to retain. Can use suite   
                        names, !, &, |, and parentheses. Default is to retain   
                        all cases.                                              
  -h, --help            Print help message                                      


$ filter samples/tests/expected.yaml temp/test.yaml -c
Reading suite from samples/tests/expected.yaml
Removing cart field from each Step.
Writing filtered suite to temp/test.yaml
Filtering complete

Evaluate Tool

$ evaluate samples/tests/expected.yaml samples/tests/observed.yaml -x -v
Comparing
  expected validation suite: samples/tests/expected.yaml
  observed validation suite: samples/tests/observed.yaml
 
Computing repair cost with menu files from samples/menu.
 
---------------------------------------
2: Product SKU is wrong because generic product is wrong.
  step 0: NEEDS REPAIRS
    employee: ok i've added a tall latte no foam with two pumps of vanilla and an apple bran muffin warmed
 
      1 tall mocha (801)                           801
        1 no foam (5200)                          5200
        2 vanilla syrup (2502)                    2502
      1 apple bran muffin (10000)                10000
        1 warmed (200)                             200
 
    id(23): delete item(tall mocha)
    id(28): insert default item(grande latte)
    id(28): change item(grande latte) attribute "grande" to "tall"
      id(29): insert default item(vanilla syrup)
      id(29): make item(vanilla syrup) quantity 2
      id(30): insert default item(foam)
      id(30): change item(foam) attribute "regular" to "no"
 
---------------------------------------
3: Product SKU is wrong because one or more attributes are wrong.
  step 0: NEEDS REPAIRS
    employee: ok i've added a tall latte no foam with two pumps of vanilla and an apple bran muffin warmed
 
      1 iced venti latte (605)                     605
        1 no foam (5200)                          5200
        2 vanilla syrup (2502)                    2502
      1 apple bran muffin (10000)                10000
        1 warmed (200)                             200
 
    id(33): change item(iced venti latte) attribute "iced" to "hot"
    id(33): change item(iced venti latte) attribute "venti" to "tall"
 
---------------------------------------
4: Product quantity is wrong
  step 0: NEEDS REPAIRS
    employee: ok i've added a tall latte no foam with two pumps of vanilla and an apple bran muffin warmed
 
      5 tall latte (601)                           601
        1 no foam (5200)                          5200
        2 vanilla syrup (2502)                    2502
      1 apple bran muffin (10000)                10000
        1 warmed (200)                             200
 
    id(43): change item(tall latte) quantity to 1
 
---------------------------------------
6: Option SKU wrong because generic option is wrong.
  step 0: NEEDS REPAIRS
    employee: ok i've added a tall latte no foam with two pumps of vanilla and an apple bran muffin warmed
 
      1 tall latte (601)                           601
        1 no foam (5200)                          5200
        2 cinnamon syrup (1902)                   1902
      1 apple bran muffin (10000)                10000
        1 warmed (200)                             200
 
      id(64): delete item(cinnamon syrup)
      id(69): insert default item(vanilla syrup)
      id(69): make item(vanilla syrup) quantity 2
 
---------------------------------------
7: Option SKU wrong because one or more attributes are wrong.
  step 0: NEEDS REPAIRS
    employee: ok i've added a tall latte no foam with two pumps of vanilla and an apple bran muffin warmed
 
      1 tall latte (601)                           601
        1 extra foam (5203)                       5203
        2 vanilla syrup (2502)                    2502
      1 apple bran muffin (10000)                10000
        1 warmed (200)                             200
 
      id(75): change item(extra foam) attribute "extra" to "no"
 
---------------------------------------
8: Option quantity wrong.
  step 0: NEEDS REPAIRS
    employee: ok i've added a tall latte no foam with two pumps of vanilla and an apple bran muffin warmed
 
      1 tall latte (601)                           601
        1 no foam (5200)                          5200
        5 vanilla syrup (2502)                    2502
      1 apple bran muffin (10000)                10000
        1 warmed (200)                             200
 
      id(84): change item(vanilla syrup) quantity to 2
 
---------------------------------------
Repair algorithm: Menu-based repairs, createWorld
Total test cases: 9
Total steps: 9
Perfect carts: 1/9 (11.1%)
Complete carts: 3/9 (33.3%)
Repaired carts: 6/9 (66.7%)
Total repairs: 15
Repairs/Step: 1.67
 
Case pass rate by suite:
  sample: 3/9
  
  Total failed cases: 6
  Overall pass rate: 3/9 (0.333)
---------------------------------------


Scoring complete