Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
cpragadeesh committed Aug 26, 2017
1 parent c004fac commit 1ac8f73
Showing 1 changed file with 14 additions and 6 deletions.
20 changes: 14 additions & 6 deletions src/rescore/lua/README.md
Expand Up @@ -4,25 +4,25 @@ Run corpus_test.lua to test emails and generate log. These logs can be used for

Example:

`lua corpus_test.lua -a path/to/ham/dir -s path/to/spam/dir -o results.log`
`rspamadm lua corpus_test.lua -a -a -a path/to/ham/dir -a -s -a path/to/spam/dir -a -o -a results.log`

Use -h option to get more info about usage.

Log file consists of one line per email. It contains actual email_type, score, action, symbols in that order.

# STATISTICS

Use statistics.lua to infer useful information from the log file generated in previous step. For generating statistics specify spam threshold score using -t. Feed in log file using input redirection.
Use statistics.lua to infer useful information from the log file generated in previous step. Use -t option to specify spam threshold.

### Example:

`rspamadm lua statistics.lua -a path/to/log/file -a -t -a 15`

Use `rspamadm lua statistics.lua -a -h` to get more info about usage

Statistics contains two different information - File stats and symbol stats.
Statistics contains two different information - Corpus statistics and Symbols statistics

### File stats:
### Corpus Statistics

**Number of emails**: Number of emails read from log
**Number of spam**: Number of spam emails read from log
Expand All @@ -33,7 +33,7 @@ Statistics contains two different information - File stats and symbol stats.
**False negative rate**: Percentage of spam emails that were falsely classified as ham
**Overall Accuracy**: Overall accuracy of classification

### Symbol stats:
### Symbol Statistics

Each line presents statistics about a symbol read from the log.

Expand All @@ -46,12 +46,20 @@ Each line presents statistics about a symbol read from the log.

# Rescoring

Use rescore.lua on logs generated from corpus-test to find optimal symbol scores using perceptron.
Use rescore.lua on logs generated from corpus-test to find optimal symbol scores using perceptron. Use -o option to dump new scores in json format.

### Example:

rspamadm lua -a -l -a path/to/log/dir -a --diff -a -o -a new.scores


# Example usage

1. Collect ham and spam messages and store them in /ham and /spam directories respectively.
2. Run `rspamadm lua corpus_test.lua -a -a -a path/to/ham/dir -a -s -a path/to/spam/dir -a -o -a results.log`
3. Make a directory for logs files `mkdir logs`
4. Move log files into logs directory `mv results.log logs/`
5. Run `rspamadm lua -a -l -a logs -a --diff -a -o -a new.scores`



0 comments on commit 1ac8f73

Please sign in to comment.