Update README.md

cpragadeesh · Aug 26, 2017 · c004fac · c004fac
1 parent 6f1c73d
commit c004fac
Showing 1 changed file with 2 additions and 76 deletions.
diff --git a/src/rescore/README.md b/src/rescore/README.md
@@ -1,77 +1,3 @@
-# Requirements
+# Corpus testing and Symbol score generation module
 
-Python Dependencies:  
-       * Python version: 2.7+  
-       * requests  
-
-# CORPUS TESTING
-
-Run corpus-test.py to test emails and generate log.
-
-Example:
-
-	./corpus-test.py --ham test-ham --spam test-spam -o test.log
-
-Use ./corpus-test.py -h to get more info about usage.
-
-Log file consists of one line per email. It contains filename, actual email type, score, action, symbols in that order.
-
-
-### TESTING SETTINGS
-
-"test.conf" file is used as default config file corpus-testing. If you would like to use a custom config file, use -c option.
-
-NOTE: Enclose the value for settings in triple quotes
-
-Example 'test.conf' for disabled symbol group "encryption":
-```
-	{
-		"Settings" : '''{groups_disabled=["encryption"]}'''
-	}
-```
-
-
-NOTE: You might encounter shutdown exception at times. This is a known python 2.7 error. However, it doesnt affect the results.
-(http://bugs.python.org/issue14623)
-
-
-# STATISTICS
-
-Use statistics.py to infer useful information from the log file generated in previous step. For generating statistics specify spam threshold score using -t. Feed in log file using input redirection.
-
-### Example:
-
-	./statistics.py -t 10 < test.log > stats.log
-
-Use ./statistics.py -h to get more info about usage
-
-Statistics contains two different information - File stats and symbol stats.
-
-### File stats:
-
-**Number of emails**: Number of emails read from log  
-**Number of spam**: Number of spam emails read from log  
-**Number of ham**: Number of ham emails read from log  
-**Spam percentage**: Percentage of spam emails read from log  
-**Ham percentage**: Percentage of ham emails read from log  
-**False positive rate**: Percentage of ham emails that were falsely classified as spam  
-**False negative rate**: Percentage of spam emails that were falsely classified as ham  
-
-### Symbol stats:
-
-Each line presents statistics about a symbol read from the log.  
-
-**Overall**: % of emails hit by a symbol  
-**Spam**: % of spam emails hit by a symbol  
-**Ham**: % of ham emails hit by a symbol  
-**S/O**: % spam emails hit over all its hits  
-	   (i.e What is the probability that it hits a spam message when it is fired)  
-
-
-# Rescoring
-
-Use rescore.py on logs generated from corpus-test to find optimal symbol scores using perceptron.
-
-### Example:
-
-	./rescore.py -l logs/ -r 0.001 -e 500 -o scores.txt
+Use these scripts for generating best symbol scores from you spam/ham corpus. Lua version is written using Lua + torch, python version is written using Python + Scikit. You can also use these scripts of generating statistics of corpus, symbols. Find readme on how to use it inside python/lua folders