try this https://archive.ics.uci.edu/ml/datasets/Sentence+Classification
https://archive.ics.uci.edu/ml/datasets/Sentiment+Labelled+Sentences
try this one too https://archive.ics.uci.edu/ml/datasets/Phishing+Websites
Check out the readme in the smspam project.
Grabbed this: UCI Irvine Machine Learning Repo data set for spam: SMS Spam Collection Data Set
go to smspam and run go run script/build_script.go(there is a readme there too)... also need to change a file path in
func init() {
//path from the root of text api...
//environ = ParseEnv("production", "config.toml")
environ = ParseEnv("production", "smspam/config.toml")
}from "smspam/config.toml" to "config.toml" in order to get it to work.. until i fix stupid config file path issues.
copied enron data http://www2.aueb.gr/users/ion/data/enron-spam/ to spam/not_spam:
cp spam/* $mywork/text_api/smspam/build_data/training/spam
cp ham/* $mywork/text_api/smspam/build_data/training/not_spamthen i put in in my x/training data. must ignore this data for github