Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
tokenize.lua removes sentences containing special characters #316
when I use tokenize.lua to process corpus, I have got this result:
after I checked, I found that was a special character (�) caused the problem.
please see attchment file s1.txt