Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MalletCRFStringOutcomeDataWriter ignore non-string value silently #407

Open
bethard opened this issue Apr 15, 2015 · 1 comment
Open

MalletCRFStringOutcomeDataWriter ignore non-string value silently #407

bethard opened this issue Apr 15, 2015 · 1 comment

Comments

@bethard
Copy link
Contributor

bethard commented Apr 15, 2015

Original issue 409 created by ClearTK on 2014-09-10T21:20:41.000Z:

This is affecting all ClearTk version.

MalletCRFStringOutcomeDataWriter does not write the numerical or boolean values of Features. I am referring to this piece of code in ClearTk's MalletCRFStringOutcomeDataWriter.

@OverRide
68 public void writeEncoded(List<NameNumber> features, String outcome) {
69 for (NameNumber nameNumber : features) {
70 this.trainingDataWriter.print(nameNumber.name);
71 this.trainingDataWriter.print(" ");
72 }
73
74 this.trainingDataWriter.print(outcome);
75 this.trainingDataWriter.println();
76 }

Note that this is not totally obvious from this piece of code but for Features of String type, the nameNumber.name field contains the encoded value with the name whereas for any other type (e.g. Boolean, Number, etc) the field nameNumber.name contains only the Feature name and not the value.

I don't see a good reason for not encoding integer and boolean values. At a minimum, there should be an exception thrown when such value type is handled.

@bethard
Copy link
Contributor Author

bethard commented Apr 15, 2015

Comment #1 originally posted by ClearTK on 2014-11-05T13:08:14.000Z:

If nothing else, MalletCRFStringOutcomeDataWriter should throw an exception to inform the user that non-String values aren't supported. An alternative would be to convert numbers into Strings and pass them on to Mallet, but I'm not confident that would do the sensible thing for, say, doubles.

@bethard bethard modified the milestone: 2.1 Apr 16, 2015
@reckart reckart added 🐛 Bug Something isn't working and removed Type-Defect labels Nov 4, 2022
@reckart reckart modified the milestones: 2.1.0, 🐛 Bug backlog Nov 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants