Sanitize filename and convert file content to UTF-8 of uploaded files #338

Closed
jerboaa opened this Issue May 7, 2011 · 22 comments

Comments

Projects
None yet
7 participants
@jerboaa
Member

jerboaa commented May 7, 2011

We seem to have several problems related to bad filenames and messed up character encodings. I think it would be a good idea to implement the following:

  1. Filenames get sanitized: new_filename = old_filename.gsub(/[^a-zA-Z0-9.-]/,'') I.e. replace all characters which aren't ASCII-alphanumeric, dash, dot or underscore. I think we are doing this already. Maybe we need to spell things out more explicitly in order to avoid accents, etc. in filenames. I'm not sure how ruby's \w character class is defined nor if [a-z] would match an accented e.
  2. Convert the file content of uploaded files as they are uploaded to UTF-8 (at least make an attempt). I think we won't lose here, since we don't do any conversion at the moment and with the conversion in place we'd catch a few more encoding related errors.
  3. Make sure we update JavaScript code of the student's file-manager code in order to account for changed file sanitation.

For 2. we should probably use the cmess gem [1]. What do you think?

[1] http://prometheus.rubyforge.org/cmess/

@NelleV

This comment has been minimized.

Show comment
Hide comment
@NelleV

NelleV May 7, 2011

Member

+1 on 1.

-1 on 2. Encoding can be tricky, and may lead to almost permanent damage to the file.
IMO, guessing the encoding is not an option, as highly unreliable (which excludes cmess)

For 3., well if we implement 1., we don't have much of a choice :)

Member

NelleV commented May 7, 2011

+1 on 1.

-1 on 2. Encoding can be tricky, and may lead to almost permanent damage to the file.
IMO, guessing the encoding is not an option, as highly unreliable (which excludes cmess)

For 3., well if we implement 1., we don't have much of a choice :)

@jerboaa

This comment has been minimized.

Show comment
Hide comment
@jerboaa

jerboaa May 8, 2011

Member

Nelle, there is no way to determine the character encoding and be 100% correct, unless the user knows exactly which encoding she is using and we'd ask prior to uploading. I'm pretty sure most of the time users won't know the encoding they are using to store the file.

The idea was that it could solve some more of our encoding problems, but not introduce more. I guess there is no good way verify this, other than running a pilot or some thing like that. I might be wrong.

What are our other options? Just leave it as is? Do you have ideas?

On a related note, even if we would convert encodings, that doesn't help for setups where SVN is used for submitting files. Hrm.

Member

jerboaa commented May 8, 2011

Nelle, there is no way to determine the character encoding and be 100% correct, unless the user knows exactly which encoding she is using and we'd ask prior to uploading. I'm pretty sure most of the time users won't know the encoding they are using to store the file.

The idea was that it could solve some more of our encoding problems, but not introduce more. I guess there is no good way verify this, other than running a pilot or some thing like that. I might be wrong.

What are our other options? Just leave it as is? Do you have ideas?

On a related note, even if we would convert encodings, that doesn't help for setups where SVN is used for submitting files. Hrm.

@jerboaa

This comment has been minimized.

Show comment
Hide comment
@jerboaa

jerboaa May 8, 2011

Member

This isn't going to make it into 0.10.0, resetting milestone.

Member

jerboaa commented May 8, 2011

This isn't going to make it into 0.10.0, resetting milestone.

@NelleV

This comment has been minimized.

Show comment
Hide comment
@NelleV

NelleV May 8, 2011

Member

Unfortuntately, there are no ways to determine the character encoding 100%. If the encoding is guessed incorrectly, then it leads to Mojibake: http://en.wikipedia.org/wiki/Mojibake

The most common encodings for file are (but it depends on the setting of the text editors):
Windows: a type of unicode: UCS2 I believe,
Mac OS X: utf-8
other: depends on the local

When it comes to dealing with encoding of filenames, I know python uses a sys.getfilesystemencoding(), which returns:
Windows: the codepage
Mac OS X : utf-8
Linux: the local

In most cases, there should be a codepage somewhere in the file (Windows is not very reliable on that).

I was also afraid of the svn case. One potential solution would be to do the decoding/encoding when displaying the file. Python converts to unicode pretty efficiently, so hopefully, ruby does too, and that would allow use to implement some "options" for the corrector to choose manually the encoding necessary (and then save that encoding in the database), and would not "damage" the file.

Member

NelleV commented May 8, 2011

Unfortuntately, there are no ways to determine the character encoding 100%. If the encoding is guessed incorrectly, then it leads to Mojibake: http://en.wikipedia.org/wiki/Mojibake

The most common encodings for file are (but it depends on the setting of the text editors):
Windows: a type of unicode: UCS2 I believe,
Mac OS X: utf-8
other: depends on the local

When it comes to dealing with encoding of filenames, I know python uses a sys.getfilesystemencoding(), which returns:
Windows: the codepage
Mac OS X : utf-8
Linux: the local

In most cases, there should be a codepage somewhere in the file (Windows is not very reliable on that).

I was also afraid of the svn case. One potential solution would be to do the decoding/encoding when displaying the file. Python converts to unicode pretty efficiently, so hopefully, ruby does too, and that would allow use to implement some "options" for the corrector to choose manually the encoding necessary (and then save that encoding in the database), and would not "damage" the file.

@jerboaa

This comment has been minimized.

Show comment
Hide comment
@jerboaa

jerboaa May 8, 2011

Member

I'm not sure if you are suggesting to convert encodings now. Previously it seemed you were against it. If we decide to do any kind of encoding conversion (Ruby has Iconv.iconv), key is to pass the right source encoding. And that is the most difficult part. That's why I suggested to use cmess in order to make an educated guess (as compared to cook our own stew).

I think cmess would guess the most common encodings correctly (as you describe them above). It would even do some cleanup if some file is already double UTF-8 encoded (redundant UTF-8 sequence). I understand this bares some risk, but at the moment we don't do anything in that regard and this causes problems as well (see #16 for example). In my opinion it would be worthwhile to implement this as an optional feature (make it configurable to turn it on/off), so that we would be able to run a small proof of concept. Ideally, we'd have a pool of "problematic" files at development time in order to validate if this would make sense at all. Would you be able to collect some of those files from Nantes?

Re: svn: We are not storing files (their content) in the database. Storage is the svn repo. If MarkUs is configured to allow svn submissions, it will refuse to write to the svn repository (i.e. writing to svn is mutually exclusive). That would mean we'd have to do the conversion every time the file is read. If we present the marker with options as to which encoding to use for conversion, this might become painful. Not sure about this. I'm leaning towards converting on-the-fly and keeping the raw download option as is in order for the grader to be able to get the "original" encoded file if they wish.

Member

jerboaa commented May 8, 2011

I'm not sure if you are suggesting to convert encodings now. Previously it seemed you were against it. If we decide to do any kind of encoding conversion (Ruby has Iconv.iconv), key is to pass the right source encoding. And that is the most difficult part. That's why I suggested to use cmess in order to make an educated guess (as compared to cook our own stew).

I think cmess would guess the most common encodings correctly (as you describe them above). It would even do some cleanup if some file is already double UTF-8 encoded (redundant UTF-8 sequence). I understand this bares some risk, but at the moment we don't do anything in that regard and this causes problems as well (see #16 for example). In my opinion it would be worthwhile to implement this as an optional feature (make it configurable to turn it on/off), so that we would be able to run a small proof of concept. Ideally, we'd have a pool of "problematic" files at development time in order to validate if this would make sense at all. Would you be able to collect some of those files from Nantes?

Re: svn: We are not storing files (their content) in the database. Storage is the svn repo. If MarkUs is configured to allow svn submissions, it will refuse to write to the svn repository (i.e. writing to svn is mutually exclusive). That would mean we'd have to do the conversion every time the file is read. If we present the marker with options as to which encoding to use for conversion, this might become painful. Not sure about this. I'm leaning towards converting on-the-fly and keeping the raw download option as is in order for the grader to be able to get the "original" encoded file if they wish.

@NelleV

This comment has been minimized.

Show comment
Hide comment
@NelleV

NelleV May 8, 2011

Member

I'm not suggesting to convert the encodings of the original files (ie read the file, decode/encode it then write it) but to do it a display time. This way, the grader can change the encoding himself, and not leave it to a algorithm to decide what to do.

Also, yes, indeed, cmess would probably guess the encoding correctly "most" of the encodings we use. But something that works in some cases and not all cases is not something I suggest doing: it will lead to more complication in the future, and it is just hiding the real problem which will be harder to spot and fix when we find one of those edge case.

Storage is indeed the svn repository: on the fly is the only solution acceptable. What I was suggesting is keeping the original encoding/the choice of encoding the author made by hand in the database, not the conversion.
We could easily do a "try except" display, and allow the corrector to choose among a list of encodings to try to decode the page, with maybe a suggested choice per default "guessed" with an algorithm or another.
Isn't there a way to read the codepage from a file ? That would be "safer" than guessing the encoding (even thought windows does not deal with codepage reliably).

The ruby gem chardet gives a probability of guessing the encoding reliably in addition to the encoding.

Do please take in account that we have a lot of foreign students at Centrale Nantes, which also means dealing with something pretty strange encodings if students work from home : I once had to deal with a computer in JIS encoding.

Member

NelleV commented May 8, 2011

I'm not suggesting to convert the encodings of the original files (ie read the file, decode/encode it then write it) but to do it a display time. This way, the grader can change the encoding himself, and not leave it to a algorithm to decide what to do.

Also, yes, indeed, cmess would probably guess the encoding correctly "most" of the encodings we use. But something that works in some cases and not all cases is not something I suggest doing: it will lead to more complication in the future, and it is just hiding the real problem which will be harder to spot and fix when we find one of those edge case.

Storage is indeed the svn repository: on the fly is the only solution acceptable. What I was suggesting is keeping the original encoding/the choice of encoding the author made by hand in the database, not the conversion.
We could easily do a "try except" display, and allow the corrector to choose among a list of encodings to try to decode the page, with maybe a suggested choice per default "guessed" with an algorithm or another.
Isn't there a way to read the codepage from a file ? That would be "safer" than guessing the encoding (even thought windows does not deal with codepage reliably).

The ruby gem chardet gives a probability of guessing the encoding reliably in addition to the encoding.

Do please take in account that we have a lot of foreign students at Centrale Nantes, which also means dealing with something pretty strange encodings if students work from home : I once had to deal with a computer in JIS encoding.

@benjaminvialle

This comment has been minimized.

Show comment
Hide comment
@benjaminvialle

benjaminvialle May 8, 2011

Member

I agree with Severin and Nelle on 1 : "Filenames get sanitized". This will correct all issues with shell commands (like the one converting the pdf/image to a good sized one).

For the second point, I was first thinking about an on-the-fly conversion to UTF-8. All files going to the SVN would be in UTF-8. It would closes several bugs, in particular the one we have with SyntaxHighlighter and non-utf-8 files. But it could bring some new bugs, in particular with svn collision system. Students should have to many conflicts when uploading new files. I guess we need to do some tests. And what about images/pdf ? And what about binary files or documents the user will not want to get them converted by MarkUs ?

For conversions, I don't know gems like Cmess. But I saw that Ruby have bindings for the iconv library, which I used to convert manually some students files in UTF-8. But, the bad point was I already knew the encoding I was converting from. But a gem could simplifies things. If we choose to use a gem, just be sure that this gem is compatible with Rails3 (and then, maybe, with Ruby1.9, which will be my next game after Rails3)

Edit: iconv ruby documentation: http://www.ruby-doc.org/stdlib/libdoc/iconv/rdoc/index.html

Member

benjaminvialle commented May 8, 2011

I agree with Severin and Nelle on 1 : "Filenames get sanitized". This will correct all issues with shell commands (like the one converting the pdf/image to a good sized one).

For the second point, I was first thinking about an on-the-fly conversion to UTF-8. All files going to the SVN would be in UTF-8. It would closes several bugs, in particular the one we have with SyntaxHighlighter and non-utf-8 files. But it could bring some new bugs, in particular with svn collision system. Students should have to many conflicts when uploading new files. I guess we need to do some tests. And what about images/pdf ? And what about binary files or documents the user will not want to get them converted by MarkUs ?

For conversions, I don't know gems like Cmess. But I saw that Ruby have bindings for the iconv library, which I used to convert manually some students files in UTF-8. But, the bad point was I already knew the encoding I was converting from. But a gem could simplifies things. If we choose to use a gem, just be sure that this gem is compatible with Rails3 (and then, maybe, with Ruby1.9, which will be my next game after Rails3)

Edit: iconv ruby documentation: http://www.ruby-doc.org/stdlib/libdoc/iconv/rdoc/index.html

@benjaminvialle

This comment has been minimized.

Show comment
Hide comment
@benjaminvialle

benjaminvialle May 8, 2011

Member

mmhhh, I was thinking that the on-the-fly conversion before insertion into SVN is not very reliable, as using directly SVN without the web interface will bypass this conversion… So maybe not the best solution.

Member

benjaminvialle commented May 8, 2011

mmhhh, I was thinking that the on-the-fly conversion before insertion into SVN is not very reliable, as using directly SVN without the web interface will bypass this conversion… So maybe not the best solution.

@jerboaa

This comment has been minimized.

Show comment
Hide comment
@jerboaa

jerboaa May 8, 2011

Member

For what it's worth, we should already be doing 1. (sanitation of filenames). For some odd reason there seem to be bugs (either related to Ruby not handling strings correctly, direct SVN submission related, or something else I'm missing).

Here is another good reference I found:
http://blog.grayproductions.net/articles/understanding_m17n

I also realized that we don't set the accept-charset attribute [1] on the HTML form tag in the file submissions form. We should probably set this in order to force the browser to send UTF-8 form data [2]. This seems to be the default in Rails 3 [3].

[1] http://www.w3.org/TR/html401/interact/forms.html#adef-accept-charset
[2] http://blog.twodividedbyzero.net/2009/11/adding-accept-charset-attribute-to-html.html
[3] http://edgeguides.rubyonrails.org/form_helpers.html

Member

jerboaa commented May 8, 2011

For what it's worth, we should already be doing 1. (sanitation of filenames). For some odd reason there seem to be bugs (either related to Ruby not handling strings correctly, direct SVN submission related, or something else I'm missing).

Here is another good reference I found:
http://blog.grayproductions.net/articles/understanding_m17n

I also realized that we don't set the accept-charset attribute [1] on the HTML form tag in the file submissions form. We should probably set this in order to force the browser to send UTF-8 form data [2]. This seems to be the default in Rails 3 [3].

[1] http://www.w3.org/TR/html401/interact/forms.html#adef-accept-charset
[2] http://blog.twodividedbyzero.net/2009/11/adding-accept-charset-attribute-to-html.html
[3] http://edgeguides.rubyonrails.org/form_helpers.html

@jerboaa

This comment has been minimized.

Show comment
Hide comment
@jerboaa

jerboaa May 8, 2011

Member

Nelle, the chardet gem seems to be doing the same thing as cmess is doing. Anyhow, those two would be candidates for suggesting some character encoding.

I like the idea of implementing something which eventually stores a source encoding in the DB to be used for on-view conversion. It's probably the least intrusive option.

Member

jerboaa commented May 8, 2011

Nelle, the chardet gem seems to be doing the same thing as cmess is doing. Anyhow, those two would be candidates for suggesting some character encoding.

I like the idea of implementing something which eventually stores a source encoding in the DB to be used for on-view conversion. It's probably the least intrusive option.

@NelleV

This comment has been minimized.

Show comment
Hide comment
@NelleV

NelleV May 8, 2011

Member

And the most painful to implement :p

Member

NelleV commented May 8, 2011

And the most painful to implement :p

@jerboaa

This comment has been minimized.

Show comment
Hide comment
@jerboaa

jerboaa May 8, 2011

Member

Reading more of [1] it looks like my hunch was right: In unicode ruby 1.8 the regular expression character class \w matches unicode character such as 'ö', whereas [0-9a-zA-Z] does not. Reproducer:

$ ruby -vKUe 'p "ö".scan(/\w/m)'
ruby 1.8.7 (2010-08-16 patchlevel 302) [i686-linux]
["ö"]

$ ruby -vKUe 'p "ö".scan(/[0-9a-zA-Z]/m)'
ruby 1.8.7 (2010-08-16 patchlevel 302) [i686-linux]
[]

I'm pretty sure this is why we still have filenames containing accented characters on the server, although we do filename sanitation (but use the \w class, which is too generic in this case).

[1] http://blog.grayproductions.net/articles/bytes_and_characters_in_ruby_18

Member

jerboaa commented May 8, 2011

Reading more of [1] it looks like my hunch was right: In unicode ruby 1.8 the regular expression character class \w matches unicode character such as 'ö', whereas [0-9a-zA-Z] does not. Reproducer:

$ ruby -vKUe 'p "ö".scan(/\w/m)'
ruby 1.8.7 (2010-08-16 patchlevel 302) [i686-linux]
["ö"]

$ ruby -vKUe 'p "ö".scan(/[0-9a-zA-Z]/m)'
ruby 1.8.7 (2010-08-16 patchlevel 302) [i686-linux]
[]

I'm pretty sure this is why we still have filenames containing accented characters on the server, although we do filename sanitation (but use the \w class, which is too generic in this case).

[1] http://blog.grayproductions.net/articles/bytes_and_characters_in_ruby_18

@NelleV

This comment has been minimized.

Show comment
Hide comment
@NelleV

NelleV May 10, 2011

Member

One other point "against" converting files before committing them into the SVN repositories is that some programming language expect some encoding definitions in the header of the file.

For example, Python and Ruby tolerate any ASCII compatible type of encoding, when it is specified in the header with:

# -*- coding: utf-8 -*-

If a developper chooses to encode his file in ISO-8859-1 while using non-ascii character, and we convert it to utf-8, the header will not be change, and the file will not be valid python anymore. This would be a problem when using the test framework.

Member

NelleV commented May 10, 2011

One other point "against" converting files before committing them into the SVN repositories is that some programming language expect some encoding definitions in the header of the file.

For example, Python and Ruby tolerate any ASCII compatible type of encoding, when it is specified in the header with:

# -*- coding: utf-8 -*-

If a developper chooses to encode his file in ISO-8859-1 while using non-ascii character, and we convert it to utf-8, the header will not be change, and the file will not be valid python anymore. This would be a problem when using the test framework.

@hoboman313

This comment has been minimized.

Show comment
Hide comment
@hoboman313

hoboman313 Feb 6, 2012

I've read through the discussion and your guys' solution seems to be:
1.) write the guessed file encoding to db on file upload using the cmess/chardet gem
2.) convert the file to utf-8 when the file gets viewed in graderview
3.) give the viewer the option to switch back to the source encoding in case cmess/chardet guessed wrong

I realize that some file sanitation is already implemented. Has anything been done relating to this in the last year and is there still work required for #1/#3?

I would like to work on this issue.

I've read through the discussion and your guys' solution seems to be:
1.) write the guessed file encoding to db on file upload using the cmess/chardet gem
2.) convert the file to utf-8 when the file gets viewed in graderview
3.) give the viewer the option to switch back to the source encoding in case cmess/chardet guessed wrong

I realize that some file sanitation is already implemented. Has anything been done relating to this in the last year and is there still work required for #1/#3?

I would like to work on this issue.

@jerboaa

This comment has been minimized.

Show comment
Hide comment
@jerboaa

jerboaa Feb 6, 2012

Member

@hoboman313 It's great to see your ambition. However, it would be good to not work on too many issues at any single time. At the moment there are 5 open issues[1] already assigned to you. Please finish them off before moving on to the next. I'm happy to assign this to you once the others are closed. If you are blocked on any of the ones you are assigned to, let the team know. Thanks!

[1] https://github.com/MarkUsProject/Markus/issues?assignee=hoboman313&state=open

Member

jerboaa commented Feb 6, 2012

@hoboman313 It's great to see your ambition. However, it would be good to not work on too many issues at any single time. At the moment there are 5 open issues[1] already assigned to you. Please finish them off before moving on to the next. I'm happy to assign this to you once the others are closed. If you are blocked on any of the ones you are assigned to, let the team know. Thanks!

[1] https://github.com/MarkUsProject/Markus/issues?assignee=hoboman313&state=open

@hoboman313

This comment has been minimized.

Show comment
Hide comment
@hoboman313

hoboman313 Feb 11, 2012

3 tasks currently assigned to me are finished and waiting on review.

I have tried to work on this task and have made some improvements; however, it is hard for me to make a submission as a student and then to see my submission in the grader view ( after the due date has passed ). I have logged #649 for this, but additional bugs may be required for the submission -> grading process to be working well enough for this task to be worked on.

3 tasks currently assigned to me are finished and waiting on review.

I have tried to work on this task and have made some improvements; however, it is hard for me to make a submission as a student and then to see my submission in the grader view ( after the due date has passed ). I have logged #649 for this, but additional bugs may be required for the submission -> grading process to be working well enough for this task to be worked on.

@ghost ghost assigned hoboman313 Feb 11, 2012

@jerboaa

This comment has been minimized.

Show comment
Hide comment
@jerboaa

jerboaa Feb 11, 2012

Member

I've assigned you this issue. Looking forward to what you'll come up with :) Please coordinate with other folks as in order to avoid duplicate work. I.e. if you discover a bug make sure to send email asking if anybody is working on it already before you attempt to fix it yourself. It should be easy enough to pull in some changes somebody has made on a feature branch.

Member

jerboaa commented Feb 11, 2012

I've assigned you this issue. Looking forward to what you'll come up with :) Please coordinate with other folks as in order to avoid duplicate work. I.e. if you discover a bug make sure to send email asking if anybody is working on it already before you attempt to fix it yourself. It should be easy enough to pull in some changes somebody has made on a feature branch.

@hoboman313

This comment has been minimized.

Show comment
Hide comment
@hoboman313

hoboman313 Mar 7, 2012

I have uploaded my first set of changes here: http://review.markusproject.org/r/1210/ Please read the description of the review request and let me know if I am on the right track.

  1. I would like to add a check for encoding whenever a user submits a file, so that he/she can be warned early about his/her use of bad encoding
  2. I am currently testing with the files provided by Benjamin here: #197
    but they all have a confidence of lower than 90% ( they all are at about 80-86% ). I believe the confidence is that low because the files are too short, but I may be wrong.

I have not yet had the time to test with large programming files to test the syntax highlighting after conversion. Can you guys recommend a program to easily convert and save documents in different encodings.

I have uploaded my first set of changes here: http://review.markusproject.org/r/1210/ Please read the description of the review request and let me know if I am on the right track.

  1. I would like to add a check for encoding whenever a user submits a file, so that he/she can be warned early about his/her use of bad encoding
  2. I am currently testing with the files provided by Benjamin here: #197
    but they all have a confidence of lower than 90% ( they all are at about 80-86% ). I believe the confidence is that low because the files are too short, but I may be wrong.

I have not yet had the time to test with large programming files to test the syntax highlighting after conversion. Can you guys recommend a program to easily convert and save documents in different encodings.

@benjaminvialle

This comment has been minimized.

Show comment
Hide comment
@benjaminvialle

benjaminvialle Nov 11, 2012

Member

Here is patch from @hoboman313 from RB: https://gist.github.com/4055292

Description was:
Since this is a rather controversial feature, I will upload it in parts so I can get some early feedback.

Work Done:
So far I have added the gem to guess encoding. Whenever a grader chooses to grade a submission, my code will cache the encoding and convert the file to utf-8 before it is shown in the graderview.
The encoding will only be written to databse if the gem is at least 90% sure <--- this might be too low...thoughts?

Work Left to Do:

  • the graderview still doesn't work for unknown encodings...for some reason the .rjs file tries to treat them like binary files and the webpage hangs.
  • add a warning for to the user if they upload a file with an unknown encoding
Member

benjaminvialle commented Nov 11, 2012

Here is patch from @hoboman313 from RB: https://gist.github.com/4055292

Description was:
Since this is a rather controversial feature, I will upload it in parts so I can get some early feedback.

Work Done:
So far I have added the gem to guess encoding. Whenever a grader chooses to grade a submission, my code will cache the encoding and convert the file to utf-8 before it is shown in the graderview.
The encoding will only be written to databse if the gem is at least 90% sure <--- this might be too low...thoughts?

Work Left to Do:

  • the graderview still doesn't work for unknown encodings...for some reason the .rjs file tries to treat them like binary files and the webpage hangs.
  • add a warning for to the user if they upload a file with an unknown encoding

@ghost ghost assigned ghigt Jun 8, 2013

@ghigt

This comment has been minimized.

Show comment
Hide comment
@ghigt

ghigt Jun 28, 2013

Member

I discovered this issue today after my commit #1131.

My first thought was to enable graders choosing encoding with a select field like the idea of @NelleV. But, it's a bit difficult to send the value to coviewer.rjs, I can't now seem to find how to do that (if it's the best way, I'll work harder on it).

For the time being, I chose the solution of encode from ISO-8859-1 to UTF-8 if iconv throw an exception with encoding from UTF-8. It works for every files I tested (yours @benjaminvialle).
The only problem is when the encoding is not ISO-8859-1 or UTF-8, special characters are not good. But, with normal characters it works weel and it doesn't crash the call of results/common/text_codeviewer which is, for me, the problem for now.

Member

ghigt commented Jun 28, 2013

I discovered this issue today after my commit #1131.

My first thought was to enable graders choosing encoding with a select field like the idea of @NelleV. But, it's a bit difficult to send the value to coviewer.rjs, I can't now seem to find how to do that (if it's the best way, I'll work harder on it).

For the time being, I chose the solution of encode from ISO-8859-1 to UTF-8 if iconv throw an exception with encoding from UTF-8. It works for every files I tested (yours @benjaminvialle).
The only problem is when the encoding is not ISO-8859-1 or UTF-8, special characters are not good. But, with normal characters it works weel and it doesn't crash the call of results/common/text_codeviewer which is, for me, the problem for now.

@reidka

This comment has been minimized.

Show comment
Hide comment
@reidka

reidka May 13, 2014

Member

I thought maybe the bulk of this issue had been solved, but now I'm not so sure. If anyone has time to think about this more carefully, please leave a comment.

Member

reidka commented May 13, 2014

I thought maybe the bulk of this issue had been solved, but now I'm not so sure. If anyone has time to think about this more carefully, please leave a comment.

@david-yz-liu

This comment has been minimized.

Show comment
Hide comment
@david-yz-liu

david-yz-liu Jan 17, 2015

Contributor

(1) seems to work now. Closing until we hear a new complaint about file contents encoding.

Contributor

david-yz-liu commented Jan 17, 2015

(1) seems to work now. Closing until we hear a new complaint about file contents encoding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment