You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Had a CSV file with escaped whitespace characters in it. Processing will always return the unescaped character representing the whitespace character (e.g. "\n" turns into 'n' instead of a new line.)
[Fact]publicvoidInlineEscapedWhitespaceCharacters(){usingvarreader=new StringReader(@"_,a1 \a a2,_,b1 \b b2,_,f1 \f f2,_,n1 \n n2,_,r1 \r r2,_,t1 \t t2,_,v1 \v v2");usingvarcsvReader= CsvDataReader.Create(reader,new CsvDataReaderOptions
{CsvStyle= CsvStyle.Escaped,Escape='\\',Delimiter=',',HasHeaders=false,});
csvReader.Read();varvalue00= csvReader.GetString(0);varvalue01= csvReader.GetString(1);varvalue02= csvReader.GetString(2);varvalue03= csvReader.GetString(3);varvalue04= csvReader.GetString(4);varvalue05= csvReader.GetString(5);varvalue06= csvReader.GetString(6);varvalue07= csvReader.GetString(7);varvalue08= csvReader.GetString(8);varvalue09= csvReader.GetString(9);varvalue10= csvReader.GetString(10);varvalue11= csvReader.GetString(11);varvalue12= csvReader.GetString(12);varvalue13= csvReader.GetString(13);
Assert.Multiple(()=> Assert.Equal("_", value00),()=> Assert.Equal("a1 \a a2", value01),// This will fail; will be "a1 a a2"()=> Assert.Equal("_", value02),()=> Assert.Equal("b1 \b b2", value03),// This will fail; will be "b1 b b2"()=> Assert.Equal("_", value04),()=> Assert.Equal("f1 \f f2", value05),// This will fail; will be "f1 f f2"()=> Assert.Equal("_", value06),()=> Assert.Equal("n1 \n n2", value07),// This will fail; will be "n1 n n2"()=> Assert.Equal("_", value08),()=> Assert.Equal("r1 \r r2", value09),// This will fail; will be "r1 r r2"()=> Assert.Equal("_", value10),()=> Assert.Equal("t1 \t t2", value11),// This will fail; will be "t1 t t2"()=> Assert.Equal("_", value12),()=> Assert.Equal("v1 \v v2", value13)// This will fail; will be "v1 v v2");}
CsvDataReader.PrepareField could have the escape block modified to cover this:
if(c==escape){if(i<len){c= buffer[offset+i++];if(c!=quote&&c!=escape){if(quote==escape){// the escape we just saw was actually the closing quote// the remainder of the field will be added verbatiminQuote=false;}elseif('\\'==escape){switch(c){case'a':// bellc='\a';break;case'b':// backspacec='\b';break;case'f':// form feedc='\f';break;case'n':// new linec='\n';break;case'r':// carriage returnc='\r';break;case't':// horizontal tabc='\t';break;case'v':// vertical tabc='\v';break;}}}}else{// we should never get here. Invalid fields should always be// handled in ReadField and end up in PrepareInvalidFieldthrownew CsvFormatException(rowNumber,-1);}}
All existing unit tests pass with this modification.
The text was updated successfully, but these errors were encountered:
This is working as expected. The Escaped mode parser only expects delimiters, newlines and escape characters to be escaped. Yes, it is similar to C-style string literal escaping, but is not exactly the same. The test-cases that you provide are invalid, as a doesn't need to be escaped, so a \a sequence is invalid. I've made the design decision to simply remove the unnecessary escape character. If you want a bell (\a) character, or a tab (\t) character, you can simply include that character in the output stream without needing to escape it.
Agreed on characters like tab and bell not needing escaping for the format to output correctly. The designer of the application that creates the files I'm consuming unfortunately decided to escape every character anyways and isn't open to changing the behavior since the choice was made over 25 years ago.
That said, the newline characters \r and \n were also not working in my tests. A CSV line like
Hello,A\r\nB
would result in parsing to
"Hello"
"ArnB"
field values, instead of
"Hello"
"A
B"
The newline character parsing was primarily where I was encountering issues. I don't think any of the data from this application will seriously include the bell character, but I included it to try and be comprehensive.
After having my coffee I realize you said the design accommodates newlines by using the escape character followed by the actual newline character itself and not the C-style escape. Sorry about that.
Had a CSV file with escaped whitespace characters in it. Processing will always return the unescaped character representing the whitespace character (e.g. "\n" turns into 'n' instead of a new line.)
CsvDataReader.PrepareField could have the escape block modified to cover this:
All existing unit tests pass with this modification.
The text was updated successfully, but these errors were encountered: