Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Track] bugs in csv input format parser #6314

Closed
Tracked by #6313
youngsofun opened this issue Jun 29, 2022 · 7 comments
Closed
Tracked by #6313

[Track] bugs in csv input format parser #6314

youngsofun opened this issue Jun 29, 2022 · 7 comments

Comments

@youngsofun
Copy link
Member

Summary

@youngsofun
Copy link
Member Author

@ZhiHanZ a, {'b": "c"}, d is not a valid CSV, to my understanding. can you provide more context for it?

@ZhiHanZ
Copy link
Collaborator

ZhiHanZ commented Jun 29, 2022

@ZhiHanZ a, {'b": "c"}, d is not a valid CSV, to my understanding. can you provide more context for it?

That is some psedo examples, here is some temp file generated from airbyte which could be helpful

ee9e9984-083a-439b-8a1d-f9178181fb01,"{""id"":1,""topic"":""install"",""timestamp"":""2017-08-22T17:22:15Z"",""model"":""install"",""details"":""{\""name\"":\""Conferência Faturamento - Custo - Taxas - Margem - Resumo ano inicial até -2\"",\""description\"":null}""}",2021-07-08 00:58:59.889

@sundy-li
Copy link
Member

sundy-li commented Jun 29, 2022

Need modify

 fn de_text_csv<R: BufferRead>(
        &mut self,
        reader: &mut R,
        _format: &FormatSettings,
    ) -> Result<()> {
        self.buffer.clear();
        reader.read_quoted_text(&mut self.buffer, b'"')?;

        let val = serde_json::from_slice(self.buffer.as_slice())?;
        self.builder.append_value(val);
        Ok(())
    }

If the first char is '{', we should consume to the next matched '}';

So do '[' .

@youngsofun
Copy link
Member Author

youngsofun commented Jun 29, 2022

I think csv should do know nothing about { or [ ?
do we support " as default escape char (instead of \) or need to specify it before parse?

embedded json field in csv/tsv should be serialized to a string, and then escaped with the csv/tsv rule

@wubx
Copy link
Member

wubx commented Jun 30, 2022

track: #5586

@wubx
Copy link
Member

wubx commented Jun 30, 2022

@ZhiHanZ a, {'b": "c"}, d is not a valid CSV, to my understanding. can you provide more context for it?

That is some psedo examples, here is some temp file generated from airbyte which could be helpful

ee9e9984-083a-439b-8a1d-f9178181fb01,"{""id"":1,""topic"":""install"",""timestamp"":""2017-08-22T17:22:15Z"",""model"":""install"",""details"":""{\""name\"":\""Conferência Faturamento - Custo - Taxas - Margem - Resumo ano inicial até -2\"",\""description\"":null}""}",2021-07-08 00:58:59.889

if support this feature, github archive file is very easy to load to databend.

@youngsofun
Copy link
Member Author

@wubx @ZhiHanZ this case is fixed by #6524 (comment)

@ZhiHanZ a, {'b": "c"}, d is not a valid CSV, to my understanding. can you provide more context for it?

That is some psedo examples, here is some temp file generated from airbyte which could be helpful

ee9e9984-083a-439b-8a1d-f9178181fb01,"{""id"":1,""topic"":""install"",""timestamp"":""2017-08-22T17:22:15Z"",""model"":""install"",""details"":""{\""name\"":\""Conferência Faturamento - Custo - Taxas - Margem - Resumo ano inicial até -2\"",\""description\"":null}""}",2021-07-08 00:58:59.889

if support this feature, github archive file is very easy to load to databend.

@BohuTANG BohuTANG closed this as completed Jul 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants