Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding support for TEXT fields, remove non-standard ascii characters #2

Merged
merged 3 commits into from
Nov 7, 2016
Merged

adding support for TEXT fields, remove non-standard ascii characters #2

merged 3 commits into from
Nov 7, 2016

Conversation

odedelharar
Copy link

Added a very basic TEXT field support, while removing non-ascii characters from such a field.
Updated the date and datetime fields to support string representations of unix time.

@cognitom
Copy link
Owner

cognitom commented Nov 7, 2016

@odedelharar thank you for your PR! LGTM about TEXT type. I like it 👍

I assumed that DATE and DATETIME type are string like 2016-11-07 basically. Could you change the code to pass such values?

The code could be like this:

convert: val => {
  if (!val) return 'NULL'
  if (typeof val == 'number') val = moment(val).format('YYYY-MM-DD')
  if (typeof val == 'string') {
    if (/\d+/.test(val)) val = moment(parseInt(val)).format('YYYY-MM-DD')
    if (!/\d{4}-\d{2}-\d{2}/.test(val)) return 'NULL'
  }
  return `"${ val }"`
}

@cognitom
Copy link
Owner

cognitom commented Nov 7, 2016

Could you pls add documentation about TEXT to here, too? Thanks.

@odedelharar
Copy link
Author

Updated, now can allow: date format string (YYYY-MM-DD), unix time (number), unix time (string). Can make more options viable, maybe in a future change

@cognitom
Copy link
Owner

cognitom commented Nov 7, 2016

Great! Thank you.

@cognitom cognitom merged commit 1211199 into cognitom:master Nov 7, 2016
.toString()
.replace(/\\/g, '\\\\') // escape backslashs
.replace(/"/g, '\\"') // escape double quotations
.replace(/[^\x00-\x7F]/g, "") // remove non ascii characters
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I'll remove this line, later.
We need to care about Unicode chars equally.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fixing the code like this.

.replace(/[\x00-\x1F\x7F]/g, '')

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Umm..., we should keep \x09 and \x10.

.replace(/[\x00-\x08\x11-\x1F\x7F]/g, '')

@odedelharar could you explain the use case you thought?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I had a problem with a field with a non-standard character. MySQL wouldn't allow inserting this line. If we want to keep this line (we should have some safeguard there), we should include all characters allowed in standard DB encoding (unless you have examples of non-standard encoding). So my code allows all common ASCII characters (0-127).

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@odedelharar thanks. I've checked the spec of MySQL: \0 \b \t \n \r \x1a supposed to be allowed. Actually sqlstring module care about those control chars.
https://github.com/mysqljs/sqlstring/blob/master/lib/SqlString.js#L2

At this point, I'd like to remove control chars except above, and escape them by sqlstring.

The meaning of Non-standard depends on what charset we use, I think.
If you need another custom filter, feel free to send a new PR 😉

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll find a way to remove the offending characters without removing anything else. In my case the character was \xEF\xBF\xBD or �.

cognitom added a commit that referenced this pull request Nov 8, 2016
@odedelharar
Copy link
Author

Maybe add an expression like: /[\x00-\x1F\x7F\uFFFD\xEF\xBF\xBD]/g so it also covers the cases I saw

@cognitom
Copy link
Owner

cognitom commented Nov 8, 2016

@odedelharar unfortunately, it seems not a good idea. These chars like \xBF \xBD are actually valid letters in some languages.

@odedelharar
Copy link
Author

OK, so maybe we can use /[\x00-\x1F\x7F\uFFFD]/g since \uFFFD (�) is not OK anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants