Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Psych::SyntaxError: (<unknown>): control characters are not allowed at line 1 column 1 #260

Closed
JamesChevalier opened this issue Feb 17, 2016 · 5 comments

Comments

@JamesChevalier
Copy link

I'm hitting an issue with invisible characters. This is the YML from the audited_changes field that is causing the issue:


---
user_id: 7
name: Banking in the Digital Era
description: Talks about a shift in the banks� approach...

(Now I wish I came here first! I've been working on this issue all morning, and this github preview is the first time I've actually seen a representation of that character at all. In all of my effort so far, it has been completely invisible. Anyway...)

This invisible character causes the following error whenever attempting to access the audit:

Psych::SyntaxError: (<unknown>): control characters are not allowed at line 1 column 1

Ideally for me, audited would Just Handle This ... I don't know how this would be fixed, though.

Can anyone offer me good direction on how to handle this? My current working idea is to try & strip those invisible characters out before saving my data, so they never make it into the database (or audited). I don't have a clear idea of how to accomplish that, yet.

Thanks for your time!

@danielmorrison
Copy link
Member

It is possible your DB is doing something weird so retrieving the data isn't getting the same character as it was when setting. Maybe check your DB character set? Is this MySQL?

@JamesChevalier
Copy link
Author

I'm using PostgreSQL in Amazon RDS.
I can reproduce this in the console, without involving audited or the database, though.

yml_broken = '---
user_id: 7
name: Banking in the Digital Era
description: "Talks about a shift in the banks� approach\r\n"'
yml_fixed = yml_broken.gsub(/\u0092/, '')

YAML.load(yml_broken) # will return Psych::SyntaxError error
YAML.load(yml_fixed)  # will return parsed yml

@JamesChevalier
Copy link
Author

I took a look through the Issues in the psych gem repository. Based on this issue and this issue, my current opinion is that audited should be cleaning up text to adhere to the YAML 1.2 spec.

This comment seems to give a good outline of allowable characters, but I haven't been able to pick up this \u0092 character with it yet. I've just opened an issue with Psych to get their take on whether it should be allowed or not...

@JamesChevalier
Copy link
Author

I had one of those half-wake-up-out-of-a-dream-with-an-idea-on-how-to-test-this experiences last night, and it actually worked. 😮

The data that I have been working with is data that was migrated over from MySQL to PostgreSQL. I wondered if it was just legacy data that was experiencing this issue, and I was correct. The only reason that I'm in a situation where this \u0092 character is because it was migrated over from MySQL. New data that audited writes does not return this error - it's actually storing the character as \x92.

As far as I can tell now, there don't need to be any changes to audited. It's up to me to iterate over the old data and clean it up on my own.

Thanks for helping out, @danielmorrison - you had the right thread and all I had to do was follow it down.

@jasonfb
Copy link

jasonfb commented Jun 26, 2017

I am on MySQL 5.6 and experiencing this problem. My audited_changes field is type TEXT with an encoding of "UTF-8 Unicode" and a collation of utf_unicode_ci

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants