Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Redshift types like NUMERIC(20, 0) #39

Open
joeschmid opened this issue Nov 6, 2019 · 4 comments
Open

Support for Redshift types like NUMERIC(20, 0) #39

joeschmid opened this issue Nov 6, 2019 · 4 comments

Comments

@joeschmid
Copy link

Thanks for the work on this project! We're just trying out Singer for moving data from MySQL to Redshift. In MySQL we have a column type of bigint(18) unsigned. Some values in this column don't fit it Redshift's bigint column type and we get errors like Overflow (Long valid range -9223372036854775808 to 9223372036854775807)

Typically we declare a Redshift column as NUMERIC(20, 0) to hold these values. Is there a way to tell target-redshift to use that type for a particular Redshift column?

@AlexanderMann
Copy link
Collaborator

@joeschmid thanks for the kind words! We're always looking to make Target-Redshift better, so we really appreciate questions like this.

There is currently no supported way to do what you're asking. There have been conversations in the past about building up tooling to detect data widths so that we can leverage tighter constraints inside Redshift and avoid penalties for things like TEXT columns everywhere, instead of VARCHAR(20), etc.

There is some work coming down the pipe which will make a number of these improvements simpler in the future, but what the "future" here means is pretty up in the air.

Given this, I don't think the most expedient way for you to resolve your is to wait out for this feature.

I'd be happy to help walk you through what changes I would expect you'd need to make to get things working if that's useful to you?

@joeschmid
Copy link
Author

@AlexanderMann thanks very much for the update and explanation. That all makes sense. If you wouldn't mind walking through the changes to get this scenario working I'd appreciated it. (And maybe any others who come across similar issues would see the explanation here and it would help them out.)

@AlexanderMann
Copy link
Collaborator

@joeschmid no problem. So I will start by saying that the way to "get this working" is to fork this repo, and start trying to get what you're after working. I'm also not sure if it'll "work" or end up being a 🐰 🕳

Worth noting, Stitch also doesn't "support" this: https://www.stitchdata.com/docs/destinations/redshift/#data-limits

Integer range
9223372036854775808 to 9223372036854775807
Integer values outside of this range will be rejected and logged in the _sdc_rejected table.

Easiest Option

Make all integers NUMERIC(0, 20)

Pros

Prolly be straightforward and simple.

Cons

Column widths will balloon for all integers. Redshift (last I checked) uses the full width for a column for all values in the column, whereas PostgreSQL uses the width of the data in the row to consume memory.

Changes

In these lines, you're just going to make a mapping for JSONSchema's integer type to Redshift's NUMERIC(0,20): https://github.com/datamill-co/target-redshift/blob/master/target_redshift/redshift.py#L97-L118

For more examples of what that'd look like, check in here: https://github.com/datamill-co/target-postgres/blob/master/target_postgres/postgres.py#L806-L870

@awm33
Copy link
Member

awm33 commented May 16, 2020

@joeschmid I'm not sure if you resolved this, but a hack (and for anyone looking this issue) would be to create a view where that column is a text/string type then use a SQL transform to parse that into a custom numeric type after replication.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants