-
Notifications
You must be signed in to change notification settings - Fork 722
Add sql server support #498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work @maxispeicher !
I just want to highlight some issues with the new dependency on pymssql.
- It had started a deprecation process last year that was reverted later. Even though, the main developer leaved the project and honestly I don't know how much active the current developer(s) are.
- It's under the LGPL license that has been facing some barriers to be used internally in some companies.
- Microsoft explicit endorse and recommend the use of pyodbc (Reference):
There are several python SQL drivers available. However, Microsoft places its testing efforts and its confidence in pyodbc driver.
So, I didn't dig to deep in this universe yet but seems that pyodbc would be a better candidate. But we would still need to figure out how to fit it inside the library because it has a external dependency on the ODBC driver that must be installed independently and the Lambda Layer / AWS Glue whl will not support it out-of-the-box.
But first let's try to evaluate pyodbc as a possible dependency and then we invest some energy trying to understand the best way to distribute it. (Probably just a good documentation about the limitation and some reference about how to install the driver).
What do you think?
@danielwo FYI
|
We should maybe add |
|
I also was unsure about whether to use |
|
Yeah, it sounds like the best plan by now. Let's wait for the @danielwo opinion too, he has more experience running on restrict environments than me. |
|
Sounds good to me. I will do some local testing meanwhile, to see if everything works as intended when using |
|
So |
|
This new branch seems great. I only see one issue: We will probably need to import |
|
Previously, AWS Data Wrangler had support for Apache Spark, and this was the way we were handling this kind of situation. P.S. Regardless to a specific solution we must ensure that the autocomplete will work properly for |
|
I've tried to implement a solution which basically always imports the It's also pushed to the swap-to-pyodbc branch. |
|
Really cool. I've liked this approach! |
|
For now I've managed to adapt the build process of the Lambda Layers to include the needed ODBC driver and |
|
@maxispeicher the lambda layer is technically perfect, but we will not be able to distribute this layers containing the MS proprietary driver. The best we will be able to do here is to give the instructions (docs) to let users create the layer by themselves. And with this scenario in mind I would prefer to instruct them how to create a smaller "sidecar" layer that will only contain the driver and the config files. This way they will be able to use our pre-built layer + their own layer with the missing driver. But regardless the way we will help users to use this driver on lambda/glue, let's first merge a PR without it. The pure implementation with What do you think? |
|
Yes sounds good to me. That's what I already feared regarding the ODBC driver. I've tried to add a little bit of documentation regarding Microsoft SQL Server and the required setup in the Additionally I've reverted any changes to the Lambda Layer build process. I think from my side this PR is good to go in this state. |
danielwo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good. I like the dynamic import + decorator.
igorborgest
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some tiny observations.
igorborgest
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @maxispeicher !
|
Thanks for working on the feature guys |
Issue #356:
Description of changes:
Adding support for Microsoft SQL Server which should be equivalent to the PostgreSQL or MySQL support. For handling the connection to SQL Server databases, the pymssql package is used. Apart from the pure feature implementation some adaptions were needed:
awswrangler/_databases.pyneeds to be adapted to the SQL Server Syntax for the database nameread_sql_queryinawswrangler/_databases.pyneeded to be adapted because thepymssql.Cursorobject could not be handed over to a different function. Because of that the cursor object is first created in the functions which directly query the database.awswrangler/catalog/_get.pyto astrfrombytesbecausepymssql.connect()can't handle a byte string as passworddoublecolumn toddouble, becausedoubleis protected in SQL Server.Additionally the
raisekeyword was added for some exceptions.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.