Add sql server support #498

maxispeicher · 2021-01-01T15:10:12Z

Issue #356:

Description of changes:

Adding support for Microsoft SQL Server which should be equivalent to the PostgreSQL or MySQL support. For handling the connection to SQL Server databases, the pymssql package is used. Apart from the pure feature implementation some adaptions were needed:

Parsing of the JDBC URL in awswrangler/_databases.py needs to be adapted to the SQL Server Syntax for the database name
read_sql_query in awswrangler/_databases.py needed to be adapted because the pymssql.Cursor object could not be handed over to a different function. Because of that the cursor object is first created in the functions which directly query the database.
Decoding the databases passwords in awswrangler/catalog/_get.py to a str from bytes because pymssql.connect() can't handle a byte string as password
In tests change the name of the double column to ddouble, because double is protected in SQL Server.

Additionally the raise keyword was added for some exceptions.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

igorborgest

Awesome work @maxispeicher !

I just want to highlight some issues with the new dependency on pymssql.

It had started a deprecation process last year that was reverted later. Even though, the main developer leaved the project and honestly I don't know how much active the current developer(s) are.
It's under the LGPL license that has been facing some barriers to be used internally in some companies.
Microsoft explicit endorse and recommend the use of pyodbc (Reference):

There are several python SQL drivers available. However, Microsoft places its testing efforts and its confidence in pyodbc driver.

So, I didn't dig to deep in this universe yet but seems that pyodbc would be a better candidate. But we would still need to figure out how to fit it inside the library because it has a external dependency on the ODBC driver that must be installed independently and the Lambda Layer / AWS Glue whl will not support it out-of-the-box.

But first let's try to evaluate pyodbc as a possible dependency and then we invest some energy trying to understand the best way to distribute it. (Probably just a good documentation about the limitation and some reference about how to install the driver).

What do you think?

@danielwo FYI

awswrangler/_config.py

igorborgest · 2021-01-02T14:56:23Z

We should maybe add pyodbc as an optional dependency...

maxispeicher · 2021-01-02T14:57:48Z

I also was unsure about whether to use pymssql or pyodbc. I've decided against pyodbc because of the additional dependency on the ODBC driver, but I also see the issues with pymssql.
Maybe we could add a extra to the package for using sqlserver which installs pyodbc

igorborgest · 2021-01-02T15:13:42Z

Yeah, it sounds like the best plan by now. Pyodbc as an extra dependency will keep the same requirements for current users that don't need MS SQLServer. And for those that want to use it, they already need to install the driver by themselves, an extra dependency during the pip install would be nothing.

Let's wait for the @danielwo opinion too, he has more experience running on restrict environments than me.

maxispeicher · 2021-01-02T15:16:28Z

Sounds good to me. I will do some local testing meanwhile, to see if everything works as intended when using pyodbc instead of pymssql.

maxispeicher · 2021-01-02T16:52:23Z

So pyodbc seems to work just fine aswell (if you have the driver installed). I've implemented the swap on a new branch, if you want to have a look: https://github.com/maxispeicher/aws-data-wrangler/tree/swap-to-pyodbc.
However it's still missing documentation (and checking if there are remainders from pymssql)

igorborgest · 2021-01-02T17:21:41Z

This new branch seems great. I only see one issue:

We will probably need to import pyodbc dynamically or figure out some other strategy to only import it when it is installed. Otherwise, the way it is implemented right now, Wrangler will always try to import it during the import awswrangler.

igorborgest · 2021-01-02T17:25:49Z

Previously, AWS Data Wrangler had support for Apache Spark, and this was the way we were handling this kind of situation.

P.S. Regardless to a specific solution we must ensure that the autocomplete will work properly for wr.sqlserver.*.

maxispeicher · 2021-01-02T20:05:34Z

I've tried to implement a solution which basically always imports the sqlserver module, but only imports pyodbc when it is available. As soon as a public function from the sqlserver module is called and pyodbc is not installed a ModuleNotFoundError is raised with a hint that pyodbc or awswrangler with the sqlserver extra needs to be installed.

It's also pushed to the swap-to-pyodbc branch.

igorborgest · 2021-01-02T21:47:52Z

Really cool. I've liked this approach!

maxispeicher · 2021-01-03T20:34:28Z

For now I've managed to adapt the build process of the Lambda Layers to include the needed ODBC driver and pyodbc. However I couldn't find a way to do it for the Glue part.

igorborgest · 2021-01-04T20:57:51Z

@maxispeicher the lambda layer is technically perfect, but we will not be able to distribute this layers containing the MS proprietary driver. The best we will be able to do here is to give the instructions (docs) to let users create the layer by themselves.

And with this scenario in mind I would prefer to instruct them how to create a smaller "sidecar" layer that will only contain the driver and the config files. This way they will be able to use our pre-built layer + their own layer with the missing driver.

But regardless the way we will help users to use this driver on lambda/glue, let's first merge a PR without it. The pure implementation with pyodbc already has a lot of value for users running in others platforms. Then you could open a second PR only to implement/discuss the driver distribution subject.

What do you think?

maxispeicher · 2021-01-04T22:00:52Z

Yes sounds good to me. That's what I already feared regarding the ODBC driver. I've tried to add a little bit of documentation regarding Microsoft SQL Server and the required setup in the Install section of the docs. Hopefully it helps a little bit.

Additionally I've reverted any changes to the Lambda Layer build process.

I think from my side this PR is good to go in this state.

danielwo

I think this looks good. I like the dynamic import + decorator.

igorborgest

Just some tiny observations.

awswrangler/sqlserver.py

docs/source/install.rst

igorborgest

Thank you @maxispeicher !

gtossou · 2021-01-05T13:29:48Z

Thanks for working on the feature guys

maxispeicher added 11 commits December 31, 2020 03:01

WIP: Add support for SQL Server

838634c

WIP: SQL Server feature complete

30c1f03

WIP: Adapt databases cfn template

0ecadbc

WIP: Add docstrings and formatting

0e02973

Fix raising of exceptions

2f749a9

Adapt README and documentation

e58b006

Decode password to string

ec4b7b1

WIP: Fix SQLServer tests

80afc70

WIP: Fix cfn template

1ce7c31

Fix tests for Linux

256e8c0

Add missing ;

eb18b55

igorborgest added the feature label Jan 2, 2021

igorborgest added this to the 2.3.0 milestone Jan 2, 2021

igorborgest self-requested a review January 2, 2021 12:39

Merge branch 'master' into add-sql-server-support

878ee71

igorborgest suggested changes Jan 2, 2021

View reviewed changes

awswrangler/_config.py Show resolved Hide resolved

igorborgest requested a review from danielwo January 2, 2021 15:19

Swap from pymssql to pyodbc

721baec

Dynamically import pyodbc

f1f3694

maxispeicher and others added 2 commits January 3, 2021 15:49

Merge branch 'master' into add-sql-server-support

c5ea2ef

Add pyodbc to Lambda layer

0480992

Merge branch 'master' into swap-to-pyodbc

ad3253e

maxispeicher added 2 commits January 3, 2021 22:01

Fix for 3.6 and 3.7

e892fc7

Fix for 3.8

6955b54

maxispeicher marked this pull request as draft January 4, 2021 16:18

danielwo marked this pull request as ready for review January 4, 2021 18:14

danielwo marked this pull request as draft January 4, 2021 18:17

maxispeicher added 3 commits January 4, 2021 22:45

Update documentation

4f7edd3

Revert changes to Lambda layer build

2bc1c8f

Merge branch 'swap-to-pyodbc' into add-sql-server-support

c2bde90

maxispeicher marked this pull request as ready for review January 4, 2021 22:01

maxispeicher requested a review from igorborgest January 4, 2021 22:01

Merge branch 'master' into add-sql-server-support

d4b1d89

danielwo approved these changes Jan 4, 2021

View reviewed changes

igorborgest suggested changes Jan 4, 2021

View reviewed changes

awswrangler/sqlserver.py Outdated Show resolved Hide resolved

awswrangler/sqlserver.py Outdated Show resolved Hide resolved

awswrangler/sqlserver.py Outdated Show resolved Hide resolved

docs/source/install.rst Show resolved Hide resolved

Fix formatting in docstrings

d869eeb

maxispeicher requested a review from igorborgest January 5, 2021 00:25

igorborgest self-assigned this Jan 5, 2021

igorborgest approved these changes Jan 5, 2021

View reviewed changes

igorborgest merged commit 6172b4c into aws:master Jan 5, 2021

maxispeicher deleted the add-sql-server-support branch January 5, 2021 17:10

maxispeicher mentioned this pull request Sep 16, 2021

Is it possible to use pymssql instead of pyodbc for Sql server Integration? #910

Closed

Add sql server support #498

Add sql server support #498

Uh oh!

Conversation

maxispeicher commented Jan 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

igorborgest left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

igorborgest commented Jan 2, 2021

Uh oh!

maxispeicher commented Jan 2, 2021

Uh oh!

igorborgest commented Jan 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maxispeicher commented Jan 2, 2021

Uh oh!

maxispeicher commented Jan 2, 2021

Uh oh!

igorborgest commented Jan 2, 2021

Uh oh!

igorborgest commented Jan 2, 2021

Uh oh!

maxispeicher commented Jan 2, 2021

Uh oh!

igorborgest commented Jan 2, 2021

Uh oh!

maxispeicher commented Jan 3, 2021

Uh oh!

igorborgest commented Jan 4, 2021

Uh oh!

maxispeicher commented Jan 4, 2021

Uh oh!

danielwo left a comment

Choose a reason for hiding this comment

Uh oh!

igorborgest left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

igorborgest left a comment

Choose a reason for hiding this comment

Uh oh!

gtossou commented Jan 5, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

maxispeicher commented Jan 1, 2021 •

edited

Loading

igorborgest left a comment •

edited

Loading

igorborgest commented Jan 2, 2021 •

edited

Loading