From a73a097f27c776052f084cf1b1fc3ff079ace40c Mon Sep 17 00:00:00 2001 From: Brooke White Date: Mon, 26 Oct 2020 17:20:19 -0700 Subject: [PATCH] docs: Add installation instructions and example usage --- README.md | 39 ------------ README.rst | 184 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 184 insertions(+), 39 deletions(-) delete mode 100644 README.md create mode 100644 README.rst diff --git a/README.md b/README.md deleted file mode 100644 index d64258d..0000000 --- a/README.md +++ /dev/null @@ -1,39 +0,0 @@ -## Redshift Python Connector - -Redshift Python Connector requires Python3 and pip3. This connector has many Redshift specific features such as, - - * IAM authentication - * IDP authentication - * Redshift specific datatypes support - * API to explore database objects such as get_tables(), get_columns() etc. - * External schema support as part of get_tables() and get_columns() API - * Easy to integrate with Pandas and NumPy - * Pure Python Driver - - This connector supports Python Database API Specification v2.0. - - ## Build Driver - On Unix system run: - ``` - build.sh - ``` - It builds **redshift_connector-{version}-py3-none-any.whl** and **redshift_connector-{version}.tar.gz** files under **dist** directory. - The Wheel file is the Redshift Python Connector built distribution. - The zip file is the source archive of the Redshift Python Connector. - - ## Report Bugs - - See [CONTRIBUTING](CONTRIBUTING.md#Reporting-Bugs/Feature-Requests) for more information. - - ## Contributing Code Development - - See [CONTRIBUTING](CONTRIBUTING.md#Contributing-via-Pull-Requests) for more information. - -## Security - -See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information. - -## License - -This project is licensed under the Apache-2.0 License. - diff --git a/README.rst b/README.rst new file mode 100644 index 0000000..12726cd --- /dev/null +++ b/README.rst @@ -0,0 +1,184 @@ +======================================================= +redshift_connector +======================================================= + +redshift_connector is the Amazon Web Services (AWS) Redshift connector for +Python. Easy integration with `pandas `_ and `numpy `_, as well as support for numerous Amazon Redshift specific features help you get the most out of your data + +Supported Amazon Redshift features include: + +- IAM authentication +- Identity provider (IdP) authentication +- Redshift specific data types + + +This pure Python connector implements Python Database API Specification 2.0. + + +Getting Started +--------------- +The easiest way to get started with redshift_connector is via `pip `_ + +``pip install redshift_connector`` + +Note: redshift_connector requires Python >= 3.5 + + +You can install from source by cloning this repository. Assuming that you have Python and ``virtualenv`` installed, set up your environment and install the required dependencies like this: + +.. code-block:: sh + + $ git clone https://github.com/aws/amazon-redshift-python-driver.git + $ cd redshift_connector + $ virtualenv venv + $ . venv/bin/activate + $ python -m pip install -r requirements.txt + $ python -m pip install -e . + $ python -m pip install redshift_connector + +Basic Example +~~~~~~~~~~~~~ +.. code-block:: python + + import redshift_connector + + # Connects to Redshift cluster using AWS credentials + conn = redshift_connector.connect( + host='examplecluster.abc123xyz789.us-west-1.redshift.amazonaws.com', + port=5439, + database='dev', + user='awsuser', + password='my_password' + ) + + cursor: redshift_connector.Cursor = conn.cursor() + cursor.execute("create Temp table book(bookname varchar,author‎ varchar)") + cursor.executemany("insert into book (bookname, author‎) values (%s, %s)", + [ + ('One Hundred Years of Solitude', 'Gabriel García Márquez'), + ('A Brief History of Time', 'Stephen Hawking') + ] + ) + cursor.execute("select * from book") + + result: tuple = cursor.fetchall() + print(result) + >> (['One Hundred Years of Solitude', 'Gabriel García Márquez'], ['A Brief History of Time', 'Stephen Hawking']) + + +Integration with pandas +~~~~~~~~~~~~~~~~~~~~~~~ +.. code-block:: python + + import pandas + cursor.execute("create Temp table book(bookname varchar,author‎ varchar)") + cursor.executemany("insert into book (bookname, author‎) values (%s, %s)", + [ + ('One Hundred Years of Solitude', 'Gabriel García Márquez'), + ('A Brief History of Time', 'Stephen Hawking') + + ]) + cursor.execute("select * from book") + result: pandas.DataFrame = cursor.fetch_dataframe() + print(result) + >> bookname author‎ + >> 0 One Hundred Years of Solitude Gabriel García Márquez + >> 1 A Brief History of Time Stephen Hawking + + +Integration with numpy +~~~~~~~~~~~~~~~~~~~~~~ + +.. code-block:: python + + import numpy + cursor.execute("select * from book") + + result: numpy.ndarray = cursor.fetch_numpy_array() + print(result) + >> [['One Hundred Years of Solitude' 'Gabriel García Márquez'] + >> ['A Brief History of Time' 'Stephen Hawking']] + +Query using functions +~~~~~~~~~~~~~~~~~~~~~ +.. code-block:: python + + cursor.execute("SELECT CURRENT_TIMESTAMP") + print(cursor.fetchone()) + >> [datetime.datetime(2020, 10, 26, 23, 3, 54, 756497, tzinfo=datetime.timezone.utc)] + + +Connection Parameters +~~~~~~~~~~~~~~~~~~~~~ ++-------------------------+--------------------------------------------------------------------------------------------+---------------+----------+ +| Name | Description | Default Value | Required | ++=========================+============================================================================================+===============+==========+ +| database | String. The name of the database to connect to | | Yes | ++-------------------------+--------------------------------------------------------------------------------------------+---------------+----------+ +| user | String. The username to use for authentication | | Yes | ++-------------------------+--------------------------------------------------------------------------------------------+---------------+----------+ +| password | String. The password to use for authentication | | Yes | ++-------------------------+--------------------------------------------------------------------------------------------+---------------+----------+ +| host | String. The hostname of Amazon Redshift cluster | | Yes | ++-------------------------+--------------------------------------------------------------------------------------------+---------------+----------+ +| port | Int. The port number of the Amazon Redshift cluster | 5439 | No | ++-------------------------+--------------------------------------------------------------------------------------------+---------------+----------+ +| ssl | Bool. If SSL is enabled | True | No | ++-------------------------+--------------------------------------------------------------------------------------------+---------------+----------+ +| iam | Bool. If IAM Authentication is enabled | False | No | ++-------------------------+--------------------------------------------------------------------------------------------+---------------+----------+ +| sslmode | String. The security of the connection to Amazon Redshift. | 'verify-ca' | No | +| | 'verify-ca' and 'verify-full' are supported. | | | ++-------------------------+--------------------------------------------------------------------------------------------+---------------+----------+ +| idp_response_timeout | Int. The timeout for retrieving SAML assertion from IdP | 120 | No | ++-------------------------+--------------------------------------------------------------------------------------------+---------------+----------+ +| idp_port | Int. The listen port IdP will send the SAML assertion to | 7890 | No | ++-------------------------+--------------------------------------------------------------------------------------------+---------------+----------+ +| log_level | Int. The level of logging enabled, increasing in granularity (values [0,4] are valid) | 0 | No | ++-------------------------+--------------------------------------------------------------------------------------------+---------------+----------+ +| log_path | String. The file path to the log file | 'driver.log' | No | ++-------------------------+--------------------------------------------------------------------------------------------+---------------+----------+ +| max_prepared_statements | Int. The maximum number of prepared statements that can be open at once | 1000 | No | ++-------------------------+--------------------------------------------------------------------------------------------+---------------+----------+ +| idp_tenant | String. The IdP tenant | None | No | ++-------------------------+--------------------------------------------------------------------------------------------+---------------+----------+ +| credential_provider | String. The IdP that will be used for authenticating with Amazon Redshift. | None | No | +| | 'OktaCredentialsProvider', 'AzureCredentialsProvider', 'BrowserAzureCredentialsProvider', | | | +| | 'PingCredentialsProvider', 'BrowserSamlCredentialsProvider', and 'AdfsCredentialsProvider' | | | +| | are supported. | | | ++-------------------------+--------------------------------------------------------------------------------------------+---------------+----------+ +| cluster_identifier | String. The cluster identifier of the Amazon Redshift Cluster | None | No | ++-------------------------+--------------------------------------------------------------------------------------------+---------------+----------+ +| db_user | String. The user ID to use with Amazon Redshift | None | No | ++-------------------------+--------------------------------------------------------------------------------------------+---------------+----------+ +| login_url | String. The SSO Url for the IdP | None | No | ++-------------------------+--------------------------------------------------------------------------------------------+---------------+----------+ +| preferred_role | String. The IAM role preferred for the current connection | None | No | ++-------------------------+--------------------------------------------------------------------------------------------+---------------+----------+ +| client_secret | String. The client secret from Azure IdP | None | No | ++-------------------------+--------------------------------------------------------------------------------------------+---------------+----------+ +| client_id | String. The client id from Azure IdP | None | No | ++-------------------------+--------------------------------------------------------------------------------------------+---------------+----------+ +| region | String. The AWS region where the cluster is located | None | No | ++-------------------------+--------------------------------------------------------------------------------------------+---------------+----------+ +| app_name | String. The name of the IdP application used for authentication. | None | No | ++-------------------------+--------------------------------------------------------------------------------------------+---------------+----------+ + + +Getting Help +~~~~~~~~~~~~ +- Ask a question on `Stack Overflow `_ and tag it with redshift_connector +- Open a support ticket with `AWS Support `_ +- If you may have found a bug, please `open an issue `_ + +Contributing +~~~~~~~~~~~~ +We look forward to collaborating with you! Please read through `CONTRIBUTING `_ before submitting any issues or pull requests. + +Running Tests +------------- +You can run tests by using ``pytest test/unit``. This will run all unit tests. Integration tests require providing credentials for an Amazon Redshift cluster as well as IdP attributes in ``test/config.ini``. + +Additional Resources +~~~~~~~~~~~~~~~~~~~~ +- `LICENSE `_