Skip to content

Commit

Permalink
Merge branch 'develop' into master
Browse files Browse the repository at this point in the history
  • Loading branch information
terencezl committed Jul 8, 2020
2 parents aeba3c5 + d08f1ca commit ee80e4d
Show file tree
Hide file tree
Showing 67 changed files with 7,246 additions and 1,442 deletions.
34 changes: 34 additions & 0 deletions .github/ISSUE_TEMPLATE.md
@@ -0,0 +1,34 @@
#### Problem description

Be sure your description clearly answers the following questions:

- What are you trying to achieve?
- What is the expected result?
- What are you seeing instead?

#### Steps/code to reproduce the problem

In order for us to be able to solve your problem, we have to be able to reproduce it on our end.
Without reproducing the problem, it is unlikely that we'll be able to help you.

Include full tracebacks, logs and datasets if necessary.
Please keep the examples minimal ([minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example)).

#### Versions

Please provide the output of:

```python
import platform, sys, smart_open
print(platform.platform())
print("Python", sys.version)
print("smart_open", smart_open.__version__)
```

#### Checklist

Before you create the issue, please make sure you have:

- [ ] Described the problem clearly
- [ ] Provided a minimal reproducible example, including any required data
- [ ] Provided the version numbers of the relevant software
54 changes: 54 additions & 0 deletions .github/PULL_REQUEST_TEMPLATE.md
@@ -0,0 +1,54 @@
#### Title

Please **pick a concise, informative and complete title** for your PR.
The title is important because it will appear in [our change log](https://github.com/RaRe-Technologies/smart_open/blob/master/CHANGELOG.md).

#### Motivation

Please explain the motivation behind this PR in the description.

If you're fixing a bug, link to the issue number like so:

```
- Fixes #{issue_number}
```

If you're adding a new feature, then consider opening a ticket and discussing it with the maintainers before you actually do the hard work.

#### Tests

If you're fixing a bug, consider [test-driven development](https://en.wikipedia.org/wiki/Test-driven_development):

1. Create a unit test that demonstrates the bug. The test should **fail**.
2. Implement your bug fix.
3. The test you created should now **pass**.

If you're implementing a new feature, include unit tests for it.

Make sure all existing unit tests pass.
You can run them locally using:

pytest smart_open

If there are any failures, please fix them before creating the PR (or mark it as WIP, see below).

#### Work in progress

If you're still working on your PR, include "WIP" in the title.
We'll skip reviewing it for the time being.
Once you're ready to review, remove the "WIP" from the title, and ping one of the maintainers (e.g. mpenkov).

#### Checklist

Before you create the PR, please make sure you have:

- [ ] Picked a concise, informative and complete title
- [ ] Clearly explained the motivation behind the PR
- [ ] Linked to any existing issues that your PR will be solving
- [ ] Included tests for any new functionality
- [ ] Checked that all unit tests pass

#### Workflow

Please avoid rebasing and force-pushing to the branch of the PR once a review is in progress.
Rebasing can make your commits look a bit cleaner, but it also makes life more difficult from the reviewer, because they are no longer able to distinguish between code that has already been reviewed, and unreviewed code.
9 changes: 9 additions & 0 deletions .gitignore
Expand Up @@ -56,3 +56,12 @@ target/
# vim
*.swp
*.swo

# PyCharm
.idea/

# VSCode
.vscode/

# env files
.env
58 changes: 17 additions & 41 deletions .travis.yml
@@ -1,64 +1,40 @@
language: python
dist: xenial
os: linux

env:
global:
- secure: "GH+DI7f9QenVtTZFEfn4B8wO5JJK65PkHDg8vY/npdW51y5PPAynTEf/++D0H4tjwqMuXwB5lrbWfYeskf29Xuq5MT5+FixajZFcbhscoSM9CGABeph2s2+Hm4kSIKmjnTlQLgJHVbwypnlU/W4sfMCbqeOmv2fYXCCt0GhtnWc="
- secure: "du3PQYEiDPw55TRzhk+Ocv1Gx1DuusbCSMVSQmccAoyOr7qxDqm+1jh2v13RmajNK7FnlKyC4xSTySVpSl70By2uhZlJT43EpenqIcpQWqUAm3nVr6etszdb1A6TfEGQrxZ8Y2j9KD6QAMNovsMZbl0bcDJDxTeFA4P/yU9UZcI="


matrix:
jobs:
include:
- python: '2.7'
env:
- SO_DISABLE_MOCKS: "1"
- SO_S3_URL: "s3://smart-open-py27-benchmark"
- SO_S3_RESULT_URL: "s3://smart-open-py27-benchmark-results"
- python: '3.7'
name: "flake8"
env: TOXENV="flake8"

- python: '3.5'
env: TOXENV="check_keys,py35-test,py35-integration"

- python: '3.6'
env:
- SO_DISABLE_MOCKS: "1"
- SO_S3_URL: "s3://smart-open-py36-benchmark"
- SO_S3_RESULT_URL: "s3://smart-open-py36-benchmark-results"
- BOTO_CONFIG: "/dev/null"
- TOXENV: "check_keys,py36-doctest,py36-test,py36-benchmark,py36-integration"

- python: '3.7'
env:
- SO_DISABLE_MOCKS: "1"
- SO_S3_URL: "s3://smart-open-py37-benchmark"
- SO_S3_RESULT_URL: "s3://smart-open-py37-benchmark-results"
- BOTO_CONFIG: "/dev/null"
dist: xenial

- SO_ENABLE_MOTO_SERVER: "1"
- TOXENV: "check_keys,py37-doctest,enable_moto_server,py37-test,py37-benchmark,py37-integration,disable_moto_server"
- python: '3.8'
env:
- BOTO_CONFIG: "/dev/null"
- TOXENV: "check_keys,py38-doctest,test_coverage,py38-integration"

install:
- pip install --upgrade setuptools
- pip install .[test]
- pip freeze

- pip install tox

script:
- if [[ ${TRAVIS_SECURE_ENV_VARS} = false ]]; then
echo "DISABLE INTEGRATION TESTING FOR S3";
unset SO_DISABLE_MOCKS;
unset SO_S3_URL;
unset SO_S3_RESULT_URL;
fi
- python setup.py test
- export SO_S3_URL=$SO_S3_URL/$(python -c 'from uuid import uuid4;print(uuid4())')
- pip install pytest
- py.test integration-tests/test_http.py
- if [[ ${SO_DISABLE_MOCKS} = "1" ]]; then
pip install pytest_benchmark awscli;
set -e;
py.test integration-tests/test_s3.py --benchmark-save=`git rev-parse HEAD`;
set +e;
aws s3 cp .benchmarks/*/*.json ${SO_S3_RESULT_URL};
aws s3 rm --recursive $SO_S3_URL;
fi
- pip install numpy
- py.test integration-tests/test_207.py

- tox -vv

cache:
directories:
Expand Down
75 changes: 75 additions & 0 deletions CHANGELOG.md
@@ -1,5 +1,80 @@
# Unreleased

- Bypass unnecessary GCS storage.buckets.get permission (PR [#516](https://github.com/RaRe-Technologies/smart_open/pull/516), [@gelioz](https://github.com/gelioz))

# 2.1.0, 1 July 2020

- Azure storage blob support ([@nclsmitchell](https://github.com/nclsmitchell) and [@petedannemann](https://github.com/petedannemann))
- Correctly pass `newline` parameter to built-in `open` function (PR [#478](https://github.com/RaRe-Technologies/smart_open/pull/478), [@burkovae](https://github.com/burkovae))
- Ensure GCS objects always have a .name attribute (PR [#506](https://github.com/RaRe-Technologies/smart_open/pull/506), [@todor-markov](https://github.com/todor-markov))
- Use exception chaining to convey the original cause of the exception (PR [#508](https://github.com/RaRe-Technologies/smart_open/pull/508), [@cool-RR](https://github.com/cool-RR))

# 2.0.0, 27 April 2020, "Python 3"

- **This version supports Python 3 only** (3.5+).
- If you still need Python 2, install the smart_open==1.10.1 legacy release instead.
- Prevent smart_open from writing to logs on import (PR [#476](https://github.com/RaRe-Technologies/smart_open/pull/476), [@mpenkov](https://github.com/mpenkov))
- Modify setup.py to explicitly support only Py3.5 and above (PR [#471](https://github.com/RaRe-Technologies/smart_open/pull/471), [@Amertz08](https://github.com/Amertz08))
- Include all the test_data in setup.py (PR [#473](https://github.com/RaRe-Technologies/smart_open/pull/473), [@sikuan](https://github.com/sikuan))

# 1.10.1, 26 April 2020

- This is the last version to support Python 2.7. Versions 1.11 and above will support Python 3 only.
- Use only if you need Python 2.

# 1.11.1, 8 Apr 2020

- Add missing boto dependency (Issue [#468](https://github.com/RaRe-Technologies/smart_open/issues/468))

# 1.11.0, 8 Apr 2020

- Fix GCS multiple writes (PR [#421](https://github.com/RaRe-Technologies/smart_open/pull/421), [@petedannemann](https://github.com/petedannemann))
- Implemented efficient readline for ByteBuffer (PR [#426](https://github.com/RaRe-Technologies/smart_open/pull/426), [@mpenkov](https://github.com/mpenkov))
- Fix WebHDFS read method (PR [#433](https://github.com/RaRe-Technologies/smart_open/pull/433), [@mpenkov](https://github.com/mpenkov))
- Make S3 uploads more robust (PR [#434](https://github.com/RaRe-Technologies/smart_open/pull/434), [@mpenkov](https://github.com/mpenkov))
- Add pathlib monkeypatch with replacement of `pathlib.Path.open` (PR [#436](https://github.com/RaRe-Technologies/smart_open/pull/436), [@menshikh-iv](https://github.com/menshikh-iv))
- Fix error when calling str() or repr() on GCS SeekableBufferedInputBase (PR [#442](https://github.com/RaRe-Technologies/smart_open/pull/442), [@robcowie](https://github.com/robcowie))
- Move optional dependencies to extras (PR [#454](https://github.com/RaRe-Technologies/smart_open/pull/454), [@Amertz08](https://github.com/Amertz08))
- Correctly handle GCS paths that contain '?' char (PR [#460](https://github.com/RaRe-Technologies/smart_open/pull/460), [@chakruperitus](https://github.com/chakruperitus))
- Make our doctools submodule more robust (PR [#467](https://github.com/RaRe-Technologies/smart_open/pull/467), [@mpenkov](https://github.com/mpenkov))

Starting with this release, you will have to run:

pip install smart_open[gcs] to use the GCS transport.

In the future, all extra dependencies will be optional. If you want to continue installing all of them, use:

pip install smart_open[all]

See the README.rst for details.

# 1.10.0, 16 Mar 2020

- Various webhdfs improvements (PR [#383](https://github.com/RaRe-Technologies/smart_open/pull/383), [@mrk-its](https://github.com/mrk-its))
- Fixes "the connection was closed by the remote peer" error (PR [#389](https://github.com/RaRe-Technologies/smart_open/pull/389), [@Gapex](https://github.com/Gapex))
- allow use of S3 single part uploads (PR [#400](https://github.com/RaRe-Technologies/smart_open/pull/400), [@adrpar](https://github.com/adrpar))
- Add test data in package via MANIFEST.in (PR [#401](https://github.com/RaRe-Technologies/smart_open/pull/401), [@jayvdb](https://github.com/jayvdb))
- Google Cloud Storage (GCS) (PR [#404](https://github.com/RaRe-Technologies/smart_open/pull/404), [@petedannemann](https://github.com/petedannemann))
- Implement to_boto3 function for S3 I/O. (PR [#405](https://github.com/RaRe-Technologies/smart_open/pull/405), [@mpenkov](https://github.com/mpenkov))
- enable smart_open to operate without docstrings (PR [#406](https://github.com/RaRe-Technologies/smart_open/pull/406), [@mpenkov](https://github.com/mpenkov))
- Implement object_kwargs parameter (PR [#411](https://github.com/RaRe-Technologies/smart_open/pull/411), [@mpenkov](https://github.com/mpenkov))
- Remove dependency on old boto library (PR [#413](https://github.com/RaRe-Technologies/smart_open/pull/413), [@mpenkov](https://github.com/mpenkov))
- implemented efficient readline for ByteBuffer (PR [#426](https://github.com/RaRe-Technologies/smart_open/pull/426), [@mpenkov](https://github.com/mpenkov))
- improve buffering efficiency (PR [#427](https://github.com/RaRe-Technologies/smart_open/pull/427), [@mpenkov](https://github.com/mpenkov))
- fix WebHDFS read method (PR [#433](https://github.com/RaRe-Technologies/smart_open/pull/433), [@mpenkov](https://github.com/mpenkov))
- Make S3 uploads more robust (PR [#434](https://github.com/RaRe-Technologies/smart_open/pull/434), [@mpenkov](https://github.com/mpenkov))

# 1.9.0, 3 Nov 2019

- Add version_id transport parameter for fetching a specific S3 object version (PR [#325](https://github.com/RaRe-Technologies/smart_open/pull/325), [@interpolatio](https://github.com/interpolatio))
- Document passthrough use case (PR [#333](https://github.com/RaRe-Technologies/smart_open/pull/333), [@mpenkov](https://github.com/mpenkov))
- Support seeking over HTTP and HTTPS (PR [#339](https://github.com/RaRe-Technologies/smart_open/pull/339), [@interpolatio](https://github.com/interpolatio))
- Add support for rt, rt+, wt, wt+, at, at+ methods (PR [#342](https://github.com/RaRe-Technologies/smart_open/pull/342), [@interpolatio](https://github.com/interpolatio))
- Change VERSION to version.py (PR [#349](https://github.com/RaRe-Technologies/smart_open/pull/349), [@mpenkov](https://github.com/mpenkov))
- Adding howto guides (PR [#355](https://github.com/RaRe-Technologies/smart_open/pull/355), [@mpenkov](https://github.com/mpenkov))
- smart_open/s3: Initial implementations of str and repr (PR [#359](https://github.com/RaRe-Technologies/smart_open/pull/359), [@ZlatSic](https://github.com/ZlatSic))
- Support writing any bytes-like object to S3. (PR [#361](https://github.com/RaRe-Technologies/smart_open/pull/361), [@gilbsgilbs](https://github.com/gilbsgilbs))

# 1.8.4, 2 Jun 2019

- Don't use s3 bucket_head to check for bucket existence (PR [#315](https://github.com/RaRe-Technologies/smart_open/pull/315), [@caboteria](https://github.com/caboteria))
Expand Down
8 changes: 6 additions & 2 deletions MANIFEST.in
@@ -1,4 +1,8 @@
include LICENSE
include README.rst
include CHANGELOG.rst
include setup.py
include MIGRATING_FROM_OLDER_VERSIONS.rst
include CHANGELOG.md
recursive-include sampledata *.zip
recursive-include smart_open/tests/test_data *.txt *.xz *.gz
include integration-tests/README.md integration-tests/*.txt
recursive-include integration-tests *.py
104 changes: 104 additions & 0 deletions MIGRATING_FROM_OLDER_VERSIONS.rst
@@ -0,0 +1,104 @@
Migrating to the new ``open`` function
======================================

Since 1.8.1, there is a ``smart_open.open`` function that replaces ``smart_open.smart_open``.
The new function offers several advantages over the old one:

- 100% compatible with the built-in ``open`` function (aka ``io.open``): it accepts all
the parameters that the built-in ``open`` accepts.
- The default open mode is now "r", the same as for the built-in ``open``.
The default for the old ``smart_open.smart_open`` function used to be "rb".
- Fully documented keyword parameters (try ``help("smart_open.open")``)

The instructions below will help you migrate to the new function painlessly.

First, update your imports:

.. code-block:: python
>>> from smart_open import smart_open # before
>>> from smart_open import open # after
In general, ``smart_open`` uses ``io.open`` directly, where possible, so if your
code already uses ``open`` for local file I/O, then it will continue to work.
If you want to continue using the built-in ``open`` function for e.g. debugging,
then you can ``import smart_open`` and use ``smart_open.open``.

**The default read mode is now "r" (read text).**
If your code was implicitly relying on the default mode being "rb" (read
binary), you'll need to update it and pass "rb" explicitly.

Before:

.. code-block:: python
>>> import smart_open
>>> smart_open.smart_open('s3://commoncrawl/robots.txt').read(32) # 'rb' used to be the default
b'User-Agent: *\nDisallow: /'
After:

.. code-block:: python
>>> import smart_open
>>> smart_open.open('s3://commoncrawl/robots.txt', 'rb').read(32)
b'User-Agent: *\nDisallow: /'
The ``ignore_extension`` keyword parameter is now called ``ignore_ext``.
It behaves identically otherwise.

The most significant change is in the handling on keyword parameters for the
transport layer, e.g. HTTP, S3, etc. The old function accepted these directly:

.. code-block:: python
>>> url = 's3://smart-open-py37-benchmark-results/test.txt'
>>> session = boto3.Session(profile_name='smart_open')
>>> smart_open.smart_open(url, 'r', session=session).read(32)
'first line\nsecond line\nthird lin'
The new function accepts a ``transport_params`` keyword argument. It's a dict.
Put your transport parameters in that dictionary.

.. code-block:: python
>>> url = 's3://smart-open-py37-benchmark-results/test.txt'
>>> params = {'session': boto3.Session(profile_name='smart_open')}
>>> open(url, 'r', transport_params=params).read(32)
'first line\nsecond line\nthird lin'
Renamed parameters:

- ``s3_upload`` -> ``multipart_upload_kwargs``
- ``s3_session`` -> ``session``

Removed parameters:

- ``profile_name``

**The profile_name parameter has been removed.**
Pass an entire ``boto3.Session`` object instead.

Before:

.. code-block:: python
>>> url = 's3://smart-open-py37-benchmark-results/test.txt'
>>> smart_open.smart_open(url, 'r', profile_name='smart_open').read(32)
'first line\nsecond line\nthird lin'
After:

.. code-block:: python
>>> url = 's3://smart-open-py37-benchmark-results/test.txt'
>>> params = {'session': boto3.Session(profile_name='smart_open')}
>>> open(url, 'r', transport_params=params).read(32)
'first line\nsecond line\nthird lin'
See ``help("smart_open.open")`` for the full list of acceptable parameter names,
or view the help online `here <https://github.com/RaRe-Technologies/smart_open/blob/master/help.txt>`__.

If you pass an invalid parameter name, the ``smart_open.open`` function will warn you about it.
Keep an eye on your logs for WARNING messages from ``smart_open``.

0 comments on commit ee80e4d

Please sign in to comment.