Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor threading/ray; add single-path distributed s3 select impl #1446

Merged
merged 4 commits into from
Jul 11, 2022

Conversation

kukushking
Copy link
Contributor

Feature or Bugfix

  • Refactoring
  • Feature

Detail

  • add ray pool & thread pool executor wrapper
  • add single-path distributed s3 select impl
  • refactor distributed lakeformation impl

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

- add ray pool & thread pool executor wrapper
- add single-path distributed s3 select impl
- refactor distributed lakeformation impl
@kukushking kukushking added this to the 3.0.0 milestone Jul 8, 2022
@jaidisido jaidisido changed the title Rrefactor threading/ray; add single-path distributed s3 select impl Refactor threading/ray; add single-path distributed s3 select impl Jul 11, 2022
awswrangler/_utils.py Outdated Show resolved Hide resolved
awswrangler/_threading.py Outdated Show resolved Hide resolved
def __init__(self, processes: Optional[Union[bool, int]] = None):
self._exec: Pool = Pool(processes=None if isinstance(processes, bool) else processes)

def map(self, func: Callable[..., List[str]], _: boto3.Session, *args: Any) -> List[Any]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this assume that the first argument after func must always be a boto3_session? Don't you think it's the kind of opaque knowledge that external/new contributors wouldn't know about?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I basically followed the same convention we're using in WriteProxy and to be honest I prefer an explicit convention over any other trickery (e.g. picking out kwargs) but happy to discuss

awswrangler/distributed/_pool.py Show resolved Hide resolved
awswrangler/_utils.py Outdated Show resolved Hide resolved
awswrangler/_threading.py Outdated Show resolved Hide resolved
@jaidisido jaidisido merged commit 69d74cc into release-3.0.0 Jul 11, 2022
@jaidisido jaidisido deleted the refact-wrap-pool branch July 11, 2022 13:38
@kukushking kukushking self-assigned this Nov 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants