Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create generic shares_base and s3_datasets_shares modules from current dataset_sharing #1283

Closed
dlpzx opened this issue May 17, 2024 · 2 comments
Assignees
Labels
effort: large priority: high status: in-progress This issue has been picked and is being implemented type: newfeature New feature request

Comments

@dlpzx
Copy link
Contributor

dlpzx commented May 17, 2024

Is your feature request related to a problem? Please describe.
It is difficult to add new sharing item types with the current implementation of the datasets_sharing module.

Describe the solution you'd like
If we want to add more types of sharing(e.g. Redshift dataset sharing, notebook sharing) a first step will be the creation of a generic class that defines a data.all Shares abstraction that can be used by each of the particular sharing implementations.

Describe alternatives you've considered
This are rough design considerations that might change during implementation:


⚠️ WIP
This item is a continuation of #1123 that implements the top part of the diagram, the separation between datasets_base and s3_datasets. Now we need to do the same for shares.

Backend

image

1. shares_base module

With the generic code to create share objects and share items.

  • api - all APIs on ShareObject and the approval/reject/revoke workflow. List generic ShareItems
  • aws - nothing, it should be technology agnostic
  • db - generic ShareObject and ShareItem models. Operations on these models
    - in ShareObject: rename datasetUri as targetUri
    - in ShareItem: remove Glue and S3 details from the shareItem
    - repositories: all operations that do not involve details on S3/Glue
  • cdk - nothing, it does not involve any particular AWS IAM permissions or infra
  • handlers - ECS handler that handles shares tasks and executes sharing manager methods accordingly
  • services
    • sharing_service - interfaces to implement ShareProcessors
    • shares_enums - they are generic - ShareItemTypes will be used to know the items that can be shared
    • shares_permissions - they are generic
    • share_object_service
    • share_item_service
  • tasks
    • generic shareTask that loads all shareProcessors
    • generic shareVerifier that loads all shareProcessors
    • generic shareReapply that loads all shareProcessors

image

2. s3_datasets_shares module

With the specific code to share S3 dataset items.

  • api - TBD: if there is any API that is specific to s3 datasets it will remain here
  • aws: the same ones as before, lakeformation, s3, glue, kms...
  • db
    • Specific repositories to get share items with details of Tables, Folders, Buckets
  • cdk
    • pivot role and env role permissions
  • handlers: nothing
  • services: the specific services used to share S3 resources in the ECS task
    • processors
    • managers
    • other utilities
  • tasks: no sharing tasks! the actual sharing ECS task is generic, the specific logic is imported through services

Frontend

Most views related to shares are very generic. If anything, share item tables and share details (consumption details) might be s3-specific. Since the shares view is being modified in other workstreams this is paused at the moment.

Config.json

tbd

@dlpzx
Copy link
Contributor Author

dlpzx commented May 17, 2024

Step-by-step implementation plan

  • Rename dataset_sharing to s3_datasets_shares and create shares_base module (PART1)
  • Move enums and permissions to shares_base(PART1)
  • Move ShareObject and ShareItem db models to shares_base (PART2)
  • Move Share state machines to shares_base into its own file (PART2)
  • Create Interface in shares_base to execute sharing tasks from s3_datasets_shares (PART3)
  • Same for Share-verify tasks (PART3)
  • Same for Share-reapply tasks (PART3)
  • Create base ShareItem model in shares_base. Use ShareItem to create S3ShareItems or GlueShareItems, S3BucketShareItem, S3AccessPointShareItem --> Instead of making multiple ShareItems with particular information about the shareItem (glutable...) we should be using the itemType and itemUri to get the table technical data. The task is now: remove the usage of shareItem.GlueTableName, GlueDatabaseName and AccessPointName (PART4)
  • Move generic support files to shares_base: exceptions and notifications (PART5)
  • Split APIs into generic APIs and s3-specific APIs (PART6)
  • Make share object service independent from s3 datasets and move to shares_base (PART7)
  • Make share item service independent from s3 datasets and move to shares_base, create a second share_item service in s3 if needed (PART8)
  • Split db-repositories (PART9)
  • Remove any other S3 reference in shares_base (PART10)
  • Rename files and clean-up s3_datasets_share (PART11)

The next items are enhancements

dlpzx added a commit that referenced this issue May 22, 2024
…art 1 (renaming, enums and permissions) (#1284)

### Feature or Bugfix
- Feature
- Refactoring

### Detail
As explained in the design for #1123 and #1283 we are trying to
implement generic `datasets_base` and `shares_base` modules that can be
used by any type of datasets and by any type of shareable object in a
generic way.

In this PR:
- Rename `dataset_sharing` as `s3_dataset_shares`
- Create `shares_base` and introduce dependency (`s3_dataset_shares`
depends on `shares_base`)
- Move generic enums to shares_base
- Move generic permissions to shares_base


### Relates
- #1283 
- #1123 
- #955 

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
dlpzx added a commit that referenced this issue May 22, 2024
…tend) (#1292)

### Feature or Bugfix
- Feature

### Detail
As explained in the design for #1123 we are trying to implement a
generic `datasets_base` module that can be used by any type of datasets
in a generic way.

In this PR we:
- Create DatasetsBase module in frontend. Depends on S3_Datasets module
- Move DatasetsList view and DatasetListItem component to DatasetsBase
- Add CreateDataset modal that allows multiple types of datasets
creation
- Fix routes and redirects to point at /datasets/ or any other
/X-dataset/
- Move dataset_base services. In backend/datasets_base/api we define the
following queries that are good candidates to become part of the
DatasetsBase module
- listDatasets - only used in DatasetsBase/DatasetList view - it should
be in DatasetsBase/services
- listOwnedDatasets - only used in Shares/SharesBoxList view - it should
be in Shares/services
- listDatasetsCreatedInEnvironment - only used in
Environments/EnvDataset tab - it should be in Environment/services

If we want to keep everything clean we could rename all "datasets" as
"s3_datasets" or equivalent in the S3_Datasets module. Because it is a
cosmetic change that would pollute the PR a lot I have decided not to
include it.

⚠️ UPDATE: Next steps
The Data Shared With You table in Environments>Datasets tab needs a
remake. It contains references to each type of item and it is very
coupled with s3-dataset-shares. For the moment I just made it work for
the changes of s3-datasets, but when completing the work in #1283 we
should fix this. Maybe in favor of DataGrid

### Relates
- #1123 
- #955 


### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
dlpzx added a commit that referenced this issue May 23, 2024
…art 2 (db objects to shares_base) (#1294)

### Feature or Bugfix
- Refactoring

### Detail
As explained in the design for #1123 and #1283 we are trying to
implement generic `datasets_base` and `shares_base` modules that can be
used by any type of datasets and by any type of shareable object in a
generic way.

In this PR:
- Move shares state machines into its own file and move them to the
`shares_base` module.
- Create new ShareObjectRepository and copy some generic methods to it
- Move db share objects to shares_base:
- ShareObject has a field called datasetUri. For the scope of S3 and
Redshift datasets we can leave it as dataseturi, but if we want to
implement other kinds of sharing we should rethink it: we need to store
the "approvers" of the share in some way. For the moment I am not going
to go down that path until shares and s3 are uncoupled, then we can see
how we would implement a complete generic sharing.
- ShareObjectItem includes 3 fields related to the glue tables... we
need to get rid of them in the backend code and look up the info with
the itemType and itemUri. Left for part3 to keep the PR clean.


### Relates
- #1283 
- #1123 
- #955 

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
@dlpzx dlpzx self-assigned this May 31, 2024
dlpzx added a commit that referenced this issue Jun 4, 2024
…art 3 (share processor and manager interfaces) (#1298)

### Feature or Bugfix
- Feature
- Refactoring

### Detail
As explained in the design for #1123 and #1283 we are trying to
implement generic `datasets_base` and `shares_base` modules that can be
used by any type of datasets and by any type of shareable object in a
generic way.

In this PR:
- Move ECS handler for sharing tasks to shares_base
- Move sharing tasks to shares_base
- Move DataSharingService to shares_base and rename it as
SharingService. Make SharingService generic and remove any reference to
specific items. In this process:
- attach and delete table and folder permissions are moved into the
specific share processor. delete permissions are processed item by item
and moved into the shareItemService
- Some methods are copied from ShareObjectRepository in
s3_datasets_shares to shares_base. They have not been removed from
s3_datasets_shares, instead there is a TODO marking those methods that
have been copied. The migration and clean-up of shareObjectRepository
will be done in a following PR
- Need to introduce `DatasetBaseRepository.get_dataset_by_uri(session,
share.datasetUri)` to avoid future circular dependencies: shares_base
depends only on datasets_base.
- Clean-up and consolidate methods: remove updates of the share items
outside of state machine transactions. Only re-used methods and dataset
lock handling in share_manager --> TODO: dataset_lock manager should be
its own service outside of sharing, but this is out of the scope of this
PR.
- Add updates of share_item statuses in except clauses for more robust
sharing
- Introduce ShareProcessor and ShareManager interfaces and use them in
the SharingService: instead of the processor inheriting the manager
class, the processor uses the ShareProcessor interface and constructs a
manager when needed.
- Introduce new load ImportMode `SHARES_TASK` and register
ShareProcessors in s3_datasets_share
(`backend/dataall/modules/s3_datasets_shares/__init__.py`)


![image](https://github.com/data-dot-all/dataall/assets/71252798/af4eafc3-c990-4532-ba25-62ef950791aa)

See full detail of SharingService design in #1283

### Next steps/Open questions
For failures I think we should rollback whatever actions where
performed. For example, if we are sharing a table and it failed in one
step, it should revert all the steps executed before. @petrkalos
@SofiaSazonova @noah-paige what do you think?



### Relates
- #1283 
- #1123 
- #955 

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
dlpzx added a commit that referenced this issue Jun 6, 2024
…art 4 (remove s3 info from shareItem db models) (#1311)

### Feature or Bugfix
- Refactoring

### Detail
As explained in the design for #1123 and #1283 we are trying to
implement generic `datasets_base` and `shares_base` modules that can be
used by any type of datasets and by any type of shareable object in a
generic way.

In this PR:
- Remove the fields `GlueDatabaseName`, `GlueTableName` and
`S3AccessPointName` from the ShareObjectItem db model. The goal is to
have a generic ShareObjectItem and access the specific item information
through its itemUri and itemType.
- Migration script to drop those columns. 
- Disclaimer, there is still work to do on the gql types; but that is
out of the scope for this PR

### Relates
- #1283 
- #1123 
- #955 

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
dlpzx added a commit that referenced this issue Jun 6, 2024
…art 5 (move exceptions and notifications to shares_base) (#1312)

### Feature or Bugfix
- Refactoring

### Detail
As explained in the design for #1123 and #1283 we are trying to
implement generic `datasets_base` and `shares_base` modules that can be
used by any type of datasets and by any type of shareable object in a
generic way.

In this PR:
- Move share_exceptions to shares_base
- Move share_notification_service to shares_base


### Relates
- #1283 
- #1123 
- #955 

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
dlpzx added a commit that referenced this issue Jun 17, 2024
…art 6 (Split APIs and graphql types) (#1320)

### Feature or Bugfix
- Feature
- Refactoring

### Detail
As explained in the design for #1123 and #1283 we are trying to
implement generic `datasets_base` and `shares_base` modules that can be
used by any type of datasets and by any type of shareable object in a
generic way.

- First, this PR splits the query used in Worksheet and Environments to
list glue databases and list datasets. They are pretty different
queries, the one used in Worksheets is only relevant for S3 datasets,
while the one in Environment is focused on the share items in general:
- Introduce new API `listS3DatasetsSharedWithEnvGroup` to list shared
glue databases in Worksheets view. It is part of the s3_datasets_shares
module. This new API replaces the usage of `searchEnvironmentDataItems`
in Worksheets frontend.
- Remove Glue-parameters from `searchEnvironmentDataItems` API, this API
belongs to shares_base. It is only used in the Environment view >
Datasets tab, so I moved the API in frontend to modules/Environment.
- remove unused parameters (`tables`, `locations`) from statistics in
`api/types/ShareObjectStatistics`. Now the statistics are only generic.

- Introduce new API `getS3ConsumptionData` in s3_datasets_shares. This
new API call gets the details of gluedatabase/table, s3accesspoint that
were previously part of ShareObject. This way the graphql ShareObject
does not contain specific S3 info.

- The rest of the APIs have been split in `shares_base` and
`s3_datasets_shares`. In general, all the share lifecycle (create, add
items, approve...) is part of shares_base. listDatasetShareObjects,
verifyDatasetShareObjects used in the S3-Dataset UI are part of
s3_datasets_shares

- TODO: Review tests and create new tests for get_consumption_data.
Currently tests for shares and datasets are placed in the same folder. I
will open a separate PR to order the tests a bit before this

### Relates
- #1283 
- #1123 
- #955 
- #1277 ---> This PR needs
to be merged and then I will introduce some changes in the ShareView.

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
dlpzx added a commit that referenced this issue Jun 19, 2024
…art 7 (share_object_service) (#1340)

### Feature or Bugfix
- Refactoring

### Detail
As explained in the design for #1123 and #1283 we are trying to
implement generic `datasets_base` and `shares_base` modules that can be
used by any type of datasets and by any type of shareable object in a
generic way.

The goal of this PR is to move the `share_object_service` to
`shares_base` and refactor any dependency to S3 in the service.

- Move file and fix imports of ShareObjectService
- Use DatasetsBase and DatasetsBaseRepository instead of the S3
equivalents
- ⚠️ Avoid Dashboard check logic in
`ShareObjectService.submit_share_object` see below
- ⚠️ Avoid SharePolicyService logic in
`ShareObjectService.create_share_object` see below
- Create ShareLogsService for logs
- Remove unused methods
- I also copied share_item_service to shares_base (it will be used in
next PR)


#### Avoid Dashboard check logic in
`ShareObjectService.submit_share_object`
Currently, whenever a share request is submitted, we check if the
REQUESTER environment has dashboards enabled and if there are shared
tables we verify that the Quicksight subscription is active.

Alternative: perform this check in the share processor of tables. It
solves the issue, but it gives a poorer user experience as it is
difficult to figure out for the requester why the share failed. This can
be solved holistically as requested in
#1168.

Decision: move the logic to the processor and make the table share fail.

#### Avoid SharePolicyService logic in
`ShareObjectService.create_share_object`

When a share request is first created, we perform a series of operations
to ensure that an IAM policy for the share requester principal IAM role
is created.

Alternative 1: move this logic inside the share processor. Not sure if
it is possible. It would be the ideal solution, but the
SharePolicyService throws errors in the create share object API if the
policy is not attached.

Alternative 2: implement interface to define share_policies (similar to
the dataset-actions that use share logic in
`backend/dataall/modules/s3_datasets_shares/services/dataset_sharing_service.py`).

Decision: we want to preserve the user experience of having the IAM
policy created before the share request is processed. Plus, it is not an
uncommon pattern that could get extended by other dataset types, for
example redshift sharing might need additional share policies. For this
reason this PR implements alternative 2


### Relates
- #1283 
- #1123 
- #955 

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
dlpzx added a commit that referenced this issue Jun 19, 2024
…art 8 (sharei_item_service) (#1350)

### Feature or Bugfix
- Refactoring

### Detail
As explained in the design for #1123 and #1283 we are trying to
implement generic `datasets_base` and `shares_base` modules that can be
used by any type of datasets and by any type of shareable object in a
generic way.

The goal of this PR is to split the `share_item_service`logic into the a
generic service in `shares_base` and an specific service in
s3_datasets_shares.
- `ShareItemService` in shares_base only has shareItem logic without
references to S3 or Glue.
- `S3ShareItemService` in s3_datasets_shares has logic for share items
that are tables and folders.

The files' names are a bit messy but i don't want to pollute this PR
with more changes. I'll do a review of the file names in part 10.

### Relates
- #1283 
- #1123 
- #955 

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
dlpzx added a commit that referenced this issue Jun 21, 2024
…art 9 (share db repositories) (#1351)

### Feature or Bugfix
- Refactoring

### Detail
As explained in the design for #1123 and #1283 we are trying to
implement generic `datasets_base` and `shares_base` modules that can be
used by any type of datasets and by any type of shareable object in a
generic way.

This PR includes:
- Split the ShareobjectRepository from s3_datasets_shares into:
- `ShareobjectRepository` (shares_base) - generic db operations on share
objects - no references to S3, Glue
- `ShareStatusRepository` (shares_base) - db operations related to the
sharing state machine states - a way to split the db operations into
smaller chunks
- `S3ShareobjectRepository` (s3_datasets_share) - db operations on s3
share objects - used only in s3_datasets_shares. They might contain
references to DatasetTables, S3Datasets... They are used in the clean-up
activities and to count resources in environment.

- Adapt `S3ShareobjectRepository` to S3 objects. For some queries it was
needed to add filters on the type of share items retrieved, so that if
in the future anyone adds a new share type the code still makes sense.
To add some more meaning, some functions are renamed to clearly point
out that they are s3 functions or what they do.

- Make `ShareobjectRepository` completely generic. The following queries
needed extra work:
- ShareObjectRepository.get_share_item - renamed as
`get_share_item_details`
- `list_shareable_items` - split in 2 parts
`list_shareable_items_of_type` + `paginated_list_shareable_items`: the
first function is invoked recursively over the list of share processors,
instead of querying the DatasetTable, DatasetStorageLocation and
DatasetBucket we query the shareable_type. The challenge is to get all
fields from the db Resource object that all of them are built upon. In
particular the field `itemName` does not match the BucketName (in
bucket) or the S3Prefix (in folders). For this reason I added a
migration script to backfill the DatasetBucket.name as
DatasetBucket.S3BucketName. and the DatasetStorageLocation.name with
DatasetStorageLocation.S3Prefix. `paginated_list_shareable_items` joins
the list of subqueries, filters and paginates.
- In verify_dataset_share_objects instead of using list_shareable_items,
I replaced it by `ShareObjectRepository.get_all_share_items_in_share`
which does not need tables, storage, avoiding the whole S3 logic and
avoiding unnecessary queries

- Remove S3 references from shares_base.api.resolvers. Use DatasetBase
and DatasetBaseRepository instead. Remove unused `ShareableObject`.

- I had some problems with circular dependencies so I created the
`ShareProcessorManager` in shares_base for the registration of
processors. The SharingService uses the manager to get all processors.

Missing items for Part10:
- Lake Formation cross region table references in
shares_base/services/share_item_service.py:add_shared_item
- remove table references in
shares_base/services/share_item_service.py:remove_shared_item
- remove s3_prefix references
shares_base/services/share_notification_service:notify_new_data_available_from_owners
- RENAMING! Right now the names are a bit misleading

### Relates
- #1283 
- #1123 
- #955 


### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
dlpzx added a commit that referenced this issue Jun 25, 2024
…art 10 (other s3 references in shares_base) (#1357)

### Feature or Bugfix
- Refactoring

### Detail
As explained in the design for #1123 and #1283 we are trying to
implement generic `datasets_base` and `shares_base` modules that can be
used by any type of datasets and by any type of shareable object in a
generic way.

This PR:
- Remove the delete_resource_policy conditional for Tables in
`backend/dataall/modules/shares_base/services/share_item_service.py` -->
Permissions to the Table in data.all are granted once the share has
succeeded, the conditional that checks for share_failed tables should
not exist.
- Remove unnecessary check in share_item_service: in add_share_item we
check if it is a table whether it is a cross-region share. This check is
completely unnecessary because when we create a share request object we
are already checking if it is cross-region
- Use `get_share_item_details` in add_share_item - we want to check if
the table, folder, bucket exist so we need to query those tables.
- Move s3_prefix notifications to subscription task
- Fix error in query in
`backend/dataall/modules/shares_base/db/share_state_machines_repositories.py`


### Relates
- #1283 
- #1123 
- #955 

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
dlpzx added a commit that referenced this issue Jun 27, 2024
…art 11 (renaming and cleaning up s3_shares) (#1359)

### Feature or Bugfix
- Refactoring

### Detail
As explained in the design for #1123 and #1283 we are trying to
implement generic `datasets_base` and `shares_base` modules that can be
used by any type of datasets and by any type of shareable object in a
generic way.

This is one of the last PRs focused on renaming files and cleaning-up
the s3_datasets_shares module. The first step is a consolidation of the
file and classes names in the services to clearly refer to s3_shares:
- `services.managed_share_policy_service.SharePolicyService` --->
`services.s3_share_managed_policy_service.S3SharePolicyService`
- `services.dataset_sharing_alarm_service.DatasetSharingAlarmService`
--> `services.s3_share_alarm_service.S3ShareAlarmService`
- `services.managed_share_policy_service.SharePolicyService` -->
`services.s3_share_managed_policy_service.S3SharePolicyService`

👀 The main refactoring happens in what is used to be
`services.dataset_sharing_service`.
- The part that implements the `DatasetServiceInterface` has been moved
to `services/s3_share_dataset_service.py` as the `S3ShareDatasetService`
- The part used in the resolvers and by other methods has been renamed
as `services.s3_share_service.py` and the methods for the folder/table
permissions are also added to the S3ShareService (from
share_item_service)

Lastly, there is one method previously in share_item_service that has
been moved to the GlueClient directly as
`get_glue_database_from_catalog`.


### Relates
- #1283 
- #1123 
- #955 

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
@dlpzx dlpzx closed this as completed Jun 28, 2024
@dlpzx
Copy link
Contributor Author

dlpzx commented Jun 28, 2024

The core functionality has been completed. There will be follow-up enhancements related to shares tackled in separate github issues

noah-paige added a commit that referenced this issue Aug 30, 2024
commit 22a6f6ef 
Author: Noah Paige <noahpaig@amazon.com> 
Date: Mon Jul 08 2024 11:28:07 GMT-0400 (Eastern Daylight Time) 

    Add integ tests


commit 4fb7d653 
Author: Noah Paige <noahpaig@amazon.com> 
Date: Mon Jul 08 2024 11:26:36 GMT-0400 (Eastern Daylight Time) 

    Merge env test changes


commit 4cf42e8 
Author: Petros Kalos <kalosp@amazon.com> 
Date: Fri Jul 05 2024 08:19:34 GMT-0400 (Eastern Daylight Time) 

    improve docs


commit 65f930a 
Author: Petros Kalos <kalosp@amazon.com> 
Date: Fri Jul 05 2024 08:10:56 GMT-0400 (Eastern Daylight Time) 

    fix failures


commit 170b7ce 
Author: Petros Kalos <kalosp@amazon.com> 
Date: Wed Jul 03 2024 10:52:20 GMT-0400 (Eastern Daylight Time) 

    add group/consumption_role invite/remove tests


commit ba77d69 
Author: dlpzx <71252798+dlpzx@users.noreply.github.com> 
Date: Wed Jul 03 2024 06:51:47 GMT-0400 (Eastern Daylight Time) 

    Rename alias for env_vars kms key in cognito lambdas FE and BE (#1385)

### Feature or Bugfix
- Bugfix

### Detail
For the case in which we deploy FE and BE in us-east-1 the new lambda
env_key alias is the same one for TriggerFunctionCognitoUrlsConfig in FE
and for TriggerFunctionCognitoConfig in BE, which results in a failure
of the CICD in the FE stack because the alias already exists.

This PR changes the name of both aliases to avoid this conflict. It also
adds envname to avoid issues with other deployment environments/tooling
account in the future

### Relates
- <URL or Ticket>

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

commit e5923a9 
Author: dlpzx <71252798+dlpzx@users.noreply.github.com> 
Date: Wed Jul 03 2024 04:27:11 GMT-0400 (Eastern Daylight Time) 

    Fix lambda_env_key out of scope for vpc-facing cognito setup (#1384)

### Feature or Bugfix
- Bugfix

### Detail
The KMS key for the Lambda environment variables in the Cognito IdP
stack was defined inside an if-clause for internet facing frontend.
Outside of that if, for vpc-facing architecture the kms key does not
exist and the CICD pipeline fails. This PRs move the creation of the KMS
key outside of the if.

### Relates

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

commit 3ccacfc 
Author: Noah Paige <69586985+noah-paige@users.noreply.github.com> 
Date: Mon Jul 01 2024 13:56:58 GMT-0400 (Eastern Daylight Time) 

    Add delete docs not found when re indexing in catalog task (#1365)

### Feature or Bugfix
<!-- please choose -->
- Feature

### Detail
- Add logic to Catalog Indexer Task to Delete Docs No Longer in RDS
- TODO: Add Ability to Re-index Catalog Items via Dataall Admin UI

### Relates
- #1078

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

commit e2817a1 
Author: Noah Paige <69586985+noah-paige@users.noreply.github.com> 
Date: Mon Jul 01 2024 05:14:07 GMT-0400 (Eastern Daylight Time) 

    Fix/glossary status (#1373)

### Feature or Bugfix
<!-- please choose -->
- Bugfix


### Detail
- Add back `status` to Glossary GQL Object for GQL Operations
(getGlossary, listGlossaries)
- Fix  `listOrganizationGroupPermissions` enforce non null on FE


### Relates


### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

commit c3c58bd 
Author: Petros Kalos <kalosp@amazon.com> 
Date: Fri Jun 28 2024 06:55:42 GMT-0400 (Eastern Daylight Time) 

    add environment tests (#1371)

### Feature or Bugfix
Feature

### Detail
* add list_environment tests
* add test for updating an environment (via update_stack)
* generalise the polling functions for stacks

### Relates
#1220 

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

commit e913d48 
Author: dlpzx <71252798+dlpzx@users.noreply.github.com> 
Date: Fri Jun 28 2024 04:15:49 GMT-0400 (Eastern Daylight Time) 

    Add search (Autocomplete) in miscellaneous dropdowns (#1367)

### Feature or Bugfix
- Feature

### Detail
Autocomplete for environments and teams in the following frontend views
as requested in #1012. In this case the views required custom dropdowns.

❗ I used `noOptionsText` whenever it was necessary instead of checking
groupOptions lenght >0
- [x] DatasetEditForm.js -> ❗ I kept the stewards field as `freesolo` -
what that means is that users CAN specify options that are not on the
list. I would like the reviewer to confirm this is what we want. At the
end stewardship is a delegation of permissions, it makes sense that
delegation happens to other teams. Also changed DatasetCreateForm
- [X] RequestDashboardAccessModal.js - already implemented, minor
changes
- [X] EnvironmentTeamInviteForm.js - already implemented, minor changes.
-> Kept `freesolo` because invited teams might not be the user teams.
Same reason why there is no check for groupOptions == 0, if there are no
options there is still the free text option.
- [X] EnvironmentRoleAddForm.js
- [X] NetworkCreateModal.js 

### Relates
- #1012 

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

commit ee71d7b 
Author: Tejas Rajopadhye <71188245+TejasRGitHub@users.noreply.github.com> 
Date: Thu Jun 27 2024 14:08:27 GMT-0400 (Eastern Daylight Time) 

    [Gh 1301] Enhancement Feature - Bulk share reapply on dataset  (#1363)

### Feature or Bugfix
- Feature


### Detail

- Adds feature to reapply shares in bulk for a dataset. 
- Also contains bugfix for AWS worker lambda errors 

### Relates
- #1301
- #1364

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)? N/A
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization? N/A
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features? N/A
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users? N/A
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

---------

Co-authored-by: trajopadhye <tejas.rajopadhye@yahooinc.com>

commit 27f1ad7 
Author: Noah Paige <69586985+noah-paige@users.noreply.github.com> 
Date: Thu Jun 27 2024 13:18:32 GMT-0400 (Eastern Daylight Time) 

    Convert Dataset Lock Mechanism to Generic Resource Lock (#1338)

### Feature or Bugfix
<!-- please choose -->
- Feature
- Bugfix
- Refactoring

### Detail
- Convert Dataset Lock Mechanism to Generic Resource Lock
- Extend locking to Share principals (i.e. EnvironmentGroup and
Consumption Roles)

- Making locking a generic component not tied to datasets


### Relates
- #1093 

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

---------

Co-authored-by: dlpzx <dlpzx@amazon.com>

commit e3b8658 
Author: Petros Kalos <kalosp@amazon.com> 
Date: Thu Jun 27 2024 12:50:59 GMT-0400 (Eastern Daylight Time) 

    ignore ruff change in blame (#1372)

### Feature or Bugfix
<!-- please choose -->
- Feature
- Bugfix
- Refactoring

### Detail
- <feature1 or bug1>
- <feature2 or bug2>

### Relates
- <URL or Ticket>

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

commit 2e80de4 
Author: dlpzx <71252798+dlpzx@users.noreply.github.com> 
Date: Thu Jun 27 2024 10:59:18 GMT-0400 (Eastern Daylight Time) 

    Generic shares_base module and specific s3_datasets_shares module - part 11 (renaming and cleaning up s3_shares) (#1359)

### Feature or Bugfix
- Refactoring

### Detail
As explained in the design for #1123 and #1283 we are trying to
implement generic `datasets_base` and `shares_base` modules that can be
used by any type of datasets and by any type of shareable object in a
generic way.

This is one of the last PRs focused on renaming files and cleaning-up
the s3_datasets_shares module. The first step is a consolidation of the
file and classes names in the services to clearly refer to s3_shares:
- `services.managed_share_policy_service.SharePolicyService` --->
`services.s3_share_managed_policy_service.S3SharePolicyService`
- `services.dataset_sharing_alarm_service.DatasetSharingAlarmService`
--> `services.s3_share_alarm_service.S3ShareAlarmService`
- `services.managed_share_policy_service.SharePolicyService` -->
`services.s3_share_managed_policy_service.S3SharePolicyService`

👀 The main refactoring happens in what is used to be
`services.dataset_sharing_service`.
- The part that implements the `DatasetServiceInterface` has been moved
to `services/s3_share_dataset_service.py` as the `S3ShareDatasetService`
- The part used in the resolvers and by other methods has been renamed
as `services.s3_share_service.py` and the methods for the folder/table
permissions are also added to the S3ShareService (from
share_item_service)

Lastly, there is one method previously in share_item_service that has
been moved to the GlueClient directly as
`get_glue_database_from_catalog`.


### Relates
- #1283 
- #1123 
- #955 

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

commit 1c09015 
Author: Noah Paige <69586985+noah-paige@users.noreply.github.com> 
Date: Thu Jun 27 2024 04:16:14 GMT-0400 (Eastern Daylight Time) 

    fix listOrganizationGroupPermissions (#1369)

### Feature or Bugfix
<!-- please choose -->
- Bugfix


### Detail
- Fix listOrganizationGroupPermissions


### Relates
- <URL or Ticket>

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

commit 976ec6b 
Author: dlpzx <71252798+dlpzx@users.noreply.github.com> 
Date: Thu Jun 27 2024 04:13:14 GMT-0400 (Eastern Daylight Time) 

    Add search (Autocomplete) in create pipelines (#1368)

### Feature or Bugfix
- Feature

### Detail
Autocomplete for environments and teams in the following frontend views
as requested in #1012.
This PR implements it for createPipelines

### Relates
- #1012 

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

commit 6c909a3 
Author: Noah Paige <69586985+noah-paige@users.noreply.github.com> 
Date: Wed Jun 26 2024 11:18:04 GMT-0400 (Eastern Daylight Time) 

    fix migration to not rely on OrganizationService or RequestContext (#1361)

### Feature or Bugfix
<!-- please choose -->
- Bugfix

### Detail
- Ensure migration script does not need RequestContext - otherwise fails
in migration trigger lambda as context info not set / available


### Relates
- #1306

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

commit 90835fb 
Author: Anushka Singh <anushka.singh.2511@gmail.com> 
Date: Wed Jun 26 2024 11:17:22 GMT-0400 (Eastern Daylight Time) 

    Issue1248: Persistent Email Reminders (#1354)

### Feature or Bugfix
- Feature


### Detail
- When a share request is initiated and remains pending for an extended
period, dataset producers will receive automated email reminders at
predefined intervals. These reminders will prompt producers to either
approve or extend the share request, thereby preventing delays in
accessing datasets.

Attaching screenshots for emails:

<img width="1336" alt="Screenshot 2024-06-20 at 5 34 31 PM"
src="https://github.com/data-dot-all/dataall/assets/26413731/d7be28c3-5c98-4146-92b1-295e136137a3">

<img width="1322" alt="Screenshot 2024-06-20 at 5 34 52 PM"
src="https://github.com/data-dot-all/dataall/assets/26413731/047556e8-59ee-4ebf-b8a7-c0a6684e2a63">


- Email will be sent every Monday at 9am UTC. Schedule can be changed in
cron expression in container.py

### Relates
- #1248

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Anushka Singh <anushka.singh@yahooinc.com>
Co-authored-by: trajopadhye <tejas.rajopadhye@yahooinc.com>
Co-authored-by: Mohit Arora <marora@yahooinc.com>
Co-authored-by: rbernota <rbernota@yahooinc.com>
Co-authored-by: Rick Bernotas <rbernota@verizonmedia.com>
Co-authored-by: Raj Chopde <rchopde@yahooinc.com>
Co-authored-by: Noah Paige <69586985+noah-paige@users.noreply.github.com>
Co-authored-by: dlpzx <71252798+dlpzx@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: jaidisido <jaidisido@gmail.com>
Co-authored-by: dlpzx <dlpzx@amazon.com>
Co-authored-by: mourya-33 <134511711+mourya-33@users.noreply.github.com>
Co-authored-by: nikpodsh <124577300+nikpodsh@users.noreply.github.com>
Co-authored-by: MK <manjula_kasturi@hotmail.com>
Co-authored-by: Manjula <manjula.kasturi@gmail.com>
Co-authored-by: Zilvinas Saltys <zilvinas.saltys@gmail.com>
Co-authored-by: Zilvinas Saltys <zilvinas.saltys@yahooinc.com>
Co-authored-by: Daniel Lorch <98748454+lorchda@users.noreply.github.com>
Co-authored-by: Tejas Rajopadhye <71188245+TejasRGitHub@users.noreply.github.com>
Co-authored-by: Zilvinas Saltys <zilvinas.saltys@oath.com>
Co-authored-by: Sofia Sazonova <sofia-s@304.ru>
Co-authored-by: Sofia Sazonova <sazonova@amazon.co.uk>

commit e477bdf 
Author: Noah Paige <69586985+noah-paige@users.noreply.github.com> 
Date: Wed Jun 26 2024 10:39:09 GMT-0400 (Eastern Daylight Time) 

    Enforce non null on GQL query string if non null defined (#1362)

### Feature or Bugfix
<!-- please choose -->
- Bugfix


### Detail
- Add `String!` to ensure non null input argument on FE if defined as
such on backend GQL operation for `listS3DatasetsSharedWithEnvGroup`


### Relates

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

commit d6b59b3 
Author: Noah Paige <69586985+noah-paige@users.noreply.github.com> 
Date: Wed Jun 26 2024 08:48:52 GMT-0400 (Eastern Daylight Time) 

    Fix Init Share Base (#1360)

### Feature or Bugfix
<!-- please choose -->
- Bugfix

### Detail
- Need to register processors in init for s3 dataset shares API module


### Relates

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

commit bd3698c 
Author: Petros Kalos <kalosp@amazon.com> 
Date: Wed Jun 26 2024 05:19:14 GMT-0400 (Eastern Daylight Time) 

    split cognito urls setup and cognito user creation (#1366)

### Feature or Bugfix
- Bugfix
### Details
For more details about the issue read #1353 
In this PR we are solving the problem by splitting the configuration of
Cognito in 2.
* First part (cognito_users_config.py) is setting up the required groups
and users and runs after UserPool deployment
* Second part (cognito_urls_config.py) is setting up Cognito's
callback/logout urls and runs after the CloudFront deployment

We chose to split the functionality because we need to have the
users/groups setup for the integration tests which are run after the
backend deployment.

The other althernative is to keep the config functionality as one but
make the integ tests run after CloudFront stage.

### Relates
- Solves #1353 

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
noah-paige added a commit that referenced this issue Aug 30, 2024
commit 4425e756 
Author: Noah Paige <noahpaig@amazon.com> 
Date: Mon Jul 08 2024 11:57:31 GMT-0400 (Eastern Daylight Time) 

    Fix


commit 4cd2bf77 
Author: Noah Paige <noahpaig@amazon.com> 
Date: Mon Jul 08 2024 11:56:38 GMT-0400 (Eastern Daylight Time) 

    Fix


commit 22a6f6ef 
Author: Noah Paige <noahpaig@amazon.com> 
Date: Mon Jul 08 2024 11:28:07 GMT-0400 (Eastern Daylight Time) 

    Add integ tests


commit 4fb7d653 
Author: Noah Paige <noahpaig@amazon.com> 
Date: Mon Jul 08 2024 11:26:36 GMT-0400 (Eastern Daylight Time) 

    Merge env test changes


commit 4cf42e8 
Author: Petros Kalos <kalosp@amazon.com> 
Date: Fri Jul 05 2024 08:19:34 GMT-0400 (Eastern Daylight Time) 

    improve docs


commit 65f930a 
Author: Petros Kalos <kalosp@amazon.com> 
Date: Fri Jul 05 2024 08:10:56 GMT-0400 (Eastern Daylight Time) 

    fix failures


commit 170b7ce 
Author: Petros Kalos <kalosp@amazon.com> 
Date: Wed Jul 03 2024 10:52:20 GMT-0400 (Eastern Daylight Time) 

    add group/consumption_role invite/remove tests


commit ba77d69 
Author: dlpzx <71252798+dlpzx@users.noreply.github.com> 
Date: Wed Jul 03 2024 06:51:47 GMT-0400 (Eastern Daylight Time) 

    Rename alias for env_vars kms key in cognito lambdas FE and BE (#1385)

### Feature or Bugfix
- Bugfix

### Detail
For the case in which we deploy FE and BE in us-east-1 the new lambda
env_key alias is the same one for TriggerFunctionCognitoUrlsConfig in FE
and for TriggerFunctionCognitoConfig in BE, which results in a failure
of the CICD in the FE stack because the alias already exists.

This PR changes the name of both aliases to avoid this conflict. It also
adds envname to avoid issues with other deployment environments/tooling
account in the future

### Relates
- <URL or Ticket>

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

commit e5923a9 
Author: dlpzx <71252798+dlpzx@users.noreply.github.com> 
Date: Wed Jul 03 2024 04:27:11 GMT-0400 (Eastern Daylight Time) 

    Fix lambda_env_key out of scope for vpc-facing cognito setup (#1384)

### Feature or Bugfix
- Bugfix

### Detail
The KMS key for the Lambda environment variables in the Cognito IdP
stack was defined inside an if-clause for internet facing frontend.
Outside of that if, for vpc-facing architecture the kms key does not
exist and the CICD pipeline fails. This PRs move the creation of the KMS
key outside of the if.

### Relates

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

commit 3ccacfc 
Author: Noah Paige <69586985+noah-paige@users.noreply.github.com> 
Date: Mon Jul 01 2024 13:56:58 GMT-0400 (Eastern Daylight Time) 

    Add delete docs not found when re indexing in catalog task (#1365)

### Feature or Bugfix
<!-- please choose -->
- Feature

### Detail
- Add logic to Catalog Indexer Task to Delete Docs No Longer in RDS
- TODO: Add Ability to Re-index Catalog Items via Dataall Admin UI

### Relates
- #1078

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

commit e2817a1 
Author: Noah Paige <69586985+noah-paige@users.noreply.github.com> 
Date: Mon Jul 01 2024 05:14:07 GMT-0400 (Eastern Daylight Time) 

    Fix/glossary status (#1373)

### Feature or Bugfix
<!-- please choose -->
- Bugfix


### Detail
- Add back `status` to Glossary GQL Object for GQL Operations
(getGlossary, listGlossaries)
- Fix  `listOrganizationGroupPermissions` enforce non null on FE


### Relates


### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

commit c3c58bd 
Author: Petros Kalos <kalosp@amazon.com> 
Date: Fri Jun 28 2024 06:55:42 GMT-0400 (Eastern Daylight Time) 

    add environment tests (#1371)

### Feature or Bugfix
Feature

### Detail
* add list_environment tests
* add test for updating an environment (via update_stack)
* generalise the polling functions for stacks

### Relates
#1220 

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

commit e913d48 
Author: dlpzx <71252798+dlpzx@users.noreply.github.com> 
Date: Fri Jun 28 2024 04:15:49 GMT-0400 (Eastern Daylight Time) 

    Add search (Autocomplete) in miscellaneous dropdowns (#1367)

### Feature or Bugfix
- Feature

### Detail
Autocomplete for environments and teams in the following frontend views
as requested in #1012. In this case the views required custom dropdowns.

❗ I used `noOptionsText` whenever it was necessary instead of checking
groupOptions lenght >0
- [x] DatasetEditForm.js -> ❗ I kept the stewards field as `freesolo` -
what that means is that users CAN specify options that are not on the
list. I would like the reviewer to confirm this is what we want. At the
end stewardship is a delegation of permissions, it makes sense that
delegation happens to other teams. Also changed DatasetCreateForm
- [X] RequestDashboardAccessModal.js - already implemented, minor
changes
- [X] EnvironmentTeamInviteForm.js - already implemented, minor changes.
-> Kept `freesolo` because invited teams might not be the user teams.
Same reason why there is no check for groupOptions == 0, if there are no
options there is still the free text option.
- [X] EnvironmentRoleAddForm.js
- [X] NetworkCreateModal.js 

### Relates
- #1012 

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

commit ee71d7b 
Author: Tejas Rajopadhye <71188245+TejasRGitHub@users.noreply.github.com> 
Date: Thu Jun 27 2024 14:08:27 GMT-0400 (Eastern Daylight Time) 

    [Gh 1301] Enhancement Feature - Bulk share reapply on dataset  (#1363)

### Feature or Bugfix
- Feature


### Detail

- Adds feature to reapply shares in bulk for a dataset. 
- Also contains bugfix for AWS worker lambda errors 

### Relates
- #1301
- #1364

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)? N/A
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization? N/A
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features? N/A
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users? N/A
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

---------

Co-authored-by: trajopadhye <tejas.rajopadhye@yahooinc.com>

commit 27f1ad7 
Author: Noah Paige <69586985+noah-paige@users.noreply.github.com> 
Date: Thu Jun 27 2024 13:18:32 GMT-0400 (Eastern Daylight Time) 

    Convert Dataset Lock Mechanism to Generic Resource Lock (#1338)

### Feature or Bugfix
<!-- please choose -->
- Feature
- Bugfix
- Refactoring

### Detail
- Convert Dataset Lock Mechanism to Generic Resource Lock
- Extend locking to Share principals (i.e. EnvironmentGroup and
Consumption Roles)

- Making locking a generic component not tied to datasets


### Relates
- #1093 

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

---------

Co-authored-by: dlpzx <dlpzx@amazon.com>

commit e3b8658 
Author: Petros Kalos <kalosp@amazon.com> 
Date: Thu Jun 27 2024 12:50:59 GMT-0400 (Eastern Daylight Time) 

    ignore ruff change in blame (#1372)

### Feature or Bugfix
<!-- please choose -->
- Feature
- Bugfix
- Refactoring

### Detail
- <feature1 or bug1>
- <feature2 or bug2>

### Relates
- <URL or Ticket>

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

commit 2e80de4 
Author: dlpzx <71252798+dlpzx@users.noreply.github.com> 
Date: Thu Jun 27 2024 10:59:18 GMT-0400 (Eastern Daylight Time) 

    Generic shares_base module and specific s3_datasets_shares module - part 11 (renaming and cleaning up s3_shares) (#1359)

### Feature or Bugfix
- Refactoring

### Detail
As explained in the design for #1123 and #1283 we are trying to
implement generic `datasets_base` and `shares_base` modules that can be
used by any type of datasets and by any type of shareable object in a
generic way.

This is one of the last PRs focused on renaming files and cleaning-up
the s3_datasets_shares module. The first step is a consolidation of the
file and classes names in the services to clearly refer to s3_shares:
- `services.managed_share_policy_service.SharePolicyService` --->
`services.s3_share_managed_policy_service.S3SharePolicyService`
- `services.dataset_sharing_alarm_service.DatasetSharingAlarmService`
--> `services.s3_share_alarm_service.S3ShareAlarmService`
- `services.managed_share_policy_service.SharePolicyService` -->
`services.s3_share_managed_policy_service.S3SharePolicyService`

👀 The main refactoring happens in what is used to be
`services.dataset_sharing_service`.
- The part that implements the `DatasetServiceInterface` has been moved
to `services/s3_share_dataset_service.py` as the `S3ShareDatasetService`
- The part used in the resolvers and by other methods has been renamed
as `services.s3_share_service.py` and the methods for the folder/table
permissions are also added to the S3ShareService (from
share_item_service)

Lastly, there is one method previously in share_item_service that has
been moved to the GlueClient directly as
`get_glue_database_from_catalog`.


### Relates
- #1283 
- #1123 
- #955 

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

commit 1c09015 
Author: Noah Paige <69586985+noah-paige@users.noreply.github.com> 
Date: Thu Jun 27 2024 04:16:14 GMT-0400 (Eastern Daylight Time) 

    fix listOrganizationGroupPermissions (#1369)

### Feature or Bugfix
<!-- please choose -->
- Bugfix


### Detail
- Fix listOrganizationGroupPermissions


### Relates
- <URL or Ticket>

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

commit 976ec6b 
Author: dlpzx <71252798+dlpzx@users.noreply.github.com> 
Date: Thu Jun 27 2024 04:13:14 GMT-0400 (Eastern Daylight Time) 

    Add search (Autocomplete) in create pipelines (#1368)

### Feature or Bugfix
- Feature

### Detail
Autocomplete for environments and teams in the following frontend views
as requested in #1012.
This PR implements it for createPipelines

### Relates
- #1012 

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

commit 6c909a3 
Author: Noah Paige <69586985+noah-paige@users.noreply.github.com> 
Date: Wed Jun 26 2024 11:18:04 GMT-0400 (Eastern Daylight Time) 

    fix migration to not rely on OrganizationService or RequestContext (#1361)

### Feature or Bugfix
<!-- please choose -->
- Bugfix

### Detail
- Ensure migration script does not need RequestContext - otherwise fails
in migration trigger lambda as context info not set / available


### Relates
- #1306

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

commit 90835fb 
Author: Anushka Singh <anushka.singh.2511@gmail.com> 
Date: Wed Jun 26 2024 11:17:22 GMT-0400 (Eastern Daylight Time) 

    Issue1248: Persistent Email Reminders (#1354)

### Feature or Bugfix
- Feature


### Detail
- When a share request is initiated and remains pending for an extended
period, dataset producers will receive automated email reminders at
predefined intervals. These reminders will prompt producers to either
approve or extend the share request, thereby preventing delays in
accessing datasets.

Attaching screenshots for emails:

<img width="1336" alt="Screenshot 2024-06-20 at 5 34 31 PM"
src="https://github.com/data-dot-all/dataall/assets/26413731/d7be28c3-5c98-4146-92b1-295e136137a3">

<img width="1322" alt="Screenshot 2024-06-20 at 5 34 52 PM"
src="https://github.com/data-dot-all/dataall/assets/26413731/047556e8-59ee-4ebf-b8a7-c0a6684e2a63">


- Email will be sent every Monday at 9am UTC. Schedule can be changed in
cron expression in container.py

### Relates
- #1248

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Anushka Singh <anushka.singh@yahooinc.com>
Co-authored-by: trajopadhye <tejas.rajopadhye@yahooinc.com>
Co-authored-by: Mohit Arora <marora@yahooinc.com>
Co-authored-by: rbernota <rbernota@yahooinc.com>
Co-authored-by: Rick Bernotas <rbernota@verizonmedia.com>
Co-authored-by: Raj Chopde <rchopde@yahooinc.com>
Co-authored-by: Noah Paige <69586985+noah-paige@users.noreply.github.com>
Co-authored-by: dlpzx <71252798+dlpzx@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: jaidisido <jaidisido@gmail.com>
Co-authored-by: dlpzx <dlpzx@amazon.com>
Co-authored-by: mourya-33 <134511711+mourya-33@users.noreply.github.com>
Co-authored-by: nikpodsh <124577300+nikpodsh@users.noreply.github.com>
Co-authored-by: MK <manjula_kasturi@hotmail.com>
Co-authored-by: Manjula <manjula.kasturi@gmail.com>
Co-authored-by: Zilvinas Saltys <zilvinas.saltys@gmail.com>
Co-authored-by: Zilvinas Saltys <zilvinas.saltys@yahooinc.com>
Co-authored-by: Daniel Lorch <98748454+lorchda@users.noreply.github.com>
Co-authored-by: Tejas Rajopadhye <71188245+TejasRGitHub@users.noreply.github.com>
Co-authored-by: Zilvinas Saltys <zilvinas.saltys@oath.com>
Co-authored-by: Sofia Sazonova <sofia-s@304.ru>
Co-authored-by: Sofia Sazonova <sazonova@amazon.co.uk>

commit e477bdf 
Author: Noah Paige <69586985+noah-paige@users.noreply.github.com> 
Date: Wed Jun 26 2024 10:39:09 GMT-0400 (Eastern Daylight Time) 

    Enforce non null on GQL query string if non null defined (#1362)

### Feature or Bugfix
<!-- please choose -->
- Bugfix


### Detail
- Add `String!` to ensure non null input argument on FE if defined as
such on backend GQL operation for `listS3DatasetsSharedWithEnvGroup`


### Relates

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

commit d6b59b3 
Author: Noah Paige <69586985+noah-paige@users.noreply.github.com> 
Date: Wed Jun 26 2024 08:48:52 GMT-0400 (Eastern Daylight Time) 

    Fix Init Share Base (#1360)

### Feature or Bugfix
<!-- please choose -->
- Bugfix

### Detail
- Need to register processors in init for s3 dataset shares API module


### Relates

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

commit bd3698c 
Author: Petros Kalos <kalosp@amazon.com> 
Date: Wed Jun 26 2024 05:19:14 GMT-0400 (Eastern Daylight Time) 

    split cognito urls setup and cognito user creation (#1366)

### Feature or Bugfix
- Bugfix
### Details
For more details about the issue read #1353 
In this PR we are solving the problem by splitting the configuration of
Cognito in 2.
* First part (cognito_users_config.py) is setting up the required groups
and users and runs after UserPool deployment
* Second part (cognito_urls_config.py) is setting up Cognito's
callback/logout urls and runs after the CloudFront deployment

We chose to split the functionality because we need to have the
users/groups setup for the integration tests which are run after the
backend deployment.

The other althernative is to keep the config functionality as one but
make the integ tests run after CloudFront stage.

### Relates
- Solves #1353 

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
effort: large priority: high status: in-progress This issue has been picked and is being implemented type: newfeature New feature request
Projects
None yet
Development

No branches or pull requests

1 participant