Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor AppendOperator #439

Merged
merged 1 commit into from
Jun 8, 2022
Merged

Refactor AppendOperator #439

merged 1 commit into from
Jun 8, 2022

Conversation

kaxil
Copy link
Collaborator

@kaxil kaxil commented Jun 7, 2022

This PR refactors the Append operator to use the new interfaces.

Breaking Changes that I would love to get some feedback on:

  • Removed casting functionality
  • changed the interface from which uses source_to_target_columns_map instead of columns:
    def append(
        self,
        main_table: Table,
        columns: List[str],
        casted_columns: dict,
        append_table: Table,
    ):

to

def append(
    source_table: Table,
    target_table: Table,
    source_to_target_columns_map: Optional[Dict[str, str]] = None,
    **kwargs,
):

This PR intends to keep the same logic as now and just use new interfaces so we can delete all the old files from old interfaces. The scope to whether or not we should have a single function or not is kept for #383

closes #343
closes #335

Co-Authored-By: Utkarsh Sharma utkarsharma2@gmail.com

@kaxil kaxil force-pushed the append-refactor branch 3 times, most recently from 3fdae0f to 1a9caab Compare June 7, 2022 21:52
@kaxil kaxil marked this pull request as ready for review June 7, 2022 22:18
@kaxil kaxil mentioned this pull request Jun 7, 2022
@kaxil kaxil requested a review from sunank200 June 7, 2022 22:36
src/astro/databases/base.py Outdated Show resolved Hide resolved
columns=["sell", "living"],
main_table=load_main,
append_table=load_append,
source_to_target_columns_map={"sell": "sell", "living": "living"},
Copy link
Collaborator

@tatiana tatiana Jun 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although using a dictionary gives us more flexibility, it does look odd in the cases when the columns in the source and target tables are exactly the same. It would be great to support just a tuple with the column values, as opposed to always enforcing a dictionary. What are your thoughts, @kaxil ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, it feels tedious and probably most of the times the col names would be the same. Thoughts on the name of the parameter? source_to_target_columns_map doesn't seem appropriate , how about just columns and we allow passing either a) list or tuples containing col names b) dict containing source to target mapping.

If you agree, should we do the same for merge too?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both suggestions look great, @kaxil (interface + adjust merge)!
BTW: I'm fine with us doing this change in a separate PR if you preferer

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I will tackle them separately, thanks.

Copy link
Collaborator

@tatiana tatiana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kaxil I just noticed coverage doesn't seem to be running for this PR - this issue was not introduced in this PR, but we probably want to restore that before merging further PRs.

Copy link
Contributor

@sunank200 sunank200 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. But certain tests seem to be failing.

source_table: Table,
target_table: Table,
source_to_target_columns_map: Optional[Dict[str, str]] = None,
task_id: str = "",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is task_id required here as this operator is inherited from BaseOperator?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly because of the following line:

task_id = task_id or get_unique_task_id("append_table")

src/astro/databases/base.py Outdated Show resolved Hide resolved
This PR refactors the Append operator to use the new interfaces.

Breaking Changes:
- Removed casting functionality
- changed the interface from which uses `source_to_target_columns_map` instead of `columns`:

```
    def append(
        self,
        main_table: Table,
        columns: List[str],
        casted_columns: dict,
        append_table: Table,
    ):
```

to

```
def append(
    source_table: Table,
    target_table: Table,
    source_to_target_columns_map: Optional[Dict[str, str]] = None,
    **kwargs,
):
```

This PR intends to keep the same logic as now and just use new interfaces so we can delete all the old files from old interfaces. The scope to whether or not we should have a single function or not is kept for #383

closes #343
closes #335

Co-Authored-By: Tatiana Al-Chueyr <tatiana.alchueyr@gmail.com>
Co-Authored-By: Utkarsh Sharma <utkarsharma2@gmail.com>
@kaxil kaxil merged commit c80b5a2 into main Jun 8, 2022
@kaxil kaxil deleted the append-refactor branch June 8, 2022 13:22
kaxil added a commit that referenced this pull request Jun 10, 2022
addresses #439 (comment)

Tt feels tedious and probably most of the times the col names would be the same. This commit renames `source_to_target_columns_map` param in Append and Merge operator to `columns` and allows passing either a) list or tuples containing col names b) dict containing source to target mapping.

Before:

```python
append(
        source_table=filtered_data,
        target_table=Table(name="homes_reporting", conn_id=SNOWFLAKE_CONN_ID),
        source_to_target_columns_map={
            "sell": "sell",
            "list": "list",
            "variable": "variable",
            "value": "value",
        },
    )
```

After:

```python
append(
        source_table=filtered_data,
        target_table=Table(name="homes_reporting", conn_id=SNOWFLAKE_CONN_ID),
        columns=["sell", "list", "variable", "value"]
    )
```
kaxil added a commit that referenced this pull request Jun 17, 2022
addresses #439 (comment)

Tt feels tedious and probably most of the times the col names would be the same. This commit renames `source_to_target_columns_map` param in Append and Merge operator to `columns` and allows passing either a) list or tuples containing col names b) dict containing source to target mapping.

Before:

```python
append(
        source_table=filtered_data,
        target_table=Table(name="homes_reporting", conn_id=SNOWFLAKE_CONN_ID),
        source_to_target_columns_map={
            "sell": "sell",
            "list": "list",
            "variable": "variable",
            "value": "value",
        },
    )
```

After:

```python
append(
        source_table=filtered_data,
        target_table=Table(name="homes_reporting", conn_id=SNOWFLAKE_CONN_ID),
        columns=["sell", "list", "variable", "value"]
    )
```
kaxil added a commit that referenced this pull request Jun 17, 2022
addresses #439 (comment)

It feels tedious and probably most of the times the col names would be the same. This commit renames `source_to_target_columns_map` param in Append and Merge operator to `columns` and allows passing either a) list or tuples containing col names b) dict containing source to target mapping.

Before:

```python
append(
        source_table=filtered_data,
        target_table=Table(name="homes_reporting", conn_id=SNOWFLAKE_CONN_ID),
        source_to_target_columns_map={
            "sell": "sell",
            "list": "list",
            "variable": "variable",
            "value": "value",
        },
    )
```

After:

```python
append(
        source_table=filtered_data,
        target_table=Table(name="homes_reporting", conn_id=SNOWFLAKE_CONN_ID),
        columns=["sell", "list", "variable", "value"]
    )
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor the append/merge operator Implement append_table in PostgresDatabase
3 participants