feat: A few helper-functions

### Is your feature request related to a problem?

_No response_

### What is the motivation behind your request?

_No response_

### Describe the solution you'd like

I have created the following few helper functions that are quite simple, and quite useful IMO:
```python
def is_empty(df):
    return df.count().execute() == 0

def to_list(df):
    """
    Return the table `df` as a list of dictionaries, with each dictionary representing a row.
    """
    return df.to_pyarrow().to_pylist()

def distinct_rows(df, on=None):
    if on is None:
        on = df.columns
    df_with_count = df.group_by(on).aggregate(n_rows = _.count())
    df_distinct = df.semi_join(df_with_count.filter(_.n_rows==1), predicates=on)
    return df_distinct

def duplicated_rows(df, on=None):
    if on is None:
        on = df.columns
    df_with_count = df.group_by(on).aggregate(n_rows = _.count())
    df_duplicated = df.semi_join(df_with_count.filter(_.n_rows>1), predicates=on)
    return df_duplicated
```

I think that each of them could be implemented as methods of an `ibis.Table` quite naturally. Is it okay to ask for someone else to implement such functionality? Alternatively, I would appreciate (a) a response on if this is wanted/okay, and (b) a rough outline on how to implement it. Mainly which files to change, and what requirements I should keep in mind.

The implementation for `ibis.Table.to_list` closely matches the `Column.to_list` function implemented in #10498.


## Related but not central to the issue
Also, I used `duplicated_rows` to implement the following primary-key check:
```python
def assert_pk(df, on, err=True):
    """
    If `err` is true, an error is raised. If not, potential error messages are returned as a list of strings. This allows this function to be used for internal checks in other functions with modified error messages.
    """
    df_subset = df.select(on)
    n_rows_original = df_subset.count().execute()
    n_rows_non_null = df_subset.drop_null().count().execute()
    error_messages = []
    if n_rows_original != n_rows_non_null:
        error_messages.append(f"Found {n_rows_original - n_rows_non_null} null rows for the given colum(s) `{on}`. This violates the properties of a primary key.")
    n_duplicated_rows = duplicated_rows(df, on).count().execute()
    if n_duplicated_rows != 0:
        error_messages.append(f"Found {n_duplicated_rows} duplicated rows for the given colum(s) `{on}`. This violates the properties of a primary key.")
    if error_messages:
        if err:
            raise AssertionError("\n                ".join(error_messages))
        else:
            return error_messages
```
If such a check is within scope for ibis, I would also love to have it implemented as a method on an `ibis.Table`. This would go a long way to alleviate the underlying problem behind #11356.

I added the option to return the error as a string, because I needed it for a foreign key check. The foreign key check is likely beyond scope for ibis, but I include it here for completeness:
```python
def assert_fk(df_left, df_right, on_left, on_right, err=True):
    """
    Assert that `on_left` is a foreign key in `df_left`, and that it 
    links to the primary key `on_right` in `df_right`
    """
    if isinstance(on_left, str) or isinstance(on_left, ibis.ir.AnyColumn):
        on_left = [on_left]
    if isinstance(on_right, str) or isinstance(on_right, ibis.ir.AnyColumn):
        on_right= [on_right]
    if len(on_left) != len(on_right):
        raise AssertionError(f"Expected as many left columns as right columns. Instead, got left columns `{on_left}` and right columns {on_right}")
    error_messages = []
    error_messages_pk_right = assert_pk(df_right, on_right, err=False)
    if error_messages_pk_right:
        error_messages.append(f"Attempting to assert primary key for the column(s) `{on_right}` in `df_right` resulted in the following assertion errors:\n" + "\n                ".join(error_messages_pk_right))
    
    df_FKs_not_in_PK = df_left.anti_join(df_right, [df_left[l] == df_right[r] for (l, r) in zip(on_left, on_right)])
    if df_FKs_not_in_PK.count().execute()>0:
        print("Foreign keys in `df_left` not present among primary keys in `df_right`:", end="")
        display(df_FKs_not_in_PK.select(on_left))
        error_messages.append(f"Found {df_FKs_not_in_PK.count().execute()} rows (printed at the top) in `df_left` with values in columns `{on_left}` not present in the columns `{on_right}` from `df_right`.")
        
    if error_messages:
        if err:
            raise AssertionError("\n                ".join(error_messages))
        else:
            return error_messages
```

### What version of ibis are you running?

None

### What backend(s) are you using, if any?

_No response_

### Code of Conduct

- [x] I agree to follow this project's Code of Conduct

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: A few helper-functions #11378

Is your feature request related to a problem?

What is the motivation behind your request?

Describe the solution you'd like

Related but not central to the issue

What version of ibis are you running?

What backend(s) are you using, if any?

Code of Conduct

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat: A few helper-functions #11378

Description

Is your feature request related to a problem?

What is the motivation behind your request?

Describe the solution you'd like

Related but not central to the issue

What version of ibis are you running?

What backend(s) are you using, if any?

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions