Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix type checker errors for libraries such as Polars where Altair uses dataframe interchange protocol #3297

Merged
merged 10 commits into from
Dec 25, 2023

Conversation

binste
Copy link
Contributor

@binste binste commented Dec 23, 2023

Closes #3294 . As discussed in that issue, I switched to Protocols over ABC classes as other libraries such as Polars of course don't use our own classes but instead have their own implementations -> ABC classes won't type check but Protocols work.

After that change Polars dataframes still did not satisfy mypy so I digged a bit into the interchange protocol. There are minor differences in how Pandas, Polars, etc. have implemented type hints for the protocol and sometimes even the protocol itself. For example, in Polars CategoricalDescription.is_dictionary has type Literal[True] as it is indeed always True in the case of Polars. However, the interchange protocol only requires bool. As TypedDicts are invariant, this will not satisfy a type checker as the types would need to be exactly the same. As an example:

from typing import TypedDict


class PolarsDict(TypedDict):
    y: int


class AltairDict(TypedDict):
    y: int


class AltairDict2(TypedDict):
    y: int | None


z: AltairDict = PolarsDict(x=1, y=2)  # this is fine
t: AltairDict2 = PolarsDict(x=1, y=2)  # this is not as the TypedDict is invariant

To not continuously have to deal with such minor differences, I removed all type hints including methods and attributes from the interchange protocols which are not relevant for Altair, e.g. methods which we do not use anywhere in our code base.

With the changes here, all examples in #3294 now pass mypy. If you see errors while testing, try deleting the .mypy_cache folder before running mypy.

…e parts of the Protocol which we need for our codebase to make the type hint more flexible
… keep the ones which are relevant for our codebase. get_chunks is not directly used in Altair codebase but is relevant for pi.from_dataframe
@binste
Copy link
Contributor Author

binste commented Dec 23, 2023

Hmm, pyright is not yet happy. I'll see if I can figure out why.

…n as Pyright doesn't like that in Polars it is specified as a property which is semantically not the same as an attribute (the later can be set, properties are read-only by default)
@binste
Copy link
Contributor Author

binste commented Dec 23, 2023

Now, pyright is happy as well :) The last issue I fixed comes actually more from the Polars codebase where they define DataFrame.version as a property in an underlying Protocol but then in PolarsDataFrame define it as an attribute. Pyright still sees it as a property which does not work as we define it as an attribute in our own codebase as specified by the interchange protocol. See microsoft/pyright#2204 on why properties and attributes are not the same. In short, I removed version from the DataFrame protocol as we don't need it anyway.

I also had to add a mroe explicit signature for __dataframe__, see microsoft/pyright#2204 for more context if needed.

Copy link
Contributor

@jonmmease jonmmease left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me. Thanks for digging through this!

@mattijn
Copy link
Contributor

mattijn commented Dec 24, 2023

Thanks @binste, can we add a note in the release notes? Not sure if it counts as an enhancement or bug fix.

@binste
Copy link
Contributor Author

binste commented Dec 25, 2023

Sounds good. I'd say it's a bug fix as the expectation was that the previous type hints would work for Polars.


For reference, here the code I used to test the type hints:

test_types.py:

import polars as pl
import numpy as np
import altair as alt
import pyarrow as pa
import pandas as pd

content = {
    "a": range(5),
    "b": np.random.rand(5),
}
df_pl = pl.DataFrame(content)
df_pd = pd.DataFrame(content)
df_pa = pa.Table.from_pydict(content)

alt.Chart(df_pl).mark_point().encode(
    x="a",
    y="b",
)
alt.Chart(df_pd).mark_point().encode(
    x="a",
    y="b",
)
alt.Chart(df_pa).mark_point().encode(
    x="a",
    y="b",
)
mypy test_types.py
pyright test_types.py

@binste binste merged commit 50f4748 into vega:main Dec 25, 2023
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Passing basic polars dataframes to alt.Chart results in Pylance errors
3 participants