Skip to content

Enable ruff's pandas-vet (PD) rules#2830

Merged
seisman merged 4 commits intomainfrom
ruff/pandas-vet
Nov 27, 2023
Merged

Enable ruff's pandas-vet (PD) rules#2830
seisman merged 4 commits intomainfrom
ruff/pandas-vet

Conversation

@seisman
Copy link
Copy Markdown
Member

@seisman seisman commented Nov 25, 2023

Errors after enabling ruff's pandas-vet rules. Looks good to me but maybe we should ignore PD901?

examples/gallery/3d_plots/scatter3d.py:22:1: PD901 Avoid using the generic variable name `df` for DataFrames
   |
21 | # Load sample iris data
22 | df = pd.read_csv("https://github.com/mwaskom/seaborn-data/raw/master/iris.csv")
   | ^^ PD901
23 | 
24 | # Convert 'species' column to categorical dtype
   |

examples/gallery/histograms/blockm.py:28:1: PD901 Avoid using the generic variable name `df` for DataFrames
   |
26 | # Calculate mean depth in kilometers from all events within
27 | # 150x150 arc-minute bins using blockmean
28 | df = pygmt.blockmean(data=data, region=region, spacing=spacing)
   | ^^ PD901
29 | # Convert to grid
30 | grd = pygmt.xyz2grd(data=df, region=region, spacing=spacing)
   |

examples/gallery/histograms/blockm.py:48:1: PD901 Avoid using the generic variable name `df` for DataFrames
   |
46 | # Calculate number of total locations within 150x150 arc-minute bins
47 | # with blockmean's summary parameter
48 | df = pygmt.blockmean(data=data, region=region, spacing=spacing, summary="n")
   | ^^ PD901
49 | grd = pygmt.xyz2grd(data=df, region=region, spacing=spacing)
   |

examples/gallery/seismology/velo_arrow_ellipse.py:18:1: PD901 Avoid using the generic variable name `df` for DataFrames
   |
17 | fig = pygmt.Figure()
18 | df = pd.DataFrame(
   | ^^ PD901
19 |     data={
20 |         "x": [0, -8, 0, -5, 5, 0],
   |

examples/gallery/symbols/points_categorical.py:20:1: PD901 Avoid using the generic variable name `df` for DataFrames
   |
19 | # Load sample penguins data
20 | df = pd.read_csv("https://github.com/mwaskom/seaborn-data/raw/master/penguins.csv")
   | ^^ PD901
21 | 
22 | # Convert 'species' column to categorical dtype
   |

examples/tutorials/advanced/date_time_charts.py:268:1: PD901 Avoid using the generic variable name `df` for DataFrames
    |
266 |     ["20200729", 1634],
267 | ]
268 | df = pd.DataFrame(data, columns=["Date", "Score"])
    | ^^ PD901
269 | df.Date = pd.to_datetime(df["Date"], format="%Y%m%d")
    |

pygmt/clib/conversion.py:95:17: PD011 Use `.to_numpy()` instead of `.values`
   |
93 |     # East-West, North-South.
94 |     for dim in grid.dims[::-1]:
95 |         coord = grid.coords[dim].values
   |                 ^^^^^^^^^^^^^^^^^^^^^^^ PD011
96 |         coord_incs = coord[1:] - coord[0:-1]
97 |         coord_inc = coord_incs[0]
   |

pygmt/tests/test_blockmedian.py:39:13: PD011 Use `.to_numpy()` instead of `.values`
   |
37 |     a matrix.
38 |     """
39 |     table = dataframe.values
   |             ^^^^^^^^^^^^^^^^ PD011
40 |     output = blockmedian(data=table, spacing="5m", region=[245, 255, 20, 30])
41 |     assert isinstance(output, pd.DataFrame)
   |

pygmt/tests/test_datasets_earth_vertical_gravity_gradient.py:24:12: PD003 `.isna` is preferred to `.isnull`; functionality is equivalent
   |
22 |     npt.assert_allclose(data.min(), -137.125, atol=1 / 32)
23 |     npt.assert_allclose(data.max(), 104.59375, atol=1 / 32)
24 |     assert data[1, 1].isnull()
   |            ^^^^^^^^^^^^^^^^^ PD003
   |

pygmt/tests/test_datasets_samples.py:192:12: PD003 `.isna` is preferred to `.isnull`; functionality is equivalent
    |
190 |     npt.assert_allclose(grid.min(), -4929.5)
191 |     # Test for the NaN values in the remote file
192 |     assert grid[2, 21].isnull()
    |            ^^^^^^^^^^^^^^^^^^ PD003
    |

pygmt/tests/test_select.py:40:12: PD011 Use `.to_numpy()` instead of `.values`
   |
38 |     Also testing the reverse (I) alias.
39 |     """
40 |     data = dataframe.values
   |            ^^^^^^^^^^^^^^^^ PD011
41 |     output = select(data=data, region=[245.5, 254.5, 20.5, 29.5], reverse="r")
42 |     assert isinstance(output, pd.DataFrame)
   |

pygmt/tests/test_surface.py:96:12: PD011 Use `.to_numpy()` instead of `.values`
   |
94 |     Run surface by passing in a numpy array into data.
95 |     """
96 |     data = data.values  # convert pandas.DataFrame to numpy.ndarray
   |            ^^^^^^^^^^^ PD011
97 |     output = surface(
98 |         data=data,
   |

pygmt/tests/test_surface.py:135:12: PD011 Use `.to_numpy()` instead of `.values`
    |
133 |     Run surface with the -Goutputfile.nc parameter.
134 |     """
135 |     data = data.values  # convert pandas.DataFrame to numpy.ndarray
    |            ^^^^^^^^^^^ PD011
136 |     with GMTTempFile(suffix=".nc") as tmpfile:
137 |         output = surface(
    |

Found 13 errors.

Related to #2741

@seisman seisman marked this pull request as draft November 25, 2023 14:15
@seisman seisman changed the title Enable ruff's pandas-vet (PD) rules RFC: Enable ruff's pandas-vet (PD) rules Nov 25, 2023
@weiji14
Copy link
Copy Markdown
Member

weiji14 commented Nov 26, 2023

maybe we should ignore PD901

Yeah, let's ignore PD901. I feel like df is quite a common name for pandas dataframes.

weiji14
weiji14 previously approved these changes Nov 27, 2023
@weiji14 weiji14 dismissed their stale review November 27, 2023 01:27

Some tests failing

Copy link
Copy Markdown
Member

@weiji14 weiji14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to change isna() back to isnull() for checks on xarray.DataArray. Seems like a bug on ruff not actually checking that the object is from pandas instead of xarray.

Comment thread pygmt/tests/test_datasets_earth_vertical_gravity_gradient.py Outdated
Comment thread pygmt/tests/test_datasets_samples.py Outdated
Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com>
@seisman seisman added the maintenance Boring but important stuff for the core devs label Nov 27, 2023
@seisman seisman added this to the 0.11.0 milestone Nov 27, 2023
@seisman seisman added the needs review This PR has higher priority and needs review. label Nov 27, 2023
@seisman seisman marked this pull request as ready for review November 27, 2023 02:13
@seisman
Copy link
Copy Markdown
Member Author

seisman commented Nov 27, 2023

Seems like a bug on ruff not actually checking that the object is from pandas instead of xarray.

I have reported the bug to upstream astral-sh/ruff#8846.

@seisman seisman changed the title RFC: Enable ruff's pandas-vet (PD) rules Enable ruff's pandas-vet (PD) rules Nov 27, 2023
@seisman seisman removed the needs review This PR has higher priority and needs review. label Nov 27, 2023
@seisman seisman merged commit 15a0642 into main Nov 27, 2023
@seisman seisman deleted the ruff/pandas-vet branch November 27, 2023 03:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

maintenance Boring but important stuff for the core devs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants