From 001fba56af3db4936bb35af34b0377a6abfee9dc Mon Sep 17 00:00:00 2001 From: Francesco Calcavecchia Date: Fri, 29 Sep 2023 17:42:20 +0200 Subject: [PATCH 1/3] add markdown-link-check to pre-commit --- .pre-commit-config.yaml | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml index bae09bb..233c012 100644 --- a/.pre-commit-config.yaml +++ b/.pre-commit-config.yaml @@ -33,3 +33,8 @@ repos: hooks: - id: ruff args: [--fix, --exit-non-zero-on-fix] + - repo: https://github.com/tcort/markdown-link-check + rev: 'v3.11.2' + hooks: + - id: markdown-link-check + args: [-q] From 3e3fc3b919dac5177b8e99043958bf1eb1cb6745 Mon Sep 17 00:00:00 2001 From: Francesco Calcavecchia Date: Fri, 29 Sep 2023 17:54:36 +0200 Subject: [PATCH 2/3] fix broken links and rename pandera schema model -> pandera dataframe model --- docs/index.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/index.md b/docs/index.md index eeae357..c961ba2 100644 --- a/docs/index.md +++ b/docs/index.md @@ -25,7 +25,7 @@ data = load() Depending on how the data was prepared, load may return a [pandas](https://pandas.pydata.org/), [dask](https://www.dask.org/), [pyspark](https://spark.apache.org/docs/latest/api/python/index.html), or [modin](https://github.com/modin-project/modin) dataframe. The limited choice is due to the fact that it must be supported by [pandera](https://pandera.readthedocs.io/en/stable/). -Not only accessing data will be this easy, but you will also have the [pandera schema model](https://pandera.readthedocs.io/en/stable/schema_models.html) associated with the data. How? +Not only accessing data will be this easy, but you will also have the [pandera DataFrame Model](https://pandera.readthedocs.io/en/stable/dataframe_models.html) associated with the data. How? ```python from demo_data import Schema ``` @@ -46,7 +46,7 @@ and use the command `dac pack` (run `dac pack --help` for detailed instructions) On a high level, the most important elements you must provide are: * python code to load the data. It should as a DataFrame in one of the supported libraries: [pandas](https://pandas.pydata.org/), [dask](https://www.dask.org/), [pyspark](https://spark.apache.org/docs/latest/api/python/index.html), or [modin](https://github.com/modin-project/modin) -* a [pandera ModelSchema](https://pandera.readthedocs.io/en/stable/schema_models.html) fitting the data that can be loaded +* a [pandera DataFrame Model](https://pandera.readthedocs.io/en/stable/dataframe_models.html) fitting the data that can be loaded * python dependencies ## What are the advantages of distributing data in this way? @@ -54,7 +54,7 @@ On a high level, the most important elements you must provide are: * The code needed to load the data, the data source, and locations are abstracted away from the user. This mean that the data engineer can start from local files, transition to SQL database, cloud file storage, or kafka topic, without having the user to notice it or need to adapt its code. -* Column names are passed to the user, and can be abstracted from the data source leveraging on the pandera [`Field.alias`](https://pandera.readthedocs.io/en/stable/reference/generated/pandera.model_components.Field.html#pandera.model_components.Field). In this way, the user code will not contain hard-coded column names, and changes in data source column names won't impact the user. +* Column names are passed to the user, and can be abstracted from the data source leveraging on the pandera [`Field.alias`](https://pandera.readthedocs.io/en/stable/reference/generated/pandera.api.pandas.model_components.Field.html). In this way, the user code will not contain hard-coded column names, and changes in data source column names won't impact the user. * Users can build robust code by [writing unit testing for their functions](https://pandera.readthedocs.io/en/stable/data_synthesis_strategies.html) effortlessly. From 53e7d4360dc9c5e7980403cedb9fd5cab38091c9 Mon Sep 17 00:00:00 2001 From: Francesco Calcavecchia Date: Fri, 29 Sep 2023 17:55:19 +0200 Subject: [PATCH 3/3] specify intentional mistake in wrong_syntax.py for ruff --- test/data/schema/wrong_syntax.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/test/data/schema/wrong_syntax.py b/test/data/schema/wrong_syntax.py index 9a2c0e8..f2d79d0 100644 --- a/test/data/schema/wrong_syntax.py +++ b/test/data/schema/wrong_syntax.py @@ -1,4 +1,4 @@ -iport pandera as pa +iport pandera as pa # noqa: E999 from pandera.typing import Series