Skip to content

Commit

Permalink
Add pre-commits preventing accidental API changes in common.sql (#27962)
Browse files Browse the repository at this point in the history
We had recently some mishaps when it comes to changes to the
common.sql provider that accidentally changed API of the common.sql
providers and they passed both tests and reviews.

Common.sql is pretty special provider because it used by other
providers and we have to make sure the API it epxoses is backwards
compatible, otherwise upgrading common.sql provider might break
those providers.

This had already happened:

* The #26761 refactored common.sql provided and removed some
  methods that were actually used by Google Provider

* This caused a problem described in #27838 and forced us ot
  yank common.sql 1.3.0 and release 1.3.1 with the methods
  added back in #27843

This change introduces tools and process to flag common.sql API
changes that are likely backwards compatible.

They will require a deliberate action from the contributor who
introduces a breaking API to common.sql, it will also help to
flag to reviewer when such a change is made.

The idea is based on how Android OS handles flagging API changes
for their public APIs paired with the tooling that MyPy provides
with generating stubs and verifying usage of packages in the
stubs.
  • Loading branch information
potiuk committed Dec 5, 2022
1 parent c4224e2 commit 6852f3f
Show file tree
Hide file tree
Showing 13 changed files with 967 additions and 101 deletions.
10 changes: 9 additions & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,7 @@ repos:
- --fuzzy-match-generates-todo
files: >
\.cfg$|\.conf$|\.ini$|\.ldif$|\.properties$|\.readthedocs$|\.service$|\.tf$|Dockerfile.*$
# Keep version of black in sync wit blacken-docs and pre-commit-hook-names
# Keep version of black in sync wit blacken-docs, pre-commit-hook-names, update-common-sql-api-stubs
- repo: https://github.com/psf/black
rev: 22.3.0
hooks:
Expand Down Expand Up @@ -396,6 +396,14 @@ repos:
language: python
files: ^docs
pass_filenames: false
- id: update-common-sql-api-stubs
name: Check and update common.sql API stubs
entry: ./scripts/ci/pre_commit/pre_commit_update_common_sql_api_stubs.py
language: python
files: ^scripts/ci/pre_commit/pre_commit_update_common_sql_api\.py|^airflow/providers/common/sql/.*\.py[i]$
additional_dependencies: ['rich>=12.4.4', 'mypy==0.971', 'black==22.3.0']
pass_filenames: false
require_serial: true
- id: check-pydevd-left-in-code
language: pygrep
name: Check for pydevd debug statements accidentally left
Expand Down
2 changes: 2 additions & 0 deletions STATIC_CODE_CHECKS.rst
Original file line number Diff line number Diff line change
Expand Up @@ -307,6 +307,8 @@ require Breeze Docker image to be build locally.
+-----------------------------------------------------------+------------------------------------------------------------------+---------+
| update-breeze-readme-config-hash | Update Breeze README.md with config files hash | |
+-----------------------------------------------------------+------------------------------------------------------------------+---------+
| update-common-sql-api-stubs | Check and update common.sql API stubs | |
+-----------------------------------------------------------+------------------------------------------------------------------+---------+
| update-er-diagram | Update ER diagram | * |
+-----------------------------------------------------------+------------------------------------------------------------------+---------+
| update-extras | Update extras in documentation | |
Expand Down
103 changes: 103 additions & 0 deletions airflow/providers/common/sql/README_API.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

# Keeping the API of common.sql provider backwards compatible

The API of the `common.sql` provider should be kept backwards compatible with previously released versions.
The reason is - there are already released providers that use `common.sql` provider and rely on the API and
behaviour of the `common.sql` provider, and any updates in the API or behaviour of it, might potentially
break those released providers.

Therefore, we should keep an extra care when changing the APIs.

The approach we take is similar to one that has been applied by Android OS team to keep their API in check,
and it is based on storing the current version of API and flagging changes that are potentially breaking.
This is done by comparing the previous API (store in stub files) and the upcoming API from the PR.
The upcoming API is automatically extracted from `common.sql` Python files using `update-common-sql-api-stubs`
pre-commit using mypy `stubgen` and stored as `.pyi` files in the `airflow.providers.common.sql` package.
We also post-process the `.pyi` files to add some historically exposed methods that should be also
considered as public API.

If the comparison determines that the change is potentially breaking, the contributor is asked
to review the changes and manually regenerate the stub files.

The details of the workflow are as follows:

1) The previous API is stored in the (committed to repository) stub files.
2) Every time when common.sql Python files are modified the `update-common-sql-api-stubs` pre-commit
regenerates the stubs (including post-processing it) and looks for potentially breaking changes
(removals or updates of the existing classes/methods).
3) If the check reveals there are no changes to the API, nothing happens, pre-commit succeeds.
4) If there are only additions, the pre-commit automatically updates the stub files,
asks the contributor to commit resulting updates and fails the pre-commit. This is very similar to
other static checks that automatically modify/fix source code.
5) If the pre-commit detects potentially breaking changes, the process is a bit more involved for the
contributor. The pre-commit flags such changes to the contributor by failing the pre-commit and
asks the contributor to review the change looking specifically for breaking compatibility with previous
providers (and fix any backwards compatibility). Once this is completed, the contributor is asked to
manually and explicitly regenerate and commit the new version of the stubs by running the pre-commit
with manually added environment variable:

```shell
UPDATE_COMMON_SQL_API=1 pre-commit run update-common-sql-api-stubs
```

# Verifying other providers to use only public API of the `common.sql` provider

MyPy automatically checks if only the methods available in the stubs are used. This gives enough protection
for most "static" usages of the `common.sql` provider by any other provider.

Of course, this is never 100% fool-proof, due to the dynamic Python behaviours and due to possible internal
changes in behaviours of the common.sql APIs, but at the very least any of "interface" changes will be
flagged in any change to contributor making a change and also reviewers will be notified whenever any
API change happens.

# External usage, tracking API changes

The `api.txt` file might be used by any external provider developer to make sure that only official
APIs are used, which will also help those external providers to use the `common.sql` provider
functionality without the fear of using methods and classes that are deemed non-public.

Since this file is stored in the repository, we can automatically generate the API differences between
future released version of `common.sql` provider, and we will be able to easily reason about changes of the
API between versions.

# Possible future work

This mechanism providing the "official" API of the `common.sql` provider is just the first attempt to
standardise APIs provided by various Airflow's components. In the future, we might decide to extend
after experimenting and testing its behaviour for the `common.sql`.

## Other providers

Keeping a formal, automatically generated and maintained stubs is a good thing. We might simply decide
to generate the stub files for all our providers - so that our users and cross-provider dependencies might
start relying on stability of the APIs in the provider. This will be just a simple extension of the current
approach - we will not limit stub generation to `common.sql` but we will add it to all providers.

## Airflow Core

We should be able to use very similar mechanism for exposing and flagging the public API of Airflow's core
and flag any misuse of methods in the Community providers

As a follow-up of those two - we should be able to produce more "formal" API that providers written
externally could utilise MyPy as "compatibility tool" that will be able to verify if the provider
written externally is compatible (on API level) with certain version of Airflow, or in case of the
cross-provider dependencies whether it is compatible (on API level) with certain versions of the released
providers.
95 changes: 95 additions & 0 deletions airflow/providers/common/sql/hooks/sql.pyi
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#
# This is automatically generated stub for the `common.sql` provider
#
# This file is generated automatically by the `update-common-sql-api stubs` pre-commit
# and the .pyi file represents part of the the "public" API that the
# `common.sql` provider exposes to other providers.
#
# Any, potentially breaking change in the stubs will require deliberate manual action from the contributor
# making a change to the `common.sql` provider. Those stubs are also used by MyPy automatically when checking
# if only public API of the common.sql provider is used by all the other providers.
#
# You can read more in the README_API.md file
#
from _typeshed import Incomplete
from airflow.hooks.dbapi import DbApiHook as BaseForDbApiHook
from typing import Any, Callable, Iterable, Mapping, Sequence
from typing_extensions import Protocol

def return_single_query_results(
sql: Union[str, Iterable[str]], return_last: bool, split_statements: bool
): ...
def fetch_all_handler(cursor) -> Union[list[tuple], None]: ...
def fetch_one_handler(cursor) -> Union[list[tuple], None]: ...

class ConnectorProtocol(Protocol):
def connect(self, host: str, port: int, username: str, schema: str) -> Any: ...

class DbApiHook(BaseForDbApiHook):
conn_name_attr: str
default_conn_name: str
supports_autocommit: bool
connector: Union[ConnectorProtocol, None]
placeholder: str
log_sql: Incomplete
descriptions: Incomplete
def __init__(self, *args, schema: Union[str, None] = ..., log_sql: bool = ..., **kwargs) -> None: ...
def get_conn(self): ...
def get_uri(self) -> str: ...
def get_sqlalchemy_engine(self, engine_kwargs: Incomplete | None = ...): ...
def get_pandas_df(self, sql, parameters: Incomplete | None = ..., **kwargs): ...
def get_pandas_df_by_chunks(
self, sql, parameters: Incomplete | None = ..., *, chunksize, **kwargs
) -> None: ...
def get_records(
self, sql: Union[str, list[str]], parameters: Union[Iterable, Mapping, None] = ...
) -> Any: ...
def get_first(
self, sql: Union[str, list[str]], parameters: Union[Iterable, Mapping, None] = ...
) -> Any: ...
@staticmethod
def strip_sql_string(sql: str) -> str: ...
@staticmethod
def split_sql_string(sql: str) -> list[str]: ...
@property
def last_description(self) -> Union[Sequence[Sequence], None]: ...
def run(
self,
sql: Union[str, Iterable[str]],
autocommit: bool = ...,
parameters: Union[Iterable, Mapping, None] = ...,
handler: Union[Callable, None] = ...,
split_statements: bool = ...,
return_last: bool = ...,
) -> Union[Any, list[Any], None]: ...
def set_autocommit(self, conn, autocommit) -> None: ...
def get_autocommit(self, conn) -> bool: ...
def get_cursor(self): ...
def insert_rows(
self,
table,
rows,
target_fields: Incomplete | None = ...,
commit_every: int = ...,
replace: bool = ...,
**kwargs,
) -> None: ...
def bulk_dump(self, table, tmp_file) -> None: ...
def bulk_load(self, table, tmp_file) -> None: ...
def test_connection(self): ...
Loading

0 comments on commit 6852f3f

Please sign in to comment.