Skip to content

add newer_than parameter to SFTP sensor#21811

Closed
AdamPaslawski wants to merge 282 commits intoapache:mainfrom
AdamPaslawski:sftp_sensor_newer_than
Closed

add newer_than parameter to SFTP sensor#21811
AdamPaslawski wants to merge 282 commits intoapache:mainfrom
AdamPaslawski:sftp_sensor_newer_than

Conversation

@AdamPaslawski
Copy link
Contributor

closes: #21655
Add a parameter newer_than to the sftp sensor to allow for specifying a datetime for which the file in the SFTP site should be newer than

Comment on lines 66 to 69
if self.newer_than:
_mod_time = datetime.strptime(mod_time, '%Y%m%d%H%M%S')
_newer_than = make_naive(self.newer_than) if not is_naive(self.newer_than) else self.newer_than
return _newer_than <= _mod_time
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to assume a naive comparison and allow the sensor to be configured with an offset aware or unaware newer_than parameter, thoughts on this @eladkal ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you shoudl convert all dates to UTC. It's surprisingly hard to find what is the timezone of st_mtime but all signs point at UTC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Assuming the SFTP site is always on the same time-zone as the airflow instance is weaker than assuming UTC.

I'll make that change.

jbampton and others added 28 commits February 27, 2022 20:00
TableConfig requires models.ImportError, while the try/except
on line 128 requires the builtin ImportError

This was causing errors when running with k8 scheduler,
without having celery packages installed.
Co-authored-by: huan.15 <huan.15@kakaocorp.com>
…pache#21546)

We are preparing to switch from Buster to Bullseye and this is
the second change that is needed (following apache#21522). This change
allows to choose whether we want to use Buster or Bullseye images
as a base. We need to be able to choose, because:

1) we want to keep backwards compatibility and continue our
   users to build Buster-base images
2) we cannot yet fully switch to Bullseye because MsSQL's odbc
   driver does not yet support Bullseye and we reached out to
   mysql maintainers to learn about their plans to make the
   decision on when and how we are going to support Bullseye and
   MSSQL.

   Details of this discussion are in:
   https://github.com/MicrosoftDocs/sql-docs/issues/7255#issuecomment-1037097131

This PR adds the capability of choosing the DEBIAN_VERSION in
Breeze when building images but does not yet switch from Buster to
Bullseye
)

Debian 11 Bullseye have been released some time ago as the new
LTS Debian release and already all our dependencies (including
MySQL and MSSQL ODBC drivers) caught up with it so we can finally
migrate to it.

This change switches base images to bullsey for our Dockerfiles
as well as for Redis image we are using in CI.

The relevant packages have been updated to include that
and documentation have been updated.

Examples of ours also are updated to use "bullseye" rather than
buster.

Closes: apache#18190
Closes: apache#18279
…pache#21378)" (apache#21874)

This reverts commit 5d89dea.

The issue is not a random IO timeout -- it's a problem with the file in the repo.

Reverting this right now as all PRs are failing :(
* Expand mapped tasks in the Scheduler

Technically this is done inside
DagRun.task_instance_scheduling_decisions, but the only place that is
currently called is the Scheduler

The way we are getting `upstream_ti` to pass to expand_mapped_task is
all sorts of wrong and will need fixing, I think the interface for that
method is wrong and the mapped task should be responsible for finding
the right upstream TI itself.

* make UI and tree work with mapped tasks

* add graph tooltip and map count

* simplify node label redraw logic

* add utils.js and map_index to /taskInstances

* use TaskInstanceState instead of strings

* move map_index on /taskinstance to separate PR

* check to use Task or Tasks

* remove `no_status` and use TaskInstanceState

Co-authored-by: Ash Berlin-Taylor <ash@apache.org>
The method outlined in the current doc results in a 403 error, which can be avoided by following Databricks' documentation on this topic: https://kb.databricks.com/dev-tools/invalid-access-token-airflow.html. The changes suggested here reflect the Databricks doc.
`sql` accepts `Union[str, Iterable[str]],` not `Optional[Union[Dict, Iterable]],`
* document airflow version in each alembic migration module and use this to autogen the doc
* update each migration module to have the same description used in migration ref (so it can be used in autogen)
Fix walking through wildcarded directory in `FileSensor.poke` method
* log celery task id to correlate logs

* add celery.task_timeout_error metric
need to add packages in setup.py, otherwise the package can not be found
chenglongyan and others added 16 commits March 16, 2022 09:31
…he#20759)

A workaround was added (apache#5731) to handle the refreshing of EKS tokens.  It was necessary because of an upstream bug.  It has since been fixed (kubernetes-client/python-base@70b78cd) and released in v21.7.0 (https://github.com/kubernetes-client/python/blob/master/CHANGELOG.md#v2170).
When the no value is found with `get_conn_value`, the warning was being triggered, even though `get_conn_value` was implemented and just returned no value (cus there wasn't one).

Now we make the logic a little tighter and only raise the dep warning when `get_conn_value` not implemented, which is what we intended to do in the first place.
* Stronger language about Docker Compose customizability

Despite our warnings, our users continue treating the Docker
Compose that we exposed as something that should be easy to
extend and customize for their own needs, yet they continue
to struggle with some basic behaviour of containers, Docker Compose
and how they interact. This results in vast space of potential
problems as Docker Compose gives the user a false premise of
something that "just works" where it requires quite a deep
understanding on how it works.

When you get things wrong with Docker Compose, you often end up
with extremely confusing messages, that might suggest that the
problem is with Airflow, but really the problem is with how users
interact with their custom Docker images, registries, pulling,
networking, mounting volumes and plenty other things.

While this is the same with Kubernetes and Helm Chart, Helm Chart makes
it infinitely easier to customize in declarative way (this is what
our values.yaml does) and anything that has not been foreseen by Helm
Chart developers is "hard" by definition.

Docker Compose makes no such distinction. You really can't make Docker
Compose customizable by configuration, and any customization in it
requires modifying the compose file and for people who do not know
what they are doing will eventually lead to errors that they are not
able to diagnose and leads to creation of "Airlfow isssues", where they
should be brought to "Docker Compose" issues.

Example of that is here: apache#22301
where there are at least two issues that are not reproducible without
knowing in detail what the user has done, how the image was build
and distributed, and how the docker-compose installation interacted
with them. This leads to a terrible distraction for supporting
users of Airflow as the issues are really Docker Compose issues and
Airflow maintainers should not be involved in solving those.

This PR adds a bit stronger language and statement about the scope
and customizability of the Quick Start Docker Compose of ours. Not
only mentioning "Lack of Production Readiness" but also the
responsibility of the user to understand and diagnose docker compose
errors on their own and setting expectations that issues with Docker
Compose running should be directed elsewhere.

* Update docs/apache-airflow/start/docker.rst

* Update docs/apache-airflow/start/docker.rst

Co-authored-by: Daniel Standish <15932138+dstandish@users.noreply.github.com>

* Update docs/apache-airflow/start/docker.rst

Co-authored-by: Daniel Standish <15932138+dstandish@users.noreply.github.com>

Co-authored-by: Daniel Standish <15932138+dstandish@users.noreply.github.com>
We had disabled this previously in (apache#22254) but now the website is up on a different domain as listed in celery/celeryproject#51 (comment)
The celery documentation have been moved from https://docs.celeryproject.org/ to https://docs.celeryq.dev/. The old links now refer to a 404 error page, the new links to the actual documentation.
Prior to SqlAlchemy 1.4 the correct scheme for postgres was `postgres+psycopg2` but as of 1.4 it is `postgresql`.  Airflow 2.3 updates SqlAlchemy to 1.4 so unless we patch the config for users (or they update their URIs), upgrading to 2.3 will break.
…ingOperator, S3GetBucketTaggingOperator, S3DeleteBucketTaggingOperator, S3DeleteBucketOperator (apache#22312)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SFTP Sensor - Allow Use of Modified Time