Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding ElasticserachPythonHook - ES Hook With The Python Client #24895

Merged
merged 18 commits into from
Jul 8, 2022

Conversation

alexkruc
Copy link
Contributor

@alexkruc alexkruc commented Jul 7, 2022

Adding a new hook for work with Elasticsearch, using the official Elasticsearch Python client.
Currently, I've implemented only one function (search), but if any more will be needed, we can add even more.

This hook was also tested with a local ES docker image, to make sure that everything works :)

closes: #21929

@@ -0,0 +1,56 @@
#
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we have two ways to interact with elastic one is Python based with the SDK and one is SQL based by using sql query syntax.

i think we should rename this file as python.py or something similar?
and the current ElasticsearchHook in hooks/elasticsearch.py

class ElasticsearchHook(DbApiHook):
should be hooks/sql.py with ElasticsearchSqlHook

Alternatively, we can have both classes ElasticsearchSqlHook + ElasticsearchPythonHook in the same hooks/elasticsearch.py ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add them to the same files, as I think having a file named python.py is a bit confusing..
I don't know how it will play with the documentation files, but I'll try to make it work :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done :)

from airflow.models import Connection
from airflow.providers.elasticsearch.hooks.elasticsearch import ElasticsearchHook
from airflow.providers.elasticsearch.hooks.elasticsearch import ElasticsearchHook, ElasticsearchPythonHook
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the new class name on the import

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done- the old hook tests are now using the new name.
I've also added a test that checks the generation of a DepracationWarning for ElasticsearchHook :)

Comment on lines 120 to 123
:param hosts: list: A list of a single or many Elasticsearch instances. Example: ["http://localhost:9200"]
:param es_conn_args: dict: Additional arguments you might need to enter to connect to Elasticsearch.
Example: {"ca_cert":"/path/to/http_ca.crt", "basic_auth": "(user, pass)"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we need also a connection.rst to explain this additional settings for the connection

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a new document for both hooks under hooks/elasticsearch.rst

@@ -0,0 +1,58 @@
# Licensed to the Apache Software Foundation (ASF) under one
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename the file example_elasticsearch_hook.py ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've merged the new example into the old examples file :)
Now it should give the example to both hooks

@potiuk
Copy link
Member

potiuk commented Jul 7, 2022

Docs now :(

@alexkruc
Copy link
Contributor Author

alexkruc commented Jul 8, 2022

@potiuk Is it possible to get your assistance on the failing tests?
I have 2 failing tests:

  • The test TestElasticsearchProviderProjectStructure is failing. From what I understand, it looks for all the available hooks, and looks that they have a matching example in the examples folder actual test.
    The thing is, that it's failing on the ElasticsearchHook which I deprecated (because I renamed it) in this PR. I don't want to add an example for a deprecated hook, because it kinda beets the purpose.. Do you have any tips?

  • The build docs test is failing, on the new documentation that I've added - hooks/elasticsearch.rst.rst:37:non-whitespace stripped by dedent. I compared the dedent clause with several other rst files, and it looked fine to me.. Also, the pre-commit hooks are passing on this file without any changes/failures.. Is there any possibility you can help me with this?

Sorry for the trouble :( I'm just kinda stuck and need an external opinion 🙏

@potiuk potiuk force-pushed the adding_elasticsearch_python_hook branch from 5100406 to fa9d058 Compare July 8, 2022 09:01
@potiuk
Copy link
Member

potiuk commented Jul 8, 2022

  1. Rebase (I just did).
  2. Let's see.

@potiuk
Copy link
Member

potiuk commented Jul 8, 2022

The test TestElasticsearchProviderProjectStructure is failing. From what I understand, it looks for all the available hooks, and looks that they have a matching example in the examples folder actual test.
The thing is, that it's failing on the ElasticsearchHook which I deprecated (because I renamed it) in this PR. I don't want to add an example for a deprecated hook, because it kinda beets the purpose.. Do you have any tips?

Look at the tests - there some other deprecated classes there mentioned in the exclusion list.

@alexkruc
Copy link
Contributor Author

alexkruc commented Jul 8, 2022

The test TestElasticsearchProviderProjectStructure is failing. From what I understand, it looks for all the available hooks, and looks that they have a matching example in the examples folder actual test.
The thing is, that it's failing on the ElasticsearchHook which I deprecated (because I renamed it) in this PR. I don't want to add an example for a deprecated hook, because it kinda beets the purpose.. Do you have any tips?

Look at the tests - there some other deprecated classes there mentioned in the exclusion list.

Wow I completely missed the DEPRECATED_CLASSES somehow. Thanks for the tip! :) I'll try that out after the build (that will probably fail, but I want to see if the build docs part succeeds after the rebase) :)

@alexkruc
Copy link
Contributor Author

alexkruc commented Jul 8, 2022

@potiuk Sorry to trouble you again.. I followed your tips, and now the integration tests are passing! But the build docs are still failing on the same thing - /hooks/elasticsearch.rst.rst:37:non-whitespace stripped by dedent.
Do you have a tip regarding what's wrong there? Breeze static-checks and pre-commit are passing, and the dedent clause is the same as in other files (at least from what my IDE sees)..

Really appreciate your help 🙏

@potiuk
Copy link
Member

potiuk commented Jul 8, 2022

Well. Not sure - seems like really "cryptic" error from Sphinx (which is known from rather cryptic messages). Look at the actual output of the sphix build (Above the summary) and you will find the warning there.

Also there is apparently a bug (@mik-laj?) that in this case our error parsing added second .rst to display the line of .rst file that was actually causing it.

However I made an intelligent guess and looked closely. From what I see, I think the lines below are all indented ONE SPACE MORE than all the others. so maybe fixing that will help.

    Connection types <connections/elasticsearch>
    Logging for Tasks <logging/index>
    Elasticsearch hook with native Python client <hooks/elasticsearch>

Also few other tips - and ways I deal with similar cases (been there, done that many times). You can easily iterate locally with it (and this is what I do when I have similar issues with docs):

breeze build-docs --package-filter apache-airlfow-provider-elasticsearch --docs-only

This will run much faster than CI circle and allows you to also take a look at sphinx-generated intermediate artifacts which usually helps with debugging it.

If all else fails. I usually fall-back to just looking at teh source code of Sphins and finding what the error message actually means (i.e. when it will be produced). In this case: WARNING: non-whitespace stripped by dedent - my intelligent guess is that Sphinx took the first indentation depth and stripped the same identation from the next block (which lead to stripping some non-white space character) - that's why the guess that one-space-more of indentation in the first block might be the reason. But I have not looked it up in the sources yet.

@alexkruc
Copy link
Contributor Author

alexkruc commented Jul 8, 2022

Finally, all the tests are passing! :D
To recap, the Sphinx error messages are ambiguous and don't point to the actual issues..
The issue with the dedent was real, but it referred to the indent of the example code. As I'm passing the Python block without any indents, I had to just remove the dedent clause from the rst file.

Later, I had issues with a "typo" - but Sphinx just set it as a generic typo, and did not always point to the correct file with the error..

In any case, at least now I know how to handle these issues - the more I know, the more I can help other people in the community :)

@alexkruc alexkruc requested a review from eladkal July 8, 2022 21:43
@potiuk
Copy link
Member

potiuk commented Jul 8, 2022

To recap, the Sphinx error messages are ambiguous and don't point to the actual issues..

Yeah. quite often the message are rather unhelpful, I must agree.

@potiuk potiuk merged commit 2ddc100 into apache:main Jul 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Elasticsearch hook support DSL
3 participants