-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Referencing Connections in tutorial doc #19053
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
b291e16 to
cdaa140
Compare
- added logging to tutorial example - used streaming ingest in tutorial example
7eed1b1 to
690eb3f
Compare
690eb3f to
f418268
Compare
| want to add a docker-appropriate postgres connection (the following creates one that matches postgres as | ||
| configured in ``docker-compose.yml``): | ||
|
|
||
| .. code-block:: bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally this would be nested within :note above; unclear to me if that's supported, but it was easier to debug other syntax issues with it separate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was easier to debug other syntax issues with it separate
What's wrong with
.. note::
Airflow manages databases using ...
.. code-block:: bash
airflow connections ...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably nothing now; something was wrong and it was too confusing to debug. I'll put it back and re-try.
| for row in response.text.split("\n"): | ||
| if row: | ||
| file.write(row + "\n") | ||
| with requests.get(url, stream=True) as req: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Streams the content now instead of loading into memory and splitting by line (then re-adding lines)
| conn.commit() | ||
| return 0 | ||
| except Exception as e: | ||
| logging.error(f"Failed to merge data: {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some logging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to
- Use
logger = logging.getLogger(__name__)instead of logging directly to the root logger. - Use
logger.exception("Failed to merge data")since it keeps the traceback. See documentation of the logging module for more information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had point 1 https://github.com/apache/airflow/pull/19053/files#diff-5d885a5057f9caeddb7c96e09d905011e8bb58024009d1f9badaa37dee9ab283R504 but I think I keep losing context during copy-paste between the 'full code' and smaller examples. I'll take another pass.
Thanks, re point 2. I'm new to python and thought I was doing that correctly, but will remove.
| req.raise_for_status() | ||
| with open("/opt/airflow/dags/files/employees.csv", "wb") as file: | ||
| for chunk in req.iter_content(chunk_size=1024): | ||
| if chunk: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this if check is needed? (Not a big deal though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unclear to me. It was a big deal previous when strings were used and a trailing newline at the end of the file was causing issues. I can remove but was also thinking "meh, works fine"
| file.write(row + "\n") | ||
| with requests.get(url, stream=True) as req: | ||
| req.raise_for_status() | ||
| with open("/opt/airflow/dags/files/employees.csv", "wb") as file: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| with open("/opt/airflow/dags/files/employees.csv", "wb") as file: | |
| with open(data_path, "wb") as file: |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, thanks.
|
Instead of repeating the example code, we should probably move the example to |
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
closes: #18950
Connectionin tutorial exampleQuestions/alternatives
A simpler approach for a new engineer here would be if the default airflow docker postgres configuration matched the default postgres connection defined in
airflow/utils/db.py. That is,postgres_defaultdefines username (er "login")postgres, while the airflowdocker-compose.ymlfile uses loginairflow. If these matched, the tutorial could skip the preamble and would work out of the box with a hard-coded connection id ofpostgres_default.I've chosen to update the documentation to match the current realities here, as I'm new to airflow and am not clear on what impact changing
docker-composeor the default postgres connection would have.