Skip to content

Conversation

@drobert
Copy link
Contributor

@drobert drobert commented Oct 18, 2021

closes: #18950

  • added notes on configuring a Connection in tutorial example
  • added logging to tutorial example
  • used streaming ingest in tutorial example

Questions/alternatives

A simpler approach for a new engineer here would be if the default airflow docker postgres configuration matched the default postgres connection defined in airflow/utils/db.py. That is, postgres_default defines username (er "login") postgres, while the airflow docker-compose.yml file uses login airflow. If these matched, the tutorial could skip the preamble and would work out of the box with a hard-coded connection id of postgres_default.

I've chosen to update the documentation to match the current realities here, as I'm new to airflow and am not clear on what impact changing docker-compose or the default postgres connection would have.

@drobert drobert force-pushed the 18950-tutorial-docs-connection branch 3 times, most recently from b291e16 to cdaa140 Compare October 18, 2021 19:48
- added logging to tutorial example
- used streaming ingest in tutorial example
@drobert drobert force-pushed the 18950-tutorial-docs-connection branch 2 times, most recently from 7eed1b1 to 690eb3f Compare October 19, 2021 14:46
@drobert drobert force-pushed the 18950-tutorial-docs-connection branch from 690eb3f to f418268 Compare October 19, 2021 15:55
want to add a docker-appropriate postgres connection (the following creates one that matches postgres as
configured in ``docker-compose.yml``):

.. code-block:: bash
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally this would be nested within :note above; unclear to me if that's supported, but it was easier to debug other syntax issues with it separate

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was easier to debug other syntax issues with it separate

What's wrong with

.. note::

    Airflow manages databases using ...

    .. code-block:: bash

        airflow connections ...

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably nothing now; something was wrong and it was too confusing to debug. I'll put it back and re-try.

for row in response.text.split("\n"):
if row:
file.write(row + "\n")
with requests.get(url, stream=True) as req:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Streams the content now instead of loading into memory and splitting by line (then re-adding lines)

conn.commit()
return 0
except Exception as e:
logging.error(f"Failed to merge data: {e}")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some logging

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to

  1. Use logger = logging.getLogger(__name__) instead of logging directly to the root logger.
  2. Use logger.exception("Failed to merge data") since it keeps the traceback. See documentation of the logging module for more information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had point 1 https://github.com/apache/airflow/pull/19053/files#diff-5d885a5057f9caeddb7c96e09d905011e8bb58024009d1f9badaa37dee9ab283R504 but I think I keep losing context during copy-paste between the 'full code' and smaller examples. I'll take another pass.

Thanks, re point 2. I'm new to python and thought I was doing that correctly, but will remove.

@drobert drobert marked this pull request as ready for review October 19, 2021 17:27
@drobert drobert requested a review from kaxil as a code owner October 19, 2021 17:27
req.raise_for_status()
with open("/opt/airflow/dags/files/employees.csv", "wb") as file:
for chunk in req.iter_content(chunk_size=1024):
if chunk:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this if check is needed? (Not a big deal though)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unclear to me. It was a big deal previous when strings were used and a trailing newline at the end of the file was causing issues. I can remove but was also thinking "meh, works fine"

file.write(row + "\n")
with requests.get(url, stream=True) as req:
req.raise_for_status()
with open("/opt/airflow/dags/files/employees.csv", "wb") as file:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
with open("/opt/airflow/dags/files/employees.csv", "wb") as file:
with open(data_path, "wb") as file:

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks.

@uranusjr
Copy link
Member

Instead of repeating the example code, we should probably move the example to airflow/example_dags and use exampleinclude instead (search for this in the documentation for usages).

@github-actions
Copy link

github-actions bot commented Dec 4, 2021

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Dec 4, 2021
@github-actions github-actions bot closed this Dec 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind:documentation stale Stale PRs per the .github/workflows/stale.yml policy file

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Basic tutorial example has many bugs

3 participants