Skip to content
This repository has been archived by the owner on Aug 4, 2023. It is now read-only.

Update to postgres 13 and apache-airflow 1.10.15 #54

Merged
merged 4 commits into from Apr 28, 2021
Merged

Update to postgres 13 and apache-airflow 1.10.15 #54

merged 4 commits into from Apr 28, 2021

Conversation

obulat
Copy link
Contributor

@obulat obulat commented Apr 23, 2021

This PR is based on top of #52 and #53 to fix build problems.

It updates Postgres version from 10 to 13. It also does the second step on the way of updating apache-airflow to version >2.0.0: updates the package to the bridge version 1.10.15.

This updates makes the pandas workaround from #52 unnecessary: the new version of airflow doesn't require the old version of pandas that required Cython and numpy before installation.

The tests were failing due to an == comparison returning false for a difference of around 0.00000000001, so I added pytest.approx function to fix them.

The log of the cc_catalog_airflow_postgres_1 container has this first line:
PostgreSQL Database directory appears to contain a database; Skipping initialization
I'm not sure if this is expected or not.

P.S. I couldn't checkout #53, probably because it's a draft PR (I get an error: couldn't find remote ref pull/53/head), so I created a new one.

Fix 2 failing tests, remove pandas version to ensure that newer version is installed
Signed-off-by: Olga Bulat <obulat@gmail.com>
@dhruvkb
Copy link
Member

dhruvkb commented Apr 23, 2021

Btw @obulat you can un-draft PRs as a maintainer by clicking the "Ready for review" button just above the comment box.

@zackkrida
Copy link
Member

zackkrida commented Apr 23, 2021

@obulat The test fixes are very appreciated. To you and @dhruvkb, since you have a bit more python experience than me: is pandas even being used? Maybe it's a dependency of another package? I tried searching for it with ripgrep (both searching for pandas and pd.) and I don't see it imported or accessed in any files:

ccsearch-catalog on  postgres-13 [$] via 🐍 v2.7.16 on ☁️  (us-east-2)
❯ rg 'pandas' -l
src/cc_catalog_airflow/requirements_prod.txt

ccsearch-catalog on  postgres-13 [$] via 🐍 v2.7.16 on ☁️  (us-east-2)
❯ rg 'pd\.' -l

The only result is the requirements file. Any ideas?

@obulat
Copy link
Contributor Author

obulat commented Apr 23, 2021

I had the same question :) So, on @dhruvkb 's recommendation, I ran pipdeptree on the docker container as and I saw that pandas is being used by apache airflow.

@zackkrida zackkrida mentioned this pull request Apr 23, 2021
Copy link
Member

@dhruvkb dhruvkb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have much knowledge of this repo but I checked it out and ran the tests. All 636 tests passed (albeit with 6 deprecation warnings) so it LGTM!

Signed-off-by: Olga Bulat <obulat@gmail.com>
@obulat
Copy link
Contributor Author

obulat commented Apr 26, 2021

Previously, I've seen a deprecation notice from airflow, something like airflow initdb is deprecated in airflow versions >2, please use airflow db init instead. After I changed it, database stopped initializing, so I reverted it back to airflow initdb. We are still on version 1 (1.10.15), so using the old syntax is correct.

I had several problems running the catalog:

  • My local postgres installation with the same credentials was running, so it was interfering with process of creating database in Docker
  • After stopping local postgres, I can see db being created, but the tables are empty.
  • local /tmp/docker_postgres_data is not being created. When I create this folder manually, the files from Docker ARE written into it, though.
  • I run the tests and they pass, but I think I need to do more to test if the code works correctly.
Some of the errors in the postgres container
2021-04-26 11:03:54.508 UTC [6798] ERROR: "image_view" is not a table or materialized view

2021-04-26 11:03:54.508 UTC [6798] STATEMENT: REFRESH MATERIALIZED VIEW CONCURRENTLY image_view;

2021-04-26 11:04:12.364 UTC [6917] ERROR: cannot refresh materialized view "public.image_popularity_constants" concurrently

2021-04-26 11:04:12.364 UTC [6917] HINT: Create a unique index with no WHERE clause on one or more columns of the materialized view.

2021-04-26 11:04:12.364 UTC [6917] STATEMENT: REFRESH MATERIALIZED VIEW CONCURRENTLY image_popularity_constants;

2021-04-26 11:05:23.356 UTC [7201] ERROR: table "provider_image_dataoverwrite20210426t110200" does not exist

2021-04-26 11:05:23.356 UTC [7201] STATEMENT: DROP TABLE provider_image_dataoverwrite20210426T110200;

@obulat
Copy link
Contributor Author

obulat commented Apr 26, 2021

I also updated the Dockerfiles to use python3.8-buster images, and they build without errors, and tests pass. Should we update the python version, too?

@zackkrida
Copy link
Member

@obulat sure, updating the python images sounds fine then.

For your other problem, I would suggest the following. Delete and manually recreate the tmp/docker_postgres_data table, make sure your local postgres isn't running, and then try docker-compose up --force-recreate --build.

This should create the db with the tables and views you would expect to see.

Copy link
Member

@zackkrida zackkrida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work all around. What a milestone!

@zackkrida zackkrida merged commit aefd1e3 into WordPress:master Apr 28, 2021
@zackkrida zackkrida deleted the update_dependencies branch April 28, 2021 02:00
@zackkrida zackkrida added ✨ goal: improvement Improvement to an existing user-facing feature 🤖 aspect: dx Concerns developers' experience with the codebase 🧰 goal: internal improvement Improvement that benefits maintainers, not users and removed ✨ goal: improvement Improvement to an existing user-facing feature labels Aug 12, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
🤖 aspect: dx Concerns developers' experience with the codebase 🧰 goal: internal improvement Improvement that benefits maintainers, not users
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants