Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Regression] Forced psycopg2 dependency on Linux causes sudden build failures #96

Closed
2 tasks done
mateusz opened this issue May 15, 2024 · 9 comments · Fixed by #113, #117 or dbt-labs/docs.getdbt.com#5707
Closed
2 tasks done

Comments

@mateusz
Copy link

mateusz commented May 15, 2024

Is this a regression?

  • I believe this is a regression in functionality
  • I have searched the existing issues, and I could not find an existing issue for this regression

Current Behavior

My existing build system now breaks due to introduction of hard psycopg2 dependency into pyproject.toml in 1.8.0 in #60 , which forces psycopg2 compilation from source on Linux, which hasn't got the dependencies installed in my docker image, which breaks the build. My desired production setup is psycopg2-binary (for consistency), and I do not wish to build psycopg2 from source (because I don't want the dev dependencies).

Expected/Previous Behavior

I expect a minor release to not break an existing build system, and I expect the same psycopg2 variant to be used across dev and prod for quality reasons (replicable build).

Steps To Reproduce

  1. On an existing psycopg2-binary setup, update dbt-postgres to 1.8.0.

Relevant log output

#0 21.05   ChefBuildError
#0 21.05 
#0 21.05   Backend subprocess exited when trying to invoke get_requires_for_build_wheel
#0 21.05   
#0 21.05   running egg_info
#0 21.05   writing psycopg2.egg-info/PKG-INFO
#0 21.05   writing dependency_links to psycopg2.egg-info/dependency_links.txt
#0 21.05   writing top-level names to psycopg2.egg-info/top_level.txt
#0 21.05   
#0 21.05   Error: pg_config executable not found.
#0 21.05   
#0 21.05   pg_config is required to build psycopg2 from source.  Please add the directory
#0 21.05   containing pg_config to the $PATH or specify the full executable path with the
#0 21.05   option:
#0 21.05   
#0 21.05       python setup.py build_ext --pg-config /path/to/pg_config build ...
#0 21.05   
#0 21.05   or with the pg_config option in 'setup.cfg'.
#0 21.05   
#0 21.05   If you prefer to avoid building psycopg2 from source, please install the PyPI
#0 21.05   'psycopg2-binary' package instead.
#0 21.05   
#0 21.05   For further information please check the 'doc/src/install.rst' file (also at
#0 21.05   <https://www.psycopg.org/docs/install.html>).

Environment

- OS: Mac
- Python: 3.9
- dbt-postgres (working version): 1.7.14
- dbt-postgres (regression version): 1.8.0

Additional Context

There is also a comment here from @andy-clapson challenging Linux=prod assumption.

@mateusz
Copy link
Author

mateusz commented May 15, 2024

Is there a workaround that would let me force psycopg2-binary? I have to currently resort to editing my poetry.lock which isn't too nice 😁

@andy-clapson
Copy link

andy-clapson commented May 15, 2024

second this regression - this fix should be reverted and the use case handled differently.
This in particular is right on the money:

I expect a minor release to not break an existing build system, and I expect the same psycopg2 variant to be used across dev and prod for quality reasons (replicable build).

@padbk
Copy link

padbk commented May 15, 2024

I agree. I posted here: #93

If the use of dbt-postgres is only as a dependency of dbt-redshift, psycopg2 is not needed at all.

My outstanding questions:

  1. Is Psycopg2 really needed as a depedency of dbt-postgres?
  2. If it is needed, is there a way to skip installation of Psycopg2 when using dbt-redshift only for example?
  3. If it is not possible to skip installation, is there a way to specify to use Psycopg2-binary instead of having to build from source when using Linux?

@jtcohen6
Copy link
Contributor

@mateusz @andy-clapson @padbk Thanks for opening the issue, and for the detailed writeups. I want to apologize for the surprise this caused. While this isn't a change in the functionality within dbt-postgres, it was a breaking change for users on Linux distributions who are missing the build dependencies for psycopg2 and previously installed psycogp2-binary by default.

Thanks @mikealfare for already responding over here:

The previous behavior was to check an environment variable (DBT_PSYCOPG2_NAME) during pip install to determine whether to install psycopg2 or psycopg2-binary. We needed to change this behavior because:

  • We are migrating all our codebases from setup.py to pyproject.toml, following the tide of the Python packaging ecosystem.
  • Our previous approach to dynamically injecting build dependencies based on an environment variable doesn't have a clear analogue in pyproject.toml. (This also goes against general guidance.)

What you can do:

  • If you're on Linux, you can install the psycopg2 build dependencies, which will also have the effect of making psycopg2 faster (like what @padbk showed in Psycopg2 dependency introduced in 1.8.0 #93
  • The problem is, if you're on Linux and you want psycopg2-binary, and you do not want to install the build dependencies, there is no good way to get this working.

What we can look into doing:

  1. Try restoring the previous conditional behavior, via a Hatch hook instead of dynamic code in setup.py. We made an initial attempt at this (revert default psycopg2 back to psycopg2-binary #41), but it was not working correctly ([Regression] Installing psycopg2-binary instead of psycopg2 #50) and so we removed it in Install psycopg2 based on platform #60.
  2. If we cannot get (1) working, we should make psycopg2-binary the default across the board, even for Linux. For production pipelines, we would then recommend that users add pip uninstall psycopg2-binary && pip install psycopg2 (and hope all transitive dependencies work). This is ugly, but it replicates the previous "opt-in" behavior, via two additional pip commands instead of setting DBT_PSYCOPG2_NAME=psycopg2.

Rejected mitigations:

  1. For Linux users who are missing psycopg2 build dependencies, try installing these build dependencies automatically (via Hatch hook). We could support this for a handful of the most popular Linux distributions... but this feels like a dangerous path to start heading down, and I don't think it should be within the scope of dbt-postgres.
  2. Use extras as a way to conditionally support either psycopg2 or psycopg2-binary for installation. However, pip install dbt-postgres would stop working — everyone would need to replace with either pip install dbt-postgres[psycogp2] or pip install dbt-postgres[psycopg2-binary]. (This has long been the case for dbt-spark, which requires totally unrelated dependencies for different connection methods — but it's a convention we established many years ago and before the v1.0 release.)
  3. See if we could figure out how to catch the pip --no-binary flag and choose psycopg2 / psycopg-binary accordingly. This is the more-standard pattern for source vs. dist, but several years ago the psycopg2 maintainers decided to instead split into two separate named packages. There's quite a lot of history here... (psycopg2-binary: Why? psycopg/psycopg2#674)

@jtcohen6 jtcohen6 removed the triage label May 15, 2024
@mateusz
Copy link
Author

mateusz commented May 15, 2024

Thank you for your in-depth answer and understanding. I had a quick look at providing optional deps via extras, but didn't find anything over what you already provided.

Philosophically speaking, 1.8.0 is already backwards-incompatible and so perhaps there is a case here to embrace this, explain to the community, and switch to the most pythonic approach that's best for the long run?

@farovictor
Copy link

farovictor commented May 16, 2024

To anyone that needs a working solution and doesn't care about using a bin version of Postgres connector. I have forked this repo and updated the readme with instructions so you can build your package. Here is the link: https://github.com/farovictor/dbt-postgres

If you do not want to pull the fork you still can do it by yourself, pull official repo and do this:

  1. Change the dependency definition for psycopg2 in pyproject.toml, to look like this:
dependencies = [
    "psycopg2-binary>=2.9,<3.0",
    ...

Remove any other psycopg2 definition in file, they are splited by platform. Just remove it and add the one above.

  1. Install hatch on your system or environment:
pip install hatch
  1. Build the package
hatch build
  1. Install dbt-postgres from your wheel file located in the dist folder, created during the build process.

Be happy.

@jtcohen6
Copy link
Contributor

jtcohen6 commented Jun 6, 2024

To follow up here, after some more investigation and discussion, this is the approach we've landed on:

we should make psycopg2-binary the default across the board, even for Linux. For production pipelines, we would then recommend that users add pip uninstall psycopg2-binary && pip install psycopg2 (and hope all transitive dependencies work). This is ugly, but it replicates the previous "opt-in" behavior, via two additional pip commands instead of setting DBT_PSYCOPG2_NAME=psycopg2.

@georgek
Copy link

georgek commented Jun 6, 2024

I think that's a reasonable compromise. Just out of interest, would use of psycopg (v3) make things easier at all?

@jtcohen6
Copy link
Contributor

jtcohen6 commented Jun 6, 2024

@georgek Fair question. While psycopg (3) makes it possible to install either psycopg (source) or psycopg[binary] (binary) by specifying the "extra," for us to pass that conditional behavior through to end users, I think it would still work the same as the second "rejected mitigation" above:

Use extras as a way to conditionally support either psycopg2 or psycopg2-binary for installation. However, pip install dbt-postgres would stop working — everyone would need to replace with either pip install dbt-postgres[psycogp2] or pip install dbt-postgres[psycopg2-binary]. (This has long been the case for dbt-spark, which requires totally unrelated dependencies for different connection methods — but it's a convention we established many years ago and before the v1.0 release.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment