Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: relax pandas version requirements for databuilder #1858

Merged
merged 1 commit into from
Jun 11, 2022

Conversation

henridwyer
Copy link
Contributor

Summary of Changes

Relax the pandas version requirements for the databuilder package, which is currently limited to a version released over 2 year ago, to allow for more recent versions. This will make it possible to easily include the package in repos that use modern versions of pandas.

Tests

Tested on pandas 1.4.2 (the current version).

Pandas usage was already tested in unit tests so no new tests are necessary.

Documentation

Very little of the pandas API is used, essentially just read_csv, which is a very stable part of the API.

CheckList

Make sure you have checked all steps below to ensure a timely review.

  • PR title addresses the issue accurately and concisely. Example: "Updates the version of Flask to v1.0.2"
  • PR includes a summary of changes.
  • PR adds unit tests, updates existing unit tests, OR documents why no test additions or modifications are needed.
  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does

@henridwyer henridwyer requested a review from a team as a code owner May 12, 2022 14:56
@boring-cyborg boring-cyborg bot added the area:databuilder From databuilder folder label May 12, 2022
@boring-cyborg
Copy link

boring-cyborg bot commented May 12, 2022

Congratulations on your first Pull Request and welcome to Amundsen community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/amundsen-io/amundsen/blob/main/CONTRIBUTING.md)

@MrwanBaghdad
Copy link
Contributor

Very little of the pandas API is used, essentially just read_csv, which is a very stable part of the API.

You raise a very good point, which begs the question couldn't this part be refactored to use the csv.reader, which is a standard module in python?

@henridwyer
Copy link
Contributor Author

There is a unit test error, which seems to be related to fastapi imports:

/opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/site-packages/starlette/concurrency.py:10: in <module>
    from typing_extensions import ParamSpec
E   ImportError: cannot import name 'ParamSpec' from 'typing_extensions' (/opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/site-packages/typing_extensions.py)

I saw it in other builds such as https://github.com/amundsen-io/amundsen/runs/6338504223, so I suspect it's not related

@feng-tao
Copy link
Member

ROR collecting tests/unit/extractor/test_feast_extractor.py _________
ImportError while importing test module '/home/runner/work/amundsen/amundsen/databuilder/tests/unit/extractor/test_feast_extractor.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/site-packages/_pytest/python.py:608: in _importtestmodule
    mod = import_path(self.path, mode=importmode, root=self.config.rootpath)
/opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/site-packages/_pytest/pathlib.py:533: in import_path
    importlib.import_module(module_name)
/opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
<frozen importlib._bootstrap>:1006: in _gcd_import
    ???
<frozen importlib._bootstrap>:983: in _find_and_load
    ???
<frozen importlib._bootstrap>:967: in _find_and_load_unlocked
    ???
<frozen importlib._bootstrap>:677: in _load_unlocked
    ???
/opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/site-packages/_pytest/assertion/rewrite.py:[16](https://github.com/amundsen-io/amundsen/runs/6409081024?check_suite_focus=true#step:5:16)8: in exec_module
    exec(co, module.__dict__)
tests/unit/extractor/test_feast_extractor.py:13: in <module>
    from databuilder.extractor.feast_extractor import FeastExtractor
databuilder/extractor/feast_extractor.py:7: in <module>
    from feast import FeatureStore, FeatureView
/opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/site-packages/feast/__init__.py:13: in <module>
    from .feature_store import FeatureStore
/opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/site-packages/feast/feature_store.py:38: in <module>
    from feast import feature_server, flags, flags_helper, utils
/opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/site-packages/feast/feature_server.py:3: in <module>
    from fastapi import FastAPI, HTTPException, Request
/opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/site-packages/fastapi/__init__.py:7: in <module>
    from .applications import FastAPI as FastAPI
/opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/site-packages/fastapi/applications.py:4: in <module>
    from fastapi import routing
/opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/site-packages/fastapi/routing.py:[22](https://github.com/amundsen-io/amundsen/runs/6409081024?check_suite_focus=true#step:5:22): in <module>
    from fastapi.datastructures import Default, DefaultPlaceholder
/opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/site-packages/fastapi/datastructures.py:3: in <module>
    from starlette.datastructures import URL as URL  # noqa: F[40](https://github.com/amundsen-io/amundsen/runs/6409081024?check_suite_focus=true#step:5:40)1
/opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/site-packages/starlette/datastructures.py:8: in <module>
    from starlette.concurrency import run_in_threadpool
/opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/site-packages/starlette/concurrency.py:10: in <module>
    from typing_extensions import ParamSpec
E   ImportError: cannot import name 'ParamSpec' from 'typing_extensions'

@chonyy
Copy link
Contributor

chonyy commented May 30, 2022

@feng-tao This is a unit test import error that does not relate to our change. Any way we could get past it or get our PR merged?

@henridwyer henridwyer changed the title fix - relax pandas version requirements for databuilder fix: relax pandas version requirements for databuilder Jun 9, 2022
Signed-off-by: Henri Dwyer <dwyerh@gene.com>
@henridwyer
Copy link
Contributor Author

I saw a fix for the fastapi bug was merged. So I rebased this branch @feng-tao

@feng-tao
Copy link
Member

thanks @henridwyer

@feng-tao feng-tao merged commit 2d8f969 into amundsen-io:main Jun 11, 2022
@boring-cyborg
Copy link

boring-cyborg bot commented Jun 11, 2022

Awesome work, congrats on your first merged pull request!

@henridwyer henridwyer deleted the databuilder_pandas_version branch June 13, 2022 20:41
@HaoXuAI HaoXuAI mentioned this pull request Jul 1, 2024
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:databuilder From databuilder folder
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants