Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider splitting out taxcalc features #39

Closed
MaxGhenis opened this issue Oct 15, 2019 · 5 comments · Fixed by #52
Closed

Consider splitting out taxcalc features #39

MaxGhenis opened this issue Oct 15, 2019 · 5 comments · Fixed by #52

Comments

@MaxGhenis
Copy link
Collaborator

MaxGhenis commented Oct 15, 2019

microdf supports data and tasks that include but are not limited to taxcalc, and it's increasingly general. Given taxcalc's size and large dependency set, ideally it would be an optional dependency. While pip supports optional dependencies, they're not yet implemented in conda (conda/conda#7502).

Another option is creating a new package like microdf-taxcalc which includes both microdf and taxcalc as dependencies, and includes functions like calc_df.

It's OK for now but placeholder for future consideration.

@hdoupe
Copy link

hdoupe commented Nov 22, 2019

Pandas handles its optional/soft fastparquet dependency like this:

In [1]: import pandas as pd                                                                                                                                                                   

In [2]: df = pd.read_parquet("matchups/statcast2018.parquet", engine="fastparquet")                                                                                                           
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-2-284dbcb9c5fc> in <module>
----> 1 df = pd.read_parquet("matchups/statcast2018.parquet", engine="fastparquet")

~/miniconda3/lib/python3.7/site-packages/pandas/io/parquet.py in read_parquet(path, engine, columns, **kwargs)
    293     """
    294 
--> 295     impl = get_engine(engine)
    296     return impl.read(path, columns=columns, **kwargs)

~/miniconda3/lib/python3.7/site-packages/pandas/io/parquet.py in get_engine(engine)
     42         return PyArrowImpl()
     43     elif engine == "fastparquet":
---> 44         return FastParquetImpl()
     45 
     46 

~/miniconda3/lib/python3.7/site-packages/pandas/io/parquet.py in __init__(self)
    139         # we need to import on first use
    140         fastparquet = import_optional_dependency(
--> 141             "fastparquet", extra="fastparquet is required for parquet support."
    142         )
    143         self.api = fastparquet

~/miniconda3/lib/python3.7/site-packages/pandas/compat/_optional.py in import_optional_dependency(name, extra, raise_on_missing, on_version)
     91     except ImportError:
     92         if raise_on_missing:
---> 93             raise ImportError(message.format(name=name, extra=extra)) from None
     94         else:
     95             return None

ImportError: Missing optional dependency 'fastparquet'. fastparquet is required for parquet support. Use pip or conda to install fastparquet.

In [3]:                                                                                                                                                                                       

If you wanted to do something like this, then you could add a note to the install instructions telling people that taxcalc should be installed separately.

One thing to note is that Pandas can still read parquet files without other packages like fastpaquet being installed.

@MaxGhenis
Copy link
Collaborator Author

MaxGhenis commented Dec 5, 2019

Interesting, thanks @hdoupe. Based on this and some more poking around the code, looks like this is how pandas treats fastparquet:

  • Include as a dep in environment.yml
  • Don't include in setup.py
  • Use an import_optional_dependency function in pandas.compat._optional, which is used elsewhere internally (this also specifies the minimum version for optional deps)
  • Call import_optional_dependency in to_parquet and read_parquet in parquet.py (through a couple intermediate steps)
  • In test_parquet.py, check if it's installed with a try statement and skip tests if it isn't

I'll mirror this approach.

@MaxGhenis
Copy link
Collaborator Author

For completeness, here's the current error when importing microdf without having taxcalc installed:

import microdf
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-2-15236528f5d7> in <module>()
----> 1 import microdf

1 frames
/usr/local/lib/python3.6/dist-packages/microdf/taxcalc.py in <module>()
      1 import microdf as mdf
----> 2 import taxcalc as tc
      3 
      4 
      5 def static_baseline_calc(recs, year):

ModuleNotFoundError: No module named 'taxcalc'

@MaxGhenis
Copy link
Collaborator Author

Importing microdf now works when taxcalc isn't installed.

@MaxGhenis
Copy link
Collaborator Author

Thanks again, @hdoupe for the guidance here!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants