Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError: sequence item 0: expected str instance, NoneType found on running python setup.py java on source #127

Closed
sauloal opened this issue Jan 30, 2021 · 14 comments

Comments

@sauloal
Copy link

sauloal commented Jan 30, 2021

$ git clone https://github.com/nils-braun/dask-sql.git

$ cd dask-sql

$ pytest tests
ERROR: usage: pytest [options] [file_or_dir] [file_or_dir] [...]
pytest: error: unrecognized arguments: --cov --cov-config=.coveragerc tests
  inifile: /mnt/d/Programs/dask/dask-sql/pytest.ini
  rootdir: /mnt/d/Programs/dask/dask-sql


$ python setup.py java
running java
Traceback (most recent call last):
  File "setup.py", line 93, in <module>
    command_options={"build_sphinx": {"source_dir": ("setup.py", "docs"),}},
  File "/home/saulo/anaconda3/lib/python3.7/site-packages/setuptools/__init__.py", line 165, in setup
    return distutils.core.setup(**attrs)
  File "/home/saulo/anaconda3/lib/python3.7/distutils/core.py", line 148, in setup
    dist.run_commands()
  File "/home/saulo/anaconda3/lib/python3.7/distutils/dist.py", line 966, in run_commands
    self.run_command(cmd)
  File "/home/saulo/anaconda3/lib/python3.7/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "setup.py", line 30, in run
    self.announce(f"Running command: {' '.join(command)}", level=distutils.log.INFO)
TypeError: sequence item 0: expected str instance, NoneType found

$ python dask-sql-test.py
Traceback (most recent call last):
  File "dask-sql-test.py", line 1, in <module>
    from dask_sql import Context
  File "/mnt/d/Programs/dask/dask-sql/dask_sql/__init__.py", line 1, in <module>
    from .context import Context
  File "/mnt/d/Programs/dask/dask-sql/dask_sql/context.py", line 9, in <module>
    from dask_sql.java import (
  File "/mnt/d/Programs/dask/dask-sql/dask_sql/java.py", line 88, in <module>
    DaskTable = com.dask.sql.schema.DaskTable
AttributeError: Java package 'com' has no attribute 'dask'
$ python -V
Python 3.7.6
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.1 LTS
Release:        20.04
Codename:       focal
$ java -version
openjdk version "14.0.2" 2020-07-14
OpenJDK Runtime Environment (build 14.0.2+12-Ubuntu-120.04)
OpenJDK 64-Bit Server VM (build 14.0.2+12-Ubuntu-120.04, mixed mode, sharing)
@nils-braun
Copy link
Collaborator

Thanks again!
To run those commands, you still want to install two additional packages:

  • For the tests, it is pytest-cov, which automatically tests for the coverage (can be installed via pip)
  • For the Java setup, you need to install mvn (maven). How this is done depends on your setup, but I guess you should be able to just download it (it is not a python package).

Thanks for testing all that out. As you see, dask-sql does not have a very good support for non-conda installations (so far).
I will make the two additional installation steps more clear in the documentation.
If people install dask-sql directly via pip (not for development, but for production), these things do not need to be done. You can have a look into the conda.yaml to see what is needed for development.

@sauloal
Copy link
Author

sauloal commented Jan 31, 2021

Hello @nils-braun
I would install with pip but the changes for python 3.7 are not in pip yet and I was curious to try.
I'll try installing the requirements.
Regards

@nils-braun
Copy link
Collaborator

I am planning to do a patch release soon, might already be in the next days. Hopefully it will be easier after that!

@nils-braun
Copy link
Collaborator

nils-braun commented Jan 31, 2021

Thanks again for still testing even after running against all those blockers. I have added a better error message in the related PR for the error message you have mentioned and some more documentation. After the PR is merged, you should be able to install the development requirements with

pip install -e ".[dev]"

nils-braun added a commit that referenced this issue Jan 31, 2021
* Change conda.yaml to conda.txt and make pip installation more clear

* Give better error message on missing maven (see #127)

* Missing change in Dockerfile

* Added dev requirements
@sauloal
Copy link
Author

sauloal commented Jan 31, 2021

It got much further but still fails some tests

$ git clone https://github.com/nils-braun/dask-sql.git

$ cd dask-sql

$ sudo apt install maven

$ pip install pytest-cov

$ python setup.py java

$ pytest tests

=============================================== short test summary info ================================================
FAILED tests/integration/test_analyze.py::test_analyze - TypeError: assert_frame_equal() got an unexpected keyword ar...
FAILED tests/integration/test_groupby.py::test_group_by_nan - AssertionError: Attributes of DataFrame.iloc[:, 0] (col...
FAILED tests/integration/test_model.py::test_training_and_prediction - ModuleNotFoundError: No module named 'dask_ml'
FAILED tests/integration/test_model.py::test_clustering_and_prediction - ValueError: Can not import model dask_ml.clu...
FAILED tests/integration/test_model.py::test_iterative_and_prediction - ModuleNotFoundError: No module named 'dask_ml'
================================ 5 failed, 119 passed, 3 skipped, 8 warnings in 47.94s =================================

@sauloal
Copy link
Author

sauloal commented Jan 31, 2021

dask-ml is also a requirement

$ pip install "dask[complete]"
$ pip install dask-ml
$ pytest tests

================================================================= short test summary info ==================================================================
FAILED tests/integration/test_analyze.py::test_analyze - TypeError: assert_frame_equal() got an unexpected keyword argument 'atol'
FAILED tests/integration/test_groupby.py::test_group_by_nan - AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="c") are different
================================================== 2 failed, 122 passed, 3 skipped, 6 warnings in 55.26s ===================================================

the first error seems to be a pandas version error:

$ python
Python 3.7.6 | packaged by conda-forge | (default, Jun  1 2020, 18:57:50)
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> pd.__version__
'1.0.5'

@nils-braun
Copy link
Collaborator

Right! Just for your information, I have added pip install -e ".[dev]" in the newest version, so you do not need to find out all dev requirements on your own :-)
Are you still running with pandas version 1.1.5? (if you want, can you post your pip list?)

@nils-braun
Copy link
Collaborator

Ah, you edited your answer in the same moment that I did. Very good! So it seems there is some incompatibility. Could you try with 1.1.5 again if possible?

@sauloal
Copy link
Author

sauloal commented Jan 31, 2021

just updated:

$ pip install --upgrade pandas
Collecting pandas
  Downloading pandas-1.2.1-cp37-cp37m-manylinux1_x86_64.whl (9.9 MB)
     |████████████████████████████████| 9.9 MB 5.5 MB/s
Requirement already satisfied, skipping upgrade: pytz>=2017.3 in /home/saulo/anaconda3/lib/python3.7/site-packages (from pandas) (2020.1)
Requirement already satisfied, skipping upgrade: numpy>=1.16.5 in /home/saulo/anaconda3/lib/python3.7/site-packages (from pandas) (1.19.4)
Requirement already satisfied, skipping upgrade: python-dateutil>=2.7.3 in /home/saulo/anaconda3/lib/python3.7/site-packages (from pandas) (2.8.1)
Requirement already satisfied, skipping upgrade: six>=1.5 in /home/saulo/anaconda3/lib/python3.7/site-packages (from python-dateutil>=2.7.3->pandas) (1.15.0)
ERROR: dask-sql 0.3.0 has requirement pandas<1.2.0, but you'll have pandas 1.2.1 which is incompatible.
Installing collected packages: pandas
  Attempting uninstall: pandas
    Found existing installation: pandas 1.0.5
    Uninstalling pandas-1.0.5:
      Successfully uninstalled pandas-1.0.5
Successfully installed pandas-1.2.1

despite the error it passed the test with some warnings

===================================================================== warnings summary =====================================================================
tests/integration/test_rex.py::test_like
tests/integration/test_rex.py::test_date_functions
  /mnt/d/Programs/dask/dask-sql/dask_sql/context.py:201: DeprecationWarning: register_dask_table is deprecated, use the more general create_table instead.
    DeprecationWarning,

tests/integration/test_rex.py::test_math_operations
tests/integration/test_rex.py::test_math_operations
  /home/saulo/anaconda3/lib/python3.7/site-packages/pandas/core/arraylike.py:358: RuntimeWarning: invalid value encountered in arcsin
    result = getattr(ufunc, method)(*inputs, **kwargs)

tests/integration/test_rex.py::test_math_operations
tests/integration/test_rex.py::test_math_operations
  /home/saulo/anaconda3/lib/python3.7/site-packages/pandas/core/arraylike.py:358: RuntimeWarning: invalid value encountered in arccos
    result = getattr(ufunc, method)(*inputs, **kwargs)

tests/integration/test_rex.py::test_date_functions
  /home/saulo/anaconda3/lib/python3.7/site-packages/dask/dataframe/accessor.py:88: FutureWarning: Series.dt.weekofyear and Series.dt.week have been deprecated.  Please use Series.dt.isocalendar().week instead.
    if callable(getattr(self._meta, key)):

tests/integration/test_rex.py::test_date_functions
tests/integration/test_rex.py::test_date_functions
  /home/saulo/anaconda3/lib/python3.7/site-packages/dask/dataframe/accessor.py:43: FutureWarning: Series.dt.weekofyear and Series.dt.week have been deprecated.  Please use Series.dt.isocalendar().week instead.
    out = getattr(getattr(obj, accessor, obj), attr)

-- Docs: https://docs.pytest.org/en/latest/warnings.html

======================================================= 124 passed, 3 skipped, 9 warnings in 51.46s ========================================================

@nils-braun
Copy link
Collaborator

The warnings are fine (its deprecations in the pandas <-> dask interface). Thank you so much for testing this all out. I will increase the minimal required pandas version to 1.1.0 (which works, I just tested).
Thanks for testing 1.2.1 - I tested some time ago with 1.2.0 and it failed, but I will repeat the tests.

@sauloal
Copy link
Author

sauloal commented Jan 31, 2021

Always a pleasure to help and to get a great tool to the toolbox :)
Cheers

@sauloal
Copy link
Author

sauloal commented Jan 31, 2021

After running pip install -e ".[dev]" works perfectly

$ python dask-sql-test.py
       name    id         x
0       Tim  1017  0.999988
1   Norbert   994  0.999949
0     Frank   983  0.999970
0    Oliver   990  0.999987
1       Dan   979  0.999991
0   Michael  1021  0.999992
0     Quinn  1012  0.999973
1    Xavier   986  0.999925
0   Charlie   961  0.999986
1     Alice  1003  0.999981
0    Ingrid  1030  0.999991
0     Zelda  1050  0.999923
0     Sarah  1084  0.999987
0     Edith  1013  0.999989
0    Ursula  1015  0.999990
1  Patricia   974  0.999999
0     Jerry   994  0.999999
0     Wendy  1000  0.999990
0     Laura  1014  0.999986
0       Ray   975  0.999939
1    Hannah   940  0.999986
0    Yvonne  1033  0.999981
0       Bob   976  0.999978
0    George  1026  0.999993
1     Kevin   992  0.999981
0    Victor   983  0.999997
0.9999788188689883

@nils-braun
Copy link
Collaborator

I have added a PR in #129 that fixes the tests for pandas 1.0 and 1.1. Now the requirement is actually only >=1.0 and <1.2 (the latter we still need, due to dask/dask#7156)

@nils-braun
Copy link
Collaborator

The upper pandas version requirement is gone now - the problem in dask is fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants