Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding input and format arguments to cli tool #6

Merged
merged 3 commits into from
Jul 1, 2020

Conversation

IKnowLogic
Copy link

Reason for pull request:

This pull request adds an <input/dir> argument to the cli tool. This enables the user to specify the location of the ystafdb data, instead of being a static path. The pull request also adds checks, to verify csv. file paths exists, before using them. This, along with better installation documentation, should resolve Issue 1 and Issue 4. To have consistent cli options, the regenerate option is renamed to -o for output.

The pull request also adds documentation for how to use an existing virtual environment with this software, which solves Issue 2

File overview

README.md:

  • Adding documentation for the usage of an existing virtual environment.
  • Changing the Usage section to reflect new cli options. Among other the -i and -f options.

setup.py:

  • We don't have to package the data anymore, since we search for it locally.
  • Python v3.8 added.

ystafdb/filesystem.py:

  • Adding cli option -f to choose output format. Currently .nt, .ttl and .xml are functional.

@IKnowLogic IKnowLogic added bug Something isn't working documentation Improvements or additions to documentation enhancement New feature or request good first issue Good for newcomers labels Jun 30, 2020
Copy link
Contributor

@kuzeko kuzeko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please double-check the commented points.

ystafdb/config_parser.py Outdated Show resolved Hide resolved
ystafdb/config_parser.py Outdated Show resolved Hide resolved
ystafdb/config_parser.py Outdated Show resolved Hide resolved
ystafdb/foaf.py Show resolved Hide resolved
Copy link
Contributor

@tngTUDOR tngTUDOR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, look at the following complaints from cd ystafdb; pylama -i E501 (pylama ignoring the max 80 chars per line, since the repo does not have a config for this:

ystafdb_metadata.py:1:1: W0611 '.data_dir' imported but unused [pyflakes]
ystafdb_metadata.py:23:1: C901 'generate_ystafdb_metadata_uris' is too complex (18) [mccabe]
ystafdb_metadata.py:38:5: E303 too many blank lines (2) [pycodestyle]
ystafdb_metadata.py:60:1: W0612 local variable 'author' is assigned to but never used [pyflakes]
__init__.py:10:1: E402 module level import not at top of file [pycodestyle]
__init__.py:11:1: E402 module level import not at top of file [pycodestyle]
__init__.py:16:41: W292 no newline at end of file [pycodestyle]
graph_common.py:6:1: E402 module level import not at top of file [pycodestyle]
graph_common.py:7:1: E402 module level import not at top of file [pycodestyle]
config_parser.py:5:1: E302 expected 2 blank lines, found 1 [pycodestyle]
config_parser.py:5:1: C901 'get_config_data' is too complex (14) [mccabe]
config_parser.py:60:5: E303 too many blank lines (2) [pycodestyle]
config_parser.py:119:30: W292 no newline at end of file [pycodestyle]
provenance_uris.py:1:1: W0611 '.filesystem.write_graph' imported but unused [pyflakes]
provenance_uris.py:6:1: W0611 'pathlib.Path' imported but unused [pyflakes]
provenance_uris.py:13:1: W0612 local variable 'purl' is assigned to but never used [pyflakes]
provenance_uris.py:41:1: W0612 local variable 'purl' is assigned to but never used [pyflakes]
provenance_uris.py:113:13: W292 no newline at end of file [pycodestyle]
foaf.py:17:1: W0612 local variable 'purl' is assigned to but never used [pyflakes]
bin/ystafdb.py:14:1: W0611 'docopt.docopt' imported but unused [pyflakes]
bin/ystafdb.py:35:38: E231 missing whitespace after ',' [pycodestyle]
bin/ystafdb.py:35:44: E231 missing whitespace after ',' [pycodestyle]

$ cd ystafdb
```
##### Download Base Data
Before progressing the installation, the base ystafdb data must be downloaded, and placed in a folder of your choosing, inside the repo.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inside the repo <- it can be "anywhere", not necessarily inside the repo.


Options:
-h --help Show this screen.
--version Show version.

"""
import argparse
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docopt was used at the beginning ... let's keep it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my comment was not clear.
I meant to say that I would have liked to use docopt instead of argparse. But we should remove the import of docopt if we are not using it anymore.

@@ -1,2 +1,7 @@
appdirs
docopt
pandas
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you move the dependencies to the requirements.txt file, an install that uses python setup.py without pip install -r requirements.txt won't pull the dependencies anymore.

I personally prefer to have things managed along the setup.py [or even better, setup.cfg]. Please take a look at https://packaging.python.org/discussions/install-requires-vs-requirements/#install-requires-vs-requirements-files and decide which one you prefer. I find it makes life easier to have it only in the setup.[py,cfg]


Options:
-h --help Show this screen.
--version Show version.

"""
import argparse
from docopt import docopt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you won't use docopt anymore for the CLI, remove this import.
docopt is fun: it allows to describe the cli options at the same time as the docstrings of the function.
Do you have a reason why you prefer argparse ?

@tngTUDOR tngTUDOR self-requested a review June 30, 2020 13:44
@tngTUDOR
Copy link
Contributor

tngTUDOR commented Jun 30, 2020

Update to summarize comment:

  • If the config.json is supposed to be part of the package, it must be found by the cli when run from anywhere in the filesystem.
  • There are still some hardcoded paths or an assumption that the cli is run from ystafdb directory.

Also, before merging, The following scenario does not work:

(I install the cli, and use it outside of the repo in the filesystem)

git clone repo
cd repo
pip install -r requirements; python setup.py install
cd ..
mkdir input_data
cp YSTAFDB_CSV_files.zip input_data
unzip YSTAFDB_CSV_files.zip
cd ..
mkdir my_dir
cd my_dir
ystafdb-cli -i ../input_data -f ttl

Here is the error:

➜ ystafdb-cli -i ../input_data
No config.json file

And if I run it from the repo dir

commit a46b58f9a63b2539e0e312cbdff101bd05d73935 (HEAD -> cli-argument-improvement, origin/cli-argument-improvement)
Author: Emil Riis Hansen <emil3507@gmail.com>
Date:   Tue Jun 30 15:09:38 2020 +0200

    proper exiting when error arises and removed duplicate names

Traceback (most recent call last):
  File "<frozen zipimport>", line 177, in get_data
KeyError: 'ystafdb/../input_data/publications.csv'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/gannascus/miniconda3/envs/ystafdb-pr6/bin/ystafdb-cli", line 33, in <module>
    sys.exit(load_entry_point('ystafdb==0.4', 'console_scripts', 'ystafdb-cli')())
  File "/home/gannascus/miniconda3/envs/ystafdb-pr6/lib/python3.8/site-packages/ystafdb-0.4-py3.8.egg/ystafdb/bin/ystafdb.py", line 42, in main
  File "/home/gannascus/miniconda3/envs/ystafdb-pr6/lib/python3.8/site-packages/ystafdb-0.4-py3.8.egg/ystafdb/__init__.py", line 16, in generate_ystafdb
  File "/home/gannascus/miniconda3/envs/ystafdb-pr6/lib/python3.8/site-packages/ystafdb-0.4-py3.8.egg/ystafdb/ystafdb_metadata.py", line 45, in generate_ystafdb_metadata_uris
  File "/home/gannascus/miniconda3/envs/ystafdb-pr6/lib/python3.8/site-packages/pkg_resources/__init__.py", line 1150, in resource_stream
    return get_provider(package_or_requirement).get_resource_stream(
  File "/home/gannascus/miniconda3/envs/ystafdb-pr6/lib/python3.8/site-packages/pkg_resources/__init__.py", line 1399, in get_resource_stream
    return io.BytesIO(self.get_resource_string(manager, resource_name))
  File "/home/gannascus/miniconda3/envs/ystafdb-pr6/lib/python3.8/site-packages/pkg_resources/__init__.py", line 1402, in get_resource_string
    return self._get(self._fn(self.module_path, resource_name))
  File "/home/gannascus/miniconda3/envs/ystafdb-pr6/lib/python3.8/site-packages/pkg_resources/__init__.py", line 1570, in _get
    return self.loader.get_data(path)
  File "<frozen zipimport>", line 179, in get_data
OSError: [Errno 0] : 'ystafdb/../input_data/publications.csv'

Copy link
Contributor

@tngTUDOR tngTUDOR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that if the config.json file is totally specific to the yale db we're parsing here, there is little need for the user to know about this file.
I suggest we keep the config.json as a "package" resource, and load it from there, so that the cli can be used from anywhere in the filesystem.
From my understanding of the usage of this repo / package, it is mostly for the cli that it exists. Cli is first class citizen, and keeping the config.json inside the package seems better suited because you wouldn't have to clone the repo to use it. Of course, if you think that the user should be able to provide his/her own config.json, then we should open another issue to add this feature as an option to the cli.

Here is a little patch that implements what I suggest (to keep the config.json inside the package).
If you agree @IKnowLogic @kuzeko , I can commit it later.
edit: I removed the patch from the message contents because I pushed a commit (815e98c)

@tngTUDOR
Copy link
Contributor

For the scenario I mention in my previous comment I think that we are trying to forcefully load the data as a "package resource", but this is not the case anymore if we allow the user to supply a directory with the data.

Here are my working assumptions regarding ystafdb:

  • the CSV files from The Yale Stocks and Flows Database are published elsewhere and we don't want to re-distribute them. This is why we don't have them inside the "data" directory of the package.
  • installing the package without the actual repository is the best possible user experience: pip install ystafdb; ystafdb-cli -i input. That is to me the shortest and simplest approach to say I can use the tool (module the name of the options / commands of the cli)

That is why I insist on the location of the config.json and the flow .

@tngTUDOR
Copy link
Contributor

tngTUDOR commented Jun 30, 2020

The following patch can be used to load files from the path, instead of trying to load them as resources.
(it might not be possible to apply it as is. If you are OK with it, I can commit this changes)
edit I removed patch from comment because I pushed a commit (815e98c)

@IKnowLogic
Copy link
Author

IKnowLogic commented Jul 1, 2020

Hi @tngTUDOR , @kuzeko

I agree with request 1. Actually, since we have a repository for each dataset, the config file is not really needed at all, but let's keep it for now.

For request 2 I think you are correct. The only reason we don't have the csv files in the data directory is that we don't want to redistribute the ystafdb dataset. Therefore we might as well omit to load them as a package resource.

If @kuzeko also agrees, you are very welcome to push the changes.

@kuzeko
Copy link
Contributor

kuzeko commented Jul 1, 2020

Looks fine to me. Please proceed :)
Great work you two 👍

@kuzeko kuzeko merged commit e3b86bb into main Jul 1, 2020
@kuzeko kuzeko deleted the cli-argument-improvement branch July 1, 2020 10:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working documentation Improvements or additions to documentation enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants