Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pending] pyDKB: common/hdfs.py documentation #178

Open
wants to merge 6 commits into
base: pyDKB-master-merge
Choose a base branch
from

Conversation

anastasiakaida
Copy link
Contributor

@anastasiakaida anastasiakaida commented Sep 20, 2018

Changed according to the template.

Note: there are still some empty :return: and :rtype: lines when
a function don't return anything explicitly.


Waits for #165

mgolosova and others added 6 commits September 4, 2018 13:01
 Conflicts:
	Utils/Dataflow/091_datasetsRucio/datasets_processing.py
Added:
parameters' names and types in each function
return type and object description
Note: for some functions :return: and :rtype: parameters
are empty (no return).

It can be fixed with one more commit if needed.
@mgolosova mgolosova changed the title pyDKB: common/hdfs.py documentation [pending] pyDKB: common/hdfs.py documentation Nov 13, 2018
@mgolosova
Copy link
Collaborator

Not reviewed yet as #165 should go first.

Copy link
Collaborator

@mgolosova mgolosova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few more comments:

  1. Would be good to add information about exceptions that this or that function rises (see :raises: here). As some functions return nothing, it is the only way their results should be checked.
  2. To see what a function returns, if it returns nothing, you could try following:
    >>> def f(): pass
    ... 
    >>> print f()
    None
    >>> if f() is None: print "None it is."
    ... 
    None it is.
    
  3. I believe git client(s) have settings at least to detect trailing whitespaces in commits; and you can always run git diff --staged before git commit to make sure you`re commiting what was meant to be commited. It is a good practice and, in fact, item 3 of this guide clearly states: "Review changes before every commit". It was not written for my own graphimaniac pleasure, trust me ;)

@@ -1,5 +1,6 @@
"""
Utils to interact with HDFS.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why a new line?

:param max_lines: maximum quantity of lines
:type max_lines: number (int)

:return: subprocess return code, set by poll()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return code is not set by poll() -- it is returned by the subprocess, and can be accessed by poll().
I believe it will be Ok to simply say that it is a "subprocess return code".


Return value: file name (without directory)

:param fname: file name of a downloaded file
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I`d rather say "path to file", then "file name".

:param fname: file name of a downloaded file
:type fname: string

:return: file name without its directory
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Without path"?


Return value: open file object (TemporaryFile).

:param fname: name of a temporary file
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahem.. please check again ;)

:param fname: name of a temporary file
:type fname: string

:return: open file object (TemporaryFile)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...or None in case of error.

:param mode: type of a list of files
:type mode: string

:return: a list of files' names/subdirectories inside HDFS
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...in HDFS directory.
"Inside HDFS" looks a bit too broad and can make one to believe it is about all files/subdirectories.

""" Return file name without path.

:param path: path to a file
:type path: string
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If path is None, it is taken as empty string -- I believe it should be mentioned somewhere...

""" Return dirname without filename.

:param path: path to a file
:type path: string
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above about None.

:param path: path to a file
:type path: string
:param filename: file name
:type filename: string
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above about None.

Create directory and check status of process
with subprocess.Popen.

:param dirname: a name of a created directory
Copy link
Contributor

@Evildoor Evildoor Nov 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "name of the directory" or "directory name" or something else?

  1. "a" before "name" - similar instances of "name" in this file come without "a".
  2. "a" before "directory" - it's not some directory, it's a specific directory which was already mentioned in lines 52 and 54. By the way, in those lines it can be "create a directory".
  3. "created" - we are talking about arguments, so nothing was created yet. "to create" or "to be created" can be used if necessary.

Get file from HDFS and check status of process with
subprocess.Popen.

:param fname: path of a downloaded file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Path to the file (to download/to be downloaded)". Apart from the past tense in function arguments (see comment to line 57, point 3), it is possible to read this as "path to the local copy of the downloaded file" - which is incorrect, if I understand correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants