Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added unique naming for output files #22

Merged
merged 9 commits into from
May 6, 2020
Merged

Added unique naming for output files #22

merged 9 commits into from
May 6, 2020

Conversation

salmannotkhan
Copy link
Contributor

used @jawrainey create output directory function using pathlib
using same pathlib to generate unique names now based on file existance in directory
now -n name option is compulsary
just bit more cleaned code using docstring comments thanks @jawrainey for suggestion

now can use custom naming using -n  parameter
-n [column_name]
Renamed "csv2docx" to "src"
now code looks clean
Using @jawrainey's pathlib solution for checking and creating directory
Changed logic to name files now it is based on availability of file
Now file name is required
Overall Cleanup
Merge branch 'master' of github.com:salmannotkhan/csv2docx
now using relative import in cli
@salmannotkhan
Copy link
Contributor Author

salmannotkhan commented May 5, 2020

Sorry for messy commits. I didn't knew how to clean up commits :)

@jawrainey
Copy link
Collaborator

jawrainey commented May 5, 2020

Please address my comments in the code review and the following issues identified by flake8, then we can discuss it further:

Install flake8 locally (DO NOT COMMIT THIS!!):

poetry add --dev flake8

Use the python virtual environment

poetry shell

Run flake8 from the command line:

flake8

You'll see the following issues identified:

./csv2docx/csv2docx.py:5:1: E302 expected 2 blank lines, found 1
./csv2docx/csv2docx.py:17:1: E302 expected 2 blank lines, found 1
./csv2docx/csv2docx.py:29:62: E202 whitespace before ')'
./csv2docx/csv2docx.py:33:1: E302 expected 2 blank lines, found 1
./csv2docx/csv2docx.py:34:10: E211 whitespace before '('
./csv2docx/csv2docx.py:39:25: E711 comparison to None should be 'if cond is not None:'
./csv2docx/csv2docx.py:40:17: E117 over-indented
./csv2docx/csv2docx.py:51:18: E211 whitespace before '('
./csv2docx/csv2docx.py:60:35: E203 whitespace before ':'
./csv2docx/csv2docx.py:64:33: W292 no newline at end of file
./csv2docx/cli.py:27:1: W191 indentation contains tabs
./csv2docx/cli.py:27:1: E101 indentation contains mixed spaces and tabs
./csv2docx/cli.py:27:2: E117 over-indented

Rather than me write an in-depth code review, the above are all important concerns that you should address, especially E711.

Create one more commit addressing the above and give it the message "Fixed Flake8 issues."

print ("Getting .docx template and .csv data files ...")

with open(data, 'rt') as csvfile:
csvdict = csv.DictReader(csvfile, delimiter=delimiter)
csv_headers = csvdict.fieldnames

if (custom_name != None and custom_name not in csv_headers):
Copy link
Collaborator

@jawrainey jawrainey May 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@salmannotkhan - This is E711 as identified by flake8. You can write custom_name instead of custom_name != None

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now that we made custom_name required, we don't need that condition.

Copy link
Collaborator

@jawrainey jawrainey May 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point -- the IF statement needs to stay of course due to checking if the argument exists in the csv 👍

Comment on lines 18 to 24
"""Checks whether file with same name exists or not
Args:
filename: the file we want to check
path: the directory we want to check in
Returns:
An unique full path with directory.
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good documentation -- well done

Comment on lines 25 to 31
filename += ".docx"
checkpath = path / filename
if checkpath.exists():
# Count available files with same name
counter = len(list(path.glob(checkpath.stem + "*docx" ))) + 1
return f"{path}/{checkpath.stem}_{counter}.docx"
return checkpath
Copy link
Collaborator

@jawrainey jawrainey May 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very tidy and excellent solution, especially the use of both glob and pathlib.

I wonder if instead of using two return statements that you could assign checkpath to the f-string in line 30? e.g.

    if checkpath.exists():
        # Count available files with same name
        counter = len(list(path.glob(checkpath.stem + "*docx" ))) + 1
        checkpath = f"{path}/{checkpath.stem}_{counter}.docx"
    return checkpath

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and other code review suggestions should be in a separate commit alongside the flake8 changes.

csv2docx/cli.py Outdated
@click.option(
'--name', '-n',
required=True,
help='naming scheme for output files.')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you capitalise naming -> Naming?

@jawrainey
Copy link
Collaborator

jawrainey commented May 5, 2020

Couple of points for future PRs (we should probably make a standard guideline for PRs), but generally you should:

  1. State what you changed and (most importantly) why;
  2. Explain how we (the reviewers) can test and review your code. I want step-by-step instructions so anyone reading this PR can also test it. In this case, I had to run convert with -n and without -n to ensure it worked in both conditions. Also to change the output parameter.
  3. Reference any other issues or PRs that this PR addresses.

It's recommended to use titles/subtitles to document these 3 areas. Worth referring to this in future PRs as it makes code reviewing easier 👍

def convert(data, template, delimiter=";"):
from pathlib import Path

def create_path_if_not_exists(path: str):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to annotate the return type, e.g. adding pathlib.Path:

def create_path_if_not_exists(path: str) -> pathlib.Path:

path.mkdir(parents=True, exist_ok=True)
return path

def generate_name(filename, path):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to add type annotations to both the paramters and return type, e.g.

def generate_name(str: filename, pathlib.Path: path) -> pathlib.Path:

# TODO: write to user-defined subfolder
docx.write(f"{counter}.docx")
# Striping every name to remove extra spaces
filename = generate_name(row[custom_name].strip(), path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call .strip() inside generate_name as imho it's cleaner that way since we can more easily understand which params are being sent

Comment on lines 20 to 21
filename: the file we want to check
path: the directory we want to check in
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the docstring args to say:

filename: the name of the file to create
path: the path where the file is stored

An unique full path with directory.
"""
filename += ".docx"
checkpath = path / filename
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we rename checkpath to filepath because this method returns the path of the file to create

@jawrainey
Copy link
Collaborator

Once you've made the above changes I will have another review and discuss with @davidverweij.

For future reference: code changes made in a PR should not draw from other PRs, e.g. you using the absolute_path method I created in in #18. My reasoning is two-fold: (i) you're attributing your name to my commit/contribution; and (ii) if we want to accept this PR and merge it into master then #18 will have to rebase which is a pain.

You're basing this PR on multiple other issues and PRs (closes #16 #13, #18, #20) which are work in progresses and INDEPENDENT branches. By bringing together all the code from across these independent PRs you have made it: (i) more difficult to test and review; and (ii) muddied the specific issues. Each issue and PR must isolate changes to make testing easier.

Having said that, this PR is an elegant and simple solution to address #20, so from that regard, good job @salmannotkhan. For future reference: if you have an alternative solution to a PR then present it in that PR via comment or gist and explain how it addresses the issue. This will save us opening more PRs for the same issue.

This solution will also close #13 and #18 and requires that we create a new issue for Dave to implement the testing infrastructure.

@jawrainey jawrainey changed the title Improved Unique naming function Added unique naming for output files May 5, 2020
checkpath = path / filename
if checkpath.exists():
# Count available files with same name
counter = len(list(path.glob(checkpath.stem + "*docx" ))) + 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also update this to use an f-string instead, e.g.

 counter = len(list(path.glob(f"{checkpath.stem}*docx"))) + 1

path.mkdir(parents=True, exist_ok=True)
return path

def generate_name(filename, path):
Copy link
Collaborator

@jawrainey jawrainey May 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename generate_name to create_unique_name

csv2docx/cli.py Outdated
Comment on lines 26 to 27
def main(data, template, delimiter, name, path):
csv2docx.convert(data, template, delimiter, name, path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposed change below requires this to be reordered as:

data, template, path, delimiter, name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have put it in order like:
data, template, custom_name, path, delimiter
Because we made custom_name required that doesn't have any default argument
and all non-default argument comes first.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another good point: name should be renamed to custom_name to mirror the library.

def convert(data, template, delimiter=";"):
from pathlib import Path

def create_path_if_not_exists(path: str):
Copy link
Collaborator

@jawrainey jawrainey May 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also rename create_path_if_not_exists to: create_output_folder

return f"{path}/{checkpath.stem}_{counter}.docx"
return checkpath

def convert(data, template, delimiter=";", custom_name=None, path="output"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

custom_name is required so should not have a default argument of None.

Once you update this, the program will not run as custom_name must come before delimiter as it's a required argument.

Update the CLI method to reflect these changes

@salmannotkhan
Copy link
Contributor Author

Thank you for all the suggestion and sorry for all that mess. if you don't know this is my first contribution to a project that's why i don't know things about PR and repo management

@salmannotkhan
Copy link
Contributor Author

Please address my comments in the code review and the following issues identified by flake8, then we can discuss it further:

Install flake8 locally (DO NOT COMMIT THIS!!):

poetry add --dev flake8

Use the python virtual environment

poetry shell

Run flake8 from the command line:

flake8

How can I ignore this in commit?

@jawrainey
Copy link
Collaborator

Thank you for all the suggestion and sorry for all that mess. if you don't know this is my first contribution to a project that's why i don't know things about PR and repo management

No worries. It's great that you're contributing and learning as Dave and I have setup this project both because we would like to use the library but also as a learning experience 👍

How can I ignore this in commit?

Make sure that you do not commit the changes to the pyproject.toml/poetry.lock files once you add the dependencies. If you use the terminal with git then checkout those files or do it through a visual git editor (e.g. Github Desktop) if that's what you use.

If you make the above changes I can review them this evening (am in GMT+1 timezone).

and other improvment from review by @jawrainey
Ignoring vscode config folder
@salmannotkhan
Copy link
Contributor Author

I'm done with PR if your guys think of any new feature let me know please

csv2docx/cli.py Outdated
Comment on lines 26 to 27
def main(data, template, name, delimiter, path):
csv2docx.convert(data, template, name, path, delimiter)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change 'main' to accept the same order of parameters as convert? e.g. switch delimiter and path

def convert(data, template, name, path="output", delimiter=";"):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I didn't noticed that

filename: the name of file to create
path: the path where the file is stored
Returns:
An unique full path with directory.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update the doc of 'Returns' to read:

  • The absolute path to the filename.



def create_unique_name(filename: str, path: Path) -> Path:
"""Checks whether file with same name exists or not
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update this text (Checks whether ...) to: Creates a filename if it does not exist in the specified path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would be more clear I guess
"Creates an unique file name for specified path"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that sounds good 👍

docx.merge_templates([single_document], separator='page_break')
# TODO: write to user-defined subfolder
docx.write(f"{counter}.docx")
# Striping every name to remove extra spaces
Copy link
Collaborator

@jawrainey jawrainey May 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove this comment (# Striping every ... ) as it's left from previous commits 👍

help='Delimiter used in your csv. Default is \';\'')
@click.option(
'--name', '-n',
required=True,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think making --name required is a reasonable approach, though would argue we need to allow flexibility of use. I imagine the scenario of generating un-named tickets, numbered flyers, etc - in which case the name of the files does not need to have a relation to the data. I'll expand on this in a separate issue.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to add, some columns might be incomplete - which might provide a result the user is not expected. We can have a discussion in the issue #25 on what approach might make more sense.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point about incomplete cells 👍

@click.option(
'--name', '-n',
required=True,
help='Naming scheme for output files.')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can expand on the help by adding:
Specific column name to be used in the naming scheme for output files. '

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also - is this case sensitive?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment, yes. Let's discuss in #25

Copy link
Owner

@davidverweij davidverweij left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely done @salmannotkhan, I've added a few comments - but these are more related to future Issues. I'll merge these into the master - after which our focus will be to set up unittesting, linting and automating those processes.

@davidverweij davidverweij merged commit 8c658fe into davidverweij:master May 6, 2020
@davidverweij
Copy link
Owner

I forgot to mention, I updated the README to reflect the changes before merging this PR, see 202fdf9.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants