Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added unique naming for output files #22

Merged
merged 9 commits into from
May 6, 2020
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -128,10 +128,12 @@ dmypy.json
# Pyre type checker
.pyre/


# output by script, except the example files
*.docx
!example.docx
!example.csv


# vim files
*.swp
14 changes: 11 additions & 3 deletions csv2docx/cli.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
import click
import csv2docx.csv2docx as c2d
from . import csv2docx


@click.command()
Expand All @@ -15,5 +15,13 @@
'--delimiter', '-d',
default=";",
help='delimiter used in your csv. Default is \';\'')
def main(data, template, delimiter):
c2d.convert(data, template, delimiter)
@click.option(
'--name', '-n',
required=True,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think making --name required is a reasonable approach, though would argue we need to allow flexibility of use. I imagine the scenario of generating un-named tickets, numbered flyers, etc - in which case the name of the files does not need to have a relation to the data. I'll expand on this in a separate issue.

Copy link
Owner

@davidverweij davidverweij May 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to add, some columns might be incomplete - which might provide a result the user is not expected. We can have a discussion in the issue #25 on what approach might make more sense.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point about incomplete cells 👍

help='naming scheme for output files.')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you capitalise naming -> Naming?

@click.option(
'--path', '-p',
default="output",
help='The location to store the files.')
def main(data, template, delimiter, name, path):
csv2docx.convert(data, template, delimiter, name, path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposed change below requires this to be reordered as:

data, template, path, delimiter, name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have put it in order like:
data, template, custom_name, path, delimiter
Because we made custom_name required that doesn't have any default argument
and all non-default argument comes first.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another good point: name should be renamed to custom_name to mirror the library.

44 changes: 39 additions & 5 deletions csv2docx/csv2docx.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,44 @@
import csv
from mailmerge import MailMerge
def convert(data, template, delimiter=";"):
from pathlib import Path

def create_path_if_not_exists(path: str):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to annotate the return type, e.g. adding pathlib.Path:

def create_path_if_not_exists(path: str) -> pathlib.Path:

Copy link
Collaborator

@jawrainey jawrainey May 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also rename create_path_if_not_exists to: create_output_folder

"""Creates a path to store output data if it does not exists.
Args:
path: the path from user in any format (relative, absolute, etc.)
Returns:
A path to store output data.
"""
path = Path(path)
if not path.exists():
path.mkdir(parents=True, exist_ok=True)
return path

def generate_name(filename, path):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to add type annotations to both the paramters and return type, e.g.

def generate_name(str: filename, pathlib.Path: path) -> pathlib.Path:

Copy link
Collaborator

@jawrainey jawrainey May 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename generate_name to create_unique_name

"""Checks whether file with same name exists or not
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update this text (Checks whether ...) to: Creates a filename if it does not exist in the specified path.

Copy link
Contributor Author

@salmannotkhan salmannotkhan May 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would be more clear I guess
"Creates an unique file name for specified path"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that sounds good 👍

Args:
filename: the file we want to check
path: the directory we want to check in
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the docstring args to say:

filename: the name of the file to create
path: the path where the file is stored

Returns:
An unique full path with directory.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update the doc of 'Returns' to read:

  • The absolute path to the filename.

"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good documentation -- well done

filename += ".docx"
checkpath = path / filename
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we rename checkpath to filepath because this method returns the path of the file to create

if checkpath.exists():
# Count available files with same name
counter = len(list(path.glob(checkpath.stem + "*docx" ))) + 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also update this to use an f-string instead, e.g.

 counter = len(list(path.glob(f"{checkpath.stem}*docx"))) + 1

return f"{path}/{checkpath.stem}_{counter}.docx"
return checkpath
Copy link
Collaborator

@jawrainey jawrainey May 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very tidy and excellent solution, especially the use of both glob and pathlib.

I wonder if instead of using two return statements that you could assign checkpath to the f-string in line 30? e.g.

    if checkpath.exists():
        # Count available files with same name
        counter = len(list(path.glob(checkpath.stem + "*docx" ))) + 1
        checkpath = f"{path}/{checkpath.stem}_{counter}.docx"
    return checkpath

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and other code review suggestions should be in a separate commit alongside the flake8 changes.


def convert(data, template, delimiter=";", custom_name=None, path="output"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

custom_name is required so should not have a default argument of None.

Once you update this, the program will not run as custom_name must come before delimiter as it's a required argument.

Update the CLI method to reflect these changes

print ("Getting .docx template and .csv data files ...")

with open(data, 'rt') as csvfile:
csvdict = csv.DictReader(csvfile, delimiter=delimiter)
csv_headers = csvdict.fieldnames

if (custom_name != None and custom_name not in csv_headers):
Copy link
Collaborator

@jawrainey jawrainey May 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@salmannotkhan - This is E711 as identified by flake8. You can write custom_name instead of custom_name != None

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now that we made custom_name required, we don't need that condition.

Copy link
Collaborator

@jawrainey jawrainey May 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point -- the IF statement needs to stay of course due to checking if the argument exists in the csv 👍

print("Column name not found. Please enter valid column name")
exit()
docx = MailMerge(template)
docx_mergefields = docx.get_merge_fields()

Expand All @@ -20,11 +52,13 @@ def convert(data, template, delimiter=";"):
return

print("All fields are present in your csv. Generating Word docs ...")
path = create_path_if_not_exists(path)

for counter, row in enumerate(csvdict):
for row in csvdict:
# Must create a new MailMerge for each file
docx = MailMerge(template)
single_document = {key : row[key] for key in docx_mergefields}
docx.merge_templates([single_document], separator='page_break')
# TODO: write to user-defined subfolder
docx.write(f"{counter}.docx")
# Striping every name to remove extra spaces
Copy link
Collaborator

@jawrainey jawrainey May 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove this comment (# Striping every ... ) as it's left from previous commits 👍

filename = generate_name(row[custom_name].strip(), path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call .strip() inside generate_name as imho it's cleaner that way since we can more easily understand which params are being sent

docx.write(filename)