Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added unique naming for output files #22

Merged
merged 9 commits into from
May 6, 2020
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -128,10 +128,15 @@ dmypy.json
# Pyre type checker
.pyre/


# output by script, except the example files
*.docx
!example.docx
!example.csv


# vim files
*.swp

# vscode files
.vscode
16 changes: 12 additions & 4 deletions csv2docx/cli.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
import click
import csv2docx.csv2docx as c2d
from . import csv2docx


@click.command()
Expand All @@ -14,6 +14,14 @@
@click.option(
'--delimiter', '-d',
default=";",
help='delimiter used in your csv. Default is \';\'')
def main(data, template, delimiter):
c2d.convert(data, template, delimiter)
help='Delimiter used in your csv. Default is \';\'')
@click.option(
'--name', '-n',
required=True,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think making --name required is a reasonable approach, though would argue we need to allow flexibility of use. I imagine the scenario of generating un-named tickets, numbered flyers, etc - in which case the name of the files does not need to have a relation to the data. I'll expand on this in a separate issue.

Copy link
Owner

@davidverweij davidverweij May 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to add, some columns might be incomplete - which might provide a result the user is not expected. We can have a discussion in the issue #25 on what approach might make more sense.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point about incomplete cells 👍

help='Naming scheme for output files.')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can expand on the help by adding:
Specific column name to be used in the naming scheme for output files. '

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also - is this case sensitive?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment, yes. Let's discuss in #25

@click.option(
'--path', '-p',
default="output",
help='The location to store the files.')
def main(data, template, name, path, delimiter):
csv2docx.convert(data, template, name, path, delimiter)
52 changes: 44 additions & 8 deletions csv2docx/csv2docx.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,47 @@
import csv
from mailmerge import MailMerge
def convert(data, template, delimiter=";"):
print ("Getting .docx template and .csv data files ...")
from pathlib import Path


def create_output_folder(path: str) -> Path:
"""Creates a path to store output data if it does not exists.
Args:
path: the path from user in any format (relative, absolute, etc.)
Returns:
A path to store output data.
"""
path = Path(path)
if not path.exists():
path.mkdir(parents=True, exist_ok=True)
return path


def create_unique_name(filename: str, path: Path) -> Path:
"""Creates an unique filename for specified path.
Args:
filename: the name of file to create
path: the path where the file is stored
Returns:
An absolute path with directory.
"""
filename = filename.strip() + ".docx"
filepath = path / filename
if filepath.exists():
# Count available files with same name
counter = len(list(path.glob(f"{filepath.stem}*docx"))) + 1
filepath = f"{path}/{filepath.stem}_{counter}.docx"
return filepath


def convert(data, template, name, path="output", delimiter=";"):
print("Getting .docx template and .csv data files ...")

with open(data, 'rt') as csvfile:
csvdict = csv.DictReader(csvfile, delimiter=delimiter)
csv_headers = csvdict.fieldnames

if (name not in csv_headers):
print("Column name not found. Please enter valid column name")
exit()
docx = MailMerge(template)
docx_mergefields = docx.get_merge_fields()

Expand All @@ -16,15 +51,16 @@ def convert(data, template, delimiter=";"):
# see if all fields are accounted for in the .csv header
column_in_data = set(docx_mergefields) - set(csv_headers)
if len(column_in_data) > 0:
print (f"{column_in_data} is in the word document, but not csv.")
print(f"{column_in_data} is in the word document, but not csv.")
return

print("All fields are present in your csv. Generating Word docs ...")
path = create_output_folder(path)

for counter, row in enumerate(csvdict):
for row in csvdict:
# Must create a new MailMerge for each file
docx = MailMerge(template)
single_document = {key : row[key] for key in docx_mergefields}
single_document = {key: row[key] for key in docx_mergefields}
docx.merge_templates([single_document], separator='page_break')
# TODO: write to user-defined subfolder
docx.write(f"{counter}.docx")
filename = create_unique_name(row[name], path)
docx.write(filename)