-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added unique naming for output files #22
Conversation
now can use custom naming using -n parameter -n [column_name]
Renamed "csv2docx" to "src" now code looks clean
Using @jawrainey's pathlib solution for checking and creating directory Changed logic to name files now it is based on availability of file Now file name is required Overall Cleanup
Merge branch 'master' of github.com:salmannotkhan/csv2docx
now using relative import in cli
Sorry for messy commits. I didn't knew how to clean up commits :) |
Please address my comments in the code review and the following issues identified by Install flake8 locally (DO NOT COMMIT THIS!!):
Use the python virtual environment
Run flake8 from the command line:
You'll see the following issues identified:
Rather than me write an in-depth code review, the above are all important concerns that you should address, especially E711. Create one more commit addressing the above and give it the message "Fixed Flake8 issues." |
csv2docx/csv2docx.py
Outdated
print ("Getting .docx template and .csv data files ...") | ||
|
||
with open(data, 'rt') as csvfile: | ||
csvdict = csv.DictReader(csvfile, delimiter=delimiter) | ||
csv_headers = csvdict.fieldnames | ||
|
||
if (custom_name != None and custom_name not in csv_headers): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@salmannotkhan - This is E711 as identified by flake8. You can write custom_name
instead of custom_name != None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now that we made custom_name required, we don't need that condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great point -- the IF statement needs to stay of course due to checking if the argument exists in the csv 👍
csv2docx/csv2docx.py
Outdated
"""Checks whether file with same name exists or not | ||
Args: | ||
filename: the file we want to check | ||
path: the directory we want to check in | ||
Returns: | ||
An unique full path with directory. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good documentation -- well done
csv2docx/csv2docx.py
Outdated
filename += ".docx" | ||
checkpath = path / filename | ||
if checkpath.exists(): | ||
# Count available files with same name | ||
counter = len(list(path.glob(checkpath.stem + "*docx" ))) + 1 | ||
return f"{path}/{checkpath.stem}_{counter}.docx" | ||
return checkpath |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very tidy and excellent solution, especially the use of both glob
and pathlib
.
I wonder if instead of using two return statements that you could assign checkpath to the f-string in line 30? e.g.
if checkpath.exists():
# Count available files with same name
counter = len(list(path.glob(checkpath.stem + "*docx" ))) + 1
checkpath = f"{path}/{checkpath.stem}_{counter}.docx"
return checkpath
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and other code review suggestions should be in a separate commit alongside the flake8 changes.
csv2docx/cli.py
Outdated
@click.option( | ||
'--name', '-n', | ||
required=True, | ||
help='naming scheme for output files.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you capitalise naming -> Naming?
Couple of points for future PRs (we should probably make a standard guideline for PRs), but generally you should:
It's recommended to use titles/subtitles to document these 3 areas. Worth referring to this in future PRs as it makes code reviewing easier 👍 |
csv2docx/csv2docx.py
Outdated
def convert(data, template, delimiter=";"): | ||
from pathlib import Path | ||
|
||
def create_path_if_not_exists(path: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to annotate the return type, e.g. adding pathlib.Path
:
def create_path_if_not_exists(path: str) -> pathlib.Path:
csv2docx/csv2docx.py
Outdated
path.mkdir(parents=True, exist_ok=True) | ||
return path | ||
|
||
def generate_name(filename, path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to add type annotations to both the paramters and return type, e.g.
def generate_name(str: filename, pathlib.Path: path) -> pathlib.Path:
csv2docx/csv2docx.py
Outdated
# TODO: write to user-defined subfolder | ||
docx.write(f"{counter}.docx") | ||
# Striping every name to remove extra spaces | ||
filename = generate_name(row[custom_name].strip(), path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call .strip()
inside generate_name
as imho it's cleaner that way since we can more easily understand which params are being sent
csv2docx/csv2docx.py
Outdated
filename: the file we want to check | ||
path: the directory we want to check in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update the docstring args to say:
filename: the name of the file to create
path: the path where the file is stored
csv2docx/csv2docx.py
Outdated
An unique full path with directory. | ||
""" | ||
filename += ".docx" | ||
checkpath = path / filename |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we rename checkpath
to filepath
because this method returns the path of the file to create
Once you've made the above changes I will have another review and discuss with @davidverweij. For future reference: code changes made in a PR should not draw from other PRs, e.g. you using the You're basing this PR on multiple other issues and PRs (closes #16 #13, #18, #20) which are work in progresses and INDEPENDENT branches. By bringing together all the code from across these independent PRs you have made it: (i) more difficult to test and review; and (ii) muddied the specific issues. Each issue and PR must isolate changes to make testing easier. Having said that, this PR is an elegant and simple solution to address #20, so from that regard, good job @salmannotkhan. For future reference: if you have an alternative solution to a PR then present it in that PR via comment or gist and explain how it addresses the issue. This will save us opening more PRs for the same issue. This solution will also close #13 and #18 and requires that we create a new issue for Dave to implement the testing infrastructure. |
csv2docx/csv2docx.py
Outdated
checkpath = path / filename | ||
if checkpath.exists(): | ||
# Count available files with same name | ||
counter = len(list(path.glob(checkpath.stem + "*docx" ))) + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also update this to use an f-string instead, e.g.
counter = len(list(path.glob(f"{checkpath.stem}*docx"))) + 1
csv2docx/csv2docx.py
Outdated
path.mkdir(parents=True, exist_ok=True) | ||
return path | ||
|
||
def generate_name(filename, path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename generate_name
to create_unique_name
csv2docx/cli.py
Outdated
def main(data, template, delimiter, name, path): | ||
csv2docx.convert(data, template, delimiter, name, path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proposed change below requires this to be reordered as:
data, template, path, delimiter, name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have put it in order like:
data, template, custom_name, path, delimiter
Because we made custom_name required that doesn't have any default argument
and all non-default argument comes first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another good point: name
should be renamed to custom_name
to mirror the library.
csv2docx/csv2docx.py
Outdated
def convert(data, template, delimiter=";"): | ||
from pathlib import Path | ||
|
||
def create_path_if_not_exists(path: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also rename create_path_if_not_exists
to: create_output_folder
csv2docx/csv2docx.py
Outdated
return f"{path}/{checkpath.stem}_{counter}.docx" | ||
return checkpath | ||
|
||
def convert(data, template, delimiter=";", custom_name=None, path="output"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
custom_name is required so should not have a default argument of None.
Once you update this, the program will not run as custom_name must come before delimiter as it's a required argument.
Update the CLI method to reflect these changes
Thank you for all the suggestion and sorry for all that mess. if you don't know this is my first contribution to a project that's why i don't know things about PR and repo management |
How can I ignore this in commit? |
No worries. It's great that you're contributing and learning as Dave and I have setup this project both because we would like to use the library but also as a learning experience 👍
Make sure that you do not commit the changes to the If you make the above changes I can review them this evening (am in GMT+1 timezone). |
and other improvment from review by @jawrainey Ignoring vscode config folder
I'm done with PR if your guys think of any new feature let me know please |
csv2docx/cli.py
Outdated
def main(data, template, name, delimiter, path): | ||
csv2docx.convert(data, template, name, path, delimiter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you change 'main' to accept the same order of parameters as convert
? e.g. switch delimiter and path
def convert(data, template, name, path="output", delimiter=";"):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I didn't noticed that
csv2docx/csv2docx.py
Outdated
filename: the name of file to create | ||
path: the path where the file is stored | ||
Returns: | ||
An unique full path with directory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you update the doc of 'Returns' to read:
The absolute path to the filename.
csv2docx/csv2docx.py
Outdated
|
||
|
||
def create_unique_name(filename: str, path: Path) -> Path: | ||
"""Checks whether file with same name exists or not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you update this text (Checks whether ...
) to: Creates a filename if it does not exist in the specified path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would be more clear I guess
"Creates an unique file name for specified path"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that sounds good 👍
csv2docx/csv2docx.py
Outdated
docx.merge_templates([single_document], separator='page_break') | ||
# TODO: write to user-defined subfolder | ||
docx.write(f"{counter}.docx") | ||
# Striping every name to remove extra spaces |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove this comment (# Striping every ...
) as it's left from previous commits 👍
help='Delimiter used in your csv. Default is \';\'') | ||
@click.option( | ||
'--name', '-n', | ||
required=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think making --name
required is a reasonable approach, though would argue we need to allow flexibility of use. I imagine the scenario of generating un-named tickets, numbered flyers, etc - in which case the name of the files does not need to have a relation to the data. I'll expand on this in a separate issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to add, some columns might be incomplete - which might provide a result the user is not expected. We can have a discussion in the issue #25 on what approach might make more sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great point about incomplete cells 👍
@click.option( | ||
'--name', '-n', | ||
required=True, | ||
help='Naming scheme for output files.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we can expand on the help by adding:
Specific column name to be used in the naming scheme for output files. '
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also - is this case sensitive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment, yes. Let's discuss in #25
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nicely done @salmannotkhan, I've added a few comments - but these are more related to future Issues. I'll merge these into the master - after which our focus will be to set up unittesting, linting and automating those processes.
I forgot to mention, I updated the README to reflect the changes before merging this PR, see 202fdf9. |
used @jawrainey create output directory function using pathlib
using same pathlib to generate unique names now based on file existance in directory
now -n name option is compulsary
just bit more cleaned code using docstring comments thanks @jawrainey for suggestion