-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unpushed updates #36
Unpushed updates #36
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good stuff!
|
||
for test in legal_tests: | ||
if test["origins"]["citation"] in case_citations: | ||
return [test] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if this if does not hit? then the function returns None right? so that means you probably need to update your return type hint. Are you running mypy to catch this?
for key, value in firac.items(): | ||
if key == "heading": | ||
continue | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could skip this else after a continue
sentences = list(doc.sents) | ||
|
||
new_file_path = file_path.split(".") | ||
del new_file_path[-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to delete this element? what about file_path.split(".")[:-1]
like you did below?
|
||
new_file_path = file_path.split(".") | ||
del new_file_path[-1] | ||
new_file_path.append("sent.txt") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or just replacing the last item? new_file_path[-1] = "sent.txt"
?
|
||
with open(new_file_path, "w", encoding="utf-8") as file: | ||
for sentence in sentences: | ||
file.write(sentence.text + "\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could also use .writelines()
# Each FIRAC key contains a list of sentences. Go through each list and | ||
# remove \n characters. Then, join the sentences into a single string. | ||
for key in firac: | ||
firac[key] = " ".join([sentence.replace("\n", " ") for sentence in firac[key]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can drop the inner []
, join() is "lazy" :)
|
||
# Check to see if a file copy exists. If not, create one. | ||
if not os.path.exists(file_path): | ||
with open(file_path, "w", encoding="utf-8") as file: | ||
if not os.path.exists(new_file_path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I often prefer an early return so you could do
if os.path.exists(new_file_path):
print and return
|
||
# Write a local copy of the text file | ||
file_name = os.path.basename(file) | ||
file_name = os.path.basename(file_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok for now, but pathlib is nicer :)
if total_tokens < min_length: | ||
percentage = 1 | ||
elif total_tokens > max_length: | ||
percentage = max_length / total_tokens |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the if on L118 discards a zero division exception here right?
text: str, | ||
percentage: float = 0.2, | ||
min_length: int = 500, | ||
max_length: int = 1500, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would put those "magic numbers" in constants at the top of the module
No description provided.