Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unpushed updates #36

Merged
merged 1 commit into from
Mar 18, 2023
Merged

Unpushed updates #36

merged 1 commit into from
Mar 18, 2023

Conversation

646e62
Copy link
Owner

@646e62 646e62 commented Mar 18, 2023

No description provided.

@646e62 646e62 merged commit 6b8250c into main Mar 18, 2023
@646e62 646e62 deleted the analytic-functions branch March 18, 2023 00:09
Copy link
Collaborator

@bbelderbos bbelderbos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff!


for test in legal_tests:
if test["origins"]["citation"] in case_citations:
return [test]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if this if does not hit? then the function returns None right? so that means you probably need to update your return type hint. Are you running mypy to catch this?

for key, value in firac.items():
if key == "heading":
continue
else:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could skip this else after a continue

sentences = list(doc.sents)

new_file_path = file_path.split(".")
del new_file_path[-1]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to delete this element? what about file_path.split(".")[:-1] like you did below?


new_file_path = file_path.split(".")
del new_file_path[-1]
new_file_path.append("sent.txt")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or just replacing the last item? new_file_path[-1] = "sent.txt"?


with open(new_file_path, "w", encoding="utf-8") as file:
for sentence in sentences:
file.write(sentence.text + "\n")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could also use .writelines()

# Each FIRAC key contains a list of sentences. Go through each list and
# remove \n characters. Then, join the sentences into a single string.
for key in firac:
firac[key] = " ".join([sentence.replace("\n", " ") for sentence in firac[key]])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can drop the inner [], join() is "lazy" :)


# Check to see if a file copy exists. If not, create one.
if not os.path.exists(file_path):
with open(file_path, "w", encoding="utf-8") as file:
if not os.path.exists(new_file_path):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I often prefer an early return so you could do

if os.path.exists(new_file_path):
    print and return


# Write a local copy of the text file
file_name = os.path.basename(file)
file_name = os.path.basename(file_path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok for now, but pathlib is nicer :)

if total_tokens < min_length:
percentage = 1
elif total_tokens > max_length:
percentage = max_length / total_tokens
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the if on L118 discards a zero division exception here right?

text: str,
percentage: float = 0.2,
min_length: int = 500,
max_length: int = 1500,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put those "magic numbers" in constants at the top of the module

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants