Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bit of a different way to filter out non-sentences from get_average_sentence_length #2

Open
jmcrey opened this issue Dec 18, 2018 · 0 comments

Comments

@jmcrey
Copy link

jmcrey commented Dec 18, 2018

def get_average_sentence_length(text):
    #standardize the sentence endings, to facilitate turning them into strings
    remove_exclamations = text.replace("!", ".")
    remove_questions = remove_exclamations.replace("?", ".")
    
    #splits each sentence into a string
    text_in_strings = remove_questions.split(".")
    
    #count the number of sentences in each statement, subtracting one as the final period counts as an extra sentence
    num_sentences = len(text_in_strings)
    num_sentences = (num_sentences-1)
    
    #split the sentences into words
    words_in_strings = text.split(" ")
    num_words = len(words_in_strings)
    
    #find the average number of words per sentence
    average_words_per_sentence = num_words / num_sentences
    print(average_words_per_sentence)
    return average_words_per_sentence

Great job with this function! It returns the exact output we want and it corrects for the bug in the code where there is a trailing space at the end of the sentence. However, the correction we have (always subtracting one from our sentence length) does not handle two cases: 1) if there are multiple sentences that have a trailing space at the end, and 2) if there is a sentence that does not have a trailing space. In both of those cases, our function will no longer calculate the correct length -- it will either overcompensate or under calculate.

So, I just wanted to show a slightly different way to achieve the same goal:

stripped = [sentence for sentence in text_in_strings if sentence.strip()]
num_sentences = len(stripped)

Okay, let's break this down. So, the line [sentence for sentence in text_in_strings if sentence.strip()] can be rewritten like so:

stripped = list()
for sentence in text_in_string:
    if sentence.strip():
        stripped.append(sentence)

The way I have written it above is called list comprehension -- it is simply a way to write a for loop in a single line to generate a list (or, really, any iterable). There are a few benefits to using list comprehension; but, for all intents and purposes, it is simply a single-line for loop.

Basically, all the list comprehension is doing is getting rid of any string that does not contain characters. It does this by combining an if statement with strip(). Note that strip() will remove any extra white space at the beginning and end of any string; thus, if the string only has white space, then strip() will return None. As such, we can use an if statement to test if sentence really has characters.

An advantage of implementing it this way is because it will catch any sentence that has a trailing white space. Also, this method allows us to further improve our implemention by combining the len function with the list comprehension:

num_sentences = len([sentence for sentence in text_in_strings if sentence.strip()])

In any case, for the purposes of this project, the original implementation is perfectly fine! In fact, it is absolutely perfect for our use case. This is merely a suggestion and an introduction to list comprehension.

P.S. If you want to learn more about list comprehension, I recommend reading this article.

@jmcrey jmcrey mentioned this issue Dec 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant