Bit of a different way to filter out non-sentences from get_average_sentence_length #2

jmcrey · 2018-12-18T01:13:21Z

def get_average_sentence_length(text):
    #standardize the sentence endings, to facilitate turning them into strings
    remove_exclamations = text.replace("!", ".")
    remove_questions = remove_exclamations.replace("?", ".")
    
    #splits each sentence into a string
    text_in_strings = remove_questions.split(".")
    
    #count the number of sentences in each statement, subtracting one as the final period counts as an extra sentence
    num_sentences = len(text_in_strings)
    num_sentences = (num_sentences-1)
    
    #split the sentences into words
    words_in_strings = text.split(" ")
    num_words = len(words_in_strings)
    
    #find the average number of words per sentence
    average_words_per_sentence = num_words / num_sentences
    print(average_words_per_sentence)
    return average_words_per_sentence

Great job with this function! It returns the exact output we want and it corrects for the bug in the code where there is a trailing space at the end of the sentence. However, the correction we have (always subtracting one from our sentence length) does not handle two cases: 1) if there are multiple sentences that have a trailing space at the end, and 2) if there is a sentence that does not have a trailing space. In both of those cases, our function will no longer calculate the correct length -- it will either overcompensate or under calculate.

So, I just wanted to show a slightly different way to achieve the same goal:

stripped = [sentence for sentence in text_in_strings if sentence.strip()]
num_sentences = len(stripped)

Okay, let's break this down. So, the line [sentence for sentence in text_in_strings if sentence.strip()] can be rewritten like so:

stripped = list()
for sentence in text_in_string:
    if sentence.strip():
        stripped.append(sentence)

The way I have written it above is called list comprehension -- it is simply a way to write a for loop in a single line to generate a list (or, really, any iterable). There are a few benefits to using list comprehension; but, for all intents and purposes, it is simply a single-line for loop.

Basically, all the list comprehension is doing is getting rid of any string that does not contain characters. It does this by combining an if statement with strip(). Note that strip() will remove any extra white space at the beginning and end of any string; thus, if the string only has white space, then strip() will return None. As such, we can use an if statement to test if sentence really has characters.

An advantage of implementing it this way is because it will catch any sentence that has a trailing white space. Also, this method allows us to further improve our implemention by combining the len function with the list comprehension:

num_sentences = len([sentence for sentence in text_in_strings if sentence.strip()])

In any case, for the purposes of this project, the original implementation is perfectly fine! In fact, it is absolutely perfect for our use case. This is merely a suggestion and an introduction to list comprehension.

P.S. If you want to learn more about list comprehension, I recommend reading this article.

The text was updated successfully, but these errors were encountered:

jmcrey mentioned this issue Dec 18, 2018

Summary #3

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bit of a different way to filter out non-sentences from get_average_sentence_length #2

Bit of a different way to filter out non-sentences from get_average_sentence_length #2

jmcrey commented Dec 18, 2018

Bit of a different way to filter out non-sentences from get_average_sentence_length #2

Bit of a different way to filter out non-sentences from get_average_sentence_length #2

Comments

jmcrey commented Dec 18, 2018