# Filtering by Average ASCII Value

If you've got docs that include human-readable and less human readible text, and you want to quickly strip out the less readable parts, this is a neat trick. Every letter has an ASCII value (which you can read more about at http://www.asciitable.com/). By calling ord(X) on the letter, you can return the ASCII value. 

![image.png](attachment:image.png)

In [1]:
text1 = "WARNING 8375" #Example of the kind of thing we'll filter
text2 = "You sly fool, you" #Example of easily human-readable text, which we want to keep

In [2]:
def AsciiVal(text):
    count=0
    for i in text:
        count += ord(i)
    return count/(len(text))

In [3]:
AsciiVal(text1) #Low average ASCII value, likely predominantly uppercase and symbols

65.08333333333333

In [4]:
AsciiVal(text2) #Higher average ASCII value, likely mixed case narrative text

93.05882352941177

Pretty simple, right? Now let's use it

In [63]:
# Setting up some imports to get toy data. Not necessary for using this function
from nltk.tokenize import sent_tokenize
from boilerpy3 import extractors
extractor = extractors.ArticleExtractor()

In [64]:
content = extractor.get_content_from_url('https://www.bbc.com/news/world-middle-east-56615521')
content = content.replace("\n", " ").replace("\\", "")
contentList = sent_tokenize(content)

In [65]:
#This is the function I'd figure out how to apply against my body of text
def triageText(body, minLen=15, minVal=71.5):
    output = set()
    #You'll want to split this based on your input data type. Can you split on newlines? Do you need NLTK's sentence tokenizer?
    for sent in body:
        if len(sent) >= minLen:
            if AsciiVal(sent) >= minVal:
                try:
                    output.add(sent)
                except:
                    continue
    return "\n".join(output) # Newline delimiting the output allows you to slice on it later, without having to run NLTK

In [66]:
triageText(contentList)

'Additional reporting by Soha Ibrahim, BBC Arabic Related Topics\nNext month Marwa Elselehdar will be taking her final exam to attain a full rank of captain, and hopes she can continue to be a role model for women in the industry.\n"This fake article was in English so it spread in other countries," says Ms Elselehdar.\n"It was challenging to go through this alone and be able to overcome it without affecting my mental health."\nAt the time, she was the youngest and first female Egyptian captain to cross the waterway.\nThough the academy only accepted men at the time, she applied anyway and was granted permission to join after a legal review by Egypt\'s then-President Hosni Mubarak.\n"My message to females who want to be in the maritime field is fight for what you love and not let any negativity to affect you," says Marwa.\n"Onboard, they were all older men with different mentalities, so it was difficult not to be able to find like-minded people to communicate with," she says.\nBut she s