File tree 1 file changed +2
-31
lines changed
1 file changed +2
-31
lines changed Original file line number Diff line number Diff line change 1
1
'''
2
- Text pre-processing module with functions:
3
-
4
- - convert_html_entities
5
- - returns string with converted character references to unicode characters
6
- - convert_ligatures
7
- - returns string with converted ligature character references to unicode characters
8
- - correct_spelling
9
- - returns spelling corrected string
10
- - create_sentence_list
11
- - returns list of sentences
12
- - keyword_tokenize
13
- - returns string with only non-stopword terms of a word length greater than 3
14
- - lowercase
15
- - returns string in lowercase format
16
- - preprocess_text
17
- - returns string with an order of preprocessing functions applied to it
18
- - remove_esc_chars
19
- - returns string stripped of escape characters
20
- - remove_numbers
21
- - returns string stripped of numbers represented as integer or float values
22
- - remove_number_words
23
- - returns string stripped of numbers represented as words (one, two, three, etc.)
24
- - remove_time_words
25
- - returns string stripped of words associated to time (day, week, month, etc.)
26
- - remove_unbound_punct
27
- - returns string stripped of punctuation unattached to a non-whitespace character
28
- - remove_urls
29
- - returns string stripped of URLs
30
- - remove_whitespace
31
- - returns string stripped of whitespace
2
+ Text pre-processing module:
32
3
'''
33
4
34
5
@@ -350,7 +321,7 @@ def remove_whitespace(text_string):
350
321
351
322
Exceptions raised:
352
323
353
- -InputErrorL occurs should a string or NoneType not be passed as an argument
324
+ - InputError: occurs should a string or NoneType not be passed as an argument
354
325
'''
355
326
if text_string is None or text_string == "" :
356
327
return ""
You can’t perform that action at this time.
0 commit comments