Sentence Normalizer

Derek Jones edited this page Jul 5, 2012 · 3 revisions
Clone this wiki locally

This is a simple function for normalizing sentences. Features include: 1- Removes duplicated question marks, exclamations and periods 2- Capitalize first letter of a sentence. 3- Split sentences not only with "." but also with "?" and "!" 4- Puts a white space at the end of each sentence 5- Retains newlines

--removed from orginal function-- undestand the meaning of "¡" and "¿" in languages like spanish. undestand the htmlentitity version of this simbols. --removed from orginal function--

Credit: A modified sentenceNormalizer by gregomm

The original script can be found here:

The Codeigniter Form Validation class allows for custom validation functions. Learn more about this in Codeigniter's documentation on Form Validation: To implement the Sentence Normalizer function, place the following function in your Controller which is using the Form Validation class.

function sentence_normalizer($str) {
        $str = preg_replace(array('/[!]+/','/[?]+/','/[.]+/'),
        $textbad = preg_split("/(\!|\.|\?|\n)/", $str,-1,PREG_SPLIT_DELIM_CAPTURE);
        $newtext = array();
        $count = sizeof($textbad);
        foreach($textbad as $key => $string) {
            if (!empty($string)) {
                $text = trim($string, ' ');
                $size = strlen($text);
                if ($size > 1){     
                    $newtext[] = ucfirst(strtolower($text));
                    elseif ($size == 1) {
                        $newtext[] = ($text == "\n") ? $text : $text . ' ';
        return implode($newtext);

Then, place the name of the callback when setting your validation rules (callback_sentence_normalizer). For example:

$this->form_validation->set_rules('about', 'About', 'required|xss_clean|prep_for_form|strip_tags|callback_sentence_normalizer');

The above code sets validation rules for a form field named "about." Validation rules include several standard Codeigniter form validation rules. Note "callback_sentence_normalizer." This calls the sentence normalizer function.

That's it!