From 3884d0623153bdfdfdc95d87ff4f89b07d9d5d47 Mon Sep 17 00:00:00 2001 From: "@aphillips" Date: Sat, 20 Feb 2016 15:19:49 +0530 Subject: [PATCH] Issue #65: introduction/problem statement - Addressed @kleinsin's comment on wording related to regex. - Expanded section slightly to provide a better introduction to the next sections. --- index.html | 19 ++++++++++++------- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/index.html b/index.html index e3cdaba..29e2488 100644 --- a/index.html +++ b/index.html @@ -462,13 +462,17 @@

The String Matching Problem

The Web is primarily made up of document formats and protocols based on character data. These formats or protocols can be viewed as a set of text files (resources) that include some form - of structural markup or syntactic content. Processing such syntactic content or document data requires - string-based operations such as matching, indexing, searching, sorting, - regular expression matching, and so forth. As a result, the Web is - sensitive to the different ways in which text might be represented in a - document. Failing to consider the different ways in which the same text - can be represented can confuse users or cause unexpected or frustrating - results.

+ of structural markup or syntactic content. Processing such syntactic content or document data requires + string-based operations such as matching (including regular expressions), indexing, searching, sorting, + and so forth.

+

Users, particularly implementers, sometimes have naïve expectations regarding the matching or non-matching + of similar strings or of the efficacy of different transformations they might apply to text, particularly to + syntactic content, but including many types of text processing on the Web.

+

Because fundamentally the Web is sensitive to the different ways in which text might be represented in a + document, failing to consider the different ways in which the same text can be represented can confuse + users or cause unexpected or frustrating results. In the sections below, this document examines the different + types of text variation that affect both user perception of text on the Web and the string processing on which + the Web relies.

Case Folding

Some scripts and writing systems make a distinction between UPPER, @@ -1825,6 +1829,7 @@

Considerations for Matching Natural Language Content

What about the character "A" followed by U+0300 (a combining accent grave)? What about writing systems, such as Devanagari, which use combining marks to suppress or express certain vowels?

+

Issue #78: Point out that the presence or absence of Arabic/Hebrew short vowels can interefere with searching.