Key Ideas in IO: Free-Text Searching and Natural Language
=========================================================

In a variety of digital platforms and systems, indicating _aboutness_ can be important or desirable. In social media, users can apply any term they like to describe topicality or _aboutness_. In social media, there are no "rules" which is one things that makes it fun and allows for creativity. 

Social media, however, does not easily permit retrieval. This is because social media relies on the text of resources, and it does not have rules for how mechanisms like hashtags are applied. Searching, as presented in this OER, can yield results with reasonably good precision, but recall will probably be terrible. 

**Free-Text Searching**
-----------------------

One problem with queries in social media is that the queries are not structured. This makes them easy to input (e.g., have you ever tried searching [Flickr](https://www.flickr.com/)?), but difficult to generate high-quality, comprehensive results. Queries in Twitter, for example, are free-text searches. According to the ODLIS (which will always focus on perspectives relating to the information professions), [free-text search](https://products.abc-clio.com/ODLIS/odlis_f.aspx#freetextsearch) means the following:

> A [search](https://products.abc-clio.com/ODLIS/odlis_s.aspx#search) of a [bibliographic database](https://products.abc-clio.com/ODLIS/odlis_b.aspx#bibdatabase) in which [natural language](https://products.abc-clio.com/ODLIS/odlis_n.aspx#naturallang) words and [phrases](https://products.abc-clio.com/ODLIS/odlis_p.aspx#phrase) appearing in the [text](https://products.abc-clio.com/ODLIS/odlis_t.aspx#text) of the [documents](https://products.abc-clio.com/ODLIS/odlis_d.aspx#document) [indexed](https://products.abc-clio.com/ODLIS/odlis_i.aspx#indexing), or in their [bibliographic descriptions](https://products.abc-clio.com/ODLIS/odlis_b.aspx#bibdescrip), are used as [search terms](https://products.abc-clio.com/ODLIS/odlis_s.aspx#searchterm), rather than terms selected from a list of [controlled vocabulary](https://products.abc-clio.com/ODLIS/odlis_c.aspx#controlled) (authorized [subject headings](https://products.abc-clio.com/ODLIS/odlis_s.aspx#subjectheading) or [descriptors](https://products.abc-clio.com/ODLIS/odlis_d.aspx#descriptor)). Compare with [full-text search](https://products.abc-clio.com/ODLIS/odlis_f.aspx#fulltextsearch). **_See also_**: [keyword(s)](https://products.abc-clio.com/ODLIS/odlis_jk.aspx#keywords).

This means that when users search Twitter for the hashtag from a conference, they get results from everyone using that hashtag, relatively few of whom might be tweeting about the conference. This is because anyone can use a hashtag and apply it to any tweet on any topic, making for a lot of confusion and difficulty with narrowing down results. **This is a problem of lack of disambiguation.**

Users can also search the [full text](https://products.abc-clio.com/ODLIS/odlis_f.aspx#fulltextsearch) of tweets in Twitter. What does this mean? Essentially, it means that users can not (only) search a surrogate for the information resource, but the text of the words composing the resource. Open the link to the definition in the ODLIS to read more about [full text search](https://products.abc-clio.com/ODLIS/odlis_f.aspx#fulltextsearch). 

Again, given the problem of a lot of words meaning the same thing (i.e., being synonymous), users searching the full text of tweets may not know which terms to input into the query. It can be nearly impossible to guess which words a person sending a tweet might include, which typos they might make, how they are using language. Even if the person searching Twitter is able to guess a word that was used and how it was spelled, it might be impossible to have identified the way the word was being applied throughout all of the twitterverse. This is a problem of a single word meaning many different things. 

#### ![](https://missouri.instructure.com/courses/10640/files/7506582/download)  
**Self-Study Interlude**

Take three minutes to search Twitter ([https://twitter.com/](https://twitter.com/)) using Twitter's search box (see Figure 1). 

**Figure 1 Twitter's ([https://twitter.com/](https://twitter.com/)) Search Box**

[![twitter-search.png](https://missouri.instructure.com/courses/49361/files/8633239/preview)](https://twitter.com/)

Then, compare how that goes with "Exploring" using hashtags (see Figure 2). 

**Figure 2 Twitter's ([https://twitter.com/](https://twitter.com/)) Explore Functionality**

[![twitter-explore-circle.png](https://missouri.instructure.com/courses/49361/files/8633248/preview)](https://twitter.com/)

In terms of measures of retrieval success, what can you say about precision and recall in the search environment? For both experiences, what do you notice about the challenges? How easy and satisfying is this process? 

**Natural Language Queries**
----------------------------

In short, the language that authors employ spontaneously is powerful, unique, and allows for creativity. It can also be incredibly difficult to search if a system is not designed to handle it.

From the ODLIS, the definition of [natural language](https://products.abc-clio.com/ODLIS/odlis_n.aspx#naturallang) is the following: 

> A human [language](https://products.abc-clio.com/ODLIS/odlis_l.aspx#language) in which the structure and rules have evolved from [usage](https://products.abc-clio.com/ODLIS/odlis_u.aspx#usage), usually over an extended period time, as opposed to an [artificial language](https://products.abc-clio.com/ODLIS/odlis_a.aspx#artificiallang) based on rules prescribed prior to its development and use, as in a computer language. In [search software](https://products.abc-clio.com/ODLIS/odlis_s.aspx#searchsoftware) designed to handle [input](https://products.abc-clio.com/ODLIS/odlis_i.aspx#input) expressed in natural language, the user may enter the [query](https://products.abc-clio.com/ODLIS/odlis_q.aspx#query) in the same form in which it would be spoken or written ("Where can I find information about Frederick Douglass?" as opposed to the [search statement](https://products.abc-clio.com/ODLIS/odlis_s.aspx#searchstatement) "frederick douglass" or "su:douglass"). [_Ask_](http://www.ask.com/) is an example of a natural-language [Internet](https://products.abc-clio.com/ODLIS/odlis_i.aspx#internet) [search engine](https://products.abc-clio.com/ODLIS/odlis_s.aspx#searchengine). Compare with [controlled vocabulary](https://products.abc-clio.com/ODLIS/odlis_c.aspx#controlled).

#### ![](https://missouri.instructure.com/courses/10640/files/7506582/download)  
**Self-Study Interlude**

Have you ever used Google ([https://www.google.com/](https://www.google.com/))? Have you ever thought about how you use web search engines, versus what you hope to retrieve? Do you input searches using natural language? Take a moment to try the Frederick Douglass searches mentioned in the definition above: 

*   the natural language approach, where you type a question into the search interface that is exactly the same as you would ask a human interlocutor;
*   the full-text approach, where you input words you think are in the full text of the document, and hope you are correct.

Why are the results different? Which set of results is better? How are these methods potentially different from searching [ERIC](https://eric.ed.gov/)?

**_Free-Text_ Searching Versus _Full-Text_ Databases?**
-------------------------------------------------------

“Free text” is a way of searching—most library catalogs allow you to do free text searches of the surrogate record, but of course the info in the surrogate is limited to what the cataloger supplied.

“Full text” is the content in the system that is searchable—so not just the surrogate, but the entire text of the document (being described) can be searched by the end-user.

#### **Next**

_The next page presents an alternative to natural language searching: controlled vocabularies._