## 11.1 Linear search (again)

[Section&nbsp;4.4](../04_Iteration/04_4_search.ipynb#4.4-Linear-search) introduced linear search as 'an algorithm
that goes systematically through the sequence and checks each element'.
This section provides a more general treatment.

### 11.1.1 Basic search

Linear search is a special case of exhaustive search
in which the collection of candidates is given:
generating candidates is simply iterating over the collection.
Like for any generate-and-test algorithm, the test part of linear search
may involve one or more candidates. For example, to find the best solution,
the test involves comparing two candidates:
the current one and the best candidate found so far.

At this point you may wish to skim again the algorithmic patterns in
[Section&nbsp;5.2](../05_TMA01-1/05_2_algorithms.ipynb#5.2.1-Linear-search) for finding
all solutions, some solutions and the best solution.

Linear searches can be done on any sequence, bag, set or map of candidates.
A linear search over a stack or a priority queue destroys it,
because we must remove items one by one to iterate over the collection.
That's not an issue if the input collection isn't needed after the search.
However, in general it's best to avoid modifying the input and so
we'd need to search a copy of the input.

#### Exercise 11.1.1

We can do a linear search over a queue or deque without making a copy of
the input collection, but not over a priority queue. Why?

_Write your answer here._

[Hint](../31_Hints/Hints_11_1_01.ipynb)
[Answer](../32_Answers/Answers_11_1_01.ipynb)

#### Exercise 11.1.2

Linear search has that name because it always has linear complexity in
the worst case. True or false?

_Write your answer here._

[Answer](../32_Answers/Answers_11_1_02.ipynb)

#### Exercise 11.1.3

Here's an algorithmic pattern for finding all solutions.
It's formulated in more general terms than the corresponding one in Section&nbsp;5.2.

1. let _solutions_ be an empty collection
2. for each _candidate_ in _candidates_:
   1. if _candidate_ satisfies all search conditions:
      1. add _candidate_ to _solutions_

A pattern abstracts several similar algorithms.
Here, step&nbsp;1 doesn't indicate the type of the output collection,
step&nbsp;2 doesn't specify how each candidate is obtained,
step&nbsp;2.1 doesn't detail the test, and step&nbsp;2.1.1 doesn't say which operation is used to add a candidate to the solutions.

Make one or more steps more detailed, to obtain a pattern for
linear searches that output solutions in the order they're found.

[Hint](../31_Hints/Hints_11_1_03.ipynb)
[Answer](../32_Answers/Answers_11_1_03.ipynb)

#### Exercise 11.1.4

Here's the algorithmic pattern once more.

1. let _solutions_ be an empty collection
2. for each _candidate_ in _candidates_:
   1. if _candidate_ satisfies all search conditions:
      1. add _candidate_ to _solutions_

Modify one or more steps to get a pattern for linear searches
that don't output repeated solutions.
Solutions don't have to be in the order they're found.

[Hint](../31_Hints/Hints_11_1_04.ipynb)
[Answer](../32_Answers/Answers_11_1_04.ipynb)

#### Exercise 11.1.5

For this exercise, assume the collection of candidates isn't empty.
The next pattern finds _one_ best solution.

1. let _best_ be the first of _candidates_
2. for each _candidate_ in _candidates_:
   1. if _candidate_ is better than _best_:
      1. let _best_ be _candidate_

Modify or add steps to obtain a pattern for a linear search that finds
_all_ equally best solutions. You can write '... is as good as ...'
to check if two candidates are equally good.
The equally best solutions may be in any order and may be duplicated,
so don't specify the type of the output collection.

[Hint](../31_Hints/Hints_11_1_05.ipynb)
[Answer](../32_Answers/Answers_11_1_05.ipynb)

### 11.1.2 Simultaneous and successive searches

You may remember the decision problem of checking if a string represents a
[valid password](../04_Iteration/04_4_search.ipynb#4.4.2-Valid-password). It was solved with
two simultaneous linear searches over the candidates (the string's characters):
a search for a lowercase letter and a search for a digit.
Each search uses a Boolean to record whether it was successful.
When both Booleans become true, the simultaneous search stops:
the string is a valid password.

Simultaneous linear searches allow us to check if the input collection,
rather than an individual item, satisfies the conditions. In this example,
no character can be a lowercase letter and a digit.
The algorithmic pattern for simultaneous linear searches is also in
[Section&nbsp;5.2](../05_TMA01-1/05_2_algorithms.ipynb#5.2.1-Linear-search).

We can also search for each condition separately instead of simultaneously,
i.e. do separate searches for a lowercase letter and for a digit.
Doing one linear search for each condition is less efficient than
doing a single pass over the input collection, but has other advantages.
First, each linear search becomes simpler.
Second, the searches can be allocated to different CPUs and executed in
parallel.
The time waiting for a result may not be much longer than for a single pass.
Third, separate linear search functions for general conditions can be reused,
making future search problems easier to solve.

If we design each linear search so that the input and output collections,
i.e. _candidates_ and _solutions_ in the above pattern, are of the same type,
then we can find the candidates that satisfy all conditions with
successive separate searches, in which the solutions of one search are
the candidates of the next search.
This approach views searching as **filtering** the candidate collection:
the conditions filter away those candidates that don't satisfy them,
while the other candidates 'pass through' to the solutions collection.

<div class="alert alert-info">
<strong>Info:</strong> TM112 Block&nbsp;2 Sections 2.1.3 and 2.3 introduce filtering and searching,
respectively, and their connection. The algorithmic patterns in both sections
are less general versions of the linear search patterns in M269.
</div>

For example, to find all white t-shirts costing less than £20 in a store,
we can use three filters, i.e. three separate general linear searches:
find all products of a given colour, find all products of a given kind, and
find all products below a given price. Each filter takes and produces
a collection of products, so they can be applied one after the other.

The order of the filters doesn't matter to obtain a correct result but
we should apply them so that they **prune** the search space quickly
for subsequent filters to go through as few candidates as possible.
For example, a store is likely to have many more cheap products and
white products than t-shirts, so searching first for t-shirts will lead to
a small collection to search for white products costing less than £20.

Let's assume the filters are implemented with functions
named 'colour', 'kind' and 'price'.
Besides the input collection of products they have a string or integer
argument to indicate which colour, kind or price to filter for.
The successive filtering algorithm for the above example is:

1. let _shirts_ be kind(_store_, 't-shirt')
2. let _white shirts_ be colour(_shirts_, 'white')
3. let _cheap white shirts_ be price(_white shirts_, 20)

Note how the output collection of a linear search is the input of the next one.

<div class="alert alert-warning">
<strong>Note:</strong> The order in which you test conditions doesn't matter for correctness but
may have a great impact on efficiency.
</div>

### 11.1.3 Sorted candidates

Consider again finding all white t-shirts under £20 in a store,
using the basic linear search algorithm pattern at the start of this section,
which checks each candidate against all conditions.
What are the best- and worst-case scenarios?
(Their complexities may be the same.)

___

The best case, when the linear search does the least work,
is for step&nbsp;2.1.1 of the pattern (adding the current candidate to the solutions)
to never execute. This happens when _no_ candidate satisfies the conditions,
i.e. the store has no white t-shirts under £20.
In the worst case, step&nbsp;2.1.1 is always executed.
This happens when _all_ candidates satisfy the conditions,
i.e. the store has nothing but white t-shirts under £20.

In both cases, the search goes through all candidates:
it can't stop early, as there might be more solutions ahead.
However, if the candidates are comparable, we can sort them
to know when no further solutions are possible.
For example, if the products are sorted by ascending price, then
as soon as the current candidate costs more than £20, we can stop searching
because any remaining candidates cost even more and hence won't be solutions.

#### Exercise 11.1.6

What are the best- and worst-case scenarios for finding all white t-shirts
under £20, if the candidates (store products) are in ascending price order?

_Write your answer here._

[Answer](../32_Answers/Answers_11_1_06.ipynb)

Best and worst cases rarely happen with real data, but
sorting can nevertheless reduce the average run-time.
We must sort and search the candidate collection so that the solutions
(if there are any) appear early. For example, to find white products,
we can sort products by ascending or descending colour names.
If the order is ascending, then 'white' will appear towards the end of the
sequence, so we must search backwards from the highest to the lowest index.
If the order is descending, then white products appear towards the start of the
sequence and we must search from lowest to highest index.
For both orders, we're searching colours from 'z' to 'a' and stop as soon as
the colour comes alphabetically before 'white', e.g. 'violet'.

Sorting can also help find the best candidate if we can sort the candidates
by the criterion that is being optimised.
For example, if we want to find the cheapest white t-shirt, and the products
are in ascending price order, then we can stop as soon as we find a white t-shirt,
as it must be the cheapest of them all.
If candidates were unsorted, we'd always have to search the whole collection.

We can combine sorting and successive filtering.
If we want all cheapest white t-shirts,
we can first filter by kind and second by colour, then sort by price,
and finally select all products that have the same price as the first one.
Why is it more efficient to filter before sorting than the other way around?

___

We're only interested in the prices of white t-shirts,
of which there are only a few compared to all products in the store.
Sorting all products by price before filtering them would be a waste of time,
especially if sorting takes quadratic time.

Since sorting takes longer than searching, even for the example given it's best
to avoid it: a linear search for all cheapest products
after filtering for white t-shirts will do the job in linear time.

However, if the same optimisation criterion is used over and over again, like
finding the cheapest black shoes, the cheapest blue dresses, etc., then
it's worth sorting all products by that criterion before any searches.
The aim is again to amortise the cost of an operation
over multiple other operations.
Previously we amortised the cost of copying an array over the cost of appending
items; here we amortise the cost of sorting over the cost of linear searches.

#### Optional exercises

The Kattis Guide includes some [linear search problems](https://mwermelinger.github.io/kattis-guide/exhaustive.html#linear-search).

⟵ [Previous section](11-introduction.ipynb) | [Up](11-introduction.ipynb) | [Next section](11_2_factorisation.ipynb) ⟶