# Randomness and Sampling

Whether modeling a real-world example like rolling dice or tossing a coin, or randomly selecting a subset of a population for a survey, the process of selecting a choice from a list of options at random is very useful! This chapter will focus on the foundations of random selection, including how to implement a random choice in Python, and extend this random sampling to DataFrames.

Additionally, we introduce control statements, which allow us to select or iterate a statement or process.

We'll ultilize these ideas to *simulate* experiments, or imitate real-world examples using code.

To start, suppose we have a list of choices that are equally likely to occur and we want to choose one.

In Python, the function –

```python
random.choice([…])
```

– will output exactly one item from the input sequence, selecting from it randomly.

NumPy provides this function as well, and we'll use that here instead.

To illustrate, suppose we toss a coin and want to know the outcome. Since we expect a random output of heads or tails, we create a list titled `coin` with those options and call the `random.choice` function on `coin` to give us exactly one result from the list.

But first – we import `numpy`.

In [1]:
import numpy as np

coin = ['heads', 'tails']

flip = np.random.choice(coin)

flip

'heads'

The random choice function does not have a fixed output and running it multiple times will eventually produce a different result.

In fact, if we want to run this experiment more than once it might be useful to keep track of the results in an array. 

## Investigation with Arrays

We can store information in arrays as seen in Chapter (chapter number here). Below we create an array that contans our first experiment result – our first flip – and append our next outcome.

In [2]:
first = np.array([flip])

first

array(['heads'], dtype='<U5')

Let's toss the coin again, that is choosing randomly from our list of possible outcomes.

In [3]:
flip_2 = np.random.choice(coin)

flip_2

'tails'

And now let's add this outcome to our list `first`, using the `append` method.

This method takes as input an array and elements to be appended, and returns a new array whose elements are those of the input array extended by the input elements.

In [4]:
np.append(first, flip_2)

array(['heads', 'tails'], dtype='<U5')

Since elements are added to a copy of the input array, `append` does not change the input array. For example, `first` gives us the original array with one coin flip result.

In [5]:
first

array(['heads'], dtype='<U5')

To remedy this, we must use assignment. We can either rename our new list <i>first </i> or assign it a new name entirely.

In [6]:
first = np.append(first,'tails')

first

array(['heads', 'tails'], dtype='<U5')

Appending items to lists and arrays is an important tool when working in Python. However, the random choice function allows an additional argument that corresponds to repeating the experiment, with results returned as an array. In fact we can repeat the coin flip experiment as many times as we want. Here we repeat 7 times:

In [7]:
outcomes = np.random.choice(coin, 7)

outcomes

array(['tails', 'heads', 'heads', 'tails', 'heads', 'heads', 'tails'],
      dtype='<U5')

Since our experiment is small, we can easily count how many 'heads' or 'tails' we have. If we have a large experiment it might be tedious, even erroneous, to count by hand. We can instead find all instances where our `outcomes` array is equal to "heads" and sum over these instances.


To do this, we use the expression `outcomes == 'heads'`, which performs an element-wise comparison of the array `outcomes` with the string `'heads'`, and returns an array of Booleans reflecting that comparision. That is, each element in `outcomes` is compared to `'heads'` and the truth value of each comparison is returned in an array.

In [8]:
outcomes == 'heads'

array([False,  True,  True, False,  True,  True, False])

To count the number of "heads" in our array of `outcomes`, we can just sum over the above Boolean array.

For arithmatic operations, `True` is treated as `1` and `False` is treated as `0`; and so, the sum of a Boolean array counts all instances of `True`, and disregards all instances of `False`, giving exactly the number of "heads" in this example.

In [9]:
sum(outcomes == 'heads')

4

We can do the same with "tails":

In [10]:
sum(outcomes == 'tails')

3

Since summing over a Boolean array counts all instances of `True`, another way to count the number of "tails" is with the expression `outcomes != 'heads'`, and summing over this instead.

Here the Boolean array we create contains instances of `True` wherever our coin flip landed on "tails", and `False` when "heads" was the result of our flip.

In [11]:
outcomes != 'heads'

array([ True, False, False,  True, False, False,  True])

Summing over this array also counts the number of tails!

In [12]:
sum(outcomes != 'heads')

3