Apply + Lambda
---

I want to briefly show you a decent idiom for doing more complicated work on a `Series` object.

This is a contrived example, but it shows the utility of `apply` + `lambda`. What if we wanted wanted to figure out if all letters A-Z are in the names of the states? First, we could create a `set` of characters in each state's name:

In [None]:
# don't use the names of states an the index!
states = pd.read_csv("states.csv")

def set_of_chars(s):
    return set(list(s.lower()))

series_of_sets = states.State.apply(lambda s: set_of_chars(s))
series_of_sets

Reminder: Lambdas
---

Reminder, a _lambda_ constructs an ephemeral unnamed function. This is opposed to the named function `set_of_chars` above. The point is the `apply` method takes a function. We could have done the following:

```
series_of_sets = states.State.apply(lambda s: set(list(s.lower())))
```

Or, simply:

```
series_of_sets = states.State.apply(set_of_chars)
```

Getting Back to the Problem
---

Now we have a `Series` of `set`s each containing the unique characters contained in each state's name. Next, we need to combine all of these sets into a single one!

- First, an example of combining sets

In [None]:
a = {1, 2, 3}
b = {2, 4}
a.union(b)

Now, we are going to __reduce__ the `Series` of `set`s by taking the union of each entry. If done step by step:

```python
_tmp = <zeroth_set>.union(<first_set>)
_tmp = _tmp.union(<second_set>)
_tmp = _tmp.union(<third_set>)
...
_tmp = _tmp.union(<final_set>)
```

Imagine if we had a million rows! Luckily, Python includes functions for this! It is called `reduce` and comes from the `functools` package.
All we need to do is provide a function which combines two elements and it will recursively apply the function until there is only one value.
Try the cell below:

In [None]:
from functools import reduce
chars_used_in_states_name = reduce(lambda x, y: x.union(y), series_of_sets)
chars_used_in_states_name

Lastly, we need to remove any non-alphanumeric characters

- `ascii_lowercase` from `string` is simply a string of all of the characters
    - We can test if something is part of this set by using the `in` function, try the cell below:

In [None]:
from string import ascii_lowercase
print(" " in ascii_lowercase) # Should print `False`
print("a" in ascii_lowercase) # Should print `True`

- We can use a set comprehension to filter the non-ascii characters

In [None]:
chars_used_in_states_name = {x for x in chars_used_in_states_name if x in ascii_lowercase}
chars_used_in_states_name

- Now we can answer our question!

Are all of the characters used in the states names?

In [None]:
alphabet_set = set(list(ascii_lowercase))
alphabet_set.difference(chars_used_in_states_name)

The concepts of reductions and anonymous functions can be very useful when doing data analysis! Many times you can use comprehensions to do something similar, but I personally enjoy the `reduce` style. No tasks for this section. I would suggest prodding the above code to make sure you understand it!