# Steelers, Cowboys, and Bears
## Using Sets to Explore American Football Teams
### How Many Players Have Been on Each Team?

In [None]:
%matplotlib inline
# Python magic to display in the web browser

In [None]:
import pickle
from matplotlib_venn import venn2_circles, venn2, venn3
import matplotlib.pyplot as plt

In [None]:
with open("./football.pickle","rb") as f0:
    teams = pickle.load(f0)

#### ``teams`` is a [dictionary](https://docs.python.org/3/tutorial/datastructures.html#dictionaries) with

* **keys** equal to name of football teams
* **values** equal to a [**list**](https://docs.python.org/3.4/tutorial/introduction.html#lists) of all the players that have played on those teams

In [None]:
type(teams)

In [None]:
teams.keys()

In [None]:
type(teams['steelers'])

### Let's look at the list of Steelers players

### Size of a collection

Every Python collection has a size (length) associated with it. This is a measure of how many objects are in the collection. This is accessed through the **``len()``** function. (Remember our definition of a function: it takes something in (in this case a collection) and returns something (in this case the length of the collection).

In [None]:
print(len(teams))
print(len(teams['steelers']))

### We can create a **set** from the **list** of Steelers players

#### Python has a ``set()`` function that takes another type of collection (e.g. a list) and creates a set

In [None]:
steelers_set = set(teams['steelers'])
len(steelers_set)

### Why are the sizes (lengths) of the two sets different?

* Lists do not have to have unique elements, but sets do (definition of a set). There are two names that two sets of players shared.

#### Python Aside finding the duplicate players

We're focusing on Python [**sets**](https://docs.python.org/3.4/tutorial/datastructures.html#sets), but here is a way to find the names using a Python [**Counter**](https://docs.python.org/3/library/collections.html#collections.Counter). Quite simply, Counters count things. They can then return the most commonly occuring items that they counted.

In [None]:
from collections import Counter

steelers_count = Counter(teams['steelers'])
steelers_count.most_common(10)

#### So there have been two Ralph Wenzel (who would have thought!) and two Mike Adams (not so surprising). Or perhaps they were on the team at two seperate times.

### Now let's make sets for the other two teams

In [None]:
bears_set = set(teams['bears'])
cowboys_set = set(teams['cowboys'])

len(bears_set),len(cowboys_set)

### Let's look at the sets with a Venn diagram

In [None]:
venn3([steelers_set, cowboys_set, bears_set],
      ("Steelers","Cowboys","Bears"))

### Anything Seem Suspicious About These Data?

* The [Chicago Bears](https://en.wikipedia.org/wiki/Chicago_Bears) are a very old football team, dating back to 1919. Do we really believe that there have been eight times as many [Steelers](https://en.wikipedia.org/wiki/Pittsburgh_Steelers) (founded 1933) and over four times as many [Cowboys](https://en.wikipedia.org/wiki/Dallas_Cowboys) (founded 1960)?
    * Maybe something problematic with our Wikipedia data? 
    * Check against another source?

#### Which Steelers have also been Cowboys or Bears?

This question involves 

* three sets (Steelers, Cowboys, and Bears)
* two set operations
    * "have also been" $\rightarrow$ **AND** $\rightarrow$ **INTERSECTION**
    * "or" $\rightarrow$ **UNION**

##### We can write this out in set notation

$ \text{Steelers} \cap (\text{Cowboys} \cup \text{Bears})$

In [None]:
steelers_on_other_teams = steelers_set.intersection(
                               bears_set.union(cowboys_set))
print("There have been %d Steelers that have played on other teams."\
      %len(steelers_on_other_teams))
print("These Steelers are")
print(steelers_on_other_teams)


####  What players have played on all three teams?

This question involves three sets (Steelers, Cowboys, Bears) and two Set operations: **Intersection** and **Intersection.**

##### Writing this in set notation

$ \text{Steelers} \cap \text{Cowboys} \cap \text{Bears}$


In [None]:
steelers_set.intersection(bears_set.intersection(cowboys_set))

#### Note that this is Consistent with the Venn diagram above