## Choosing Congressional Candidates

We're going to do some work with congressional candidates this semester. One thing we'll need is some work building up the websites (and other covariates) for the candidates. Let's divide and conquer.

In [1]:
# best practice is to do all imports at the beginning. 
from random import sample, choices, seed
from collections import Counter

file_list_of_districts = "district_list.txt"
output_file_name = "district_assignments.txt"

class_members = ["alex","anna","tony","john",
                 "kailey","kaixuan","michelle",
                 "natalie","patrick","bobby",
                 "thomas","will"]

Let's read this into a list. Note the use of `next` in the cell below--let's talk about what it does. 

In [2]:
district_list = list()
state_list = list()

with open(file_list_of_districts) as f :
    next(f)
    for item in f :
        district_list.append(item.strip()) # Question: what does strip do?
        state_list.append(item[:2])        # Question: what is happening with "item[:2]"?

In [7]:
state_list

['LA',
 'NC',
 'AL',
 'CA',
 'GA',
 'MI',
 'NV',
 'TX',
 'TX',
 'NE',
 'IN',
 'PA',
 'KY',
 'CA',
 'TX',
 'CA',
 'OH',
 'CA',
 'CA',
 'MI',
 'VA',
 'AZ',
 'FL',
 'MI',
 'UT',
 'GA',
 'TN',
 'TN',
 'IA',
 'OR',
 'DE',
 'OR',
 'GU',
 'IL',
 'PA',
 'TX',
 'PA',
 'VA',
 'OK',
 'AL',
 'IN',
 'MD',
 'CA',
 'FL',
 'CO',
 'IN',
 'NC',
 'TX',
 'IL',
 'NC',
 'AL',
 'CA',
 'MA',
 'CA',
 'CA',
 'IN',
 'GA',
 'TX',
 'PA',
 'FL',
 'TX',
 'OH',
 'UT',
 'WY',
 'CA',
 'RI',
 'MA',
 'NY',
 'MO',
 'MO',
 'SC',
 'CO',
 'TN',
 'OK',
 'NY',
 'GA',
 'KY',
 'VA',
 'TX',
 'VA',
 'MI',
 'CA',
 'TN',
 'CA',
 'CA',
 'PA',
 'CT',
 'ND',
 'AR',
 'FL',
 'NY',
 'TX',
 'TX',
 'MD',
 'FL',
 'OH',
 'IL',
 'IL',
 'CA',
 'OR',
 'CO',
 'MD',
 'CT',
 'WA',
 'FL',
 'CA',
 'PA',
 'FL',
 'CA',
 'TN',
 'FL',
 'FL',
 'MI',
 'TX',
 'NY',
 'PA',
 'WI',
 'SC',
 'TN',
 'FL',
 'MN',
 'MN',
 'NY',
 'CA',
 'NY',
 'CT',
 'PA',
 'TX',
 'NY',
 'GA',
 'PA',
 'TN',
 'TX',
 'NE',
 'IL',
 'NC',
 'FL',
 'AZ',
 'NJ',
 'OH',
 'HI',
 'FL',
 'WI',

The `Counter` collection gives us an easy way to sum up items in a dictionary. 

In [8]:
state_count = Counter(state_list)

In [9]:
state_count.most_common(10)

[('CA', 53),
 ('TX', 36),
 ('FL', 27),
 ('NY', 27),
 ('PA', 18),
 ('IL', 18),
 ('OH', 16),
 ('GA', 14),
 ('MI', 14),
 ('NC', 13)]

Now, we want to give everyone the same number (roughly) of representatives. Let's figure out how many there are. Replace the ?? with code to get the answer.

In [12]:
num_students = len(class_members)
num_districts = len(district_list)

num_per_member = num_districts/num_students

In [14]:
2*num_per_member

73.5

Now that we've got that, let's look at what `sample` does. 

In [16]:
sorted(sample(k=round(num_per_member),population=district_list))

['AL02',
 'AL06',
 'AS00',
 'AZ04',
 'CA20',
 'CA43',
 'CA49',
 'CO07',
 'CT03',
 'FL05',
 'FL23',
 'FL24',
 'GA02',
 'GA08',
 'GU00',
 'HI01',
 'IA03',
 'IN02',
 'KY01',
 'KY03',
 'LA04',
 'MD02',
 'NC06',
 'NJ12',
 'NV04',
 'NY24',
 'OH01',
 'PA08',
 'PA11',
 'PA14',
 'SC02',
 'TN01',
 'TN07',
 'TN09',
 'TX05',
 'TX06',
 'WI01']

---

Now we'd like to allocate people to districts. Your goal is to come up with a list of people that you can associate with the districts so everyone knows which districts they have to pull. Here are some hints: 

* A dictionary is a good way to create pairs of objects. Note that the key has to be unique. 
* The function `choice` does sampling _with_ replacement.

In [17]:
?choices

In [18]:
choices(k=num_districts,population=class_members)

['john',
 'john',
 'kailey',
 'kailey',
 'kaixuan',
 'kailey',
 'will',
 'thomas',
 'john',
 'bobby',
 'kailey',
 'alex',
 'michelle',
 'bobby',
 'john',
 'kaixuan',
 'tony',
 'tony',
 'anna',
 'bobby',
 'tony',
 'kaixuan',
 'tony',
 'natalie',
 'will',
 'tony',
 'kaixuan',
 'will',
 'tony',
 'bobby',
 'michelle',
 'natalie',
 'kaixuan',
 'tony',
 'anna',
 'thomas',
 'michelle',
 'tony',
 'patrick',
 'anna',
 'alex',
 'thomas',
 'john',
 'kailey',
 'patrick',
 'john',
 'kailey',
 'patrick',
 'bobby',
 'michelle',
 'will',
 'will',
 'john',
 'patrick',
 'kailey',
 'natalie',
 'kaixuan',
 'alex',
 'kailey',
 'patrick',
 'kailey',
 'will',
 'will',
 'patrick',
 'patrick',
 'natalie',
 'patrick',
 'tony',
 'tony',
 'will',
 'bobby',
 'natalie',
 'michelle',
 'john',
 'alex',
 'kaixuan',
 'patrick',
 'alex',
 'michelle',
 'tony',
 'michelle',
 'alex',
 'tony',
 'kailey',
 'anna',
 'thomas',
 'tony',
 'anna',
 'bobby',
 'anna',
 'kailey',
 'alex',
 'john',
 'anna',
 'alex',
 'tony',
 'natali

In [19]:
# Here's a place for you to work. 

num_districts = len(district_list)
district_allocation = dict()

# This next step is the tricky one. See if you can figure 
# out why I'm doing it and why it works.
class_member_list = choices(k=num_districts,population=class_members)

for idx, dist in enumerate(district_list) :
    district_allocation[dist] = class_member_list[idx]
    # Why does dist *have* to be the key of this dictionary instead of using person?


In [20]:
district_allocation

{'AK00': 'natalie',
 'AL01': 'patrick',
 'AL02': 'bobby',
 'AL03': 'alex',
 'AL04': 'tony',
 'AL05': 'natalie',
 'AL06': 'michelle',
 'AL07': 'patrick',
 'AR01': 'natalie',
 'AR02': 'will',
 'AR03': 'thomas',
 'AR04': 'will',
 'AS00': 'thomas',
 'AZ01': 'kailey',
 'AZ02': 'kaixuan',
 'AZ03': 'alex',
 'AZ04': 'natalie',
 'AZ05': 'will',
 'AZ06': 'alex',
 'AZ07': 'bobby',
 'AZ08': 'tony',
 'AZ09': 'alex',
 'CA01': 'anna',
 'CA02': 'will',
 'CA03': 'kailey',
 'CA04': 'anna',
 'CA05': 'tony',
 'CA06': 'thomas',
 'CA07': 'bobby',
 'CA08': 'patrick',
 'CA09': 'alex',
 'CA10': 'bobby',
 'CA11': 'patrick',
 'CA12': 'will',
 'CA13': 'alex',
 'CA14': 'alex',
 'CA15': 'natalie',
 'CA16': 'thomas',
 'CA17': 'tony',
 'CA18': 'will',
 'CA19': 'kailey',
 'CA20': 'natalie',
 'CA21': 'natalie',
 'CA22': 'natalie',
 'CA23': 'kaixuan',
 'CA24': 'alex',
 'CA25': 'alex',
 'CA26': 'will',
 'CA27': 'kailey',
 'CA28': 'will',
 'CA29': 'kailey',
 'CA30': 'thomas',
 'CA31': 'tony',
 'CA32': 'thomas',
 'CA33': '

Once you get to this point, you write out your list for practice, though we'll use mine as the final assignments.

In [22]:
# cell to write out your list.

with open("district_allocation.txt",'w') as outfile :
    for dist, person in district_allocation.items() :
        outfile.write("\t".join([dist,person])+"\n")
