## Part 1: The Doomsday Algorithm

The Doomsday algorithm, devised by mathematician J. H. Conway, computes the day of the week any given date fell on. The algorithm is designed to be simple enough to memorize and use for mental calculation.

__Example.__ With the algorithm, we can compute that July 4, 1776 (the day the United States declared independence from Great Britain) was a Thursday.

The algorithm is based on the fact that for any year, several dates always fall on the same day of the week, called the <em style="color:#F00">doomsday</em> for the year. These dates include 4/4, 6/6, 8/8, 10/10, and 12/12.

__Example.__ The doomsday for 2016 is Monday, so in 2016 the dates above all fell on Mondays. The doomsday for 2017 is Tuesday, so in 2017 the dates above will all fall on Tuesdays.

The doomsday algorithm has three major steps:

1. Compute the anchor day for the target century.
2. Compute the doomsday for the target year based on the anchor day.
3. Determine the day of week for the target date by counting the number of days to the nearest doomsday.

Each step is explained in detail below.

### The Anchor Day

The doomsday for the first year in a century is called the <em style="color:#F00">anchor day</em> for that century. The anchor day is needed to compute the doomsday for any other year in that century. The anchor day for a century $c$ can be computed with the formula:
$$
a = \bigl( 5 (c \bmod 4) + 2 \bigr) \bmod 7
$$
The result $a$ corresponds to a day of the week, starting with $0$ for Sunday and ending with $6$ for Saturday.

__Note.__ The modulo operation $(x \bmod y)$ finds the remainder after dividing $x$ by $y$. For instance, $12 \bmod 3 = 0$ since the remainder after dividing $12$ by $3$ is $0$. Similarly, $11 \bmod 7 = 4$, since the remainder after dividing $11$ by $7$ is $4$.

__Example.__ Suppose the target year is 1954, so the century is $c = 19$. Plugging this into the formula gives
$$a = \bigl( 5 (19 \bmod 4) + 2 \bigr) \bmod 7 = \bigl( 5(3) + 2 \bigr) \bmod 7 = 3.$$
In other words, the anchor day for 1900-1999 is Wednesday, which is also the doomsday for 1900.

__Exercise 1.1.__ Write a function that accepts a year as input and computes the anchor day for that year's century. The modulo operator `%` and functions in the `math` module may be useful. Document your function with a docstring and test your function for a few different years.  Do this in a new cell below this one.

In [1]:
def anchor_day(year): # define a function called anchor_day which takes year as input
    # this function computes the anchor day the that year's century    
    # The anchor day for a century c can be computed with the formula:
    # a=(5(c mod 4)+2) mod 7
    # The result a corresponds to a day of the week, starting with 0 for Sunday and ending with 6 for Saturday.
    return (5 * ((year//100) % 4) + 2) % 7

# test if this function is right
print (anchor_day(1954)) # 3
print (anchor_day(2017)) # 2


3
2


### The Doomsday

Once the anchor day is known, let $y$ be the last two digits of the target year. Then the doomsday for the target year can be computed with the formula:
$$d = \left(y + \left\lfloor\frac{y}{4}\right\rfloor + a\right) \bmod 7$$
The result $d$ corresponds to a day of the week.

__Note.__ The floor operation $\lfloor x \rfloor$ rounds $x$ down to the nearest integer. For instance, $\lfloor 3.1 \rfloor = 3$ and $\lfloor 3.8 \rfloor = 3$.

__Example.__ Again suppose the target year is 1954. Then the anchor day is $a = 3$, and $y = 54$, so the formula gives
$$
d = \left(54 + \left\lfloor\frac{54}{4}\right\rfloor + 3\right) \bmod 7 = (54 + 13 + 3) \bmod 7 = 0.
$$
Thus the doomsday for 1954 is Sunday.

__Exercise 1.2.__ Write a function that accepts a year as input and computes the doomsday for that year. Your function may need to call the function you wrote in exercise 1.1. Make sure to document and test your function.

In [2]:
def doomsday(year): # define a function called doomsday which takes year as input
    # this function computes the corresponding doomsday of the target year
    return (year%100 + (year%100)//4 + anchor_day(year))%7

print (doomsday(1954)) # 0 Sunday
print (doomsday(2016)) # 1 Monday
print (doomsday(2017)) # 2 Tuesday

0
1
2


### The Day of Week

The final step in the Doomsday algorithm is to count the number of days between the target date and a nearby doomsday, modulo 7. This gives the day of the week.

Every month has at least one doomsday:
* (regular years) 1/10, 2/28
* (leap years) 1/11, 2/29
* 3/21, 4/4, 5/9, 6/6, 7/11, 8/8, 9/5, 10/10, 11/7, 12/12

__Example.__ Suppose we want to find the day of the week for 7/21/1954. The doomsday for 1954 is Sunday, and a nearby doomsday is 7/11. There are 10 days in July between 7/11 and 7/21. Since $10 \bmod 7 = 3$, the date 7/21/1954 falls 3 days after a Sunday, on a Wednesday.

__Exercise 1.3.__ Write a function to determine the day of the week for a given day, month, and year. Be careful of leap years! Your function should return a string such as "Thursday" rather than a number. As usual, document and test your code.

In [3]:
def cal_day(month,day,year): # define a function to determine the day of the week for a given day, month, and year
    # month, day, year are three inputs to the function
    
    # a list of days in a week
    days = ['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday']
    
    # a list of all doomsday
    #    for regular years
    dmdays_reg = [10,28,21,4,9,6,11,8,5,10,7,12]
    #    for leap years
    dmdays_leap = [11,29,21,4,9,6,11,8,5,10,7,12]
    
    if year%4 == 0 and year%100 != 0: # check if a year is leap-yeaer
        index = ((day - dmdays_leap[month-1])%7 + doomsday(year))%7
    else:
        index = ((day - dmdays_reg[month-1])%7 + doomsday(year))%7
    
    return days[index]
    
print (cal_day(1,16,2017)) # Monday
print (cal_day(7,21,1954)) # Wednesday
print (cal_day(2,11,1992)) # Tuesday 

Monday
Wednesday
Tuesday


__Exercise 1.4.__ How many times did Friday the 13th occur in the years 1900-1999? Does this number seem to be similar to other centuries?

In [6]:
def count_day(dow, day, begin_year, end_year):
    # define a function that takes the target day of week (dow), the day of month, the year range as inputs
    # this function returns the times of this target day of week occurs on a certain day of the month in that year range
    count = 0
    
    for year in range(begin_year,end_year+1):
        for month in range(1,13):
            if cal_day(month, day, year) == dow:
                count += 1
                
    return count
                
count_day('Friday', 13, 1900, 1999)  # 172

172

__Exercise 1.5.__ How many times did Friday the 13th occur between the year 2000 and today?

In [7]:
# The times Friday occured on the 13th between the year 2000 and today can be calculated by 
#    calculating the number from 2000 to 2016 and then check if Jan 13th of 2017 is Friday
cal_day(1,13,2017) # Fridays
count_day('Friday', 13, 2000, 2016) + 1 # 31

31

## Part 2: 1978 Birthdays

__Exercise 2.1.__ The file `birthdays.txt` contains the number of births in the United States for each day in 1978. Inspect the file to determine the format. Note that columns are separated by the tab character, which can be entered in Python as `\t`. Write a function that uses iterators and list comprehensions with the string methods `split()` and `strip()` to  convert each line of data to the list format

```Python
[month, day, year, count]
```
The elements of this list should be integers, not strings. The function `read_birthdays` provided below will help you load the file.

In [8]:
def read_birthdays(file_path):
    """Read the contents of the birthdays file into a string.
    
    Arguments:
        file_path (string): The path to the birthdays file.
        
    Returns:
        string: The contents of the birthdays file.
    """
    with open(file_path) as file:
        return file.read()

__Exercise 2.2.__ Which month had the most births in 1978? Which day of the week had the most births? Which day of the week had the fewest? What conclusions can you draw? You may find the `Counter` class in the `collections` module useful.

In [9]:
#==============================================================================
# Extract the number of births in the United States for each day in 1978
#==============================================================================

# Load the file
text = read_birthdays('/Users/CJ/Dropbox/Course Winter 2017/STA 141B/HW/hw1/birthdays.txt')

# Generate a list with each element representing each line of the original text (empty lines removed)
text_lines = [elem for elem in text.split('\n') if elem != '' and elem != '\t']
# Only extract the lines where birthday information exists
text_lines = text_lines[4:401]

# generate a list with date and counts
date_num = [elm.split('\t') for elm in text_lines]

# Write a function to re-formt the result         
def get_birth_info(list): 
    # define a function to extract the date and counts in the following format
    #   [month, day, year, count] and every element should be an integer
    ans = [int(s) for s in list[0].split('/')]
    ans.append(int(list[1]))
    return ans

result = [get_birth_info(elem) for elem in date_num]    

In [11]:
#==============================================================================
# Which month had the most births in 1978?
#==============================================================================

# Create a list called result_month
#     in this list, each element corresponding to each month is a list containing the number of births on different day of that month
result_month = []
for i in range(12):
    result_month.append( [list[3] for list in result if list[0] == i+1] ) 

# add up the number of births of each month
# the births_month lists the total number of people born in that month
births_month = [sum(list) for list in result_month]
    
# reformat and sort the result to find the month with the most births
def last(list):
    return list[-1]
sorted_births_month = sorted(zip(range(1,13),births_month), key = last)
sorted_births_month   

[(2, 249875),
 (4, 254577),
 (1, 270695),
 (6, 270756),
 (5, 270812),
 (11, 274671),
 (3, 276584),
 (12, 284927),
 (10, 288955),
 (9, 293891),
 (7, 294701),
 (8, 302795)]

In [12]:
print('The month of 1978 which has the most births is the %s(th) month' % sorted_births_month[11][0])

The month of 1978 which has the most births is the 8(th) month


So the month of 1978 which has the most birth is **August**

In [13]:
#==============================================================================
# Which day of the week had the most births? Which day of the week had the fewest? 
# What conclusions can you draw?
#==============================================================================

# A function takes the data in the list form and calculate the corresponding day of week of each day 
#     and append this to the end of each element of the result list
def add_dow(list):
    return list.append(cal_day(list[0],list[1],1900+list[2]))

# Apply the add_dow function to everyday
for i in range(365):
    add_dow(result[i])

days = ['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday']

# result_dow has 7 list, with each list correspond to each day of a week.
#    In each list, it contains the number of births on different date.
result_dow = []
for day in days:
    result_dow.append( [list[3] for list in result if list[4] == day] ) 

# Summing up all the births in one day
births_dow = [sum(list) for list in result_dow]

# Sort the result to find the day with most and least births
sorted_births_dow = sorted(zip(days, births_dow), key = last)
sorted_births_dow

[('Sunday', 421400),
 ('Saturday', 432085),
 ('Monday', 487309),
 ('Thursday', 493149),
 ('Wednesday', 493897),
 ('Friday', 500541),
 ('Tuesday', 504858)]

In [14]:
#==============================================================================
# Just try to display the result in a table
#==============================================================================

import tabletext

print ( tabletext.to_text([list(a) for a in zip(days, births_dow)]) )

┌───────────┬────────┐
│ Sunday    │ 421400 │
├───────────┼────────┤
│ Monday    │ 487309 │
├───────────┼────────┤
│ Tuesday   │ 504858 │
├───────────┼────────┤
│ Wednesday │ 493897 │
├───────────┼────────┤
│ Thursday  │ 493149 │
├───────────┼────────┤
│ Friday    │ 500541 │
├───────────┼────────┤
│ Saturday  │ 432085 │
└───────────┴────────┘


In [15]:
print('The day of week with most births is %s' % sorted_births_dow[-1][0])  
print('The day of week with least births is %s' % sorted_births_dow[0][0])

The day of week with most births is Tuesday
The day of week with least births is Sunday


<p> The day of week with most births is **Tuesday**. </p>
<p> The day of week with least births is **Sunday**. </p>

__Exercise 2.3.__ What would be an effective way to present the information in exercise 2.2? You don't need to write any code for this exercise, just discuss what you would do.

**Answer:**
<p>I think a good way to present this information is to make a table with one column corresponding to each month and the other column the number of births of each month. Same for the births of each day of a week result.</p>
<p>Or another good way to do that is to draw a histogram of each month or each day of a week to show the number of people born on that specific time.</p>