# STA 141B: Homework 1

Fall 2018

## Information

After the colons (in the same line) please write just your first name, last name, and the 9 digit student ID number below.

First Name: Kevin

Last Name: Chu

Student ID: 913077890

## Instructions

We use a script that extracts your answers by looking for cells in between the cells containing the exercise statements.  So you 

- MUST add cells in between the exercise statements and add answers within them and
- MUST NOT modify the existing cells, particularly not the problem statement

To make markdown, please switch the cell type to markdown (from code) - you can hit 'm' when you are in command mode - and use the markdown language.  For a brief tutorial see: https://daringfireball.net/projects/markdown/syntax

## Part 1: The Doomsday Algorithm

The Doomsday algorithm, devised by mathematician J. H. Conway, computes the day of the week any given date fell on. The algorithm is designed to be simple enough to memorize and use for mental calculation.

__Example.__ With the algorithm, we can compute that July 4, 1776 (the day the United States declared independence from Great Britain) was a Thursday.

The algorithm is based on the fact that for any year, several dates always fall on the same day of the week, called the <em style="color:#F00">doomsday</em> for the year. These dates include 4/4, 6/6, 8/8, 10/10, and 12/12.

__Example.__ The doomsday for 2016 is Monday, so in 2016 the dates above all fell on Mondays. The doomsday for 2017 is Tuesday, so in 2017 the dates above will all fall on Tuesdays.

The doomsday algorithm has three major steps:

1. Compute the anchor day for the target century.
2. Compute the doomsday for the target year based on the anchor day.
3. Determine the day of week for the target date by counting the number of days to the nearest doomsday.

Each step is explained in detail below.

### The Anchor Day

The doomsday for the first year in a century is called the <em style="color:#F00">anchor day</em> for that century. The anchor day is needed to compute the doomsday for any other year in that century. The anchor day for a century $c$ can be computed with the formula:
$$
a = \bigl( 5 (c \bmod 4) + 2 \bigr) \bmod 7
$$
The result $a$ corresponds to a day of the week, starting with $0$ for Sunday and ending with $6$ for Saturday.

__Note.__ The modulo operation $(x \bmod y)$ finds the remainder after dividing $x$ by $y$. For instance, $12 \bmod 3 = 0$ since the remainder after dividing $12$ by $3$ is $0$. Similarly, $11 \bmod 7 = 4$, since the remainder after dividing $11$ by $7$ is $4$.

__Example.__ Suppose the target year is 1954, so the century is $c = 19$. Plugging this into the formula gives
$$a = \bigl( 5 (19 \bmod 4) + 2 \bigr) \bmod 7 = \bigl( 5(3) + 2 \bigr) \bmod 7 = 3.$$
In other words, the anchor day for 1900-1999 is Wednesday, which is also the doomsday for 1900.

__Exercise 1.1.__ Write a function that accepts a year as input and computes the anchor day for that year's century. The modulo operator `%` and functions in the `math` module may be useful. Document your function with a docstring and test your function for a few different years.

In [1]:
def anchor_fun(year):
    """Computes the anchor day for that year's century
    
    Parameters
    ----------
    year : integer
    
    Returns
    ----------
    anchorday : integer from 0, 1, ..., 6 representing Sunday, Monday, ..., Saturday
    """
    century = year // 100
    anchorday = (5 * (century % 4) + 2) % 7
    return(anchorday)

print("1954: ", anchor_fun(1954), "\n2005: ", anchor_fun(2005), "\n1785: ", anchor_fun(1785))

1954:  3 
2005:  2 
1785:  0


### The Doomsday

Once the anchor day is known, let $y$ be the last two digits of the target year. Then the doomsday for the target year can be computed with the formula:
$$d = \left(y + \left\lfloor\frac{y}{4}\right\rfloor + a\right) \bmod 7$$
The result $d$ corresponds to a day of the week.

__Note.__ The floor operation $\lfloor x \rfloor$ rounds $x$ down to the nearest integer. For instance, $\lfloor 3.1 \rfloor = 3$ and $\lfloor 3.8 \rfloor = 3$.

__Example.__ Again suppose the target year is 1954. Then the anchor day is $a = 3$, and $y = 54$, so the formula gives
$$
d = \left(54 + \left\lfloor\frac{54}{4}\right\rfloor + 3\right) \bmod 7 = (54 + 13 + 3) \bmod 7 = 0.
$$
Thus the doomsday for 1954 is Sunday.

__Exercise 1.2.__ Write a function that accepts a year as input and computes the doomsday for that year. Your function may need to call the function you wrote in exercise 1.1. Make sure to document and test your function.

In [2]:
import math

def doomsday_fun(year):
    """Computes the day of week for the doomsday given a year
    
    Parameters
    ----------
    year : integer
    
    Returns
    ----------
    doomsday : integer from 0, 1, ..., 6 representing Sunday, Monday, ..., Saturday
    """
    anchorday = anchor_fun(year)
    doomyear = int(str(year)[-2:])
    doomsday = (doomyear + math.floor(doomyear/4) + anchorday) % 7
    return(doomsday)
    
print("1954: ", doomsday_fun(1954), "\n2005: ", doomsday_fun(2005), "\n1785: ", doomsday_fun(1785))

1954:  0 
2005:  1 
1785:  1


### The Day of Week

The final step in the Doomsday algorithm is to count the number of days between the target date and a nearby doomsday, modulo 7. This gives the day of the week.

Every month has at least one doomsday:
* (regular years) 1/10, 2/28
* (leap years) 1/11, 2/29
* 3/21, 4/4, 5/9, 6/6, 7/11, 8/8, 9/5, 10/10, 11/7, 12/12

__Example.__ Suppose we want to find the day of the week for 7/21/1954. The doomsday for 1954 is Sunday, and a nearby doomsday is 7/11. There are 10 days in July between 7/11 and 7/21. Since $10 \bmod 7 = 3$, the date 7/21/1954 falls 3 days after a Sunday, on a Wednesday.

__Exercise 1.3.__ Write a function to determine the day of the week for a given day, month, and year. Be careful of leap years! Your function should return a string such as "Thursday" rather than a number. As usual, document and test your code.

In [3]:
import datetime

def dayOfWeek(year, month, day):
    """Computes the day of the week given a year, month, and day. 
    First a day_table is created which is a dictionary of 0, ..., 6 being Sunday, ..., Monday respectively
    Next a doomsday_table is created which is a dictionary defining which days are considered doomsday in each month
    Next the function checks to see if the given year is a leap year and updates the doomsday_table accordingly
    Afterwards the doomsday_fun from 1.2 is used to check which specific weekday is the doomsday of the year
    days_between is then calculated as the days between the given date and the closest doomsday
    days_left is the days_between modulo 7 which returns the number of days after the doomsday
    weekday is the weekday of the given date
    
    ----------
    
    Parameters
    ----------
    year : integer
    month : integer
    day : integer
    
    Returns
    ----------
    weekday : string
    """
    day_table = {0: "Sunday", 1: "Monday", 2: "Tuesday", 3: "Wednesday", 4: "Thursday", 5: "Friday", 6: "Saturday"}
    doomsday_table = {3: 21, 4: 4, 5: 9, 6: 6, 7: 11, 8: 8, 9: 5, 10: 10, 11: 7, 12: 12}
    
    if ((year % 400 == 0) or ((year % 4 == 0) and (year % 100 != 0))):
        doomsday_table.update({1:11, 2:29})
    else:
        doomsday_table.update({1:10, 2:28})
        
    doomsday = doomsday_fun(year)
    days_between = day - doomsday_table[month]
    days_left = days_between % 7
    weekday = day_table[(doomsday + days_left) % 7]
    return(weekday)
    
    
w = dayOfWeek(1954, 7, 21)
x = dayOfWeek(2000, 1, 10)
y = dayOfWeek(2000, 1, 11)
z = dayOfWeek(2000, 1, 12)

print(w, x, y, z)

Wednesday Monday Tuesday Wednesday


__Exercise 1.4.__ Davis picks up yard waste on the first Monday of the month.  How many times did the 1st of the month (first day of the month) fall on a Monday in the years 2000-2016 (including 2016)?

In [4]:
import datetime

def mondayCount(year1, year2):
    """Computes the number of times the 1st day of the month is a Monday between 2 separate years
    
    Parameters
    ----------
    year1 : integer
    year2 : integer
    
    Returns
    ----------
    mondayCount : integer representing the number of times the first day of the month is a Monday between 2 separate years
    ----------
    Discussed the problem with Bailey Wang who is also currently taking this course
    """
    mondayCount = 0
    
    for i in range(year1, year2 + 1):
        for j in range(1, 12 + 1):
            date = datetime.date(i, j, 1)
            day = date.weekday()
            if day == 0:
                mondayCount = mondayCount + 1
    
    return(mondayCount)
        
mondayCount(2000, 2016)

28

## Part 2: 1978 Birthdays

__Exercise 2.1.__ The file `birthdays.txt` contains the number of births in the United States for each day in 1978. Inspect the file to determine the format. Note that columns are separated by the tab character, which can be entered in Python as `\t`. Write a function that uses iterators and list comprehensions with the string methods `split()` and `strip()` to  convert each line of data to the list format

```Python
[month, day, year, count]
```
The elements of this list should be integers, not strings. The function `read_birthdays` provided below will help you load the file.

In [5]:
# Better way to do it

import pandas as pd

column_names = ['month', 'day', 'year']
data = pd.read_table('birthdays.txt', sep = '/', skiprows = 6, names = column_names)
data[['year','count']] = data['year'].str.split('\t',expand=True)
data.head()

Unnamed: 0,month,day,year,count
0,1,1,78,7701
1,1,2,78,7527
2,1,3,78,8825
3,1,4,78,8859
4,1,5,78,9043


In [6]:
def read_birthdays(file_directory):
    """Function used to read the birthdays.txt file for the assignment
    The for loop at the end of this code block is used to separate the text file into four columns:
    month, day, year, birthday count
    
    Parameters
    ----------
    file_directory : location of the file
    
    Returns
    ----------
    birthdays : a text file that has been read into python
    """
    
    file = open(file_directory)
    birthdays = [line.strip() for i,line in enumerate(file) if i > 5]
    return birthdays

birthdays = read_birthdays('birthdays.txt')

for i in range(0, 365):
    birthdays[i] = birthdays[i].replace("/", "\t")
    birthdays[i] = birthdays[i].split("\t")
    
birthdays

[['1', '1', '78', '7701'],
 ['1', '2', '78', '7527'],
 ['1', '3', '78', '8825'],
 ['1', '4', '78', '8859'],
 ['1', '5', '78', '9043'],
 ['1', '6', '78', '9208'],
 ['1', '7', '78', '8084'],
 ['1', '8', '78', '7611'],
 ['1', '9', '78', '9172'],
 ['1', '10', '78', '9089'],
 ['1', '11', '78', '9210'],
 ['1', '12', '78', '9259'],
 ['1', '13', '78', '9138'],
 ['1', '14', '78', '8299'],
 ['1', '15', '78', '7771'],
 ['1', '16', '78', '9458'],
 ['1', '17', '78', '9339'],
 ['1', '18', '78', '9120'],
 ['1', '19', '78', '9226'],
 ['1', '20', '78', '9305'],
 ['1', '21', '78', '7954'],
 ['1', '22', '78', '7560'],
 ['1', '23', '78', '9252'],
 ['1', '24', '78', '9416'],
 ['1', '25', '78', '9090'],
 ['1', '26', '78', '9387'],
 ['1', '27', '78', '8983'],
 ['1', '28', '78', '7946'],
 ['1', '29', '78', '7527'],
 ['1', '30', '78', '9184'],
 ['1', '31', '78', '9152'],
 ['2', '1', '78', '9159'],
 ['2', '2', '78', '9218'],
 ['2', '3', '78', '9167'],
 ['2', '4', '78', '8065'],
 ['2', '5', '78', '7804'],
 ['2',

__Exercise 2.2.__ 

1. Count the number of birthdays by the month (number of birthdays per month).
2. Count the number of birthdays by the day of the week. 

What conclusions can you draw? You may find the `Counter` class in the `collections` module useful.

In [7]:
def month_count(month):
    """Computes the number of birthdays per month in the given birthdays.txt file
    The function goes through each column of the birthdays.txt file and first checks the month of the row and if
    it matches the given month, it will add to a counter of birthdays for that month
    
    Parameters
    ----------
    month : integer from 0, 1, ..., 12 representing January, February, ..., December
    
    Returns
    ----------
    monthCount : integer count of how many birthdays there were in the given month
    """
    
    monthCount = 0
    
    for i in range(0, 365):
        if int(birthdays[i][0]) == month:
            monthCount = monthCount + int(birthdays[i][3])
            
    return monthCount
            
for i in range(1, 13):
    print(i, ": ", month_count(i))

1 :  270695
2 :  249875
3 :  276584
4 :  254577
5 :  270812
6 :  270756
7 :  294701
8 :  302795
9 :  293891
10 :  288955
11 :  274671
12 :  284927


In [8]:
def weekday_count(test_day):
    """Computes the number of birthdays per weekday in the given birthdays.txt file
    The function goes through each row of the birthdays.txt file and calculates the weekday of that 
    specific date. Afterwards, it checks to see if that weekday is equal to the test_day passed as an
    argument. If the days are the same, it will add the birthday counts to a counter of birthdays for 
    that specific weekday.
    
    Parameters
    ----------
    test_day : string of the weekday being tested
    
    Returns
    ----------
    dayCount : integer count of how many birthdays there were in in the given weekday
    """
    
    dayCount = 0
        
    for i in range(0, 365):
        year = int(birthdays[i][2]) + 1900
        month = int(birthdays[i][0])
        day = int(birthdays[i][1])
        weekday = dayOfWeek(year, month, day)
        
        if weekday == test_day:
            dayCount = dayCount + int(birthdays[i][3])
    
    return dayCount
        
print(weekday_count("Sunday"))
print(weekday_count("Monday"))
print(weekday_count("Tuesday"))
print(weekday_count("Wednesday"))
print(weekday_count("Thursday"))
print(weekday_count("Friday"))
print(weekday_count("Saturday"))

421400
487309
504858
493897
493149
500541
432085
