### STA 141B: Homework 1
Winter 2018

## Information

After the colons (in the same line) please write just your first name, last name, and the 9 digit student ID number below.

First Name: Mitchell

Last Name: Layton

Student ID: 912307956

## Instructions

We use a script that extracts your answers by looking for cells in between the cells containing the exercise statements.  So you 

- MUST add cells in between the exercise statements and add answers within them and
- MUST NOT modify the existing cells, particularly not the problem statement

To make markdown, please switch the cell type to markdown (from code) - you can hit 'm' when you are in command mode - and use the markdown language.  For a brief tutorial see: https://daringfireball.net/projects/markdown/syntax


## Part 1: The Doomsday Algorithm

The Doomsday algorithm, devised by mathematician J. H. Conway, computes the day of the week any given date fell on. The algorithm is designed to be simple enough to memorize and use for mental calculation.

__Example.__ With the algorithm, we can compute that July 4, 1776 (the day the United States declared independence from Great Britain) was a Thursday.

The algorithm is based on the fact that for any year, several dates always fall on the same day of the week, called the <em style="color:#F00">doomsday</em> for the year. These dates include 4/4, 6/6, 8/8, 10/10, and 12/12.

__Example.__ The doomsday for 2016 is Monday, so in 2016 the dates above all fell on Mondays. The doomsday for 2017 is Tuesday, so in 2017 the dates above will all fall on Tuesdays.

The doomsday algorithm has three major steps:

1. Compute the anchor day for the target century.
2. Compute the doomsday for the target year based on the anchor day.
3. Determine the day of week for the target date by counting the number of days to the nearest doomsday.

Each step is explained in detail below.

### The Anchor Day

The doomsday for the first year in a century is called the <em style="color:#F00">anchor day</em> for that century. The anchor day is needed to compute the doomsday for any other year in that century. The anchor day for a century $c$ can be computed with the formula:
$$
a = \bigl( 5 (c \bmod 4) + 2 \bigr) \bmod 7
$$
The result $a$ corresponds to a day of the week, starting with $0$ for Sunday and ending with $6$ for Saturday.

__Note.__ The modulo operation $(x \bmod y)$ finds the remainder after dividing $x$ by $y$. For instance, $12 \bmod 3 = 0$ since the remainder after dividing $12$ by $3$ is $0$. Similarly, $11 \bmod 7 = 4$, since the remainder after dividing $11$ by $7$ is $4$.

__Example.__ Suppose the target year is 1954, so the century is $c = 19$. Plugging this into the formula gives
$$a = \bigl( 5 (19 \bmod 4) + 2 \bigr) \bmod 7 = \bigl( 5(3) + 2 \bigr) \bmod 7 = 3.$$
In other words, the anchor day for 1900-1999 is Wednesday, which is also the doomsday for 1900.

__Exercise 1.1.__ Write a function that accepts a year as input and computes the anchor day for that year's century. The modulo operator `%` and functions in the `math` module may be useful. Document your function with a docstring and test your function for a few different years.  Do this in a new cell below this one.

In [1]:
# Creating class with constructor instance with the anchor day, doomsday, and day of the week functions inside.

import numpy as np

class Days:

    # Class variables
    days_variables = {
                "0":"Sunday",
                "1":"Monday",
                "2":"Tuesday",
                "3":"Wednesday",
                "4":"Thursday",
                "5":"Friday",
                "6":"Saturday"
                }


    # Constructor
    def __init__(self, year):
        self.year = year


    def anchor_day(self):
        """Returns anchor day for that year's century."""
        if len(self.year) == 4:
            century = int(self.year[0:2])
        elif len(self.year.split("/")[2]) == 2:
            century = 19 # assumes 1900's when input like bithdays.txt
        elif len(self.year) > 4:
            century = int(self.year[-4:-2])

        anchor = str((5*(century % 4) + 2) % 7)
        day = type(self).days_variables[anchor]

        # create output for century in print statement
        format_year = self.year[0:2]
        format_year1 = format_year + "00"
        format_year2 = format_year + "99"
        a = ((5*(century % 4) + 2) % 7)

        return("\nThe anchor day for %s-%s is %s." % (format_year1,format_year2,day), a)
    
    
    def doomsday(self, a):
        """
        Returns doomsday for specified year.
        If specified year is in form YY such as in birthdays.txt, function assumes 1900's.
        """

        if len(self.year) == 4:
            y1 = str(self.year)
            y2 = int(self.year[2:4])
        elif len(self.year) == 2:
            y1 = "19" + str(self.year)
            y2 = int(self.year)
        else:
            y1 = str(self.year[-4:])
            y2 = int(self.year[-2:])

        d = str((y2 + (int(np.floor(y2/4))) + a) % 7)
        day = type(self).days_variables[d]

        # returning 2 things b/c besides output, I want to give variable "d" to day_of_week when I use function
        return("The doomsday for %s is %s.\n" % (y1,day),d)


    def day_of_week(self):
        """
        Returns a string of the day of the week for given day, month, year.
        Idea for this function is to find day of week where my min_dooms() function inside day_of_week()
        function computes the closest doomsday to the user's targeted input for every case possible.
        Then the bulk of day_of_week() uses this closest doomsday and finds day of week dependent on input
        accounting for leap year or not.

        Note: Function is long and was not expecting it to be initially . Ended up this way because wanted the challenge
        of accounting for all of the input date's nearest doomsday calculation cases.
        """

        #----------------------------------------------------------
        def min_dooms(target_day,target_month,doom_dates):
            """Takes target date as parameters and finds closest doomsday for every case of form DD/MM/YYYY OR DD/MM/YY"""

            # variables for list
            next_month = str(int(target_month) + 1)
            previous_month = str(int(target_month) - 1)
            if target_month == "12":
                next_month_doom = "12"
            else:
                next_month_doom = doom_dates[str(int(target_month)+1)]
            if target_month == "1":
                previous_month_doom = ()
            else:
                previous_month_doom = doom_dates[str(int(target_month)-1)]


            if int(target_day) > int(doom_dates[target_month]):
                diff1 = (int(target_day) - int(doom_dates[target_month]))
                if int(target_day) > int(next_month_doom):
                    diff2 = (30 - (int(target_day) - int(next_month_doom)))
                elif int(target_day) <= int(next_month_doom):
                    return(target_month)

                L = [diff1,diff2]
                if L[0] == L[1]:
                    return(next_month)
                elif min(L) == L[0]:
                    return(target_month)
                elif min(L) == L[1]:
                    if target_month == "12":
                        return("1")
                    else:
                        return(next_month)

                
            elif int(target_day) < int(doom_dates[target_month]):
                diff1 = int(doom_dates[target_month]) - int(target_day)
                if target_month == "1":
                    return(target_month)
                elif int(target_day) > int(previous_month_doom):
                    diff3 = (30 + (int(target_day) - int(previous_month_doom)))
                elif int(target_day) <= int(previous_month_doom):
                    diff3 = (30 - (int(previous_month_doom) - int(target_day)))

                L = [diff1,diff3]
                if L[0] == L[1]:
                    return(target_month)
                elif min(L) == L[0]:
                    return(target_month)
                elif min(L) == L[1]:
                    if target_month =="1":
                        return("1")
                    else:
                        return(previous_month)

            elif int(target_day) == int(doom_dates[target_month]):
                return(target_month)

            
        #------------------------------------------------------------

        split_date = self.year.split("/")
        MM = split_date[0]
        DD = split_date[1]
        YYYY = split_date[2]
        if len(YYYY) == 2:
            YYYY = int("19"+YYYY)
        else:
            YYYY = int(split_date[2])

        # Rerun anchor_day and doomsday functions b/c "day_of_week" funct called last in main() and self gets new date input
        ANCH = self.anchor_day()
        anch_num = int(ANCH[1])
        DOOM = self.doomsday(anch_num)
        DOOM = DOOM[1]


        """Determine if input year is leap year or not."""
        if((YYYY % 400 == 0) or ((YYYY % 4 == 0) and (YYYY % 100 != 0))):
            # Is Leap Year
            doom_dates = {"1":"11","2":"29","3":"21","4":"4","5":"9","6":"6","7":"11","8":"8","9":"5","10":"10","11":"7","12":"12"}

            closest_doom_month = min_dooms(DD,MM,doom_dates)
            final_step = int((int(DD) - int(doom_dates[closest_doom_month])) % 7)
            if int(DOOM) + final_step > 6:
                DOW = str((int(DOOM) + final_step) % 7)
                answer = type(self).days_variables[DOW]
                return(answer)
            elif int(DOOM) + final_step <= 6:
                DOW = str((int(DOOM) + final_step))
                answer = type(self).days_variables[DOW]
                return(answer)

        else:
            # Not Leap Year
            doom_dates = {"1":"10","2":"28","3":"21","4":"4","5":"9","6":"6","7":"11","8":"8","9":"5","10":"10","11":"7","12":"12"}

            closest_doom_month = min_dooms(DD,MM,doom_dates)
            final_step = int((int(DD) - int(doom_dates[closest_doom_month])) % 7)
            if int(DOOM) + final_step > 6:
                DOW = str((int(DOOM) + final_step) % 7)
                answer = type(self).days_variables[DOW]
                return(answer)
            elif int(DOOM) + final_step <= 6:
                DOW = str((int(DOOM) + final_step))
                answer = type(self).days_variables[DOW]
                return(answer)


            
def main():
    y = Days(input("\nEnter year: "))
    ANCH = y.anchor_day()
    print(ANCH[0])

# Main execution 
if __name__ == "__main__":
    main()
    


Enter year: 1900

The anchor day for 1900-1999 is Wednesday.


# The Doomsday

Once the anchor day is known, let $y$ be the last two digits of the target year. Then the doomsday for the target year can be computed with the formula:
$$d = \left(y + \left\lfloor\frac{y}{4}\right\rfloor + a\right) \bmod 7$$
The result $d$ corresponds to a day of the week.

__Note.__ The floor operation $\lfloor x \rfloor$ rounds $x$ down to the nearest integer. For instance, $\lfloor 3.1 \rfloor = 3$ and $\lfloor 3.8 \rfloor = 3$.

__Example.__ Again suppose the target year is 1954. Then the anchor day is $a = 3$, and $y = 54$, so the formula gives
$$
d = \left(54 + \left\lfloor\frac{54}{4}\right\rfloor + 3\right) \bmod 7 = (54 + 13 + 3) \bmod 7 = 0.
$$
Thus the doomsday for 1954 is Sunday.

__Exercise 1.2.__ Write a function that accepts a year as input and computes the doomsday for that year. Your function may need to call the function you wrote in exercise 1.1. Make sure to document and test your function.

In [11]:
def main():
    y = Days(input("\nEnter year: "))
    ANCH = y.anchor_day()

    anch_num = ANCH[1]
    DOOM = y.doomsday(anch_num)
    print(DOOM[0])

# Calling main function which calling upon my class for doomsday function
if __name__ == "__main__":
    main()


Enter year: 9/19/78
The doomsday for 9/78 is Tuesday.



### The Day of Week

The final step in the Doomsday algorithm is to count the number of days between the target date and a nearby doomsday, modulo 7. This gives the day of the week.

Every month has at least one doomsday:
* (regular years) 1/10, 2/28
* (leap years) 1/11, 2/29
* 3/21, 4/4, 5/9, 6/6, 7/11, 8/8, 9/5, 10/10, 11/7, 12/12

__Example.__ Suppose we want to find the day of the week for 7/21/1954. The doomsday for 1954 is Sunday, and a nearby doomsday is 7/11. There are 10 days in July between 7/11 and 7/21. Since $10 \bmod 7 = 3$, the date 7/21/1954 falls 3 days after a Sunday, on a Wednesday.

__Exercise 1.3.__ Write a function to determine the day of the week for a given day, month, and year. Be careful of leap years! Your function should return a string such as "Thursday" rather than a number. As usual, document and test your code.

In [4]:
def main():
    date = Days(input("Enter a date: "))
    DOW = date.day_of_week()
    print("\n%s" % DOW)

    
# Calling main function which calling upon my day of the week function
if __name__ == "__main__":
    main()


Enter a date: 9/19/78

Tuesday


__Exercise 1.4.__ How many times did Friday the 13th occur in the years 1900-1999? Does this number seem to be similar to other centuries?

In [5]:
from datetime import date, timedelta

def F_the_13th(y):
    """
    Returns number of occurances of Friday the 13th between 1900-1999.

    Finally utilized datetime package to help with function.
    """

    # initializing day to be the start of the year
    day = date(y, 1, 1)
    # initializing end to be the end of the year
    end = date(y, 12 ,31)

    # stores a full duration of a day into one_day
    one_day = timedelta(days=1)

    # loops until end of our desired year
    while day < end:
        # datetime.weekday() and datetime.day() return desired weekday and day of month respectively
        if day.weekday() == 4 and day.day == 13:
            yield day
        day += one_day

# Checking 1900-1999 as well as other centuries:

occurances = len([str(d) for y in range(1600,1699) for d in F_the_13th(y)]) # comprehension statement
print("Friday the 13th occured %s times between the years 1600-1699." % occurances)

occurances = len([str(d) for y in range(1700,1799) for d in F_the_13th(y)]) # comprehension statement
print("Friday the 13th occured %s times between the years 1700-1799." % occurances)

occurances = len([str(d) for y in range(1800,1899) for d in F_the_13th(y)]) # comprehension statement
print("Friday the 13th occured %s times between the years 1800-1899." % occurances)

occurances = len([str(d) for y in range(1900,1999) for d in F_the_13th(y)]) # comprehension statement
print("Friday the 13th occured %s times between the years 1900-1999." % occurances)

print("\nCompared to other centuries, the value is very similar and is off by +/- 1 due to the Gregorian Calender and Leap Year Rules.")

Friday the 13th occured 169 times between the years 1600-1699.
Friday the 13th occured 170 times between the years 1700-1799.
Friday the 13th occured 170 times between the years 1800-1899.
Friday the 13th occured 171 times between the years 1900-1999.

Compared to other centuries, the value is very similar and is off by +/- 1 due to the Gregorian Calender and Leap Year Rules.


__Exercise 1.5.__ How many times did Friday the 13th occur between the year 2000 and today?

In [6]:
# create new comprehension statement, substituting in new years

in_2018_only = [str(d) for d in F_the_13th(2018)]

print("%s \n%s" % (in_2018_only[0],in_2018_only[1]))
print("\n*Notice 2 days in 2018 will be Friday the 13th but has not yet occured.")
print("*Therefore we can just count all from 2000-2017.")

occurances = len([str(d) for y in range(2000, 2017) for d in F_the_13th(y)])
print("\nFriday the 13th occured %s times between 2000 and (as of) 1/18/2018." % occurances)

2018-04-13 
2018-07-13

*Notice 2 days in 2018 will be Friday the 13th but has not yet occured.
*Therefore we can just count all from 2000-2017.

Friday the 13th occured 29 times between 2000 and (as of) 1/18/2018.


## Part 2: 1978 Birthdays

__Exercise 2.1.__ The file `birthdays.txt` contains the number of births in the United States for each day in 1978. Inspect the file to determine the format. Note that columns are separated by the tab character, which can be entered in Python as `\t`. Write a function that uses iterators and list comprehensions with the string methods `split()` and `strip()` to  convert each line of data to the list format

```Python
[month, day, year, count]
```
The elements of this list should be integers, not strings. The function `read_birthdays` provided below will help you load the file.

In [7]:
def read_birthdays(file_path):
    """Read the contents of the birthdays file into a string.
    
    Arguments:
        file_path (string): The path to the birthdays file.
        
    Returns:
        string: The contents of the birthdays file.
    """
    with open(file_path) as file:
        return file.read()
    
in_file = read_birthdays("birthdays.txt")

def list_format(read_file):
    """ Takes read file and does many formatting tasks.

    Incudes:
        striping down for whitespace, newlines, and indented columns
        while accounting for messed up first element text and complications


    Returns:
        list: A list of many lists where each line of the text file is a sub-list with the correct formatted dates

    """
    txt = read_file.strip("\n").strip(" ")
    txt = txt[127:].split("\t")
    txt = ([str(d) for line in enumerate(txt) for d in line])[3::2]

    temp = []
    for string in txt:
        temp.append(string.split("\n"))
    temp = temp[0:(len(temp)-4)]


    counts = []
    dates = []
    new_dates = []
    for i in temp:
        for j in i:
            count = int(i[0])
            d = i[1].split("/")
            c = counts.append(count)
        dates.append(d)
    counts = counts[::2]
    
    # when string formatting text file above in the splitting "\t" step, left out first date
    # and gave the value of the first date to the second. As well same thing with last count in right column
    missing_count = [8028]
    counts = counts + missing_count
    missing_date = ['1', '1', '78']
    dates = [missing_date] + dates


    for i in dates:
        d = [int(k) for k in i]
        new_dates.append(d)


    final = []
    n = 0
    while n < len(new_dates):
        for i in new_dates:
            # non-destructive way to append to lists within bigger list
            x = i + [counts[n]]
            final.append(x)
            n += 1
    return(final)

num_of_birthdays = list_format(in_file)
print(num_of_birthdays)



[[1, 1, 78, 7701], [1, 2, 78, 7527], [1, 3, 78, 8825], [1, 4, 78, 8859], [1, 5, 78, 9043], [1, 6, 78, 9208], [1, 7, 78, 8084], [1, 8, 78, 7611], [1, 9, 78, 9172], [1, 10, 78, 9089], [1, 11, 78, 9210], [1, 12, 78, 9259], [1, 13, 78, 9138], [1, 14, 78, 8299], [1, 15, 78, 7771], [1, 16, 78, 9458], [1, 17, 78, 9339], [1, 18, 78, 9120], [1, 19, 78, 9226], [1, 20, 78, 9305], [1, 21, 78, 7954], [1, 22, 78, 7560], [1, 23, 78, 9252], [1, 24, 78, 9416], [1, 25, 78, 9090], [1, 26, 78, 9387], [1, 27, 78, 8983], [1, 28, 78, 7946], [1, 29, 78, 7527], [1, 30, 78, 9184], [1, 31, 78, 9152], [2, 1, 78, 9159], [2, 2, 78, 9218], [2, 3, 78, 9167], [2, 4, 78, 8065], [2, 5, 78, 7804], [2, 6, 78, 9225], [2, 7, 78, 9328], [2, 8, 78, 9139], [2, 9, 78, 9247], [2, 10, 78, 9527], [2, 11, 78, 8144], [2, 12, 78, 7950], [2, 13, 78, 8966], [2, 14, 78, 9859], [2, 15, 78, 9285], [2, 16, 78, 9103], [2, 17, 78, 9238], [2, 18, 78, 8167], [2, 19, 78, 7695], [2, 20, 78, 9021], [2, 21, 78, 9252], [2, 22, 78, 9335], [2, 23, 78


__Exercise 2.2.__ Which month had the most births in 1978? Which day of the week had the most births? Which day of the week had the fewest? What conclusions can you draw? You may find the `Counter` class in the `collections` module useful.

In [21]:
## struggled hard with trying to loop and index my list of lists from previous function so resorted to this methodology. 
# will seek help from TA to understand further programming skills.
totals = []
months = []
for i in num_of_birthdays:
    months.append(i[3]) 
Jan = sum(months[0:31])
totals.append(Jan)
Feb = sum(months[31:59])
totals.append(Feb)
Mar = sum(months[59:90])
totals.append(Mar)
Aprl = sum(months[90:120])
totals.append(Aprl)
May = sum(months[120:151])
totals.append(May)
Jun = sum(months[151:181])
totals.append(Jun)
Jul = sum(months[181:212])
totals.append(Jul)
Aug = sum(months[212:243])
totals.append(Aug)
Sept = sum(months[243:273])
totals.append(Sept)
Oct = sum(months[273:304])
totals.append(Oct)
Nov = sum(months[304:334])
totals.append(Nov)
Dec = sum(months[334:])
totals.append(Dec)

months = ["January","February","March","April","May","June","July","August","September","October","November","Decemeber"]
m = str(max(totals))
d = {str(totals[i]):months[i] for i in range(0,12)}

print("The month with the most births in 1978 was %s with %s births.\n" % (d[m],m))
dd = {str(d[1]):str(d[3]) for d in num_of_birthdays}

# Function to find minimums and maximums not restricted to how min() and max() are built

def new_max(seq, key_func = None):
    """
    Return the max element of our sequence.
    key_func is optional ordering function with 1 arguement.
    """
    if not seq:
        raise ValueError("empty seq")

    if not key_func:
        key_func = identity

    maximum = seq[0]

    for item in seq:
        # Ask the key func which property to compare
        if key_func(item) > key_func(maximum):
            maximum = item

    return maximum

# Finds max number of births which gives us a desired date
maxy = new_max(num_of_birthdays, key_func = lambda x: (x[3]))
maxy = str(maxy[0]) + "/" + str(maxy[1]) + "/" + str(maxy[2])

# plug maxy into our previous day of the week function in our class.
date = Days(maxy)
DOW = date.day_of_week()
print("The day of the week for %s that had the most births was a %s.\n" % (maxy,DOW))
    

def new_min(seq, key_func = None):
    """
    Return the max element of our sequence.
    key_func is optional ordering function with 1 arguement.
    """
    if not seq:
        raise ValueError("empty seq")

    if not key_func:
        key_func = identity

    minimum = seq[0]

    for item in seq:
        # Ask the key func which property to compare
        if key_func(item) < key_func(minimum):
            minimum = item

    return minimum
    
# Finds max number of births which gives us a desired date
mini = new_min(num_of_birthdays, key_func = lambda x: (x[3]))
mini = str(mini[0]) + "/" + str(mini[1]) + "/" + str(mini[2])

# plug mini into our previous day of the week function in our class.
date = Days(mini)
DOW = date.day_of_week()
print("The day of the week for %s that had the least births was a %s.\n" % (mini,DOW))

print("Interpretation below.")

The month with the most births in 1978 was August with 302795 births.

The day of the week for 9/19/78 that had the most births was a Tuesday.

The day of the week for 4/30/78 that had the least births was a Tuesday.

Interpretation below.


I'm not quite sure one could draw conclusions on what seems like a coincidence of having the same day. That being said what is most important is the time of the year that these births are happening and we could conclude that people are more likely trying to have a baby around the beginning of the year due to things such as Valentine's Day or post New Year's Eve joy.
People are less likely to try to conceive a baby towards the busier times of the year like during fall season.

__Exercise 2.3.__ What would be an effective way to present the information in exercise 2.2? You don't need to write any code for this exercise, just discuss what you would do.

An effective and interesting way to present the information in the previous question would be to create some visualization maybe similar to the one we say in class where it displays the information in the circular calender based off of the months. I feel like on could almost take in heaps of data regarding births other the years and just use that website's template as an insertion for creating a pretty and decently effective visualization. Like we had discussed in class, there might be some limitations to that methodology in that it might not be the most clear or straightforward method, but it is sure attractive to the readers and viewers. 

While I believe that would be a great method, I am actually not qualified enough to give an honest and accurate response to data visualizations. Hopefully we will learn more about the methods and types of visualizations throughout the quarter.