# STA 141B: Homework 1

Fall 2018

## Information

After the colons (in the same line) please write just your first name, last name, and the 9 digit student ID number below.

First Name: Bailey

Last Name: Wang

Student ID: 914955801

## Instructions

We use a script that extracts your answers by looking for cells in between the cells containing the exercise statements.  So you 

- MUST add cells in between the exercise statements and add answers within them and
- MUST NOT modify the existing cells, particularly not the problem statement

To make markdown, please switch the cell type to markdown (from code) - you can hit 'm' when you are in command mode - and use the markdown language.  For a brief tutorial see: https://daringfireball.net/projects/markdown/syntax

## Part 1: The Doomsday Algorithm

The Doomsday algorithm, devised by mathematician J. H. Conway, computes the day of the week any given date fell on. The algorithm is designed to be simple enough to memorize and use for mental calculation.

__Example.__ With the algorithm, we can compute that July 4, 1776 (the day the United States declared independence from Great Britain) was a Thursday.

The algorithm is based on the fact that for any year, several dates always fall on the same day of the week, called the <em style="color:#F00">doomsday</em> for the year. These dates include 4/4, 6/6, 8/8, 10/10, and 12/12.

__Example.__ The doomsday for 2016 is Monday, so in 2016 the dates above all fell on Mondays. The doomsday for 2017 is Tuesday, so in 2017 the dates above will all fall on Tuesdays.

The doomsday algorithm has three major steps:

1. Compute the anchor day for the target century.
2. Compute the doomsday for the target year based on the anchor day.
3. Determine the day of week for the target date by counting the number of days to the nearest doomsday.

Each step is explained in detail below.

### The Anchor Day

The doomsday for the first year in a century is called the <em style="color:#F00">anchor day</em> for that century. The anchor day is needed to compute the doomsday for any other year in that century. The anchor day for a century $c$ can be computed with the formula:
$$
a = \bigl( 5 (c \bmod 4) + 2 \bigr) \bmod 7
$$
The result $a$ corresponds to a day of the week, starting with $0$ for Sunday and ending with $6$ for Saturday.

__Note.__ The modulo operation $(x \bmod y)$ finds the remainder after dividing $x$ by $y$. For instance, $12 \bmod 3 = 0$ since the remainder after dividing $12$ by $3$ is $0$. Similarly, $11 \bmod 7 = 4$, since the remainder after dividing $11$ by $7$ is $4$.

__Example.__ Suppose the target year is 1954, so the century is $c = 19$. Plugging this into the formula gives
$$a = \bigl( 5 (19 \bmod 4) + 2 \bigr) \bmod 7 = \bigl( 5(3) + 2 \bigr) \bmod 7 = 3.$$
In other words, the anchor day for 1900-1999 is Wednesday, which is also the doomsday for 1900.

__Exercise 1.1.__ Write a function that accepts a year as input and computes the anchor day for that year's century. The modulo operator `%` and integer division `\\` will be useful. Document your function with a docstring and test your function for a few different years.

In [2]:
import math
import datetime
import numpy as np
import pandas as pd

"""
    Create an input variable, then change the variable from a string to an integer
    Within the function, there is a variable created to pull the first two numbers from century 
    Compute the anchor day for that year's century

Parameters:
century_dumb : int

Returns
-----
int
    

"""

#century math.floor(year/ 100)

def anchor_function(century_dumb):
    century_dumb = int(century_dumb)
    century_int = int(str(century_dumb)[:2])
    #print(century_int)

    anchor = (5*(century_int % 4)+2) % 7
    return(anchor)

#func(27, 7, 1874) == "Saturday"

In [3]:
anchor_function(1988)

3

### The Doomsday

Once the anchor day is known, let $y$ be the last two digits of the target year. Then the doomsday for the target year can be computed with the formula:
$$d = \left(y + \left\lfloor\frac{y}{4}\right\rfloor + a\right) \bmod 7$$
The result $d$ corresponds to a day of the week.

__Note.__ The floor operation $\lfloor x \rfloor$ rounds $x$ down to the nearest integer. For instance, $\lfloor 3.1 \rfloor = 3$ and $\lfloor 3.8 \rfloor = 3$.

__Example.__ Again suppose the target year is 1954. Then the anchor day is $a = 3$, and $y = 54$, so the formula gives
$$
d = \left(54 + \left\lfloor\frac{54}{4}\right\rfloor + 3\right) \bmod 7 = (54 + 13 + 3) \bmod 7 = 0.
$$
Thus the doomsday for 1954 is Sunday.

__Exercise 1.2.__ Write a function that accepts a year as input and computes the doomsday for that year. Your function may need to call the function you wrote in exercise 1.1. Make sure to document and test your function.

In [4]:
"""
    Create an input variable, then change the variable from a string to an integer
    Within the function, it calls the previous function (anchor_function)
    There is a variable to pull the last two numbers from century
    Compute the doomsday for that year's century

Parameters:
century_dumb : int

Returns
-----
int
    
"""


def doomsday_function(century_dumb):
    anchor=anchor_function(century_dumb)
    century_dumb = int(century_dumb)
    century_int_2 = int(str(century_dumb)[2:])
    #print(century_int_2)
    doom = (century_int_2 + math.floor(century_int_2 / 4) + anchor) % 7
    #print(doom)
    #print('Thus the doomsday for {} is {}.'.format(century_dumb, table[doom]))
    return(doom)

In [5]:
doomsday_function(1954)

0

### The Day of Week

The final step in the Doomsday algorithm is to count the number of days between the target date and a nearby doomsday, modulo 7. This gives the day of the week.

Every month has at least one doomsday:
* (regular years) 1/10, 2/28
* (leap years) 1/11, 2/29
* 3/21, 4/4, 5/9, 6/6, 7/11, 8/8, 9/5, 10/10, 11/7, 12/12

__Example.__ Suppose we want to find the day of the week for 7/21/1954. The doomsday for 1954 is Sunday, and a nearby doomsday is 7/11. There are 10 days in July between 7/11 and 7/21. Since $10 \bmod 7 = 3$, the date 7/21/1954 falls 3 days after a Sunday, on a Wednesday.

__Exercise 1.3.__ Write a function to determine the day of the week for a given day, month, and year. Be careful of leap years! Your function should return a string such as "Thursday" rather than a number. As usual, document and test your code.

In [6]:
"""
    Create three input variables, then change the variable from a string to an integer
    Within the function, there are two dictonaries created, one to change the integer into a string and another table 
    which updates to include the leap year or regular year through an if-else statement
    Theres two equations the first one to find the days inbetween the doomsday and the input day, while the other
    finds how many days are left before the doomsday
    
Parameters:
month_dumb : int
day_dumb : int
year_dumb : int

Returns
-----
str
    
"""

def dayofweek_function(month_dumb, day_dumb, year_dumb):
    
    month_dumb = int(month_dumb)
    day_dumb = int(day_dumb)
    year_dumb = int(year_dumb)
    
              
    table  = {-6 : 'Monday',-5 : 'Tuesday',-4 : 'Wednesday',-3 : 'Thursday',-2 : 'Friday',-1 : 'Saturday',
          0 : 'Sunday',1 : 'Monday',2 : 'Tuesday',3 : 'Wednesday',4 : 'Thursday',5 : 'Friday',6 : 'Saturday'}
    #for anchor_int, day_dif in table.items():
    #    print(f'{anchor_int} is {day_dif}')
    
    table_doom = {3 : 21, 4 : 4, 5 : 9, 6 : 6, 7 : 11, 8 : 8, 9 : 5, 10 : 10, 11 : 7, 12 : 12}
    
    if ((year_dumb % 4 == 0) or ((year_dumb % 400 == 0) and year_dumb % 100 != 0)):
        table_doom.update({1 : 11, 2: 29})
        #for month_int, day_int_2 in table_leap.items():
            #print(f'Month {month_int} will have the doomsday on {day_int_2}')

    else:
        table_doom.update({1 : 10, 2 : 29})
        #for month_int, day_int_2 in table_reg.items():
            #print(f'Month {month_int} will have the doomsday on {day_int_2}')
       
    doomsday_diff = day_dumb - table_doom[month_dumb]
    doomsday_year = doomsday_function(year_dumb)
    doomsday_diff_int = (doomsday_diff + doomsday_year) % 7
    
    #doomsday_diff=int(day_dumb) - int(table_doom[month_dumb])
    #doomsday_diff_int = doomsday_diff % 7       
       
#-(-(-input)%7)        

    #print('The date {}/{}/{} falls {} days after a {}, on a {}.'
    #    .format(month_dumb, day_dumb, year_dumb, doomsday_diff_int, table[day_dumb % 7], table[doomsday_diff_int]))
    return(table[doomsday_diff_int])

print(dayofweek_function(1,10,2000), dayofweek_function(1,11,2000), dayofweek_function(1,12,2000))
print(dayofweek_function(8,22,1988),dayofweek_function(8,23,1988),dayofweek_function(8,24,1988))
print(dayofweek_function(7,21,1954),dayofweek_function(7,22,1954),dayofweek_function(7,23,1954))
print(dayofweek_function(1,1,1978),dayofweek_function(1,2,1978),dayofweek_function(1,3,1978))

Monday Tuesday Wednesday
Monday Tuesday Wednesday
Wednesday Thursday Friday
Sunday Monday Tuesday


In [7]:
year_begin = 1978
year_end = 1978
for i in range(year_begin,year_end+1):
        for j in range(1,13):
            day_count = datetime.date(i,j,1)
            day_count_2 = day_count.weekday()
        
print(day_count_2)

4


__Exercise 1.4.__ Davis picks up yard waste on the first Monday of the month.  How many times did the 1st of the month (first day of the month) fall on a Monday in the years 2000-2016 (including 2016)?

In [8]:
import datetime

"""
    Create two input variables, then change the variable from a string to an integer
    Create a for loop within the range of the two input variables and a for loop inside it within the 12 months
    Create two variables one to create the date and another to count
    Using an if statement, whenever the counter hits 0, it will add a count to the counter
    
Parameters:
year_begin : int
year_end : int

Returns
-----
int

"""

#Collaborated together with Kevin Chu
def countmonday_function(year_begin, year_end):
    count = 0
    year_begin = int(year_begin)
    year_end = int(year_end)
    ##28
    for i in range(year_begin,year_end+1):
        for j in range(1,13):
            day_count = datetime.date(i,j,1)
            day_count_2 = day_count.weekday()
            ##In this method, Monday == 0 instead of 1
            if day_count_2 == 0:
                count = count + 1
    return(count)

In [9]:
countmonday_function(2000,2016)

28

## Part 2: 1978 Birthdays

__Exercise 2.1.__ The file `birthdays.txt` contains the number of births in the United States for each day in 1978. Inspect the file to determine the format. Note that columns are separated by the tab character, which can be entered in Python as `\t`. Write a function that uses iterators and list comprehensions with the string methods `split()` and `strip()` to  convert each line of data to the tuple format

```Python
(month, day, year, count)
```
The elements of this list should be integers, not strings.  Read in the data and create this list of tuples.

In [11]:
#Code assisted by Tiffany Chen
birthday_list = pd.read_csv("birthdays.txt", sep='\t', skiprows=5, header=None)
#Using Pandas package, we can read the data file and format the file using sep function to split the data into a list, 
# skiprows to ignore the header and specify afterwards there is no header so the 0th index is part of the data

birthday_list = birthday_list.dropna()
birthday_list = birthday_list.rename(columns = {0 : "Date", 1 : "Births"})
#Rename the column names from 0 and 1 to specific the column meaning

birthday_list["Births"] = birthday_list["Births"].astype(int)
#Change the elements of the list into integers

location_df = birthday_list['Date'].apply(lambda x: pd.Series(str(x).split('/')))
#https://stackoverflow.com/questions/23317342/pandas-dataframe-split-column-into-multiple-columns-right-align-inconsistent-c
#Applying conditions to easily split the data up

location_df=location_df.rename(columns = {0 : "Month", 1 : "Day", 2: "Year"})
location_df["Month"] = location_df["Month"].astype(int)
location_df["Day"] = location_df["Day"].astype(int)
location_df["Year"] = location_df["Year"].astype(int)
location_df["Year"] = 1900+location_df["Year"]
#Due to the way Dayofweek_function works which takes a 4-integer year, we add 1900 to the year

#print(location_df)
#location_df.dtypes
#birthday_list
#birthday_list.dtypes

birthday_new=location_df.join(birthday_list)
birthday_new=birthday_new.drop(columns='Date')
#To make it so that the table returns 4 columns of (Month,Day,Year,Births)

#birthday_new.dtypes
birthday_new

#birthday_list.dtypes


Unnamed: 0,Month,Day,Year,Births
0,1,1,1978,7701
1,1,2,1978,7527
2,1,3,1978,8825
3,1,4,1978,8859
4,1,5,1978,9043
5,1,6,1978,9208
6,1,7,1978,8084
7,1,8,1978,7611
8,1,9,1978,9172
9,1,10,1978,9089


__Exercise 2.2.__ 

1. Count the number of birthdays by the month (number of birthdays per month).
2. Count the number of birthdays by the day of the week. 

What conclusions can you draw? You may find the `Counter` class in the `collections` module useful.

In [12]:
birthday_new.groupby(['Month'])['Births'].sum().reset_index()
#Call the data by the month column and find the sum of the birth
#Use reset_index to create a nicer looking table

Unnamed: 0,Month,Births
0,1,270695
1,2,249875
2,3,276584
3,4,254577
4,5,270812
5,6,270756
6,7,294701
7,8,302795
8,9,293891
9,10,288955


In [13]:
birthday_zip = list(zip(birthday_new['Month'],birthday_new['Day'],birthday_new['Year']))
#Create a tuple with the data to read into the dayofweek_function

#birthday_zip
for row in range(len(birthday_zip)):
    birthday_new.loc[row,"Weekday"] = dayofweek_function(birthday_zip[row][0],birthday_zip[row][1],birthday_zip[row][2])
birthday_new

Unnamed: 0,Month,Day,Year,Births,Weekday
0,1,1,1978,7701,Sunday
1,1,2,1978,7527,Monday
2,1,3,1978,8825,Tuesday
3,1,4,1978,8859,Wednesday
4,1,5,1978,9043,Thursday
5,1,6,1978,9208,Friday
6,1,7,1978,8084,Saturday
7,1,8,1978,7611,Sunday
8,1,9,1978,9172,Monday
9,1,10,1978,9089,Tuesday


In [14]:
birthday_new
birthday_new.groupby(['Weekday'])['Births'].sum().reset_index()
#Similar to before, we call the weekday column and find the sum of the birth column

Unnamed: 0,Weekday,Births
0,Friday,495746
1,Monday,488979
2,Saturday,430726
3,Sunday,426544
4,Thursday,493797
5,Tuesday,503632
6,Wednesday,493815


Part A
From our studies, we find that the most amount of births per month is in 'month 8' meaning August. We can conclude that majority of the couples had intercourse during the holidays. The birth rates range from 240000 to 300000 births per month. Comparing the data, it appears that there is not a heavy skewed.

Part B
From our studies, the weekday with the most births was on Tuesday and the least was on Sunday. If however these dates are wrong, and there exists an indexing problem. The birth rates range from 420000 to 500000 births per weekday. Looking at the data, there does not seem to be an outlier within the data set.