# Data
You can obtain the dataset by running the following code:

In [1]:
library(tidyverse)
bike <- read.csv('https://raw.githubusercontent.com/IAA-Faculty/statistical_foundations/master/bike.csv')

-- [1mAttaching core tidyverse packages[22m ------------------------ tidyverse 2.0.0 --
[32mv[39m [34mdplyr    [39m 1.1.4     [32mv[39m [34mreadr    [39m 2.1.5
[32mv[39m [34mforcats  [39m 1.0.0     [32mv[39m [34mstringr  [39m 1.5.1
[32mv[39m [34mggplot2  [39m 3.5.1     [32mv[39m [34mtibble   [39m 3.2.1
[32mv[39m [34mlubridate[39m 1.9.3     [32mv[39m [34mtidyr    [39m 1.3.1
[32mv[39m [34mpurrr    [39m 1.0.2     
-- [1mConflicts[22m ------------------------------------------ tidyverse_conflicts() --
[31mx[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31mx[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mi[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors


# Questions
1. Run the follwing code to get the training and test split:

In [2]:
set.seed(123)
bike <- bike %>% mutate(id = row_number())
train <- bike %>% sample_frac(0.7)
test <- anti_join(bike, train, by = 'id')

2. There are abnormal times where the number of casual users is greater than or equal to the
number of registered users. You can use the following code to create a variable casual_high that
captures this:

In [3]:
train$casual_high <- train$casual >= train$registered

3. You want to know if the occurrence of these times is related to the season of the year. Even
though season is ordinal, you want to just test a general association and not a linear one so use
the Pearson Chi-square test. What do you find at a significance level of 0.001? If you were to
perform a Mantel-Haenszel Chi-square test, would you reach the same conclusion?

In [4]:
## Pearson Chi-Square Test
chisq.test(table(train$casual_high, train$season))


	Pearson's Chi-squared test

data:  table(train$casual_high, train$season)
X-squared = 29.74, df = 3, p-value = 1.565e-06


The Pearson Chi-Square Test says that there is a significant relationship between season and the high number of casual riders

In [6]:
## Mantel-Haenszel Chi-square test
library(vcdExtra) 
CMHtest(table(train$casual_high, train$season))$table[1,]

Loading required package: vcd

Loading required package: grid

Loading required package: gnm


Attaching package: 'vcdExtra'


The following object is masked from 'package:dplyr':

    summarise




The Mantel-Haenszel Chi-square test says that there is not a significant association in the season and the high level of casual riders.

4. You also want to know if the occurrence of these times is related to the whether the day is a
holiday or not. Perform the appropriate chi-squared test to test this association as well as an
odds ratio for a measure of strength. What do you find at a significance level of 0.001? Interpret
the odds ratio.

In [7]:
chisq.test(table(train$casual_high, train$holiday))


	Pearson's Chi-squared test with Yates' continuity correction

data:  table(train$casual_high, train$holiday)
X-squared = 36.156, df = 1, p-value = 1.822e-09


In [10]:
library(DescTools) 
OddsRatio(table(train$casual_high, train$holiday))

In [11]:
library(gmodels)
CrossTable(train$casual_high, train$holiday)

Registered S3 method overwritten by 'gdata':
  method         from     
  reorder.factor DescTools




 
   Cell Contents
|-------------------------|
|                       N |
| Chi-square contribution |
|           N / Row Total |
|           N / Col Total |
|         N / Table Total |
|-------------------------|

 
Total Observations in Table:  12165 

 
                  | train$holiday 
train$casual_high |         0 |         1 | Row Total | 
------------------|-----------|-----------|-----------|
            FALSE |     11544 |       338 |     11882 | 
                  |     0.027 |     0.865 |           | 
                  |     0.972 |     0.028 |     0.977 | 
                  |     0.978 |     0.929 |           | 
                  |     0.949 |     0.028 |           | 
------------------|-----------|-----------|-----------|
             TRUE |       257 |        26 |       283 | 
                  |     1.120 |    36.299 |           | 
                  |     0.908 |     0.092 |     0.023 | 
                  |     0.022 |     0.071 |           | 
                  |     