## R Code

### Import the dataset

```R
library(dplyr) # import pertinent libraries (mostly for plotting)

hepatitis_dataset <- read.csv("report_data/hep_clean.csv") # Read the dataset (saved as a csv) to R.
```

### Get a Simple Random Sample
```R
# Select a simple random sample of n = 36 counties from the dataset (N = 3142 counties)
sample <- sample_n(hepatitis_dataset, 36, replace=TRUE)

write.csv(sample, "report_data/sample_df.csv") # Save sample to csv
```

### Re-Import the Sample Dataset
```R
sample_df <- read.csv("report_data/sample_df.csv")
```

### Define the Number and Width of the Classes for the Histogram
```R
# Use Sturge's Rule: K = 1 + 3.322 * log(n)
K <- 1 + 3.322 * log(36, base=10)
[K]  6.17003690754893
# Round UP: K = 6.17 ≅ 7
```
Using Sturge's rule, for a sample dataset with n = 36 (counties), the number of classes (K) will be 7.


```R
# Now define the width of each class. Remember, width = Range of the distribution (Max(n) - Min(n)) / K

# Range of the Distribution:
range <- max(sample_df$ChangeMortality_Rate) - min(sample_df$ChangeMortality_Rate) 
[range]  55.34

# Class Width:
class_width <- range / 7
[class_width]  7.90571428571429
```

### Plot the Percentage Frequency Density Histogram
```R
x <-sample_df$ChangeMortality_Rate # Renaming variable of interest

min <- min(sample_df$ChangeMortality_Rate)
[min] -61.92
max <- max(sample_df$ChangeMortality_Rate)
[max] -6.58

## Percent frequency
h = hist(x, breaks = seq(min, max, length.out = class_width), plot=FALSE)
h$density = h$counts/sum(h$counts)*100
labs <- paste(round(h$density), "%", sep="")

plot(h,freq=FALSE,labels = labs ,col = "pink",xlab ='Change in Mortality Rate (%)',
main='Percent Frequency Density Histogram,
U.S. Hepatitis Mortality Change from 1980 - 2014 (%)'
)
```

![image info](./report_data/hist.png)


### Sample Statistics
```R
min(x) # Sample Minimum
max(x) # Sample Maximum
mean(x) # Sample Average
median(x) #Sample Median
mode(x) # There were none
sd(x) # Sample Standard Deviation
IQR(x) # InterQuartile Range
quantile(x) # Q1, Q2, Q3
max(x) - min(x) # Range
```

#### Measures Of Center
  
|Mean (%)|Median (%)|Mode (%)|
|----------|:--------:|------:|
| -39.91 |   -40.46       | None |

#### Measures Of Dispersion
  
|Standard Deviation (%)|Interquartile Range (%)|Range (%)|
|----------|:--------:|------:|
| 12.66 |     15.64     |  55.34 |

#### Five Number Summary
|Minimum (%)|Q1 (%)|Q2 (%)|Q3 (%)|Maximum (%)|
|----------|:--------:|:--------:|:--------:|------:|
| -61.92 |    -48.5    | -40.36  |  -32.86 |  -6.58 |

#### Boxplot
```R
boxplot(x,
main = "Boxplot
Change in U.S. Hepatitis Mortality Between 1980 - 2014 (%)",
xlab = "%",
ylab = "Hepatitis Mortality",
col = "pink",
border = "brown",
horizontal = TRUE,
notch = TRUE
)
```
![image info](./report_data/box.png)

### Identify Outliers
```R
# Using the 1.5*IQR Rule:
# Q1-(1.5*IQR)
Low_Outliers <- -48.5 - (1.5*15.64)
[Low_Outliers] -71.96
# Q3+(1.5*IQR) 
High_Outliers <- -32.86 + (1.5*15.64)
[High_Outliers] -9.4
```

### Estimating the Sample Margin of Error at the 99% Confidence Interval
```R
# Using Gosset's T-Distribution
    ## Method 1
n <- 36 # Define the sample size
T_35 <- StudentsT(df = n - 1) # Obtain the critical t-value where degrees of freedom = n - 1

# Margin of Error
5.7468 = quantile(T_35, .01 / 2) * sd(x) / sqrt(n)
- 5.7468 = quantile(T_35, 1 - .01 / 2) * sd(x) / sqrt(n)

Low_ME <- mean(x) + quantile(T_35, .01 / 2) * sd(x) / sqrt(n) # Lower Confidence Inteval
[Low_ME] -45.6587774064437
High_ME <- mean(x) + quantile(T_35, 1 - .01 / 2) * sd(x) / sqrt(n) # Upper Confidence Inteval
[High_ME] -34.1651114824452

    ## Method 2
library(distributions3) # Import the library to run the t.test() function
t.test(x, conf.level = 0.99)
```
![image info](./report_data/ttest.png)

The sample mean is between [-45.66, -34.17] at the 99% confidence interval.

### Hypothesis Testing
Based on the sample data, using Gosset's T-Test, I will test the hypothesis that the population mean of the hepatitis mortality rate is -25% versus the alternative that it is less than this. The test will be conducted at the 5 percent significance level.

<br>

H_0 =  -25%
<br>
H_A <  -25%
<br>
α=5%

```R
    ## Method 1
H0_mu = -25
t = (mean_X - H0_mu) / (sd(x)/sqrt(n)) # t-test statistic
[t] -7.0677602589667
p = pt(t,n-1)
[p] 1.56220704105566e-08

    ## Method 2
t.test(x, mu= H0_mu, alternative="less", conf.level=0.95)
```
![image info](./report_data/ttest2.png)

p < .05, therefore we can reject the Null Hypothesis.

## Sample Dataset

| X.1|    X|Location                          |  FIPS| CI_Lower_Boundary| CI_Upper_Boundary| ChangeMortality_Rate|
|---:|----:|:---------------------------------|-----:|-----------------:|-----------------:|--------------------:|
|   1| 1601|Broadwater County, Montana        | 30007|            -57.01|             -4.32|               -34.51|
|   2| 2313|Newport County, Rhode Island      | 44005|            -71.51|            -38.98|               -57.65|
|   3| 1083|Nicholas County, Kentucky         | 21181|            -61.20|             -9.50|               -40.24|
|   4|  519|Taylor County, Georgia            | 13269|            -65.73|            -27.94|               -48.83|
|   5|   45|Marengo County, Alabama           |  1091|            -73.62|            -46.20|               -61.92|
|   6|  943|Lyon County, Kansas               | 20111|            -62.76|            -17.62|               -44.99|
|   7| 2048|Auglaize County, Ohio             | 39011|            -64.09|            -20.95|               -45.26|
|   8| 1800|Curry County, New Mexico          | 35009|            -32.27|             31.75|                -6.58|
|   9| 2291|Northampton County, Pennsylvania  | 42095|            -68.30|            -36.56|               -55.03|
|  10| 1136|Iberville Parish, Louisiana       | 22047|            -64.32|            -34.55|               -51.41|
|  11|  574|Gooding County, Idaho             | 16047|            -54.78|             -3.78|               -33.47|
|  12| 1156|Saint Bernard Parish, Louisiana   | 22087|            -53.83|            -18.60|               -38.69|
|  13| 2788|Juab County, Utah                 | 49023|            -50.13|              3.11|               -27.74|
|  14|  274|Jefferson County, Colorado        |  8059|            -44.40|             -5.57|               -27.11|
|  15| 1286|Midland County, Michigan          | 26111|            -54.98|             -5.66|               -33.27|
|  16|  392|Banks County, Georgia             | 13011|            -52.81|             -3.31|               -33.03|
|  17| 2952|York County, Virginia             | 51199|            -56.68|            -19.61|               -40.47|
|  18| 2043|Adams County, Ohio                | 39001|            -50.16|             10.87|               -25.63|
|  19| 1083|Nicholas County, Kentucky         | 21181|            -61.20|             -9.50|               -40.24|
|  20| 2364|Bennett County, South Dakota      | 46007|            -64.17|            -24.51|               -47.03|
|  21| 1896|Beaufort County, North Carolina   | 37013|            -57.79|            -15.48|               -40.76|
|  22| 1527|Howard County, Missouri           | 29089|            -59.92|            -13.06|               -40.61|
|  23|  773|Sullivan County, Indiana          | 18153|            -54.57|             -1.33|               -33.09|
|  24| 1179|Cumberland County, Maine          | 23005|            -62.92|            -19.13|               -44.94|
|  25|   26|Escambia County, Alabama          |  1053|            -61.64|            -29.40|               -48.39|
|  26|  892|Barton County, Kansas             | 20009|            -63.44|            -27.69|               -48.29|
|  27| 2716|Red River County, Texas           | 48387|            -46.60|             -3.73|               -27.33|
|  28| 1335|Faribault County, Minnesota       | 27043|            -71.94|            -40.28|               -58.62|
|  29| 1177|Androscoggin County, Maine        | 23001|            -49.65|              8.09|               -26.79|
|  30|  365|Okaloosa County, Florida          | 12091|            -56.63|            -12.25|               -36.66|
|  31|  452|Greene County, Georgia            | 13133|            -69.29|            -33.81|               -54.31|
|  32| 1959|Pasquotank County, North Carolina | 37139|            -52.26|              4.38|               -28.48|
|  33| 2956|Chelan County, Washington         | 53007|            -49.32|             -7.97|               -32.36|
|  34| 1958|Pamlico County, North Carolina    | 37137|            -68.05|            -19.40|               -48.86|
|  35|  384|Wakulla County, Florida           | 12129|            -41.86|             26.73|               -13.92|
|  36| 3109|Vernon County, Wisconsin          | 55123|            -72.58|            -43.12|               -60.32|