**Requirement**

Deliverable 2: 
Project Proposal
Each group is expected to prepare a written proposal within **500 words (about 1 page)** that identifies the dataset they plan to work on, as well as the question they would like to answer using that dataset for their group project. The proposal should be done in a Jupyter notebook, and then submitted both as an .html file (File → Download As → HTML) and an .ipynb file that is reproducible (i.e. works and runs without any additional files).

Only one member of your team needs to submit. You must submit **two files**:

the source **Jupyter notebook (.ipynb file)**

the rendered **final document (.html file)**

Each proposal should include the following sections:

**Title**









**Introduction**

Begin by providing some relevant background information on the topic so that someone unfamiliar with it will be prepared to understand the rest of your proposal.

Clearly state the question you will try to answer with your project. Your question should involve one or more random variables of interest, spread across two or more categories that are interesting to compare. For example, you could consider the annual maxima river flow at two different locations along a river, or perhaps gender diversity at different universities. Of the response variable, **identify one location parameter (mean, median, quantile, etc.) and one scale parameter (standard deviation, inter-quartile range, etc.)** that would be useful in answering your question. Justify your choices.

UPDATE (Mar 1, 2022): If it doesn’t make sense to infer a scale parameter, you can choose another parameter, or choose a second variable altogether. Ultimately, we’re looking for a **comprehensive inference analysis on one parameter spread across 2+ groups (with at least one hypothesis test)**, plus a bit more (such as an investigation on the variance, a quantile, or a different variable). In total, you should use both **bootstrapping and asymptotics** somewhere in your report **at least once each**. Also, your hypothesis test(s) need not be significant: it is perfectly fine to write a report claiming no significant findings (i.e. your p-value is large).

Identify and describe the dataset that will be used to answer the question. Remember, this dataset is allowed to contain more variables than you need – feel free to drop them!

Also, be sure to frame your question/objectives in terms of what is already known in the literature. Be sure to include **at least two scientific publications** that can help frame your study (you will need to include these in the **References** section). We have no specific citation style requirements, but be consistent.

**Preliminary Results**

In this section, you will:

Demonstrate that the dataset can be read from the web into R.
Clean and wrangle your data into a tidy format.
Plot the relevant raw data, tailoring your plot in a way that addresses your question.
Compute **estimates of the parameter** you identified across your groups. Present this in a table. If relevant, include these estimates in your plot.
Be sure to not print output that takes up a lot of screen space.

**Methods: Plan**

The previous sections will carry over to your final report (you’ll be allowed to improve them based on feedback you get). Begin this Methods section with a brief description of “the good things” about this report – specifically, **in what ways is this report trustworthy?**

Continue by explaining why the plot(s) and estimates that you produced are not enough to give to a stakeholder, and what you should provide in addition to address this gap. Make sure your plans **include at least one hypothesis test and one confidence interval. If possible, compare both the bootstrapping and asymptotics methods.**

Finish this section by reflecting on how your final report might play out:

**What do you expect to find?
What impact could such findings have?
What future questions could this lead to?**

**References**

**At least two citations of literature relevant to the project**. The citation format is your choice – just be consistent. Make sure to cite the source of your data as well.

assigned to review a different group’s proposal. This allows your group to collectively see a larger variety of proposals.)

submit that to canvas. There is no page limit. The teaching team will deliver the feedback to your reviewee.)

communicated effectively. When possible, provide suggestions for improvement. If everything looks good to you, say why it looks good.) comment: <> (- What part of the proposal is the most effective, and why?) comment: <> (- What part of the proposal is the least effective, and why? Provide a suggestion for improvement.) comment: <> (- Provide feedback on English, spelling, and grammar, if applicable.)

the composition of your submission, reasoning (70%) evaluates your feedback, and writing (20%) evaluates your English, spelling, and grammar.)

**Jason: here is an useful website!!!**

**https://www.citationmachine.net/**


In [35]:
# Run this cell before continuing.
library(cowplot)
library(datateachr)
library(digest)
library(infer)
library(repr)
library(taxyvr)
library(tidyverse)
library(broom)
library(digest)
library(testthat)
library(dplyr)


In [36]:
mxmh_survey_results <- read_csv("mxmh_survey_results.csv")
head(mxmh_survey_results)

[1mRows: [22m[34m736[39m [1mColumns: [22m[34m33[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (26): Timestamp, Primary streaming service, While working, Instrumentali...
[32mdbl[39m  (7): Age, Hours per day, BPM, Anxiety, Depression, Insomnia, OCD

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


Timestamp,Age,Primary streaming service,Hours per day,While working,Instrumentalist,Composer,Fav genre,Exploratory,Foreign languages,⋯,Frequency [R&B],Frequency [Rap],Frequency [Rock],Frequency [Video game music],Anxiety,Depression,Insomnia,OCD,Music effects,Permissions
<chr>,<dbl>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,⋯,<chr>,<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<chr>
8/27/2022 19:29:02,18,Spotify,3.0,Yes,Yes,Yes,Latin,Yes,Yes,⋯,Sometimes,Very frequently,Never,Sometimes,3,0,1,0,,I understand.
8/27/2022 19:57:31,63,Pandora,1.5,Yes,No,No,Rock,Yes,No,⋯,Sometimes,Rarely,Very frequently,Rarely,7,2,2,1,,I understand.
8/27/2022 21:28:18,18,Spotify,4.0,No,No,No,Video game music,No,Yes,⋯,Never,Rarely,Rarely,Very frequently,7,7,10,2,No effect,I understand.
8/27/2022 21:40:40,61,YouTube Music,2.5,Yes,No,Yes,Jazz,Yes,Yes,⋯,Sometimes,Never,Never,Never,9,7,3,3,Improve,I understand.
8/27/2022 21:54:47,18,Spotify,4.0,Yes,No,No,R&B,Yes,No,⋯,Very frequently,Very frequently,Never,Rarely,7,2,5,9,Improve,I understand.
8/27/2022 21:56:50,18,Spotify,5.0,Yes,Yes,Yes,Jazz,Yes,Yes,⋯,Very frequently,Very frequently,Very frequently,Never,8,8,7,7,Improve,I understand.


In [40]:
all_of_fav_genre <- mxmh_survey_results %>%
                    rename(Fav_Genre="Fav genre") %>%
                    group_by(Fav_Genre) %>%
                    summarise(n())
                   
all_of_fav_genre

Fav_Genre,n()
<chr>,<int>
Classical,53
Country,25
EDM,37
Folk,30
Gospel,6
Hip hop,35
Jazz,20
K pop,26
Latin,3
Lofi,10


In [5]:
mxmh_survey_results_filtered <- mxmh_survey_results %>%
                                select('Fav genre', Anxiety, Depression, Insomnia)
head(mxmh_survey_results_filtered)


Fav genre,Anxiety,Depression,Insomnia
<chr>,<dbl>,<dbl>,<dbl>
Latin,3,0,1
Rock,7,2,2
Video game music,7,7,10
Jazz,9,7,3
R&B,7,2,5
Jazz,8,8,7
