/
data_project_final.Rmd
121 lines (74 loc) · 3.58 KB
/
data_project_final.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
---
title: "Final Data Project"
author: "DataTrail Team"
output: html_notebook
---
# Final Data Project
## Your objectives!
To complete this project there are a few requirements you will need to fulfill. Remember that you are not on your own for this project! Data science is done best as a community, so please ask others (and instructors) questions you have when you get stuck!
1. Clearly state the data science question and goal for the analysis you are embarking on.
2. This project should be completely uploaded and up to date on GitHub. Follow the steps in `Pushing and Pulling Changes` chapter for how to git add, commit, and push the changes you have done.
3. Follow good organization principles -- you should at least have 2 folders: a `results` folder and a `data` folder. 4. 4. You should also have a README
5. Make a resulting plot that you save to a file.
6. Write up your final observations in regards to your original question. Note that some data science projects end with "This isn't what I thought it would be" or "that's strange" or "I think this is leading into another question I would need to investigate". Whatever your observations may be, write them up in your main R Markdown.
7. When you feel your analysis is ready for review, send your instructor the GitHub link to your project so they can review it.
8. Pat yourself on the back for all this work! You are a data scientist!
## Data Sources
For this project you will use whatever data you choose.
Refer back to our [Finding Data chapter](https://datatrail-jhu.github.io/DataTrail/finding-data.html) for more info on finding data.
Some options for places to find data are:
- [Kaggle](https://datatrail-jhu.github.io/DataTrail/finding-data.html#kaggle)
- [FiveThirtyEight](https://datatrail-jhu.github.io/DataTrail/finding-data.html#fivethirtyeight-data)
- You can see datasets you already have in R by running this command: `ls("package:datasets")`.
You are not limited to these options for finding your data.
<Write where you got your data and provide the link if applicable.>
<Describe how the data was originally created. If this is data that is part of `datasets` you can use the `?` like so: ` ?datasets::AirPassengers` to see information about the datasets.Otherwise provide a summary based on the source of the data.>
## The goal of this analysis
<Write here what the goal of this analysis is. What question are we trying to answer?>
## Set up
Load packages you will need for this analysis.
```{r}
## you can add more, or change...these are suggestions
library(tidyverse)
library(readr)
library(dplyr)
library(ggplot2)
library(tidyr)
```
## Set up directories
Set up the directories you will need.
```{r}
if (!dir.exists("data")) {
dir.create("data")
}
if (!dir.exists("results")) {
dir.create("results")
}
```
## Get the data
<Write here a bit about this dataset and put a link to where it is from>
```{r}
# Read in your dataset(s)
```
Explore your data here
```{r}
```
## Cleaning the data
```{r}
```
## Plot the data!
```{r}
ggplot(aes(VARIABLE)) +
geom_point()
```
## Get the stats
### Conclusion
Write up your thoughts about this data science project here and answer the following questions:
- What did you find out in regards to your original question?
- What exceptions or caveats do you have in regards to your analysis you did?
- What follow up questions do you have?
## Print out session info
Session info is a good thing to print out at the end of your notebooks so that you (and other folks) referencing your notebooks know what software versions and libraries you used to run the notebook.
```{r}
sessionInfo()
```