- Data Exploration (การสำรวจข้อมูล)
- Data Transformation (การแปลงข้อมูล)
- Data Analysis with Descriptive Statistics (การวิเคราะห์ข้อมูลโดยใช้สถิติเชิงพรรณนา)
- Create folder named
WS01_studentid
with consist of 2 files (Sample Output)- R file named
WS01_63130500xxx.R
for your coding (See sample file) - Markdown file named
WS01_63130500xxx.md
for writing description steps with your code and result (Copy template files to your files)
- R file named
- Read information about SAT Scores dataset.
- Explore the data and finding descriptive Statistics. (you can add more question from given question)
- Push your folder
WS01_studentid
into your GIT group repository in folder assignment
(This dataset is references dataset from NYC Open Data)
The most recent school level results for New York City on the SAT. Results are available at the school level for the graduating seniors of 2012. Records contain 2012 College-bound seniors mean SAT scores taken during SY 2012. In this dataset, some school didn't send avg scores and some has record wrong data.
About SAT
The SAT is a standardized test widely used for college admissions in the United States. This test divided into 3 parts. There are Critical Reading, Math, and Writing. (Ref: Wikipedia) Section scores are reported on a scale of 200 to 800. Total SAT score is in range of 600 to 2400. (ข้อสอบแบ่งออกเป็นสามส่วนโดยคะแนนแต่ละ ส่วนมีคะแนนเต็มอยู่ที่ 800 คะแนน และมีคะแนนต่ำสุดของแต่ละส่วนอยู่ที่ 200 คะแนนแม้ไม่ได้ทำก็ตาม โดยเมื่อรวมคะแนนทั้ง 3 ส่วนแล้วจะอยู่ที่ช่วง 600 ถึง 2400)
- How many observation of this dataset (before cleaning) ?
- Are there duplicate data sets ? (If have duplicate data, list the data that duplicate)
- How many distinct school in this dataset ? (Know after drop duplicate data)
- What is min, max, average, quartile of each part in SAT ?
- What is min, max, average, quartile of total score in SAT ?
- Which school is get highest SAT score ?
- Loading dataset from SAT_original.csv and assign variable named
sat_score
- Observe the data and answer the question:
- How many observation before cleaning ?
- List variable name
- Changing the types of values
- List duplicate data and count number of duplicate data.
- Removing duplicate data and reassign value in
sat_score
variable - Check range value of each score by using
filter
- Handling out of range with NA value
- Calculate total score and assign in
sum_score
column - Finding descriptive statistics
Try to save your clean data into CSV file
write_csv(sat_score,file = "SAT_clean.csv")
R Markdown is a file format for making dynamic documents with R. An R Markdown document is written in markdown (an easy-to-write plain text format) and contains chunks of embedded R code, like the document below. (Read more:rmarkdown)
- See Sample Rmd Files
- See Sample HTML output
Try with yourself
- Create new file
R Markdown
namedWS01_studentid.Rmd
- Copy code from Sample Rmd Files to your file
- Click button
Knit
to save in HTML (In same tabs of Run button) - Try to instead content with WS01_63130500xxx.md
- Click button
Knit
to save in HTML files again - Open recented html file with browser
You will see that Rmd file can display output of your code. In this way, it not necessary to copy output into md file. You can use Rmd file instead to show output.
Study more: Datacamp
In github, it cannot show output of Rmd file or HTML file. You can using GitHub Pages for hosting directly from your GitHub repository. Just edit, push, and your changes are live.
- Go to setting of repository > Pages
- Choose Source as 'Main' and then save (Setting Finished !)
- Push your file to your repository
- Link your HTML file to README.md