06 Workshop I: Data Analysis with Descriptive Statistics

Let's Exploratory Data Analysis (EDA)

Data Exploration (การสำรวจข้อมูล)
Data Transformation (การแปลงข้อมูล)
Data Analysis with Descriptive Statistics (การวิเคราะห์ข้อมูลโดยใช้สถิติเชิงพรรณนา)

Instruction

Create folder named WS01_studentid with consist of 2 files (Sample Output)
- R file named WS01_63130500xxx.R for your coding (See sample file)
- Markdown file named WS01_63130500xxx.md for writing description steps with your code and result (Copy template files to your files)
Read information about SAT Scores dataset.
Explore the data and finding descriptive Statistics. (you can add more question from given question)
Push your folder WS01_studentid into your GIT group repository in folder assignment

SAT Scores dataset

(This dataset is references dataset from NYC Open Data)

The most recent school level results for New York City on the SAT. Results are available at the school level for the graduating seniors of 2012. Records contain 2012 College-bound seniors mean SAT scores taken during SY 2012. In this dataset, some school didn't send avg scores and some has record wrong data.

About SAT

The SAT is a standardized test widely used for college admissions in the United States. This test divided into 3 parts. There are Critical Reading, Math, and Writing. (Ref: Wikipedia) Section scores are reported on a scale of 200 to 800. Total SAT score is in range of 600 to 2400. (ข้อสอบแบ่งออกเป็นสามส่วนโดยคะแนนแต่ละ ส่วนมีคะแนนเต็มอยู่ที่ 800 คะแนน และมีคะแนนต่ำสุดของแต่ละส่วนอยู่ที่ 200 คะแนนแม้ไม่ได้ทำก็ตาม โดยเมื่อรวมคะแนนทั้ง 3 ส่วนแล้วจะอยู่ที่ช่วง 600 ถึง 2400)

Define a question

How many observation of this dataset (before cleaning) ?
Are there duplicate data sets ? (If have duplicate data, list the data that duplicate)
How many distinct school in this dataset ? (Know after drop duplicate data)
What is min, max, average, quartile of each part in SAT ?
What is min, max, average, quartile of total score in SAT ?
Which school is get highest SAT score ?

Guideline Steps to do:

Loading dataset from SAT_original.csv and assign variable named sat_score
Observe the data and answer the question:
- How many observation before cleaning ?
- List variable name
Changing the types of values
List duplicate data and count number of duplicate data.
Removing duplicate data and reassign value in sat_score variable
Check range value of each score by using filter
Handling out of range with NA value
Calculate total score and assign in sum_score column
Finding descriptive statistics

Extra Knowledge (Optional)

1. Write CSV file

Try to save your clean data into CSV file

write_csv(sat_score,file = "SAT_clean.csv")

2. R Markdown files

R Markdown is a file format for making dynamic documents with R. An R Markdown document is written in markdown (an easy-to-write plain text format) and contains chunks of embedded R code, like the document below. (Read more:rmarkdown)

See Sample Rmd Files
See Sample HTML output

Try with yourself

Create new file R Markdown named WS01_studentid.Rmd
Copy code from Sample Rmd Files to your file
Click button Knit to save in HTML (In same tabs of Run button)
Try to instead content with WS01_63130500xxx.md
Click button Knit to save in HTML files again
Open recented html file with browser

You will see that Rmd file can display output of your code. In this way, it not necessary to copy output into md file. You can use Rmd file instead to show output.

Study more: Datacamp

3. GitHub Pages

In github, it cannot show output of Rmd file or HTML file. You can using GitHub Pages for hosting directly from your GitHub repository. Just edit, push, and your changes are live.

Go to setting of repository > Pages
Choose Source as 'Main' and then save (Setting Finished !)
Push your file to your repository
Link your HTML file to README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

06-Workshop1.md

06-Workshop1.md

06 Workshop I: Data Analysis with Descriptive Statistics

Let's Exploratory Data Analysis (EDA)

Instruction

SAT Scores dataset

Define a question

Guideline Steps to do:

Extra Knowledge (Optional)

1. Write CSV file

2. R Markdown files

3. GitHub Pages

Files

06-Workshop1.md

Latest commit

History

06-Workshop1.md

File metadata and controls

06 Workshop I: Data Analysis with Descriptive Statistics

Let's Exploratory Data Analysis (EDA)

Instruction

SAT Scores dataset

Define a question

Guideline Steps to do:

Extra Knowledge (Optional)

1. Write CSV file

2. R Markdown files

3. GitHub Pages