Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dot plot, stem-leaf plot, box-whiskers plot, outliers by CrashCourse Statistics #9

Open
EmbraceLife opened this issue Jul 30, 2018 · 0 comments
Labels
Basic basic statistics and probability Crash Statistics statistics by crashcourse on youtube

Comments

@EmbraceLife
Copy link
Owner

Plots, Outliers, and Justin Timberlake - Data visualization part 2

key words

dot plots, stem-leaf plot, box-whiskers plot, outliers, cumulative frequency plot

Video links

Bilibili

youtube

Key Questions

what other ways of representing data

how other visualizations differentiate themselves in different angles and uses (brilliant and intuitive effort!)

Interesting points

dot plots

replace histogram's bars with dots, a dot stands for an individual data point)

frequency = height of a bar = number of dots in the bar's position

see visualization of dot plots

image

but still missing details of individual data values

Stem-Leaf Plot

replace bar positions with stems 0s, 10s, 20s... 80s

replace dots with numbers, ranging from 0 lower to 9 higher

both overall frequency and individual data values are portrayed on the graph

image

Box-Whisckers plot

use median and spread based on median IQR to represent all the dataset

central tendency

  • median as middle point
  • Q1 = first half's median
  • Q3 = second half's median
  • IQR = Q3 - Q1 = middle half

lower fence

  • from middle median extend 1.5 IQR to the left

upper fence

  • from middle median extend 1.5 IQR to the right

minimum

  • the actual smallest data point inside left fence

maximum

  • the actual largest data point within upper fence

outliers

  • the actual data points smaller than lower fence
  • the actual data points larger than upper fence

image

how densely distributed

  • from Q1 to median, same amount of data points widely spreaded
  • from median to Q3, same amount of data points narrowly spreaded compared with left

outliers are much less frequent, but do occur

image

when to or not throw away an outlier

keep the interested

  • a few very high rents in the neighborhood you are interested in

throw away the non-interested

  • a football star sneak into your local neighborhood team
  • or a typo (making 5 into 500) in the dataset when collected

if not sure, use pre-existing rules to decide

  • data points, beyond lower and upper fence, are outliers, should be removed

When not to use box-whiskers plot

image

this one does not tell you anything useful or meaningful

cumulative frequency plot

imagine it yourself :)

Be critical on the charts

stats visualizations are everywhere

visualizations are only as good as the data behind

many of them could be mistaken or misleading

always ask questions about them

@EmbraceLife EmbraceLife added Basic basic statistics and probability Crash Statistics statistics by crashcourse on youtube labels Jul 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Basic basic statistics and probability Crash Statistics statistics by crashcourse on youtube
Projects
None yet
Development

No branches or pull requests

1 participant