Skip to content

Introduction to advanced data science with R, Python and SQL, Lyon College, spring 2024

License

Notifications You must be signed in to change notification settings

birkenkrahe/ds205

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction to advanced data science

./img/cover.png

What’s in this repo?

  • Repository for DSC 205 - Introduction to Advanced Data Science
  • Note: Emacs Org-mode files are rendered as Markdown
  • For a TOC, open the bullet point list menu
  • If you need native Markdown, create an *.md file yourself

Agenda for every session

Syllabus for a course overview

How to use GitHub

  • Register and get added as a collaborator to this repo
  • Complete the GitHub starter course
  • Fork this repo to your own GitHub account
  • File access: on desktop/laptop, open .org files
  • Mobile file access on the mobile GitHub app (you need .md files, which you might have to create yourself)
  • Check regularly (or setup notification) for changes
  • Commit changes to your fork

History

First offered @Lyon: Spring 2022 (Undergrad). To be offered: every spring.

Credo

“Getting it right is crucial when people’s lives are affected.” -Jonathan Steinhart

“It is clear from the rising interest in statistics over calculus that ‘data wrangling’ is a growing field. But what happens when people just feed giant piles of data into fancy programs that those same people don’t intimately understand? One possibility is that they generate interesting-looking but meaningless or incorrect results. For example, a recent study […] showed that one-fifth of published genetics papers have errors due to improper spreadsheet usage. Getting it right is crucial when people’s lives are affected.”[fn:1] (Steinhart, 2019).

References

How Many Scientists Fabricate and Falsify Research? A Systematic Review and Meta-Analysis of Survey Data Daniele Fanelli PLoS One. 2009; 4(5): e5738. Published online 2009 May 29. doi:10.1371/journal.pone.0005738

Steinhart (2019). The Secret Life of Programs. NoStarch Press. URL: nostarch.com.

Gene Name Errors Are Widespread in the Scientific Literature” by Mark Ziemann, Yotam Eren, and Assam El-Osta, Genome Biol 17, 177 (2016). doi:10.1186/s13059-016-1044-7

Footnotes

[fn:1]A timely comment! The problem could be systemic:

“The image of scientists as objective seekers of truth is periodically jeopardized by the discovery of a major scientific fraud. […] Scientific results can be distorted in several ways, which can often be very subtle and/or elude researchers’ conscious control. Data, for example, can be “cooked” (a process which mathematician Charles Babbage in 1830 defined as “an art of various forms, the object of which is to give to ordinary observations the appearance and character of those of the highest degree of accuracy”); it can be “mined” to find a statistically significant relationship that is then presented as the original target of the study; it can be selectively published only when it supports one’s expectations; it can conceal conflicts of interest, etc.” (Fanelli, 2009).

About

Introduction to advanced data science with R, Python and SQL, Lyon College, spring 2024

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages