Introduction to advanced data science

What’s in this repo?

Repository for DSC 205 - Introduction to Advanced Data Science
Note: Emacs Org-mode files are rendered as Markdown
For a TOC, open the bullet point list menu
If you need native Markdown, create an *.md file yourself

Agenda for every session

Syllabus for a course overview

How to use GitHub

Register and get added as a collaborator to this repo
Complete the GitHub starter course
Fork this repo to your own GitHub account
File access: on desktop/laptop, open .org files
Mobile file access on the mobile GitHub app (you need .md files, which you might have to create yourself)
Check regularly (or setup notification) for changes
Commit changes to your fork

History

First offered @Lyon: Spring 2022 (Undergrad). To be offered: every spring.

Credo

“Getting it right is crucial when people’s lives are affected.” -Jonathan Steinhart

“It is clear from the rising interest in statistics over calculus that ‘data wrangling’ is a growing field. But what happens when people just feed giant piles of data into fancy programs that those same people don’t intimately understand? One possibility is that they generate interesting-looking but meaningless or incorrect results. For example, a recent study […] showed that one-fifth of published genetics papers have errors due to improper spreadsheet usage. Getting it right is crucial when people’s lives are affected.”[fn:1] (Steinhart, 2019).

References

How Many Scientists Fabricate and Falsify Research? A Systematic Review and Meta-Analysis of Survey Data Daniele Fanelli PLoS One. 2009; 4(5): e5738. Published online 2009 May 29. doi:10.1371/journal.pone.0005738

Steinhart (2019). The Secret Life of Programs. NoStarch Press. URL: nostarch.com.

“Gene Name Errors Are Widespread in the Scientific Literature” by Mark Ziemann, Yotam Eren, and Assam El-Osta, Genome Biol 17, 177 (2016). doi:10.1186/s13059-016-1044-7

Footnotes

[fn:1]A timely comment! The problem could be systemic:

“The image of scientists as objective seekers of truth is periodically jeopardized by the discovery of a major scientific fraud. […] Scientific results can be distorted in several ways, which can often be very subtle and/or elude researchers’ conscious control. Data, for example, can be “cooked” (a process which mathematician Charles Babbage in 1830 defined as “an art of various forms, the object of which is to give to ordinary observations the appearance and character of those of the highest degree of accuracy”); it can be “mined” to find a statistically significant relationship that is then presented as the original target of the study; it can be selectively published only when it supports one’s expectations; it can conceal conflicts of interest, etc.” (Fanelli, 2009).

Name		Name	Last commit message	Last commit date
Latest commit History 184 Commits
data		data
img		img
ipynb		ipynb
org		org
pdf		pdf
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.org		README.org
empty		empty

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

img

img

ipynb

ipynb

org

org

pdf

pdf

src

src

.gitignore

.gitignore

LICENSE

LICENSE

README.org

README.org

empty

empty

Repository files navigation

Introduction to advanced data science

What’s in this repo?

Agenda for every session

Syllabus for a course overview

How to use GitHub

History

Credo

References

Footnotes

About

Releases

Packages

Languages

License

birkenkrahe/ds205

Folders and files

Latest commit

History

Repository files navigation

Introduction to advanced data science

What’s in this repo?

Agenda for every session

Syllabus for a course overview

How to use GitHub

History

Credo

References

Footnotes

About

Resources

License

Stars

Watchers

Forks

Languages