Skip to content
This repository has been archived by the owner on Feb 13, 2020. It is now read-only.

Interesting Datasets #39

Closed
dchiu911 opened this issue Oct 8, 2014 · 14 comments
Closed

Interesting Datasets #39

dchiu911 opened this issue Oct 8, 2014 · 14 comments

Comments

@dchiu911
Copy link
Member

dchiu911 commented Oct 8, 2014

Maybe we can share data sets here that interest us, and see how much they interest others as well! Also it would be good to comment on whether the data size is too big or too small.

I'm kind of a basketball fanatic so I'd love to analyze certain statistics. Of course, this is only one man's opinion. The 2013-2014 NBA season totals of some basic stats can be seen here as an example.

@daattali
Copy link
Member

daattali commented Oct 8, 2014

That's a great dataset!

One that I found interesting when I took this course was the Global Terrorism Database, which has tons of information about over 100,000 terrorism acts worldwide since 1970.

I hope more students share cool datasets they find, good idea starting this issue

@jkbooth
Copy link

jkbooth commented Oct 8, 2014

Here is some excellent (and already reasonably tidy) NHL player stats data. There is some fun to be had combining the data for each player, which is currently in several excel workbooks.

@andresesanch
Copy link

I think a good data set to look into is the airline on-time performance data available by years here. Considering that, for example, the data for 2007 is over 100MB, It would help illustrate the problems that arise when exploring larger data sets.

@ChiaraDG
Copy link

An interesting dataset can be found in the MovieLens website. The dataset collects movie preference and ratings. Three dataset of different size are available (100K, 1M, 10M ratings). There will be some merging and cleaning to do since the data are saved in different documents.

@BernhardKonrad
Copy link
Member

Here is a dataset on the current Ebola outbreak.

@jennybc
Copy link
Member

jennybc commented Oct 14, 2014

Here's a massive list of datasets. I've got others like this up my sleeve that I need to produce here.

NYU Health Sciences Library Data Catalog

@jennybc
Copy link
Member

jennybc commented Oct 15, 2014

A bitty bundle of research quality datasets by Hilary Mason

https://bitly.com/bundles/hmason/1

@jennybc
Copy link
Member

jennybc commented Oct 15, 2014

100+ Interesting Data Sets for Statistics

http://rs.io/100-interesting-data-sets-for-statistics/

@jennybc
Copy link
Member

jennybc commented Oct 15, 2014

The home of the U.S. Government’s open data

https://www.data.gov

@jennybc
Copy link
Member

jennybc commented Oct 15, 2014

1001 DATASETS AND DATA REPOSITORIES ( LIST OF LISTS OF LISTS )

https://dreamtolearn.com/doc/2HDNJH3XJU6CVGKZ7SDM4MCSW

@jennybc
Copy link
Member

jennybc commented Oct 15, 2014

this blog announces obsessively-detailed instructions to analyze publicly-available survey data with free tools - the r language, the survey package, and (for big data) sqlsurvey + monetdb.

http://www.asdfree.com/p/about-faq.html

@jennybc
Copy link
Member

jennybc commented Oct 15, 2014

See the "data repositories" section of the visualization design resources curated by the InfoVis Group in UBC Computer Science

@jennybc
Copy link
Member

jennybc commented Oct 19, 2014

Some public data sources gathered by the Data Incubator:

http://blog.thedataincubator.com/2014/10/data-sources-for-cool-data-science-projects-part-1/

@jennybc
Copy link
Member

jennybc commented Oct 20, 2014

You can get data on the Ebola outbreak from DataMarket:

https://blog.datamarket.com/2014/10/15/ebola-data-on-datamarket/

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants