Skip to content

Commit

Permalink
moving postgres line into database-tech.md
Browse files Browse the repository at this point in the history
  • Loading branch information
mminar committed May 22, 2014
2 parents 58d401f + 3c6f549 commit 2110920
Show file tree
Hide file tree
Showing 10 changed files with 166 additions and 7 deletions.
7 changes: 5 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,8 @@ Classic academic conduits aren't providing Data Scientists -- this talent gap wi
Start here.
* **Intro to Data Science** [UW / Coursera](https://www.coursera.org/course/datasci)
* *Topics:* Python NLP on Twitter API, Distributed Computing Paradigm, MapReduce/Hadoop & Pig Script, SQL/NoSQL, Relational Algebra, Experiment design, Statistics, Graphs, Amazon EC2, Visualization.
* **Haravard CS 109 Data Science** [Video Archive](http://cm.dce.harvard.edu/2014/01/14328/publicationListing.shtml) [Class Webpage](http://cs109.org)
* *Topics:* Data wrangling, data management, exploratory data analysis to generate hypotheses and intuition, prediction based on statistical methods such as regression and classification, communication of results through visualization, stories, and summaries.

* Data Science with Open Source Tools [Book](http://it-ebooks.info/book/624/)
* *Topics:* Visualizing Data, Estimation, Models from Scaling Arguments, Arguments from Probability Models, What you Really Need to Know about Classical Statistics, Data Mining, Clustering, PCA, Map/Reduce, Predictive Analytics
Expand All @@ -54,6 +56,7 @@ Start here.
* Linear Programming (Math 407) [University of Washington / Course](http://www.math.washington.edu/~burke/crs/407/lectures/)

* **Statistics**
* Statistics One [Princeton / Coursera](https://www.coursera.org/course/stats1)
* Stats in a Nutshell [Book ```$29```](http://amzn.to/1iMnx2X)
* Think Stats: Probability and Statistics for Programmers [Digital](http://greenteapress.com/thinkstats/) & [Book ```$25```](http://amzn.to/RcVnTf)
* Think Bayes [Allen Downey / Book](http://www.greenteapress.com/thinkbayes/)
Expand Down Expand Up @@ -129,7 +132,7 @@ _OSDSM Specialization: [Data Journalism](https://github.com/datasciencemasters/g
* Tools for Data Mining & Analysis [scikit-learn](http://scikit-learn.org/stable/)
* Network Modeling & Viz [networkx](http://networkx.github.io/)
* Natural Language Toolkit [NLTK](http://nltk.org/)
* Database querying libraries [MySQLdb](http://mysql-python.sourceforge.net/MySQLdb.html) [PostgreSQL](https://pypi.python.org/pypi/psycopg2) [AWS](https://boto.readthedocs.org/en/latest/)
* Database querying libraries [MySQLdb](http://mysql-python.sourceforge.net/MySQLdb.html) [AWS](https://boto.readthedocs.org/en/latest/)

### R resources are now [here](https://github.com/datasciencemasters/go/blob/master/r-resources.md)

Expand Down Expand Up @@ -179,6 +182,6 @@ Please Share and Contribute Your Ideas -- **it's Open Source!**

Here's [my transcript](https://github.com/datasciencemasters/go/wiki/%5BTranscript%5D-Clare-Corthell).

Please **showcase your own specialization & transcript** by submitting a markdown file pull request with your name! eg ```clare-corthell-transcript.md```
Please **showcase your own specialization & transcript** by submitting a markdown file pull request in the ```/transcripts``` directory with your name! eg ```clare-corthell-2014.md```

[Follow me on Twitter @clarecorthell](http://twitter.com/clarecorthell)
10 changes: 10 additions & 0 deletions analysis-technologies.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#### **Weka (Java Framework)**

* [Weka (MOOC)](http://www.cs.waikato.ac.nz/ml/weka/mooc/dataminingwithweka/) for Data Mining

#### **Lua** (Libraries)
* [Torch7](http://torch.ch/) scientific computing framework with wide support for machine learning algorithms

#### **R** [here](r-resources.md)

NB: The core curriculum centers on python-based techniques and technologies
3 changes: 3 additions & 0 deletions basic-programming.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,9 @@ _[I'm adding this section due to the great materials centering on applied method

* [Codecademy](http://www.codecademy.com/)

#### **Startups and Programming**
* Startup Engineering [Stanford / Coursera](https://class.coursera.org/startup-001) _NB: This is a full-stack class; explains development from conception to deployment. Great granualar, stepwise course explaining how to built an application from scratch._

#### **GIT** (Source control)

* Git tutorial [Tutorial](http://gitimmersion.com/lab_01.html)
Expand Down
10 changes: 10 additions & 0 deletions blogs-n-media.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
### Aggregate Sources

* [DataTau](http://www.datatau.com/) - [Hacker News](https://news.ycombinator.com/) for data scientists

### Blogs

* [Data Science Weekly](http://www.datascienceweekly.org/blog)
* [FastML](http://fastml.com/)
* [Shape of Data](http://shapeofdata.wordpress.com/)
* [yhat](http://blog.yhathq.com/)
8 changes: 8 additions & 0 deletions database-tech.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
### Database Technologies & Management

#### MongoDB

* Data Wrangling with Mongo DB [Udacity Course](https://www.udacity.com/course/ud032)

#### PostgreSQL
* [PostgreSQL](https://pypi.python.org/pypi/psycopg2)
10 changes: 10 additions & 0 deletions datasets.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,16 @@
#### Machine Learning

* [UCI Machine Learning Dataset Repository](https://archive.ics.uci.edu/ml/datasets.html)
* [Machine Learning Dataset Repository](http://mldata.org/)

#### Deep Learning

* [Deep Learning Datasets](http://deeplearning.net/datasets/) for benchmarking deep learning algorithms

#### Clean Sample Data (for Learning New Techniques)

* [Scikit-learn sample datasets](http://scikit-learn.org/stable/datasets/index.html)
* [Statsmodels datasets](http://statsmodels.sourceforge.net/devel/datasets/index.html)

#### Raw Dataz

Expand Down
5 changes: 0 additions & 5 deletions nosql-tech.md

This file was deleted.

8 changes: 8 additions & 0 deletions r-resources.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,3 +44,11 @@ _[Note: The core of The Open Source Data Science Masters focuses on programmatic
* Kernel Method [kernlab](http://cran.r-project.org/web/packages/kernlab/index.html)
* Chinese Language Processing [Rwordseg](http://jliblog.com/app/rwordseg)
* Chinese Weibo Analysis [Rweibo](http://jliblog.com/app/rweibo)

#### R Datasets

* [Rdatasets](http://vincentarelbundock.github.io/Rdatasets/)

#### R Blogs & Media

* [R-bloggers](http://www.r-bloggers.com/) R news and tutorials contributed by (452) R bloggers
8 changes: 8 additions & 0 deletions specializations.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,15 @@ _[Note: I'm adding this section due to the overwhelming amount of input from new
#### Machine Learning

* Neural Networks for Machine Learning [U Toronto / Coursera](https://www.coursera.org/course/neuralnets)
* [Building Machine Learning Systems with Python](http://www.packtpub.com/building-machine-learning-systems-with-python/book) [source code](https://github.com/luispedro/BuildingMachineLearningSystemsWithPython)

#### Deep Learning

[Wikipedia Definition](http://en.wikipedia.org/wiki/Deep_learning)

* Deep Learning [Tutorials](http://deeplearning.net/tutorial/)
* Deep Learning Course [Stanford / OpenClassroom](http://openclassroom.stanford.edu/MainFolder/CoursePage.php?course=DeepLearning)

#### Web Scraping & Crawling

* Introduction to WebAPIs including Twitter, Youtube, BitLy, Sunlight Foundation [CodeAcademy](http://www.codecademy.com/tracks/apis)
Expand All @@ -18,6 +22,10 @@ _[Note: I'm adding this section due to the overwhelming amount of input from new
* Web scraping [NewCoder / Tutorial](http://newcoder.io/scrape/)
* Working with Web APIs [NewCoder / Tutorial](http://newcoder.io/api/)

#### Visualization

* [D3.js Tutorial](https://www.dashingd3js.com/table-of-contents)

#### Social Network Analysis

#### Data Journalism
Expand Down
104 changes: 104 additions & 0 deletions transcripts/clare-corthell-2013.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
### The Open-Source Masters

I couldn't wait to go back to grad school. Literally. So I designed my own grad school and spent 5 months learning & hacking in great delight!

### My Background ([linkedin](http://bit.ly/clarecorthell))

I'm a Stanford-educated Engineer, previously a Front-End Developer and UX Designer on early-stage products. I'm always in hot pursuit of deeper insight to social questions!

### Goals & Motivations of the Open Source M.S.

Data Science is an ideal marriage for my technical capacities, social research inquisitions, and my geekish-freakish love of statistics.

### Next Steps?

I'm now a Data Scientist with an incredible team at [Mattermark](http://www.mattermark.com)!

***

## The Data Science Curriculum / April-August 2013

* **Intro to Data Science** [UW / Coursera](https://www.coursera.org/course/datasci)
* *Topics:* Python NLP on Twitter API, Distributed Computing Paradigm, MapReduce/Hadoop & Pig Script, SQL/NoSQL, Relational Algebra, Experiment design, Statistics, Graphs, Amazon EC2, Visualization.

### Math
* Linear Algebra / Levandosky [Stanford / Book](http://www.amazon.com/Linear-Algebra-Steven-Levandosky/dp/0536667470/ref=sr_1_1?ie=UTF8&qid=1376546498&sr=8-1&keywords=linear+algebra+levandosky#)
* Statistics [Stats in a Nutshell / Book](http://shop.oreilly.com/product/9780596510497.do)
* Problem-Solving Heuristics "How To Solve It" [Polya / Book](http://en.wikipedia.org/wiki/How_to_Solve_It)

### Computing
* **Algorithms**
* Algorithms Design & Analysis I [Stanford / Coursera](https://www.coursera.org/course/algo)
* Algorithm Design [Kleinberg & Tardos / Book](http://www.amazon.com/Algorithm-Design-Jon-Kleinberg/dp/0321295358/ref=sr_1_1?ie=UTF8&qid=1376702127&sr=8-1&keywords=kleinberg+algorithms)

* **Databases**
* Introduction to Databases [Stanford / Coursera](https://www.coursera.org/course/db)

* **Data Mining**
* Mining Massive Data Sets [Stanford / Book](http://i.stanford.edu/~ullman/mmds.html)
* Mining The Social Web [O'Reilly / Book](http://shop.oreilly.com/product/0636920010203.do)
* Introduction to Information Retrieval [Stanford / Book](http://nlp.stanford.edu/IR-book/information-retrieval-book.html)

* **Machine Learning**
* Machine Learning / Ng [Stanford / Coursera](https://www.coursera.org/course/ml)
* Programming Collective Intelligence [O'Reilly / Book](http://shop.oreilly.com/product/9780596529321.do)
* Statistics [The Elements of Statistical Learning / Book](http://www-stat.stanford.edu/~tibs/ElemStatLearn/) ** *en process*

* **Probabilistic Graphical Models**
* Probabilistic Programming and Bayesian Methods for Hackers [Github / Tutorials] (https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers)
* PGMs / Koller [Stanford / Coursera](https://www.coursera.org/course/pgm) ** *en process*

* **Natural Language Processing**
* NLP with Python [O'Reilly / Book](http://shop.oreilly.com/product/9780596516499.do)

* **Analysis**
* Python for Data Analysis [O'Reilly / Book](http://www.kqzyfj.com/click-7040302-11260198?url=http%3A%2F%2Fshop.oreilly.com%2Fproduct%2F0636920023784.do&cjsku=0636920023784)
* Big Data Analysis with Twitter [UC Berkeley / Lectures](http://blogs.ischool.berkeley.edu/i290-abdt-s12/)
* Social and Economic Networks: Models and Analysis / [Stanford / Coursera](https://www.coursera.org/course/networksonline)
* Information Visualization ["Envisioning Information" Tufte / Book](http://www.amazon.com/Envisioning-Information-Edward-R-Tufte/dp/0961392118/ref=sr_1_8?ie=UTF8&qid=1376709039&sr=8-8&keywords=information+design)

* **Python** (Learning)
* New To Python: [Learn Python the Hard Way](http://learnpythonthehardway.org/), [Google's Python Class](code.google.com/edu/languages/google-python-class/)

* **Python** (Libraries)
* Basic Packages [Python, virtualenv, NumPy, SciPy, matplotlib and IPython ](http://www.lowindata.com/2013/installing-scientific-python-on-mac-os-x/)
* Bayesian Inference | [pymc](https://github.com/pymc-devs/pymc)
* Labeled data structures objects, statistical functions, etc [pandas](https://github.com/pydata/pandas) (See: Python for Data Analysis)
* Python wrapper for the Twitter API [twython](https://github.com/ryanmcgrath/twython)
* Tools for Data Mining & Analysis [scikit-learn](http://scikit-learn.org/stable/)
* Network Modeling & Viz [networkx](http://networkx.github.io/)
* Natural Language Toolkit [NLTK](http://nltk.org/)

### Projects
* Coursework
* Sentiment analysis, trending topics, and friendship mapping with Twitter API
* Joins and Matrix Manipulation in MapReduce (AWS EC2)
* In-database Text analysis (SQL)
* Sentiment analysis of movie tweets (Python)


***
### A Note on Tools

This degree is brought to you by: "THE INTERNET".

Information is more democratized^ now than it was at any point in history. Given a little initiative and interest, you can tailor and excel in an education of your own design. The connective web made me what I am today, growing from the child obsessed with [Number Munchers](http://en.wikipedia.org/wiki/Munchers#Number_Munchers) to an adult jaw-dropping over [DBSCAN](http://en.wikipedia.org/wiki/DBSCAN).

The most valuable resources I used were:
* [Coursera](http://coursera.org)
* [Khan Academy](https://www.khanacademy.org/math/probability/random-variables-topic/random_variables_prob_dist/v/term-life-insurance-and-death-probability)
* [Wolfram Alpha](http://www.wolframalpha.com/input/?i=torus)
* [Wikipedia](http://en.wikipedia.org/wiki/List_of_cognitive_biases)
* [Quora](http://www.quora.com/Programming-Challenges-1/What-are-some-good-toy-problems-in-data-science)
* **Kindle .mobis** (carrying textbooks is so 90s.)
* PopSci Read: [The Signal and The Noise](http://www.amazon.com/Signal-Noise-Predictions-Fail-but-ebook/dp/B007V65R54/ref=tmm_kin_swatch_0?_encoding=UTF8&sr=8-1&qid=1376699450) Nate Silver
* **Friends & Family** (Impossible without their support! Special Thanks to N.S.)

*^ given internet access - an issue near and dear to me.*

***


### I "Forked" this into the [Open Source Data Science Masters](http://datasciencemasters.org) Curriculum.

[Follow me on Twitter @clarecorthell](http://twitter.com/clarecorthell)

0 comments on commit 2110920

Please sign in to comment.