Skip to content

Conversation

@butlermh
Copy link

Adding links to Coursera and Udacity Data Science specializations, recent EdX and Coursera courses, Data Journalism and Web scraping, and some other good introductory Python resources.

I have worked with many of these resources.

…Git, introductory links to Web APIs such as Twitter and web crawling, data journalism
@clarecorthell
Copy link
Member

This is awesome. Huge thanks for contributing!

Questions (asked of all PRs)

  • Did you take these courses or use these resources yourself? To what degree did you find them useful?
  • Would you recommend each resource over any other resource on the same topic? Are any duplicative of resources already in the curriculum?

Because Data Science is such a nebulous and undefinable discipline, maintaining strict standards for the core curriculum ensures its value as a complete curricular resource. However, your PR tipped the scale and catalyzed the creation of a few other very useful documents.

Comments

  • The Intro Computational Thinking course is a fantastic discovery! I took a class in college I fondly referred to as "Exceltastic" that involves the same modeling topics using excel, and I've been looking for its essence ever since. This course ignited my interest in the field at large, and I hope it'll do the same for others. (+1)
  • Intro to Statistics and Data Analysis and Statistics have overlapping content. Which would your recommend as the better or more effective source? (yes, this is subject to opinion bias)
  • Exploratory Data Analysis does not meet the standard of quality and specificity for a Data Science. (rm)
  • I will add notes to Algorithms I and II about what is addressed in each. These are both essential (at least in my mind) and they do deserve more mindshare. (+1)
  • If you believe Algos I/II Princeton are superior to the Stanford courses, make your case. Roughgarden is the best teacher I've seen yet. I hold his courses in superior regard.
  • No one technology has conquered the NoSQL game, hence specific technologies shall be referenced in nosql-tech.md. We can safely include My/SQL/ite in the core because the understanding of classic relational databases is essential and foundational. (rm Data Wrangling with Mongo DB)
  • The Analytics Edge [MIT / Edx] has been added to r-resources.md (mv)
  • Core Concepts in Data Analysis appears to use Matlab as the technology of analysis. This core curriculum focuses on python methods of analysis in part because it is a respected standard in the software industry, and in part because it is an unpaid technology. Matlab is not accessible (read: free) and does not qualify as open source. If the course is amply agnostic to the technology of analysis and is worth taking, please protest. (rm)
  • I love the newcoder tutorials. I'll admit I'd neglected to add them! (+1)
  • Are the Machine Learning I/II/III courses more valuable than any other wrt the topics at hand?
  • The Data Journalism Handbook is interesting, but I'd argue off-topic. I've added that to specializations.md (+1/mv)

General notes:

  • I welcome challenge to any and all amendments made here.
  • Where a course or resources is exclusively paid, I'll include a "$" note. The intent is to include primarily open sources, with the exception where paid sources are exceptional or an open equivalent is lacking. I've detailed these more carefully.
  • Specializations now have a dedicated home in specializations.md
  • R resources now have a dedicated home in r-resources.md
  • Resources on programming generally and specific tools have a dedicated home in basic-programming.md
  • I will link specialization spurs in the main curriculum.
  • If you see the opportunity to add another .md that would be valuable, please do!

Thank you again! It's people like you who take the time to contribute that make this a truly valuable resource. It's new to feel such a responsibility to a living, breathing thing which is neither animal nor vegetable, but a convergence of many sharp and considerate minds. Collaboration is at the heart of (almost) all things true and good. I'd love to cordially invite you to a coffee in the Mission.

Cheers!

@butlermh
Copy link
Author

Hi Clare, sorry about the delay replying - I am travelling in Taiwan at the moment - and also that this message is in a comment, that doesn't feel quite the right place to put it. Yes I am taking many of these courses - see http://uk.linkedin.com/in/butlermh/ and please send me an invite.

The reason for my submission is several people on the first course on the Coursera Data Specialization, Data Scientist's Toolbox, were discussing this resource and using it.

This first course has been quite disappointing from my point of view, especially compared to the previous course "Data Analysis" that it replaced, but I guess there are a lot of people who want to study data science right at the beginning (for example they don't know how to use a shell, or git, what I would class as basic computer literacy) and this course is aimed at them (although it doesn't really go into these in a lot of depth, and there are much better intros if people need those type of skills).

The problem is several of the MOOC providers are starting to commercialize, which to be honest is dropping the academic standards which is a shame because many courses used to be equivalent to the courses actually offered by the University.

Now to specific questions:

If I had to pick one I would recommend Data Analysis and Statistical Inference over intro to Stats purely because the latter is offered by Princeton so you don't get a certificate. But the latter is a very good course though (apologies to Andrew Conway here who gave a wonderful course and was a masterclass in good lecturing technique)

I am just starting the Roughgarden Algorithms course so I don't know how they compare but from what I have seen the Sedgewick course is much more practical with the focus on implementation with pretty tight bounds on efficiency. I actually like doing more than one course, and hearing the information more than one way. One of my psychologist friends one told me the funny thing about memory is the more you know the easier it is to remember.

The Data Journalism came up because I actually failed the Udacity Intro to Data Science course because they said the report I produced wasn't what they wanted and they wanted something based on Data Journalism. So for some people this is important ...

I don't use Matlab myself, I work with Octave. Often the courses have people who only want to use one tool e.g. Python or R. It's great to be able to use a few. So I wouldn't use this in itself as a reason for not recommending a course (e.g. Andrew Ng's course uses Octave) like the Core Concepts in Data Analysis. It's only just starting and it's not one of my favourites .. but I note others have very different views to me .. so I guess in my submission I was trying to avoid being subjective.

Yes sure there are many NoSQL databases ... but not so many courses. However I haven't done this course so I don't know the quality ... however I have seen even seasoned pros could do with a course / book on NoSQL because it means thinking in quite a different way ...

Anyway I am afraid I have got to go ... Maybe more later .. best wishes!

@rjolicoe
Copy link

Sorry meant to type a message and actually just sent. I appreciate this
thread as I'm new to studying data science. I'm taking the coursera data
scientist toolkit and find it to be pretty light, but plan to take the
remaining courses in the specialization. I look forward to follow more
learning from the open source masters study and I appreciate the time and
energy that was put in to create it.

Thank you,

Sincerely
Ryan Jolicoeur

On Monday, May 12, 2014, Ryan Jolicoeur rjolicoeur82@gmail.com wrote:

On Sunday, May 11, 2014, Mark H. Butler <notifications@github.comjavascript:_e(%7B%7D,'cvml','notifications@github.com');>
wrote:

Hi Clare, sorry about the delay replying - I am travelling in Taiwan at
the moment - and also that this message is in a comment, that doesn't feel
quite the right place to put it. Yes I am taking many of these courses -
see http://uk.linkedin.com/in/butlermh/ and please send me an invite.

The reason for my submission is several people on the first course on the
Coursera Data Specialization, Data Scientist's Toolbox, were discussing
this resource and using it.

This first course has been quite disappointing from my point of view,
especially compared to the previous course "Data Analysis" that it
replaced, but I guess there are a lot of people who want to study data
science right at the beginning (for example they don't know how to use a
shell, or git, what I would class as basic computer literacy) and this
course is aimed at them (although it doesn't really go into these in a lot
of depth, and there are much better intros if people need those type of
skills).

The problem is several of the MOOC providers are starting to
commercialize, which to be honest is dropping the academic standards which
is a shame because many courses used to be equivalent to the courses
actually offered by the University.

Now to specific questions:

If I had to pick one I would recommend Data Analysis and Statistical
Inference over intro to Stats purely because the latter is offered by
Princeton so you don't get a certificate. But the latter is a very good
course though (apologies to Andrew Conway here who gave a wonderful course
and was a masterclass in good lecturing technique)

I am just starting the Roughgarden Algorithms course so I don't know how
they compare but from what I have seen the Sedgewick course is much more
practical with the focus on implementation with pretty tight bounds on
efficiency. I actually like doing more than one course, and hearing the
information more than one way. One of my psychologist friends one told me
the funny thing about memory is the more you know the easier it is to
remember.

The Data Journalism came up because I actually failed the Udacity Intro
to Data Science course because they said the report I produced wasn't what
they wanted and they wanted something based on Data Journalism. So for some
people this is important ...

I don't use Matlab myself, I work with Octave. Often the courses have
people who only want to use one tool e.g. Python or R. It's great to be
able to use a few. So I wouldn't use this in itself as a reason for not
recommending a course (e.g. Andrew Ng's course uses Octave) like the Core
Concepts in Data Analysis. It's only just starting and it's not one of my
favourites .. but I note others have very different views to me .. so I
guess in my submission I was trying to avoid being subjective.

Yes sure there are many NoSQL databases ... but not so many courses.
However I haven't done this course so I don't know the quality ... however
I have seen even seasoned pros could do with a course / book on NoSQL
because it means thinking in quite a different way ...

Anyway I am afraid I have got to go ... Maybe more later .. best wishes!


Reply to this email directly or view it on GitHubhttps://github.com//pull/36#issuecomment-42795781
.

@butlermh
Copy link
Author

Hi Ryan,
Yes that is my thoughts too. Jeff Leek's previous course "Data Analysis" was much better. Really what I wanted was a Data Analysis II rather than an intro to it. But hopefully the later courses will be better.
I really recommend Edx's Analytics Edge to you. Unfortunately it is just about to finish but that has been a very good course.

@rjolicoe
Copy link

Hi Mark,

Thank you for that information, I will certainly have to check that out
when it becomes available again. I'm new to the field where my programming
knowledge is very rudimentary but I have a strong statistics and
probability background. I'm currently self studying python and hope to
utilize some data sets for analysis once I get a little more comfortable
with the programming language.

On Monday, May 12, 2014, Mark H. Butler notifications@github.com wrote:

Hi Ryan,
Yes that is my thoughts too. Jeff Leek's previous course "Data Analysis"
was much better. Really what I wanted was a Data Analysis II rather than an
intro to it. But hopefully the later courses will be better.
I really recommend Edx's Analytics Edge to you. Unfortunately it is just
about to finish but that has been a very good course.


Reply to this email directly or view it on GitHubhttps://github.com//pull/36#issuecomment-42854165
.

@butlermh
Copy link
Author

OK, finally get back to address the rest:

I don't know about the Udacity Exploratory Data Analysis course, I haven't done it. I would hope it should be OK as it's done by people from the Facebook Data Science team (but the Udacity "Intro to Data Science" course was done by someone from industry too, but that was rather disappointing). One of my study friends has done it, he said it was OK but the assignment was a little frustrating. The three machine learning courses just came out, so neither my friend or I have done them.

Also regarding $ most of these courses have a free version, it's just you don't get a certificate. You might want to add an extra dollar sign for Udacity as the fees tend to be double Coursera and you have to pay on a monthly basis. They do this because they employ tutors to help you (which is good for some people) but also makes them seem expensive for people who just want a certificate but might not need that level of support - I found having to discuss my "study goals" with my tutor before starting the course a bit frustrating really - he was a nice guy but it just didn't seem necessary. But I am lapsing into subjective comment ...

Anyway it sounds like you have applied some editorial judgement, that is fine I guess if you think that is appropriate.

@butlermh
Copy link
Author

Hi Ryan
Feel free to add me on LinkedIn (see above) or email me (my email address should be on my github account) if I can answer any questions for you about the courses I have done and recommendations. It would be good to keep in touch with you too as you do more courses. Very best!

@rjolicoe
Copy link

Hi Mark,

Thank you for adding me on LinkedIn. I have several questions that I would
love to discuss with you. I greatly appreciate your time, thank you

On Monday, May 12, 2014, Mark H. Butler notifications@github.com wrote:

Hi Ryan
Feel free to add me on LinkedIn (see above) or email me (my email address
should be on my github account) if I can answer any questions for you about
the courses I have done and recommendations. It would be good to keep in
touch with you too as you do more courses. Very best!


Reply to this email directly or view it on GitHubhttps://github.com//pull/36#issuecomment-42856037
.

@butlermh butlermh closed this Mar 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants