Skip to content

Latest commit

 

History

History
39 lines (27 loc) · 7.76 KB

advice.md

File metadata and controls

39 lines (27 loc) · 7.76 KB

What's Next?

Here is my best advice for getting better at data science: Find "the thing" that motivates you to practice what you learned and to learn more, and then do that thing. That could be personal data science projects, Kaggle competitions, online courses, reading books, reading blogs, attending meetups or conferences, or something else.

If you create your own data science projects, I'd encourage you to share them on GitHub and include writeups. That will help to show others that you know how to do proper data science.

Kaggle competitions are a great way to practice data science without coming up with the problem yourself. Don't worry about how high you place, just focus on learning something new with every competition. Spend as much time as possible reading the forums, because you'll learn a lot, but don't spend time in the forums at the expense of working on the competition yourself. Also, keep in mind that you won't be practicing important parts of the data science workflow, namely generating questions, gathering data, and communicating results.

There are many online courses to consider, and new ones being created all the time:

  • Coursera's Data Science Specialization is 9 courses, plus a capstone project. There is a lot of overlap with General Assembly's course, and course quality varies, but you would definitely learn a lot of R.
  • Coursera's Machine Learning is Andrew Ng's highly regarded course. It goes deeper into many topics we covered, and covers many topics we didn't. Keep in mind that it focuses only on machine learning (not the entire data science workflow), the programming assignments use MATLAB/Octave, and it requires some understanding of linear algebra. Browse these lecture notes (compiled by a student) for a preview of the course.
  • Stanford's Statistical Learning also covers some topics that we did not. It focuses on teaching machine learning at a conceptual (rather than mathematical) level, when possible. The course may be offered again in 2016, but the real gem from the course is the book and videos (linked below).
  • Caltech's Learning from Data teaches machine learning at a theoretical and conceptual level. The lectures and slides are excellent. The homework assignments are not interactive, and the course does not use a specific programming language.
  • Udacity's Data Analyst Nanodegree looks promising, but I don't know anyone who has done it.
  • Thinkful's Data Science in Python course or SlideRule's Data Science Intensive may be a good way to practice our course material with guidance from an expert mentor.
  • Dataquest is an online platform rather than a traditional course, and allows you to learn and practice data science through interactive exercises. Not all of the lessons are free, but new lessons are frequently being added.
  • edX's Introduction to Computer Science and Programming Using Python is apparently an excellent course if you want to get better at programming in Python.
  • Coursera recently added many other data science-related specializations and courses, most of which I am not familiar with. However, CourseTalk is useful for reading reviews of online courses.
  • Some additional courses are listed in the Additional Resources section of the main README.
  • I will also be teaching my own online courses, which will range in level from beginner to advanced. (Subscribe to my email newsletter to be notified when courses are announced.)

Here is just a tiny selection of books:

There are an overwhelming number of data science blogs and newsletters. If you want to read just one site, DataTau is the best aggregator. Data Elixir is the best newsletter, though the O'Reilly Data Newsletter and Python Weekly are also good. Other notable blogs include: no free hunch (Kaggle's blog), The Yhat blog (lots of Python and R content), Practical Business Python (accessible Python content), Simply Statistics (a bit more academic), FastML (machine learning content), Win-Vector blog (great data science advice), FiveThirtyEight (data journalism), and Data School (my blog).

If you prefer podcasts, I don't have any personal recommendations, though this list gives a nice summary of seven data science podcasts that the author recommends.

There are tons of data-related meetups in DC, and most of them are organized by Data Community DC. Check out the calendar or just subscribe to their weekly newsletter. District Data Labs also offers data science workshops and project opportunities in DC.

Some notable data science conferences are KDD, Strata, PyCon, PyData, and SciPy.

If you want to go full-time with your data science education, read this guide to data science bootcamps, and this other guide which also includes part-time and online programs. Or, check out this massive list of colleges and universities with data science-related degrees.

Finally, Dataquest's blog post on How to actually learn data science has some additional advice that may be useful to you.