Skip to content

dron-dronych/self-made-data-scientist

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

63 Commits
 
 

Repository files navigation

Precise plan on becoming a data scientist

After reading tons of literature and forum questions, I've determined a path I need to follow which is pretty much what most of those answers have in common. This plan essentially covers three parts, math, computer science, and practical knowledge (e.g. of tools). Theoretical part is focused on gaining math skills and knowledge in probability, statistics, optimization, and machine learning theory. Practical part, on the other hand, has its focus on programming skills and getting known with the analysis tools. Finally, the computer science part is geared towards obtaining skills that are helpful in working as a practical data scientist, for example, effective implementation of an algorithm. Unlike most of the resources and guides, this guide is to be very concise with regards to literature and resources, abundance of which is overwhelming and confusing.

If there is one thing I have learned in the last few years after following this plan is practice, practice, and practice. Practice until you know how to apply theory you have learnt; theory without practice is worthless. Do the finetuning of pretrained models, deep dive into specifics such as NLP, f.e., and see how you can use the most recent research in your predictions; whatever it is - as long as you practice it, you are taking that thought process you need to the next level.

You can always learn the tools to help you in your DS journey, but building that thinking process and developing an experimental mindset is what I bet on mostly. You can often find a few tricks here and there, and so instead of continuously learning them, isn't it better to learn how to generate them yourself?


Take time to revise your learning and get away from studying by giving yourself a little break every once in a while. As I recently read in one of the articles on self-development, look back and revamp on the knowledge you have gained and succeeded, not where you have come up short as this is what we're inclined to more naturally.

Completed items marked with ✔️.

Contents

Math

Items to be followed in the order provided

  1. ✔️ Single variable calculus Course page
  2. ✔️ Multi variable calculus Course page
  3. Linear algebra Course page
  4. ✔️ Introduction to Combinatorics Andrey Raygorodskiy on Coursera (to be taken in parallel with multi-variable calculus)
  5. ✔️ Introduction to Probability Andrey Raygorodskiy on Coursera
  6. Probability and Statistics Course page
  7. Selected topics from the Probability and Random Variables course MIT Spring 2015
  8. Introduction to Stochastic Processes MIT Sprint 2015 ## TODO find materials on Markov Models
  9. Matrix Methods in Data Analysis, Signal Processing, and Machine Learning Course by Gilbert Strang
  10. Introduction to Graph Theory Andrey Raygorodskiy on Coursera
  11. Mathematics for Computer Science MIT / Fall 2010 or MIT / Spring 2015
  12. Differential Equations Spring 2010
  13. Logic
  14. Introduction to Numerical Analysis MIT's 18.330
  15. Analytical geometry (determine the resources)
  16. Algebra I MIT's 18.701
  17. Algebra II MIT's 18.702
  18. Number Theory MIT's 18.781
  19. Analysis I MIT's 18.100B
  20. Analysis II MIT's 18.101
  21. Introduction to Functional Analysis MIT's 18.102
  22. Convex Optimization Stanford Course by Stephen Boyd

Applied part

  1. ✔️ Acquaintance with Numpy Numpy Tutorial on Scipy-Lectures
  2. ✔️ Pandas tutorial Official tutorial
  3. ✔️ Matplotlib Intro to Matplotlib

Machine Learning

  1. ✔️ Machine Learning Open Course (https://github.com/Yorko/mlcourse.ai)
  2. More hard core machine learning math from Yandex (https://academy.yandex.ru/handbook/ml/)
  3. CS231n: Convolutional Neural Networks for Visual Recognition (http://cs231n.github.io/)
  4. Fast.ai - sequence of 4 practice-oriented courses Courses page
  5. Good theoretical overview of ML fundamentals (in Russian)
  6. Machine Learning lectures by K.Voroncov videos in Russian

Deep Learning

  1. ✔️ Deepearning.ai (specialization on Coursera https://www.deeplearning.ai/)

Natural Language Processing

  1. Best NLP competitions on Kaggle to learn from: video by Abhishek Thakur
  2. Understanding Unicode and Charsets: bare minimum

Computer Science and Software Engineering

This is where I refer to a collection by jwasham's coding-interview-university
Special courses to take listed separately:

  1. Introduction to Computational Thinking and Data Science MIT's 6.0002
  2. Algorithms: Design and Analysis Stanford Course
  3. Machine Learning Stanford Course by Andrew Ng
  4. Data Structures and Algorithms Specialization (a sequence of 6 courses; specialization on Coursera https://www.coursera.org/specializations/data-structures-algorithms)
  5. Object-Oriented Programming and Design Patterns in Python (https://www.coursera.org/learn/oop-patterns-python/)
  6. C++ learning by doing
    I have found these sources to be useful especially used in conjunction with each other:
  7. Java
    • Cool in-depth coverage of Java Core: Golovach Courses (in Russian)
    • Effective Java (3rd. ed) - Joshua Bloch
    • Clean Code - Robert C. Martin
    • The Clean Coder - Robert C. Martin
    • Optimizing Java - Benjamin J.Evans, James Gough & Chris Newland
    • Test-Driven Development - Kent Beck
    • The Art of Unit Testing - Roy Osherove
    • Optimizing Java: Practical Techniques for Improving JVM Application Performance - Benjamin J. Evans, Chris Newland, James Gough
  8. Learn to work with the command line
  9. Shell scripting lessons

System Design

  1. System design - engineering approach very cool collection
  2. Real world systems explained by those who build them Architecturenotes
  3. Infrastructure explained blogpost

Other interesting courses

Some of the courses with the useful material to get a grasp of:

  1. Topics in Mathematics with Applications in Finance (topics include stochastic calculus, stochastic differential equations, time series analysis, and direct applications to finance) MIT's 18.S096

Useful Resources

Books

Interview Questions & Brain Teasers

Releases

No releases published

Packages

No packages published