This is an open-source Data Science curriculum that uses the Top-Down approach to expose you to becoming a Cloud-Native Data Scienist! It will aid you in learning to master industry standard ML libraries like Fast.ai, TensorFlow, Microsoft's Coginitive Toolkit and many more!
This repo is my attempt to share my curriculum for becoming a Data Scienist. There are many problems when trying to learn a profession in that note many clear paths are out there that are considerate of people's financial situation. This is a major stumbling block when first starting in tech.
Personally, universities are becoming obsolete as they are noted geared towards equiping students with the skills and workflows to be able to contribute to the Data Science community. There are some educators who are teaching Data science at universities propely but they are too few!
Most importantly, just because 100% of the course content is free does not mean the quality is poor either. Majority of the content is from Kaggle, Microsoft, Fast.ai and Google. All these organisations have made an effort to share their knowledge of implementing production grade ML to put the power of ML in everyones hands to improve the quality of ML for the public. This means that anyone with the ability to program does not need the expert knowledge to make impactful ML solutions!
Data Science is a broad term used to define a field with many overlaping disciplines. Most importantly, this content focuses on Machine Learning (ML).
ML is using data to understand events and teach computers to recognize patterns the same way our brains do.
However I feel it is important to get to know the other discplines that overlap each other.
Data analyst: Responbile for gathering insights from large datasets to do reporting and business intellengce. Typical questions they would need to answer are : find second highest salary of an Employee. Lots of extracting data from different database types and data warehousing/ETL if necessary
Data Engineer: This is the equivalent of a regular back-end engineer in the field of Data Science. Their responsibility is to ensure that the infrastructre to implement ML in services is avaiable. They usually deploy the ML engineer's models.
ML Engineer/Data Scienist: Their job is understand business problems and then conduct experiments to understand how ML will help derive business value with the aid of ML. This usually means setting up data cleaning pipelines to ensure that the model is as good as the data it will get. This is the skill you will develop most will going through the various courses.
Here is the WIP as a Juypter Notebook.
It provides all the links to courses for you to complete.
A few tips on how to set most things up are here
There are two ways you can do this all depending on your prefernce of the top-Down approach or bottom-up approach. Top-down will put you in the driver seat of an f1 car were you learn what research/production machine learning workflows are and how they can be applied to solve business problems. The bottom-up approach will start you off at the bare essentials and teach all the separate pieces of the puzzle that will fit together.
The later approach is how most education systems are structured and the main reason why I did not got back to Univeristy to do a Honours degree. There are more efficient ways of learning and implementing a skill I feel. However,If you have less than a years worth of programming experince, please start right at the bottom of the list and work your way up. Also for those who want to brush up on the fundamentals or figure out a specific library writes its Python functions in a similar way please pick the different skills you need to learn when the need arises.
You will need a computer with a browser, ideally Chrome or Firefox and a stable internet connection. The only thing you will need to pay for is your internet bill and electricty. The courses can be done at your own pace!
You need document and do a lot of your own work. Make sure you apply what you have learnt to use cases other than what you have learnt in the courses. Eventually you will have an extensive portfolio to track tangiable progress.
For example: If you do the Crash Course for TensorFlow, you need to have a Google Doc/Dropbox Papper app of some kind open and then summerise what concepts they are teaching you. Once you have completed the course you need apply those skills to a different dataset and document your experince and results. This is the only way people can see proof that you understand a particular topic.
The duration is based on your consisty in being able to understand/implement/document your work. Courses come with their own recommended completion time, you need to make sure you can do it in that time and budget for the time it will take for you reimplement what you've learnt to your own problem and document the results.
For most this can take a week, several days a month based on your avaiable skills, time and responsibilities. However you need to aim for a full years worth of tackling this to get grips with creating reproducable work!
Ultimately you need to have at least 5 experiments within a few months you have completed that you can form an opinon on and provide how your work is valid and its short commings.
Full time occupation: Convice employer for study leave/weekends&after full time ocupation Studying: