The list of things I've finished so far on the way of learning by myself Machine Learning and Data Science.
- My raw notes: rawnote.dinhanhthi.com (quickly capture ideas from the courses).
- My main notes: note.dinhanhthi.com (well-written notes, not only for me).
- My learning log.
- Setting up a café in Ho Chi Minh City — find a best place to setting up a new business — article — source.
- Titanic: Machine Learning from Disaster (from Kaggle) — predicts which passengers survived the Titanic shipwreck — source.
- "Bull Book for Bulldozers" Kaggle competition.
I also do some mini-projects for understanding the concepts. You can find the html files (exported from the corresponding Jupyter Notebook files) and "Open in Colab" files for below mini projects here.
- Image compression using K-Means — source — Open in Colab.
- Example to understand the idea of PCA — source — Open in Colab.
- Image compression using PCA — source — Open in Colab.
- PCA without scikit-learn — source — Open in Colab.
- Face Recognition using SVM — source — Open in Colab.
- XOR problem using SVM to see the effect of gamma and C in the case of using RBF kernel — source — Open in Colab.
- Convolutional Neural Network (CNN).
- Decision Tree — my note.
- Density-based Clustering.
- Gaussian Naive Bayes.
- Hierarchical Clustering.
- K-Means Clustering.
- K-Nearest Neighbors (KNN)
- Linear Regression / Logistic Regression.
- Neural Networks.
- Principal Component Analysis (PCA) — my note.
- Random Forest — my note.
- Recurrent neural network (RNN).
- Singular Value Decomposition (SVD).
- Stochastic Gradient Decent (SGD).
- Support Vector Machine (SVM) — my note.
- Activation functions.
- Active learning (ML).
- Cost function.
- Confusion matrix.
- Cross Validation (K-folds).
- Decision boundary.
- Gradient Descent.
- Functions: Sigmoid, ReLU.
- F-test, p-value, f1-score, t-value, z-score.
- Forward/Backward propagation.
- Overfitting (High variance) / Underfitting (High bias).
- Plots / Charts: box plots, heat map plots, line plots, area plots, bar chart, choropleth map, waffle chart, factorplot.
- Regular Expressions (RegEx).
- Supervised Learning / Unsupervised Learning.
- Train / Dev / Test sets.
- Data Visualization.
- Data Wrangling.
- Model evaluation.
- Preprocessing (texts, images, dates & times, structured data).
- Web Scraping.
🐍 Programming Languages
- GraphQL — an open-source data query and manipulation language for APIs, and a runtime for fulfilling queries with existing data.
- Python — an interpreted, high-level, general-purpose programming language — my note.
- R — a programming language and free software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing.
- Scala — a general-purpose programming language providing support for functional programming and a strong static type system.
- SQL — a domain-specific language used in programming and designed for managing data held in a relational database management system, or for stream processing in a relational data stream management system.
⚙️ Frameworks & Platforms
- Docker — a set of platform as a service products that use OS-level virtualization to deliver software in packages called containers.
- Google Colab — a free cloud service, based on Jupyter Notebooks for machine-learning education and research — my note.
- Kaggle — an online community of data scientists and machine learners, owned by Google.
- Hadoop — a collection of open-source software utilities that facilitate using a network of many computers to solve problems involving massive amounts of data and computation.
- PostgreSQL (Postgres) — a free and open-source relational database management system emphasizing extensibility and technical standards compliance.
- Spark — an open-source distributed general-purpose cluster-computing framework.
- Git — a distributed version-control system for tracking changes in source code during software development — my note.
- Markdown — a lightweight markup language with plain text formatting syntax — my note.
- Jupyter Notebook — an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text — my note.
- Trello — a web-based Kanban-style list-making application.
The "ticked" libraries don't mean that I've known/understand whole of them (but I can easily use them with their documentation)!
- Keras — an open-source neural-network library written in Python.
- Matplotlib — a plotting library for the Python programming language and its numerical mathematics extension NumPy.
- numpy, matplotlib, pandas — a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.
- OpenCV — a library of programming functions mainly aimed at real-time computer vision.
- pandas — a software library written for the Python programming language for data manipulation and analysis.
- Seaborn — a Python data visualization library based on matplotlib.
- scikit-learn — a free software machine learning library for the Python programming language.
- TensorFlow — a free and open-source software library for dataflow and differentiable programming across a range of tasks..
The "non-checked" courses are under the way to be finished!
- Machine Learning by Andew NG on Coursera. It introduces a general idea about ML and some commonly used algorithms — my note — my certificate.
- IBM Data Professional Certificate specialization on Coursera. It contains 9 sub-courses covering fundamental knowledge about data science — my note — my certificate.
- Learn Python 3 on Codecademy — my note — my certificate.
- Learn SQL on Codecademy — my certificate.
- Introduction to Statistics with NumPy on Codecademy — my certificate.
- Data Scientist path & Data Engineer path on Dataquest. Both of them contain many sub-courses covering all about Data Science — my note — my certificate.
- Data Science Path on Codecademy. It contains 27 sub-courses covering all necessary knowledge about data science — my certificate.
- Deep Learning Specialization by Andrew NG on Coursera. It contains 5 courses covering the foundations of Deep Learning (CNN, RNN, LSTM, Adam, Dropout, BatchNorm, Xavier/He initialization,...). Many case studies projects are proposed.
- fast.ai's courses for Machine Learning and Deep Learning.
- Machine Learning Crash Course by Google.
The "non-checked" books are under the way to be finished!
- An Introduction to Statistical Learning by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirami.
- Deep Learning with Python by François Chollet.
- Dive into Deep Learning — An interactive deep learning book with code, math, and discussions, based on the NumPy interface. — Github.
- Hands-On Machine Learning with Scikit-Learn, Keras, and Tensorflow: Concepts, Tools, and Techniques to Build Intelligent Systems (2nd edition) by Aurélien Géron.
- Machine Learning Yearing by Andew NG.
- The Elements of Statistical Learning: Data Mining, Inference, and Prediction by Trevor Hastie, Robert Tibshirani and Jerome Friedman.
🤖 Github's repositories
- Awesome's lists:
- Awesome Data Engineering — A curated list of data engineering tools for software developers.
- Awesome Deep Learning — A curated list of awesome Deep Learning tutorials, projects and communities.
- Awesome Deep learning papers and other resources — Deep Learning and deep reinforcement learning research papers and some codes.
- Awesome Public Datasets — A topic-centric list of HQ open datasets.
- Awesome Machine Learning — A curated list of awesome Machine Learning frameworks, libraries and software.
- Awesome Big Data — A curated list of awesome big data frameworks, ressources and other awesomeness.
- 120 Data Science Interview Questions — Answers to 120 commonly asked data science interview questions.
- A Machine Learning Course with Python — Machine Learning Course with Python. Refer to the course page for step-by-step explanations.
- Python Data Science Handbook — Python Data Science Handbook: full text in Jupyter Notebooks.
- Homemade Machine Learning — Python examples of popular machine learning algorithms with interactive Jupyter demos and math being explained.
- TensorFlow-Course — Simple and ready-to-use tutorials for TensorFlow.
- Machine Learning & Deep Learning Tutorials — ML and DL tutorials, articles and other resources.
- Data science blogs — A curated list of data science blogs.
- data-science-ipython-notebooks — DS Python notebooks: DL (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
🌏 Other resources
- Papers With Code — a free and open resource with Machine Learning papers, code and evaluation tables.
- Chris Albon's notes — Notes On Using Data Science & Artificial Intelligence To Fight For Something That Matters.
- Seeing Theory — A visual introduction to probabilities and statistics.
- Collection of useful articles for understanding concepts in ML, AI and DS.
The descriptions of terms in this site are borrowed from Wikipedia.