Skip to content

ByteMonk-GCECT/Hello-Robot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 

Repository files navigation

Hello Robot


👩‍💻 LEARNING PATH 👨‍💻


֍ Pre requisites for starting your Machine Learning journey:

֍ Basics:

  • Matrices and Linear Algebra Fundamentals
  • Database Basics:-
    • Relational vs Non Relational Databases
    • SQL + Joins
    • NoSql
  • Tabular Data
  • DataFrames and Series
  • Extract, Transform, Load(ETL)
  • Reporting vs BI vs Analytics
  • Data Formats:-
    • json
    • xml
    • csv
  • Regular Expressions (RegEx)

֍ Python Programming:

  • Python Basics:-
    • Expressions
    • Variables
    • Data Structures
    • Functions
    • Install packages (via pip, conda, e.t.c)
  • Important Libraries:-
    • Numpy
    • Pandas
  • Virtual Environments
  • Jupyter Notebooks

֍ Python Programming:

  • Data Mining
  • Web Scraping
  • Public Datasets
  • Kaggle

֍ Exploratory Data Analysis/Data Munging/Wrangling:-

  • Principal Component Analysis (PCA)
  • Dimensionalty and Numerosity Reduction
  • Normalisation
  • Data Scrubbing, Handling Missing Values
  • Unbiased Estimators
  • Binning Sparse Values
  • Feature Extraction
  • Denoising
  • Sampling

֍ Choose your path:-

֍ Data Science:-

֍ Data Engineer:-

֍ Statistics:

  • Probability Theory
    • Randomness, random variables and random sample
    • Probability distribution
    • Conditional probability and Bayes Theorem
    • Statistical Independence
    • Cumulative distribution function (cdf)
    • Probability density function (pdf)
    • Probability mass function (pmf)
  • Continuous Distributions (pdf's)
    • Normal/Gaussian
    • Uniform (continuous)
    • Beta
    • Dirichlet
    • Exponential
    • chi-squared
  • Discrete Distributions (pmf's)
    • Uniform (discrete)
    • Binomial
    • Multinomial
    • Hypergeometric
    • Poisson
    • Geometric
  • Summary statistics
    • Expectation and mean
    • Variance and standard deviation
    • Covariance and Correlation
    • Median, quartile
    • Interquartile range
    • Percentile/quantile
    • Mode
  • Important Laws
    • Law of large numbers
    • Central limit theorem
  • Estimation
    • Maximum Likelihood estimation
    • Kernel Density Estimation
  • Hypothesis Testing
    • p-Value
    • chi-square test
    • F test
    • t test
  • Confidence Interval
  • Monte Carlo Method

֍ Visualisation:

  • Chart suggestions thought starter
  • Python
    • Matplotlib
    • plotnine
    • Bokeh
    • seaborn
    • ipyvolume
  • Web
    • Vega-Lite
    • D3.js
  • Dashboards
    • Dash
  • BI
    • Tablaeu
    • PowerBI

֍ General:

  • Concepts, Inputs and Attributes
    • Categorical Variables
    • Ordinal Variables
    • Numerical Variables
  • Cost functions and gradient descent
  • Overfitting/ Underfitting
  • Training, validation and test data
  • Precision vs Recall
  • Bias and Variance
  • Lift

֍ Methods:

  • Supervised Learning
    • Regression
      • Linear Regression
      • Poisson Regression
    • Classification
      • Classification Rate
      • Decision Trees
      • Logistic Regression
      • Naive Bayes Classifiers
      • K Nearest Neighbour
      • Support Vector Machines
      • Gaussian Mixture Models
  • Unsupervised Learning
    • Clustering
      • Hierarchical Clustering
      • K Means Clustering
      • DBSCAN
      • HDBSCAN
      • Fuzzy C Means
      • Mean Shift
      • Agglomerative
      • OPTICS
    • Association Rule Learning
      • Apriori Algorithm
      • ECLAT Algorithm
      • FP Trees
    • Dimensionality Reduction
      • Principal Component Analysis
      • Random Projection
      • NMF
      • T-SNE
      • UMAP
  • Ensemble Learning
    • Boosting
    • Bagging
    • Stacking
  • Reinforcement Learning
    • Q Learning

֍ Use Cases:

  • Sentiment Analysis
  • Collaborative Filtering
  • Tagging
  • Prediction

֍ Tools:

  • Important libraries
    • scikit-learn
    • spacy(NLP)

֍ Papers:

  • Read DL Papers with concepts
  • Read DL Papers with code

֍ Neural Networks:

  • Understanding Neural Networks
  • Loss functions
  • Activation functions
  • Weight initialisation
  • Vanishing/Exploding gradient Problem

֍ Architectures:

  • Feedforward Neural Network
  • Autoencoder
  • Convolutional Neural Network
    • Pooling
  • Recurrent Neural Network
    • LSTM
    • GRU
  • Transformer
    • Encoder
    • Decoder
    • Attention
  • Siamese Network
  • Generative Adversarial Network (GAN)
  • Evolving Architectures/ NEAT
  • Residual Connections

֍ Training:

  • Optimizers
    • SGD
    • Momentum
    • AdaGrad
    • AdaDelta
    • Nadam
    • RMSProp
  • Learning Rate Schedule
  • Batch Normalisation
  • Batch Size Effects
  • Regularisation
    • Early Stopping
    • Dropout
    • Parameter Penalties
    • Data Augmentation
    • Adversarial Training
  • Multitask Learning
  • Transfer Learning
  • Curriculum Learning

֍ Tools:

  • Important Libraries
    • Awesome Deep Learning
    • Huggingface Transformers
  • Tensorflow
  • PyTorch
  • Tensorboard
  • MLFlow

֍ Model Optimisation:

  • Distillation
  • Quantization
  • Neural Architecture Search (NAS)
  • Summary of Data Formats
  • Data Discovery
  • Data Source and Acquisition
  • Data Integration
  • Data Fusion
  • Transformation and Enrichment
  • Data Survey
  • OpenRefine
  • How much Data
  • Using ETL
  • Data Lake vs Data Warehouse
  • Dockerize your Python Application

֍ Big Data Architecture:

  • Architectural Patterns and Best Practices

֍ Principles:

  • Horizontal vs Vertical Scaling
  • Map reduce
  • Data replication
  • Name and Data Nodes
  • Job and Task Tracker

֍ Tools:

  • Check the awesome big data list
  • Hadoop (large data)
    • HDFS
    • Loading data with Sqoop and Pig
    • Storm: Hadoop Realtime
  • Spark (in memory)
  • RAPIDS (on GPU)
  • Flume, Scribe: For unstruct Data
  • Data Warehouse with Hive
  • Elastic (EKL) Stack
    • to get data (e.g logging) search, analyze and visualise it in realtime
  • Avro
  • Flink
  • Dask
  • Numba
  • Onnx
  • OpenVino
  • MLFlow
  • Kafka and KSQL
  • Databases
    • Cassandra
    • MongoDB, Neo4j
  • Scalability
    • ZooKeeper
    • Kubernetes
  • Cloud Services
    • AWS Sagemaker
    • Google ML Engine
    • Microsoft Azure Machine Learning Studio
  • Awesome Production ML

About

An Introductory Roadmap to Data Science

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published