- Matrices and Linear Algebra Fundamentals
- Database Basics:-
- Relational vs Non Relational Databases
- SQL + Joins
- NoSql
- Tabular Data
- DataFrames and Series
- Extract, Transform, Load(ETL)
- Reporting vs BI vs Analytics
- Data Formats:-
- json
- xml
- csv
- Regular Expressions (RegEx)
- Python Basics:-
- Expressions
- Variables
- Data Structures
- Functions
- Install packages (via pip, conda, e.t.c)
- Important Libraries:-
- Numpy
- Pandas
- Virtual Environments
- Jupyter Notebooks
- Data Mining
- Web Scraping
- Public Datasets
- Kaggle
- Principal Component Analysis (PCA)
- Dimensionalty and Numerosity Reduction
- Normalisation
- Data Scrubbing, Handling Missing Values
- Unbiased Estimators
- Binning Sparse Values
- Feature Extraction
- Denoising
- Sampling
- Probability Theory
- Randomness, random variables and random sample
- Probability distribution
- Conditional probability and Bayes Theorem
- Statistical Independence
- Cumulative distribution function (cdf)
- Probability density function (pdf)
- Probability mass function (pmf)
- Continuous Distributions (pdf's)
- Normal/Gaussian
- Uniform (continuous)
- Beta
- Dirichlet
- Exponential
- chi-squared
- Discrete Distributions (pmf's)
- Uniform (discrete)
- Binomial
- Multinomial
- Hypergeometric
- Poisson
- Geometric
- Summary statistics
- Expectation and mean
- Variance and standard deviation
- Covariance and Correlation
- Median, quartile
- Interquartile range
- Percentile/quantile
- Mode
- Important Laws
- Law of large numbers
- Central limit theorem
- Estimation
- Maximum Likelihood estimation
- Kernel Density Estimation
- Hypothesis Testing
- p-Value
- chi-square test
- F test
- t test
- Confidence Interval
- Monte Carlo Method
- Chart suggestions thought starter
- Python
- Matplotlib
- plotnine
- Bokeh
- seaborn
- ipyvolume
- Web
- Vega-Lite
- D3.js
- Dashboards
- Dash
- BI
- Tablaeu
- PowerBI
- Concepts, Inputs and Attributes
- Categorical Variables
- Ordinal Variables
- Numerical Variables
- Cost functions and gradient descent
- Overfitting/ Underfitting
- Training, validation and test data
- Precision vs Recall
- Bias and Variance
- Lift
- Supervised Learning
- Regression
- Linear Regression
- Poisson Regression
- Classification
- Classification Rate
- Decision Trees
- Logistic Regression
- Naive Bayes Classifiers
- K Nearest Neighbour
- Support Vector Machines
- Gaussian Mixture Models
- Regression
- Unsupervised Learning
- Clustering
- Hierarchical Clustering
- K Means Clustering
- DBSCAN
- HDBSCAN
- Fuzzy C Means
- Mean Shift
- Agglomerative
- OPTICS
- Association Rule Learning
- Apriori Algorithm
- ECLAT Algorithm
- FP Trees
- Dimensionality Reduction
- Principal Component Analysis
- Random Projection
- NMF
- T-SNE
- UMAP
- Clustering
- Ensemble Learning
- Boosting
- Bagging
- Stacking
- Reinforcement Learning
- Q Learning
- Sentiment Analysis
- Collaborative Filtering
- Tagging
- Prediction
- Important libraries
- scikit-learn
- spacy(NLP)
- Read DL Papers with concepts
- Read DL Papers with code
- Understanding Neural Networks
- Loss functions
- Activation functions
- Weight initialisation
- Vanishing/Exploding gradient Problem
- Feedforward Neural Network
- Autoencoder
- Convolutional Neural Network
- Pooling
- Recurrent Neural Network
- LSTM
- GRU
- Transformer
- Encoder
- Decoder
- Attention
- Siamese Network
- Generative Adversarial Network (GAN)
- Evolving Architectures/ NEAT
- Residual Connections
- Optimizers
- SGD
- Momentum
- AdaGrad
- AdaDelta
- Nadam
- RMSProp
- Learning Rate Schedule
- Batch Normalisation
- Batch Size Effects
- Regularisation
- Early Stopping
- Dropout
- Parameter Penalties
- Data Augmentation
- Adversarial Training
- Multitask Learning
- Transfer Learning
- Curriculum Learning
- Important Libraries
- Awesome Deep Learning
- Huggingface Transformers
- Tensorflow
- PyTorch
- Tensorboard
- MLFlow
- Distillation
- Quantization
- Neural Architecture Search (NAS)
- Summary of Data Formats
- Data Discovery
- Data Source and Acquisition
- Data Integration
- Data Fusion
- Transformation and Enrichment
- Data Survey
- OpenRefine
- How much Data
- Using ETL
- Data Lake vs Data Warehouse
- Dockerize your Python Application
- Architectural Patterns and Best Practices
- Horizontal vs Vertical Scaling
- Map reduce
- Data replication
- Name and Data Nodes
- Job and Task Tracker
- Check the awesome big data list
- Hadoop (large data)
- HDFS
- Loading data with Sqoop and Pig
- Storm: Hadoop Realtime
- Spark (in memory)
- RAPIDS (on GPU)
- Flume, Scribe: For unstruct Data
- Data Warehouse with Hive
- Elastic (EKL) Stack
- to get data (e.g logging) search, analyze and visualise it in realtime
- Avro
- Flink
- Dask
- Numba
- Onnx
- OpenVino
- MLFlow
- Kafka and KSQL
- Databases
- Cassandra
- MongoDB, Neo4j
- Scalability
- ZooKeeper
- Kubernetes
- Cloud Services
- AWS Sagemaker
- Google ML Engine
- Microsoft Azure Machine Learning Studio
- Awesome Production ML