Skip to content

Latest commit

 

History

History
3002 lines (2427 loc) · 100 KB

todayilearned.md

File metadata and controls

3002 lines (2427 loc) · 100 KB

Today I Learned (TIL)

Caution: This timeline is tailored for @EshbanTheLearner and might not be suitable for everyone.

Part - II (22 September 2020 - Continued)

Day 52 | November 12, 2020 | Thursday

Today's Progress: Today I continued the Stanford CS224N: NLP with Deep Learning course by Christopher Manning on YouTube.

Description: This includes the following:

  • Lecture 7 - Vanishing Gradient, Fancy RNNs
    • Vanishing Gradient
      • Intuition

Day 51 | November 11, 2020 | Wednesday

Today's Progress: Today I continued the Stanford CS224N: NLP with Deep Learning course by Christopher Manning on YouTube.

Description: This includes the following:

  • Lecture 6 - Language Models and RNNs
    • n-gram Language Model
      • Sparsity Problem
    • Neural Language Model
      • Window-based Neural Model
    • Recurrent Neural Networks
      • Advantages and Disadvantages
    • Training RNN Language Model
    • Backpropagation for RNNs
    • Evaluating Language Models
      • Perplexity
    • Applications

Day 50 | November 10, 2020 | Tuesday

Today's Progress: Today I continued the Stanford CS224N: NLP with Deep Learning course by Christopher Manning on YouTube.

Description: This includes the following:

  • Lecture 5 - Dependency Parsing
    • Syntactic Structure: Consistency and Dependency
      • Context Free Grammers (CFGs)
      • Prepositional Phrase Attachment Ambiguity
      • Coordination Scope Ambiguity
      • Adjectival Modifier Ambiguity
      • Verb Phrase Attachment Ambiguity
    • Dependency Grammer
    • Universal Dependencies Treebanks
    • Dependency Conditioning Preferences
      • Bilexical Affinities
      • Dependency Distance
      • Intervening Material
      • Valency of Heads
    • Transition Based Dependency Parsing
      • MaltParser - Nivre and Hall 2005
    • Neural Dependency Parsing
      • Model Architecture

Day 49 | November 9, 2020 | Monday

Today's Progress: Today I continued the Stanford CS224N: NLP with Deep Learning course by Christopher Manning on YouTube.

Description: This includes the following:

  • Lecture 4 - Backpropagation
    • Matrix Gradients for Simple Neural Network
    • Computation Graphs
    • Backpropagation
      • Upstream Gradient
      • Local Gradient
      • Downstream Gradient
    • Automatic Differentiation
    • Regularization
    • Overfitting
    • Vectorization
    • Nonlinearities
    • Initialization
    • Optimizers
    • Learning Rates

Day 48 | November 8, 2020 | Sunday

Today's Progress: Today I continued the Stanford CS224N: NLP with Deep Learning course by Christopher Manning on YouTube.

Description: This includes the following:

  • Lecture 3 - Neural Networks
    • Classification Review/Introduction
      • Softmax Classifier
      • Cross-Entropy Loss
    • Neural Network Introduction
      • Intuition
      • Classification Difference with Word Vectors
      • Matrix Notations for a Layer
      • Non-Linearities
    • Named Entity Recognition
      • NER on Word Sequences
    • Binary Word Window Classification
    • Computing Gradients
      • Jacobian Matrix

Day 47 | November 7, 2020 | Saturday

Today's Progress: Today I continued the Stanford CS224N: NLP with Deep Learning course by Christopher Manning on YouTube.

Description: This includes the following:

  • Lecture 2 - Word Vectors and Word Senses
    • Optimization Basics
      • Gradient Descent
      • Stochastic Gradient Descent
    • Word2Vec Review
      • Skip-Grams
      • Continuous Bag of Words
      • Negative Sampling
      • Unigram Distribution
    • Can we capture this essence more effectively by counting?
      • Co-Occurence Matrix
        • Window
        • Full Document
      • Problems and Solutions
      • Singular Value Decomposition (SVD)
    • The GloVe Model of Word Vectors
      • Count Based vs Direct Prediction
      • Encoding Meaning in Vector Differences
      • Log-Bilinear Model with Vector Differences
      • GloVe
    • Evaluating Word Vectors
      • Intrinsic
        • Word Vector Analogies
        • Human Judgement
        • Correlation Judgement
      • Extrinsic

Day 46 | November 6, 2020 | Friday

Today's Progress: Today I started the Stanford CS224N: NLP with Deep Learning course by Christopher Manning on YouTube.

Description: This includes the following:

  • Lecture 1 - Introduction and Word Vectors
    • Human Language
    • Word Meaning
      • Denotational Semantics
        • WordNet - Advantages and Disadvantages
      • Distributional Semantics
        • Word2Vec
    • Word2Vec Introduction
      • Skip-Grams
      • Continuous Bag of Words
    • Word2Vec Objective Function Gradients
    • Optimization Basics
      • Gradient Descent
      • Stochastic Gradient Descent
    • Looking at Word Vectors

YouTube | Stanford CS224N: NLP with Deep Learning

Day 45 | November 5, 2020 | Thursday

Today's Progress: Today I concluded the Credit Risk Modeling in Python 2020 course on Udemy.

Description: This includes the following:

  • Calculating Expected Loss
    • Data Prep
      • Train-Test Split
      • Dummy Variables
    • Estimate Recovery Rate
    • Total Expected Loss on Portfolio Level
  • Summary

Day 44 | November 4, 2020 | Wednesday

Today's Progress: Today I continued with the Credit Risk Modeling in Python 2020 course on Udemy.

Description: This includes the following:

  • Exposure at Default (EAD) Model
    • Data Prep
      • Train-Test Split
      • Selecting Reference Categories
    • Model Training
      • Linear Regression
    • EAD Model Validation

Day 43 | November 3, 2020 | Tuesday

Today's Progress: Today I continued with the Credit Risk Modeling in Python 2020 course on Udemy.

Description: This includes the following:

  • Loss Given Default (LGD) Model
    • Data Prep
      • Train-Test Split
      • Preparing Inputs
    • Training the LGD Model
      • Logistic Regression
    • Testing the LGD Model
      • Accuracy of LGD Model
      • ROC-AUC of LGD Model
    • Saving the LGD Model
    • Stage 2 - Multiple Linear Regression
      • Data Prep
      • Training
      • Testing
        • Correlation
        • Mean Squared Error
    • Combining Stage 1 and Stage 2 Model

Day 42 | November 2, 2020 | Monday

Today's Progress: Today I continued with the Credit Risk Modeling in Python 2020 course on Udemy.

Description: This includes the following:

  • Applying PD Model for Decision Making
    • Calculating Probability of Default for a single Customer
    • Creating a Scorecard
    • Calculating Credit Score
    • From Credit Score to PD
    • Setting Cut-Offs
  • Loss Given Default (LGD) and Exposure at Default (EAD) Models
    • Independent Variables
    • Dependent Variables
    • Distribution of Recovery Rates and Credit Conversion Factors
      • Beta Distribution
      • Beta Regression

Day 41 | November 1, 2020 | Sunday

Today's Progress: Today I continued with the Credit Risk Modeling in Python 2020 course on Udemy.

Description: This includes the following:

  • PD Model Validation
    • Out of Sample Validation
    • Model Evaluation
      • Accuracy
      • Recall and Precision
      • ROC-AUC
    • Gini Coefficient and Kolmogorov-Smirnov Statistic

Day 40 | October 31, 2020 | Saturday

Today's Progress: Today I continued with the Credit Risk Modeling in Python 2020 course on Udemy.

Description: This includes the following:

  • Probability of Default (PD) Model Estimation
    • Logistic Regression with Dummy Variables
      • Logistic vs Linear
      • Odds
      • Interpretting Coefficients of Logistic Regression
    • Logistic Regression with P-Values
    • Interpretting PD Model Coefficients

Day 39 | October 30, 2020 | Friday

Today's Progress: Today I continued with the Credit Risk Modeling in Python 2020 course on Udemy.

Description: This includes the following:

  • Probability of Default (PD) Model Data Prep
    • Data Prep
    • Preprocessing Continuous Variables - II
      • Creating Dummy Variables

Relevant Codebase

Day 38 | October 29, 2020 | Thursday

Today's Progress: Today I continued with the Credit Risk Modeling in Python 2020 course on Udemy.

Description: This includes the following:

  • Probability of Default (PD) Model Data Prep
    • Data Prep
    • Preprocessing Continuous Variables - I
      • Creating Dummy Variables

Relevant Codebase

Day 37 | October 28, 2020 | Wednesday

Today's Progress: Today I continued with the Credit Risk Modeling in Python 2020 course on Udemy.

Description: This includes the following:

  • Probability of Default (PD) Model Data Prep
    • Data Prep
      • Weights of Evidence
      • Information Value
    • Automating Calculations
    • Visualizing Results
    • Visualizing and Interpreting Weight of Evidence

Relevant Codebase

Day 36 | October 27, 2020 | Tuesday

Today's Progress: Today I continued with the Credit Risk Modeling in Python 2020 course on Udemy.

Description: This includes the following:

  • Probability of Default (PD) Model Data Prep
    • Intro to PD Model
      • Dependant Variables
      • Default Definition
      • Logistic Regression
      • Interpretability
    • Dependant Variable
    • Information Value
      • Weight of Evidence
    • Data Prep
      • Train-Test Split

Relevant Codebase

Day 35 | October 26, 2020 | Monday

Today's Progress: Today I continued with the Credit Risk Modeling in Python 2020 course on Udemy.

Description: This includes the following:

  • General Preprocessing
    • Basic EDA
    • Dealing with Continous Variables
    • Dealing with Discrete Variables
    • Dealing with Missing Values

Relevant Codebase

Day 34 | October 25, 2020 | Sunday

Today's Progress: Today I continued with the Credit Risk Modeling in Python 2020 course on Udemy.

Description: This includes the following:

  • Setting up the Working Environment
  • Data Description
    • Lending Club Loan Data
    • Dependant and Independant Variables
      • Discrete and Continuous Variables
      • Fine and Coarse Classing

Day 33 | October 24, 2020 | Saturday

Today's Progress: Today I enrolled in the Credit Risk Modeling in Python 2020 course on Udemy.

Description: This includes the following:

  • Introduction
    • Capital Adequacy / Regulatory Capital
    • Capital Adequacy Ratio
    • Basel II Accord
      • Minimum Capital Requirements
        • Credit Risk
          • Standardized Approach
          • Internal Ratings Based Approaches
            • Foundation Internal Based Approache (F-IRB)
            • Advanced Internal Based Approache (A-IRB)
        • Operational Risk
        • Market Risk
      • Supervisory Review
      • Market Discipline
    • Different Facility Types
    • Credit Risk Modeling Approaches

Day 32 | October 23, 2020 | Friday

Today's Progress: Today I enrolled in the Credit Risk Modeling in Python 2020 course on Udemy.

Description: This includes the following:

  • Introduction
    • What is Credit Risk?
      • Creditor
      • Debtor
      • Credit Limit
      • Interest
      • Home Ownsership and Asset Financing
      • Credit Risk
      • Default Event
      • The Global Financial Crisis 2008
    • Expected Loss and its Components
      • Types of Factors for Expected Loss
        • Borrower-Specific Factors
        • The Economic Environment
      • Expected Credit Loss
      • Probability of Default
      • Loss Given Default
      • Exposure at Default

Udemy | Credit Risk Modeling in Python 2020

Day 31 | October 22, 2020 | Thursday

Today's Progress: Today I concluded the Natural Language Processing with Sequence Models course from the Natural Language Processing Specialization.

Description: This includes the following:

  • Siamese Networks
    • Computing the Cost - I
    • Computing the Cost - II
      • Mean Negative
      • Closest Negative
      • Hard Negative Mining
    • One Shot Learning
    • Training/Testing
    • Programming Assignment
      • Question Duplicates

Certificate | Natural Language Processing with Sequence Models

Day 30 | October 21, 2020 | Wednesday

Today's Progress: Today I continued with the Natural Language Processing with Sequence Models course from the Natural Language Processing Specialization.

Description: This includes the following:

  • Siamese Networks
    • Introduction
    • Architecture
      • Identical Subnetworks
      • Cosine Similarity
    • Cost Function
      • Anchor
      • Positive
      • Negative
    • Triplets
      • Simple Loss
      • Non-Linearity
      • Alpha Margin

Day 29 | October 20, 2020 | Tuesday

Today's Progress: Today I continued with the Natural Language Processing with Sequence Models course from the Natural Language Processing Specialization.

Description: This includes the following:

  • LSTMs and Named Entity Recognition
    • Introduction to Named Entity Recognition
      • Applications of NER
      • Search Engine Efficiency
      • Recommendation Engines
      • Customer Service
      • Automatic Trading
    • Training NERs: Data Processing
    • Computing Accuracy
    • Programming Assignment
      • Named Entity Recognition (NER)

Day 28 | October 19, 2020 | Monday

Today's Progress: Today I continued with the Natural Language Processing with Sequence Models course from the Natural Language Processing Specialization.

Description: This includes the following:

  • LSTMs and Named Entity Recognition
    • RNNs and Vanishing Gradients
      • Advantages vs Disadvantages
      • Exploding Gradients
      • Identity RNN with ReLU Activation
      • Gradient Clipping
      • Skip Connections
    • Introduction to LSTMs
      • Basic LSTM Structure
      • Applications of LSTMs
    • Understanding LSTMs
    • LSTM Architecture
      • The Forget Gate
      • The Input Gate
      • The Output Gate

Day 27 | October 18, 2020 | Sunday

Today's Progress: Today I continued with the Natural Language Processing with Sequence Models course from the Natural Language Processing Specialization.

Description: This includes the following:

  • N-Grams vs Sequence Models
    • Cost Function for RNNs
      • Cross Entropy Loss
    • Implementation Notes
      • Abstraction in Frameworks
      • tf.scan() function
    • Gated Recurrent Units
      • Reset Hidden Gates
      • Update Hidden Gates
      • Vanilla RNNs vs GRUs
    • Deep and Bi-Directional RNNs
    • Programming Assignments
      • Deep N-grams

Day 26 | October 17, 2020 | Saturday

Today's Progress: Today I continued with the Natural Language Processing with Sequence Models course from the Natural Language Processing Specialization.

Description: This includes the following:

  • N-Grams vs Sequence Models
    • Traditional Language Models
      • Large Corpus Requirements
      • Large Space and RAM Requirements
    • Recurrent Neural Networks
      • Basic Structure
      • Advantages
    • Applications of RNNs
      • One to One
      • One to Many
      • Many to One
      • Many to Many
    • Math in Simple RNNs

Day 25 | October 16, 2020 | Friday

Today's Progress: Today I continued with the Natural Language Processing with Sequence Models course from the Natural Language Processing Specialization.

Description: This includes the following:

  • Neural Networks for Sentiment Analysis
    • Trax: Layers
      • Classes
      • Subclasses
      • Instances
    • Dense and ReLU Layers
    • Serial Layer
    • Other Layers
      • Embedding Layer
      • Mean Layer
    • Training
      • Gradients by grad()
    • Programming Assignment
      • Sentiment Analysis with Deep Neural Networks

Day 24 | October 15, 2020 | Thursday

Today's Progress: Today I enrolled in the Natural Language Processing with Sequence Models course from the Natural Language Processing Specialization.

Description: This includes the following:

  • Neural Networks for Sentiment Analysis
    • Introduction
    • NN for Sentiment Analysis
      • Neural Network Structure
      • Forward Propagation
      • Initial Representation
    • Trax: Neural Networks
      • Trax Highlights
      • Advantages of Trax
    • Why Trax?
      • Makes programmers efficient
      • Runs code fast

Coursera | Natural Language Processing with Sequence Models

Day 23 | October 14, 2020 | Wednesday

Today's Progress: Today I completed the Algorithms for Non-Linear Optimization course by Michael Zibulevsky.

Description: This includes the following:

  • Conversion of Different Problems to SDP - II
    • Conic Quadratic Programming via SDP
      • Schur Complement
    • Barrier Method for Conic Programming
      • Barrier Aggregate
      • Examples of Barriers
    • Matrix Functions
      • Eigenvalue Decomposition
    • Gradient of Trace of Matrix Function
    • Gradient of log det A
    • Gradient of log det Barrier

Day 22 | October 13, 2020 | Tuesday

Today's Progress: Today I continued with the Algorithms for Non-Linear Optimization course by Michael Zibulevsky.

Description: This includes the following:

  • Conversion of Different Problems to SDP - I
    • Lemma of Schur Complement
    • Minimize Maximal Eigenvalue of Symmetric Matrix
    • Linear Matrix Approximation
    • Expression of Linear Programming via SDP

Day 21 | October 12, 2020 | Monday

Today's Progress: Today I continued with the Algorithms for Non-Linear Optimization course by Michael Zibulevsky.

Description: This includes the following:

  • Conic Programming I
    • Duality in Conic Programming
    • Dual Conic Problem
      • Weak Duality Theorem
    • Strong Conic Duality and Complementarily Slackness
    • Example
      • Dual SDP Problem
    • Minimax Problem
    • Chebyshev Approximation
    • Complex Valued Chebyshev Approximation

Day 20 | October 11, 2020 | Sunday

Today's Progress: Today I continued with the Algorithms for Non-Linear Optimization course by Michael Zibulevsky.

Description: This includes the following:

  • Conic Programming I
    • Examples of Cones
      • R^n+
      • Lorenz Cone
      • Cone of Positive Semidefinite Matrices
    • Conic Programming Problems
    • Semidefinite Programming
    • Dual Cone
      • Self-Dual

Day 19 | October 10, 2020 | Saturday

Today's Progress: Today I continued with the Algorithms for Non-Linear Optimization course by Michael Zibulevsky.

Description: This includes the following:

  • Minimax Theorem, Game Theory and Lagrange Duality
    • Game Interpretation of Minimax
    • Saddle Point Theorem
    • Minimax of Lagrangian
      • Weak Duality
    • Dual Problem and Weak Duality
    • Strong Duality
    • Slayter Condition for Strong Duality
    • Examples of Dual Problems
      • Quadratic Program
      • Linear Program

Day 18 | October 09, 2020 | Friday

Today's Progress: Today I continued with the Algorithms for Non-Linear Optimization course by Michael Zibulevsky.

Description: This includes the following:

  • Lagrange Multipliers via Penalty Method
    • Active vs Non-Active Constraints
    • Penalty Function for Equality Constraints
      • First Order Necessary Optimal Conditions
    • Barrier Method
    • Augmented Lagrangian Method
      • Penalty-Multiplier Function
      • Algorithm
    • Augmented Lagrangian for Equality Constraints

Day 17 | October 08, 2020 | Thursday

Today's Progress: Today I continued with the Algorithms for Non-Linear Optimization course by Michael Zibulevsky.

Description: This includes the following:

  • Summary of Unconstrained Optimization
    • 1-D Methods
      • Golden Section - No Derivatives
      • Bisection
      • Quadratic/Cubic Interpolation Methods
      • Inexact Line Search
        • Backtracking Method
        • Armijjo Rule
    • Multidimensional Optimizations
      • Steepest Descent
      • Newton
        • Gauss-Newton
      • Conjugate Gradient
      • Truncated Newton
      • Quassi-Newton
        • BFGS
      • Sequential Subspace Optimization
      • Nelder-Mead Simplex Method
  • Constrained Optimization
    • Lagrangian
  • Karush-Kuhn-Tucker First Order
    • Necessary Optimality Conditions
  • Penalty Function Method
    • Penalty Aggregate
    • Ideal Penalty Aggregate
    • Algorithm

Day 16 | October 07, 2020 | Wednesday

Today's Progress: Today I continued with the Algorithms for Non-Linear Optimization course by Michael Zibulevsky.

Description: This includes the following:

  • Sequential Subspace Optimization
    • Fast Subspace Optimization
  • Quasi-Newton Method
    • Approximate Newton Direction
  • Approximating Hessian
    • Secant Equation
  • Sherman-Morison Formula
  • Broyden Family of Quasi-Newton
    • BFGS - Broyden, Fletcher, Goldfarb, Shanno
    • DFP - Davidon, Fletcher, Powell
  • Initialization and Convergence Properties

Day 15 | October 06, 2020 | Tuesday

Today's Progress: Today I continued with the Algorithms for Non-Linear Optimization course by Michael Zibulevsky.

Description: This includes the following:

  • Conjugate Gradient Method - Part 2
    • Derivation of Conjugate Gradient Method
      • Exact Line Search
      • Gram-Schmidt for Q-Orthogonality
    • Properties
      • Simplification
      • Polak-Ribiere Method
      • Fletcher-Reevs Method
    • Conjugate Gradient Method Summary
    • Convergence Rate of Conjugate Gradient Method
    • Preconditioning
    • Truncated Newton's Method
      • Neewton System
    • Compute Resource Analysis for product of
      • Function
      • Gradient
      • Hessian Vector

Day 14 | October 05, 2020 | Monday

Today's Progress: Today I continued with the Algorithms for Non-Linear Optimization course by Michael Zibulevsky.

Description: This includes the following:

  • Conjugate Gradient Method - Part 1
    • Scalar Product
    • Gram-Schmidt Orthogonalization
    • Q-Conjugate/Q-Orthogonal Directions
    • Minimization of a Quadtratic Function
      • Conjugate Direction Method
    • Expanding Manifold Property
      • Affine Subspace

Day 13 | October 04, 2020 | Sunday

Today's Progress: Today I continued with the Algorithms for Non-Linear Optimization course by Michael Zibulevsky.

Description: This includes the following:

  • Newton's Method
    • For Non-linear Equations
    • Modified Newton's Method
      • Enforcing Descent Direction
    • Solving Symmetric System of Equations
      • Cholesky Factorization
      • Modified Cholesky Decomposition
    • Least Squares Problem
      • Gauss-Newton Method
      • Levenberg-Marquardt Method

Day 12 | October 03, 2020 | Saturday

Today's Progress: Today I continued with the Algorithms for Non-Linear Optimization course by Michael Zibulevsky.

Description: This includes the following:

  • Multidimensional Unconstrained Optimization Methods
    • Line Search Methods
      • Directional Derivative
    • Choice of Step Size
      • Exact Line Search
      • Inexact Line Search - Armijo Rule
      • Constant Step Size
      • Diminishing Step Size
    • Steepest/Gradient Descent
      • Linear Convergence Rate
    • Newton's Method
      • Derivation
      • Trust Region
      • Asymptotic Quadratic Convergence

Day 11 | October 02, 2020 | Friday

Today's Progress: Today I continued with the Algorithms for Non-Linear Optimization course by Michael Zibulevsky.

Description: This includes the following:

  • Optimality Conditions
    • Convex vs Non-Convex Functions
    • Sufficient Optimality Conditions
    • One-Dimensional Optimization
      • Bisection Method
      • Golden Selection Method
    • Quadratic Interpolation
      • Superlinear Convergence
    • Cubic Interpolation

Day 10 | October 01, 2020 | Thursday

Today's Progress: Today I continued with the Algorithms for Non-Linear Optimization course by Michael Zibulevsky.

Description: This includes the following:

  • Local and Global Minima
    • Definition
    • Norms
    • Convex Functions and Minima
      • Proof by Contradiction
    • Optimality Conditions
      • Proof by Gradient Inequality
    • Non-Convex Functions and Minima
    • Sufficient Optimality Condition

Day 09 | September 30, 2020 | Wednesday

Today's Progress: Today I continued with the Algorithms for Non-Linear Optimization course by Michael Zibulevsky.

Description: This includes the following:

  • Convex Sets & Functions
    • Definition of Set and Function
    • Properties of Convex Sets
    • Properties of Convex Functions
    • Extended Value Convex Functions
    • Epigraph
    • Properties of Epigraph
    • Convex Combination
    • Convex Hull
    • Jensen Inequality
    • Gradient Inequality
    • Second Derivative

Day 08 | September 29, 2020 | Tuesday

Today's Progress: Today I continued with the Algorithms for Non-Linear Optimization course by Michael Zibulevsky.

Description: This includes the following:

  • Derivates of Multivariate Functions
    • Total Differential
      • Gradient
      • External Definition of Gradient
    • Directional Derivative
    • Hessian
    • Second Directional Derivative
    • Example - Gradient and Hession of:
      • Linear Operator
      • Quadratic Functions
    • Taylor Expansion
    • Function of Matrices
    • Gradient of Function of a Matrix
    • Example
      • Gradient of a Neural Network

Day 07 | September 28, 2020 | Monday

Today's Progress: Today I continued with the Algorithms for Non-Linear Optimization course by Michael Zibulevsky.

Description: This includes the following:

  • Linear Algebra Refresh
    • N-Dimensional Euclidean Space
    • Linear Subspace
    • Affine Subspace
    • Vector Norm
    • Euclidean Norm
    • Matrix Norm
      • Frobenius Norm
      • Induced Matrix Norm
    • Inner Product
    • Eigenvalue Decomposition
    • Matrix Polynomials and Functions
    • Positive (Semi) Definite Symmetric Matrices

Day 06 | September 27, 2020 | Sunday

Today's Progress: Today I started the Algorithms for Non-Linear Optimization course by Michael Zibulevsky.

Description: This includes the following:

  • Unconstrained Optimization
    • Continous
    • Smooth
  • Constrained Optimization
    • Inequality and Equality Constraints
  • Optimality Conditions
  • Numerical Iterative Methods
  • Parametric Regression
    • Linear Regression
    • Nonlinear Regression
      • Neural Networks

YouTube | Introduction to Optimization

Day 05 | September 26, 2020 | Saturday

Today's Progress: Today I concluded the Bayesian Machine Learning in Python: A/B Testing course on Udemy.

Description: This includes the following:

  • Bayesian A/B Testing
    • Thompson Sampling
    • Online Nature of Bayesian A/B Testing
    • Finding Threshold without p-Value
  • Summary
    • Classical A/B Testing and Drawbacks
    • Bernoulli Distributed Data
    • Methods that adapt to Data Collected so far
    • How to Solve Explore-Exploit
  • Excercises

Day 04 | September 25, 2020 | Friday

Today's Progress: Today I continued with the Bayesian Machine Learning in Python: A/B Testing course on Udemy.

Description: This includes the following:

  • Bayesian A/B Testing
    • Exploit vs Explore Dilema
      • Multi-Armed Bandit
    • Reinforcement Learning
    • Epsilon-Greedy
    • UCB1
      • Chernoff-Hoeffding Bound
      • Upper Confidence Bound
    • Conjugate Priors
    • Beta Mean, Variance

Day 03 | September 24, 2020 | Thursday

Today's Progress: Today I continued with the Bayesian Machine Learning in Python: A/B Testing course on Udemy.

Description: This includes the following:

  • Traditional A/B Testing
    • Problem Setup
      • Hypotheses
        • Null Hypothesis
        • Alternative Hypothesis (One-Sided)
        • Alternative Hypothesis (Two-Sided)
    • Test Statistic
    • p-Value
    • Testing Characteristics
    • Pooled Standard Deviation
      • Welch's t-Statistic
    • Non-Parametric Tests
      • Kolmogorov-Smirnov Test
      • Kruskal-Wallis Test
      • Mann-Whitney U test
    • Chi-Square Test Statistic
      • Yates Correction
      • Fisher's Exact Test
    • Benferroni Correction
    • Pairwise Testing
    • One-vs-Rest Test
    • Post Hoc Testing
    • Statistical Power
    • Pitfalls of Traditional A/B Testing

Day 02 | September 23, 2020 | Wednesday

Today's Progress: Today I continued with the Bayesian Machine Learning in Python: A/B Testing course on Udemy.

Description: This includes the following:

  • Bayes Rule and Probability Review
    • Marginal Distributions
    • Joint Distribution
    • Conditional Distribution
    • Discrete vs Continuous Random Variables
    • Bayes' Rule
    • Independence
    • The Gambler's Fallacy
    • The Monty Hall Problem
    • Maximum Likelihood Estimation
      • The Bernoulli Distribution
      • The Gaussian Distribution
      • Unbiased Estimate of the Co-Variance Matrix
    • Confidence Intervals
      • Cumulative Distribution Function
      • Confidence Interval Approximation
      • Bernoulli Confidence Approximation
    • The Bayesian Paradigm

Day 01 | September 22, 2020 | Tuesday

Today's Progress: Today I enrolled in the Bayesian Machine Learning in Python: A/B Testing course on Udemy.

Description: This includes the following:

  • Introduction and Course Outline
  • Real World Examples of A/B Testing
    • Medicine
    • Website
    • Local Business Flyers
  • What is Bayesian Machine Learning
    • Bayesian vs Frequentist Approach
    • Sampling
    • Bayesian Networks
    • Latent Dirichlet Allocation (LDA) Algorithm

Important Links

Udemy | Bayesian Machine Learning in Python: A/B Testing

Part - I (13 January 2020 to 22 April 2020)

Day 100 | April 22, 2020 | Wednesday

Today's Progress: Today I completed the Unsupervised Machine Learning Hidden Markov Models in Python course on Udemy.

Description: This includes the following:

  • Discrete HMMs using Deep Learning Libraries
    • Gradient Descent
    • Discrete HMM in Tensorflow
  • HMMs for Continuous Observations
    • Gaussian Mixture Models with Hidden Markov Models
    • Generating Data from a Real-Valued HMM
    • Continuous HMM in Tensorflow
  • HMMs for Classification
    • Generative vs Discriminative Classifiers
    • HMM Classification on Poetry Data
  • Parts-of-Speech Tagging
    • PoS Tagging Concepts
    • PoS Tagging with an HMM

Relevant Codebase

Day 99 | April 21, 2020 | Tuesday

Today's Progress: Today I continued the Unsupervised Machine Learning Hidden Markov Models in Python course on Udemy.

Description: This includes the following:

  • Hidden Markov Models for Discrete Observations
    • From Markov Models to Hidden Markov Models
      • Latent Variables
    • HMMs are Doubly Embedded
    • How to choose the number of Hidden States
      • K-Fold Cross-Validation
    • The Forward-Backward Algorithm
    • The Viterbi Algorithm
    • The Baum-Welch Algorithm
      • The Expectation-Maximization Algorithm
      • Lagrange Multipliers
    • Baum-Welch Updates for Multiple Observations
    • The Underflow Problem and its Solution
      • Viterbi (Applying Log)
      • Scaling Forward
      • Scaling Backward
    • Implementation of Discrete HMM in Python

Relevant Codebase

Day 98 | April 20, 2020 | Monday

Today's Progress: Today I enrolled in the Unsupervised Machine Learning Hidden Markov Models in Python course on Udemy.

Description: This includes the following:

  • Introduction
    • Intro to Hidden Markov Models
    • Common Use Cases for HMMs
    • Unsupervised vs Supervised
  • Markov Models
    • The Markov Property
    • Markov Models
      • Transition Probabilities
      • Initial State Distribution
      • Maximum Likelihood
      • Smooth Estimates
    • The Math of Markov Chains
      • Stationary Distributions
      • Limiting Distribution
  • Markov Models: Examples, Problems and Applications
    • Sick vs Healthy
    • SEO and Bounce Rate Optimization
    • 2nd-Order Language Model
      • Python Implementation
      • Eminem Style Rap Generation
    • Google's Page Rank Algorithm
      • Perron-Frobenius Theorem

Relevant Codebase

Day 97 | April 19, 2020 | Sunday

Today's Progress: Today I completed the Smart Analytics, Machine Learning, and AI on GCP course of Data Engineering with Google Cloud Professional Certificate on Coursera.

Description: This includes the following:

  • Week 2: Module 1
    • Productionizing Custom ML Models
      • Phases of ML Projects
      • Ways to do Custom ML on GCP
      • Kubeflow
      • AI Hub
    • Lab 3: Running AI Models on Kubeflow
      • Setting up Kubeflow on a Kubernetes Engine Cluster
      • Packaging a Tensorflow Program in a Container and Uploading it to Google Container Registery
      • Submitting a TF-train Job and Save the Resulting Model to Google Cloud Storage
      • Serving and Interacting with a Trained Model
  • Week 2: Module 2
    • BigQuery ML
      • BigQuery ML for Quick Model Building
      • Classification, Regression, and Recommender Models
      • Unsupervised ML with Clustering Models
    • Lab 4: Predict Bike Trip Duration with a Regression Model in BQML
      • Querying and Exploring the London Bicycles Dataset for Feature Engineering
      • Creating a Linear Regression Model in BQML
      • Evaluate the Performance of your ML Model
      • Extracting your Model Weights
    • Lab 5: Movie Recommendations in BigQuery ML
      • Training a Recommendation Model in BigQuery
      • Making Product Predictions for Both Single Users and Batch Users
  • Week 2: Module 3
    • Cloud AutoML
      • Why AutoML?
      • Auto ML Vision
      • Auto ML NLP
      • Auto ML Tables

Certificate | Smart Analytics, Machine Learning, and AI on GCP

Day 96 | April 18, 2020 | Saturday

Today's Progress: Today I enrolled in the Smart Analytics, Machine Learning, and AI on GCP course of Data Engineering with Google Cloud Professional Certificate on Coursera.

Description: This includes the following:

  • Week 1: Module 1
    • Introduction
    • Analytics and AI
      • What is ML?
      • Machine Learning and AI
      • ML Options on GCP
      • Reviewing Key ML Concepts
    • Prebuilt ML Model APIs
      • Unstructured Data
    • Lab 1: Using NL API to Classify Unstructured Text
      • Creating a Natural Language API Request
      • Calling the API with curl
      • Using the NL API's Text Classification Feature
      • Using Text Classification to Understand a Dataset of News Articles
  • Week 1: Module 2
    • Cloud AI Platform Notebooks
      • What's a Notebook
      • BigQuery Magic and Ties to Pandas
    • Lab 2: BigQuery in Jupyter Labs on AI Platform
      • Instantiating a Jupyter Notebook on AI Platform
      • Execute a BigQuery query from within a Jupyter and Processing the Output using Pandas

Coursera | Data Engineering with Google Cloud Professional Certificate

Coursera | Smart Analytics, Machine Learning, and AI on GCP

Day 95 | April 17, 2020 | Friday

Today's Progress: Today I completed the Building Resilient Streaming Analytics Systems on GCP course of Data Engineering with Google Cloud Professional Certificate on Coursera.

Description: This includes the following:

  • Week 2: Module 3
    • Bigquery: Advance Funcionality
      • GIS Functions
      • WITH Clauses vs Permanent Tables
      • Analytical Window Functions
      • Ranking Functions + ARRAYs
    • Lab 5: Optimizing BigQuery Queries for Performance
      • Using BiqQuery to:
        • Minimizing I/O
        • Caching Results of Previous Queries
        • Avoiding Overwhelming Single Workers
        • Using Approximate Aggregation Functions
  • Week 2: Module 4
    • Performance Considerations
      • I/o
      • Shuffle
      • Grouping
      • Materialization
      • Functions and UDFs
    • Lab 6: Creating Date-Partitioned Tables in BigQuery
      • Querying a Partioned Dataset
      • Creating Dataset Partitions to Improve Query Performance and Reduce Cost

Certificate | Building Resilient Streaming Analytics Systems on GCP

Day 94 | April 16, 2020 | Thursday

Today's Progress: Today I continued with the Building Resilient Streaming Analytics Systems on GCP course of Data Engineering with Google Cloud Professional Certificate on Coursera.

Description: This includes the following:

  • Week 2: Module 1
    • Streaming into BigQuery
      • Streaming
      • Visualizing Results
    • Lab 3: Streaming Analytics and Dashboards
      • Connecting to a BigQuery data source
      • Creating reports and charts to visualize BigQuery data
  • Week 2: Module 2
    • Streaming into Cloud BigTable
      • High-Throughput Streaming with Cloud Bigtable
      • Optimizing Cloud Bigtable Performance
    • Lab 4: Streaming Data Pipelines into Bigtable
      • Launching Dataflow pipeline to read from Pub/Sub and writing into Bigtable
      • Opening an HBase shell to query the Bigtable database

Day 93 | April 15, 2020 | Wednesday

Today's Progress: Today I continued with the Building Resilient Streaming Analytics Systems on GCP course of Data Engineering with Google Cloud Professional Certificate on Coursera.

Description: This includes the following:

  • Week 1: Module 2
    • Cloud Dataflow Capabilities for Streaming Data
      • Streaming Data Challenges
      • Cloud Dataflow Windowing
    • Lab 2: Streaming Data Pipelines
      • Launching Dataflow and Running a Dataflow Job
      • Understanding how Data Elements Flow through the Transformations of a Dataflow Pipeline
      • Connecting Dataflow to Pub/Sub and BigQuery
      • Observing and Understanding how Dataflow Autoscaling adjusts Compute Resources to Process Input Data Optimally
      • Learning Where to find Logging Information Created by Dataflow
      • Explore Metrics and Create Alerts and Dashboards with Stackdriver Monitoring

Day 92 | April 14, 2020 | Tuesday

Today's Progress: Today I enrolled in the Building Resilient Streaming Analytics Systems on GCP course of Data Engineering with Google Cloud Professional Certificate on Coursera.

Description: This includes the following:

  • Week 1: Module 1
    • Introduction
    • Processing Streaming Data
    • Cloud Pub/Sub
      • Introduction to Pub/Sub
      • Cloud Pub/Sub Push vs Pull
      • Publishing with Pub/Sub Code
    • Lab 1: Publish Streaming Data into Pub/Sub
      • Creating a Pub/Sub Topic and Subscription
      • Simulating your Traffic Sensor Data into Pub/Sub

Coursera | Data Engineering with Google Cloud Professional Certificate

Coursera | Building Resilient Streaming Analytics Systems on GCP

Day 91 | April 13, 2020 | Monday

Today's Progress: Today I completed the Building Batch Data Pipelines on GCP course of Data Engineering with Google Cloud Professional Certificate on Coursera.

Description: This includes the following:

  • Week 2: Module 4
    • Aggregating with GroupByKey and Combine
    • Lab 4: MapReduce in Cloud DataFlow
      • Identifying Map and Reduce Operations
      • Executing the Pipeline
      • Using Command Line Parameters
  • Week 2: Module 5
    • Side Inputs and Windows of Data
    • Lab 5: Practicing Pipeline Side Inputs
      • Trying out a BigQuery query
      • Exploring the pipeline code
      • Executing the pipeline
  • Week 2: Module 6
    • Cloud Dataflow Templates and SQL
      • Creating and Reusing Pipeline Templates
      • Cloud Dataflow SQL Pipelines

Certificate | Building Batch Data Pipelines on GCP

Day 90 | April 12, 2020 | Sunday

Today's Progress: Today I continued with the Building Batch Data Pipelines on GCP course of Data Engineering with Google Cloud Professional Certificate on Coursera.

Description: This includes the following:

  • Week 2: Module 3
    • Running Batch Processing Pipelines on Cloud Dataflow
      • Cloud Dataflow
      • Why Customers value Dataflow
      • Building Cloud Dataflow Pipelines in Code
      • Key Considerations with Designing Pipelines
      • Transforming Data with PTransforms
    • Lab 3: Dataflow Pipeline
      • Setting up a Python Dataflow project using Apache Beam
      • Writing a simple pipeline in Python
      • Executing the query on the local machine
      • Executing the query on the cloud

Day 89 | April 11, 2020 | Saturday

Today's Progress: Today I continued with the Building Batch Data Pipelines on GCP course of Data Engineering with Google Cloud Professional Certificate on Coursera.

Description: This includes the following:

  • Week 2: Module 2
    • Cloud Composer
      • Orchestrating work b/w GCP Services with Cloud Composer
      • Apache Airflow
      • DAGs and Operators
      • Workflow Scheduling
      • Monitoring and Logging
    • Lab 3: Cloud Composer
      • Using GCP Console to create the Cloud Composer environment
      • Viewing and run the DAG (Directed Acyclic Graph) in the Airflow web interface
      • Viewing the results of the wordcount job in storage.

Day 88 | April 10, 2020 | Friday

Today's Progress: Today I continued with the Building Batch Data Pipelines on GCP course of Data Engineering with Google Cloud Professional Certificate on Coursera.

Description: This includes the following:

  • Week 2: Module 1
    • Cloud Data Fusion
      • Introduction
      • Components of Data Fusion
      • Building a Pipeline
      • Exploring Data using Wrangler
    • Lab 2: Cloud Data Fusion
      • Connecting Cloud Data Fusion to a couple of data sources
      • Applying basic transformations
      • Joining two data sources
      • Writing data to a sink

Day 87 | April 9, 2020 | Thursday

Today's Progress: Today I enrolled in the Building Batch Data Pipelines on GCP course of Data Engineering with Google Cloud Professional Certificate on Coursera.

Description: This includes the following:

  • Week 1: Module 1
    • EL, ELT, ETL
      • Refresher
    • Quality Considerations
      • Validity
      • Accuracy
      • Completeness
      • Consistency
      • Uniformity
    • BigQuery for ELT
    • Shortcomings of ELT
    • ETL for Data Quality Issues
      • DataProc
      • DataFlow
      • DataFusion
      • Data Catalog
  • Week 1: Module 2
    • Executing Spark on Cloud Dataproc
      • The Hadoop Ecosystem
      • Running Hadoop on Cloud Dataproc
      • GCS instead of HDFS
      • Optimizing Dataproc
      • Optimizing Dataproc Storage
      • Optimizing Dataproc Templates and Autoscaling
      • Optimizing Dataproc Monitoring
    • Lab 1: Running Apache Spark Jobs on Cloud Dataproc
      • Migrating existing Spark Jobs to Cloud Dataproc
      • Modify Spark Jobs to use Cloud Storage instead of HDFS
      • Optimize Spark Jobs to run on Job Specific Clusters

Coursera | Data Engineering with Google Cloud Professional Certificate

Coursera | Building Batch Data Pipelines on GCP

Day 86 | April 8, 2020 | Wednesday

Today's Progress: Today I completed the Data Science: From Prediction to Production on Udemy.

Description: This includes the following:

  • The Importance of Well Written Code
    • Respect Your Code
    • Coding Standards
      • Production vs Research Code
    • Meet Uncle Bob
      • Clean Code by Robert C. Martin
    • Data Sciene Clean Code
  • Advance Topics in Predictive Modeling
    • Three Factors which Impact Accuracy
      • Reduce Noise Variance
        • Better Features
        • Better Representation
      • Reduce Estimators Variance
        • Increase Sample
        • Increase Predictors Variance
        • Decrease Predictors Correlation
    • The Best Black-Box Model
    • How to Use Dummies the Right Way
    • The Price of Wrong Feature Set
      • Omitting Relevant Variable
      • Including Irrelevant Variable
    • The Impact of Measurements Errors
      • Errors that Create Bias
      • Errors that don't Create Bias
    • Heteroskedasticity Illness

Day 85 | April 7, 2020 | Tuesday

Today's Progress: Today I continued with the Data Science: From Prediction to Production on Udemy.

Description: This includes the following:

  • How to Plan the Development
    • The Path from Concept to Production
      • Theoretical Framework
      • High Level Design
      • Iteration 0
      • Dry Runs
      • Live Tests
    • How to Measure you Progress
      • Breakthroughs
        • Solid Theoretical Framework
        • Start Live Tests
        • Successful Live Tests
    • Scrum or Kanban
    • How to Make Time Assessments
      • Complexity Risk
      • Technology Risk
      • Tasks Risk

Day 84 | April 6, 2020 | Monday

Today's Progress: Today I continued with the Data Science: From Prediction to Production on Udemy.

Description: This includes the following:

  • What Makes You Professional
    • Your Main Responsibility
      • Building Applications
      • How to Deliver?
      • Learning from Software Developers
    • Skills You Must Have
      • Data Science Skills
        • Analytical Skills
        • Business Accumen
        • Modeling Skills
        • Statistics
        • Machine Learning
        • Good Sense about Data
        • Common Sense
        • Familiarity with Data Science Tools and Technologies
      • Delivering Skills
        • Deliver Fast
        • Respond Quickly to Changes
        • Build Robust Models
    • Developer or Researcher
  • Guidelines for Delivering Fast Results
    • Mistakes as a Junior
    • Six Important Principles
      1. Satisfy the Customer
      2. Welcome Changing Requirements
      3. Deliver Working Software Frequently
      4. Promote Sustainable Development
      5. Technical Excellence
      6. Simplicity
    • Satisfy the Customer
      1. Understand the Customer's Business
      2. Deliver Early Valuable Software
      3. Open the Black Box
      4. Communicate Frequently
      5. Do a Soft Launch
    • Simplicity is Essential
      • "Start Lean, Thicker Later"
      • "We Build Applications, Not Fancy Models"
    • Practical Perspective about Scale

Day 83 | April 5, 2020 | Sunday

Today's Progress: Today I started the Data Science: From Prediction to Production on Udemy.

Description: This includes the following:

  • Practical Perspective about Predictions
    • Why Prediction is the Wrong Term
      1. Single Point Prediction is Useless
      2. Prediction is about Knowing the Distribution of the Future
      3. Models without the Appropriate Evaluation are Useless
    • The Characteristicss of a Good Prediction
      • Different
        • Model Type
        • Feature List
        • Expressiveness results in Different Error Distribution
      • Properties of Good Prediction
        1. Stable Variance
        2. Light Tailed Variance
        3. Symmetry
        4. Known Distribution
  • Guidelines for Selecting Models
    • Why Linear Models are Great
      1. Stable Variance
      2. Simple
      3. Efficient
      4. Easy to Interpret
      5. Suitable for First Iteration
    • When to Use Nonlinear Models
      1. Direct Relation is Clearly not Linear
      2. Many Features
      3. Srtuctural Phenomenon
    • The Risk of Nonlinear Modeling
      1. Overfitting
      2. Variance Instability
      3. Heavy Tailed Distribution
      4. Corner Solutions
      5. Terrible Performance
    • Risk Management in Nonlinear Modeling
      1. Ensemble Methods
      2. Narrowing the Feature Space
      3. Specific Tuning Parameters

Important Links:

Udemy | Data Science: From Prediction to Production

Day 82 | April 4, 2020 | Saturday

Today's Progress: Today I studied about Reinforcement Learning for Stock Trading

Description: This includes the following:

  • Intro to Reinforcement Learning
  • Deep Q-Learning
  • Defining States
    • Time Series (Fixed Window)
    • Cash in Hand
    • Buying Current Stocks to Buy Better Stocks
  • Defining Actions
    • Buy/Sell/Hold Stock
    • For N stocks, 3^N possibilities
    • How many stocks to sell/buy?
  • Simplified Actions
    • Ignore Transaction Costs
    • Avoid Knapsack Problem
    • Buy Multiple Stocks in Round Robin Fashion
    • Sell before Buy
  • Defining Rewards
    • Portfolio Value
  • Minimal Trading Bot Implementation in Tensorflow 2.0

Relevant Codebase

Day 81 | April 3, 2020 | Friday

Today's Progress: Today I studied about Recommendation Systems

Description: This includes the following:

  • Introduction
  • Content Based Methods
  • Collaborative Filtering Methods
    • Model Based
      • Matrix Facorization
    • Memory Based
      • Item Centered Bayesian Classifier
      • User Centered Linear Regression
  • Hybrid Methods
  • Evaluation of Recommendation Systems
    • Metric Based Evaluation
    • Human Based Evaluation
  • Deep Learning for Recommendation Systems
    • Embeddings
  • Simple Movie Recommender System (TensorFlow 2.0)

Important Links:

Article 1 | Introduction to Recommender Systems

Article 2 | Recommender Systems in Practice

Article 3 | Recommender Systems with Deep Learning Architectures

Article 4 | RecSys Series Part 2: The 10 Categories of Deep Recommendation Systems That Academic Researchers Should Pay Attention To

Relevant Codebase

Day 80 | April 2, 2020 | Thursday

Today's Progress: Today I completed the Bash Scripting, Linux and Shell Programming Complete Guide on Udemy.

Description: This includes the following:

  • Bash Scripting
    • Bash File Structure
    • Echo Command
    • Comments
    • Variables
    • Strings
    • Loops
      • While
      • For
      • Until
      • Break & Continue
    • User Input
    • Conditional Statements
    • Case Statements
    • Command Line Arguments
    • Functions
      • Global vs Local Variables
    • Arrays
    • Shell & Environment Variables
    • Scheduled Automation
    • Aliases
    • Wildcards
    • Multiple Commands

Relevant Codebase

Day 79 | April 1, 2020 | Wednesday

Today's Progress: Today I continued with the Bash Scripting, Linux and Shell Programming Complete Guide on Udemy.

Description: This includes the following:

  • Users
    • Run Commands as Superuser
    • Change User
    • Show Effecter User and Group IDs
  • Killing Programs and Logging Out
    • Kill A Running Command
    • Kill All Processes By a Name
    • Logging Out of Bash
  • Shortcuts
    • No More Input
    • Clear Screen
    • Zoom In
    • Zoom Out
    • Moving the Cursor
    • Deleting Text
    • Fixing Typos
    • Cutting and Pasting
    • Character Capitalization

Day 78 | March 31, 2020 | Tuesday

Today's Progress: Today I continued with the Bash Scripting, Linux and Shell Programming Complete Guide on Udemy.

Description: This includes the following:

  • Getting Help
    • Show Manual Description
    • Search Manual
    • Reference Manuals
  • Working with Files/Folders
    • Creating a Folder
    • Creating a File
    • Copy Files/Folders
    • Move and Rename File/Folders
    • Delete Files/Folders
    • Delete Empty Folders
    • Change File Permission
  • Text Files
    • File Concatenation
    • File Perusal Filter
    • Terminal Based Text Editor

Day 77 | March 30, 2020 | Monday

Today's Progress: Today I started the Bash Scripting, Linux and Shell Programming Complete Guide on Udemy.

Description: This includes the following:

  • Introduction
    • Bash/Shell
    • Terminal
    • Shell
    • Console
  • Navigation
    • Listing Folder Contents
    • Print Current Folder
    • Change Folder
    • Using a Stack to Push Folders
    • Check File Type
    • Find File By Name & Update Locate Database
    • Find a Command
    • Show Command History

Important Links: Udemy | Bash Scripting, Linux and Shell Programming Complete Guide

Day 76 | March 29, 2020 | Sunday

Today's Progress: Today I completed the Modernizing Data Lakes and Data Warehouses with GCP course of Data Engineering with Google Cloud Professional Certificate on Coursera.

Description: This includes the following:

  • Week 2: Module 2
    • BigQuery as a Data Warehousing Solution
      • Exploring Schemas
      • Schema Design
      • Nested and Repeated Fields
    • Lab 4 (BigQuery: JSON and Array Data):
      • Loading semi-structured JSON into BigQuery
      • Creating and querying arrays
      • Creating and querying structs
      • Querying nested and repeated field
    • Partitioning and Clustering in BigQuery
      • Optimizing with Partitioning and Clustering
      • Creating Partitioned Tables
      • Partitioning and Clustering
      • Transforming Batch and Streaming Data

Certificate | Modernizing Data Lakes and Data Warehouses with GCP

Day 75 | March 28, 2020 | Saturday

Today's Progress: Today I continued with the Modernizing Data Lakes and Data Warehouses with GCP course of Data Engineering with Google Cloud Professional Certificate on Coursera.

Description: This includes the following:

  • Week 2: Module 1
    • Building a Data Warehouse
      • The Modern Data Warehouse
      • Intro to BigQuery
      • Querying TBs of Data in Seconds
      • Loading Data
    • Lab 3 (BigQuery):
      • Loading Data into BigQuery

Day 74 | March 27, 2020 | Friday

Today's Progress: Today I continued with the Modernizing Data Lakes and Data Warehouses with GCP course of Data Engineering with Google Cloud Professional Certificate on Coursera.

Description: This includes the following:

  • Week 1: Module 2
    • Building a Data Lake
      • Intro to Data Lakes
      • Data Storage and ETL options on GCP
      • Optimizing Cost with Google Cloud Storage classes and Cloud Functions
      • Securing Cloud Storage
      • Storing All Sorts of Data Types
      • Running Federated Queries on Parquet and ORC files in BigQuery
      • Storing Relational Data in the Cloud
      • Cloud SQL as a Relational Data Lake
    • Lab 1 (Cloud SQL):
      • Loading Data into Cloud SQL

Day 73 | March 26, 2020 | Thursday

Today's Progress: Today I enrolled in the Modernizing Data Lakes and Data Warehouses with GCP course of Data Engineering with Google Cloud Professional Certificate on Coursera.

Description: This includes the following:

  • Week 1: Module 1
    • Role of Data Engineer
    • Data Engineering Challenges
    • Intro to BigQuery
    • Data Lakes and Data Warehouses
    • Transactional Databases vs Data Warehouses
    • Effective Partnership with Other Data Teams
      • Manage Data Access and Governance
      • Build Production-Ready Pipelines
    • GCP Customer Case Study
      • Ocado
    • Lab 1 (Analysis with BigQuery):
      • Analyze 2 different public datasets
      • Run queries on them, to derive interesting insights
        • Separately
        • Combined

Coursera | Data Engineering with Google Cloud Professional Certificate

Coursera | Modernizing Data Lakes and Data Warehouses with GCP

Day 72 | March 25, 2020 | Wednesday

Today's Progress: Today I continued with the Google Cloud Platform Big Data and Machine Learning Fundamentals course of Data Engineering with Google Cloud Professional Certificate on Coursera.

Description: This includes the following:

  • Week 2: Module 2
    • ML Driving Business Value
    • ML on Unstructred Data
    • Choosing the Right ML Approach
      • Pre-Built AI Building Blocks
      • Using Pre-Built AI to Create a Chatbot
      • Customizing Pre-Built Models with AutoML
      • Building a Custom Model
    • Lab 5 (AutoML)
      • Setup API key for ML Vision API
      • Invoke the pretrained ML Vision API to classify images
      • Review label predictions from Vision API
      • Train and evaluate custom AutoML Vision image classification model
      • Predict with AutoML on new image

Certificate | Google Cloud Platform Big Data and Machine Learning Fundamentals

Day 71 | March 24, 2020 | Tuesday

Today's Progress: Today I continued with the Google Cloud Platform Big Data and Machine Learning Fundamentals course of Data Engineering with Google Cloud Professional Certificate on Coursera.

Description: This includes the following:

  • Week 2: Module 1
    • Modern Data Pipeline Challenges
    • Message-Oriented Architectures
    • Serverless Data Pipelines
      • Designing Streaming Pipelines with Apache Beam
      • Implementing Streaming Pipelines on Cloud DataFlow
    • Data Visualization with Data Studio
      • Building Collaborative Dashboards
      • Tips and Tricks to create Charts with the Data Studio UI
    • Lab 4 (Data Streaming Pipeline)
      • Connect to a streaming data Topic in Cloud Pub/sub
      • Ingest streaming data with Cloud Dataflow
      • Load streaming data into BigQuery
      • Analyze and visualize the results

Day 70 | March 23, 2020 | Monday

Today's Progress: Today I continued with the Google Cloud Platform Big Data and Machine Learning Fundamentals course of Data Engineering with Google Cloud Professional Certificate on Coursera.

Description: This includes the following:

  • Week 1: Module 3
    • Introduction to BigQuery
      • Fast SQL Query Engine
      • Managed Storage for Datasets
    • Insights from Geographic Data
    • Machine Learning on Structured Data
      • Choosing the right model type
      • Scenario: Predicting Customer Lifetime Value
    • Creating ML Models with SQL
      • Introduction to BigQuery ML
      • ML Projects Phases
      • Key Features Walkthrough
    • Lab 3 (BigQuery ML)
      • Use BigQuery to find public datasets
      • Query and explore the ecommerce dataset
      • Create a training and evaluation dataset to be used for batch prediction
      • Create a classification (logistic regression) model in BQML
      • Evaluate the performance of your machine learning model
      • Predict and rank the probability that a visitor will make a purchase

Day 69 | March 22, 2020 | Sunday

Today's Progress: Today I continued with the Google Cloud Platform Big Data and Machine Learning Fundamentals course of Data Engineering with Google Cloud Professional Certificate on Coursera.

Description: This includes the following:

  • Week 1: Module 2
    • Recommendation Systems
      • Data
      • Model
      • Training/Serving Infrastructure
    • Google Storage Systems
      • Cloud Storage
      • Cloud SQL
      • Cloud Spanner
      • DataStore
      • BigTable
      • BigQuery
    • Hadoop Ecosystem
      • Hadoop
      • Hive
      • Pig
      • Spark
    • Lab 2 (Recommendation System)
      • Create Cloud SQL instance
      • Create database tables by importing .sql files from Cloud Storage
      • Populate the tables by importing .csv files from Cloud Storage
      • Allow access to Cloud SQL
      • Explore the rentals data using SQL statements from CloudShell

Day 68 | March 21, 2020 | Saturday

Today's Progress: Today I enrolled in the Google Cloud Platform Big Data and Machine Learning Fundamentals course of Data Engineering with Google Cloud Professional Certificate on Coursera.

Description: This includes the following:

  • Week 1: Module 1
    • Google Cloud Architecture
      • Security
        • Google IAM
      • Compute Power
        • On Demand VMs
      • Storage
        • Multiregional
        • Regional
        • Nearline
        • Coldline
      • Networking
        • Edge Computing/Node/PoP
      • Big Data/ ML Products
        • GFS
        • MapReduce
        • BigTable
        • Dremel
        • Pub/Sub
        • Tensorflow etc
    • Lab 1 (BigQuery)
      • Query a public dataset
      • Create a custom table
      • Load data into a table
      • Query a table
    • GCP Approaches
      • Compute Engine
      • Google Kubernetes Engine (GKE)
      • App Engine
      • Cloud Functions
    • Business Use Cases w/ GCP

Important Links: Coursera | Data Engineering with Google Cloud Professional Certificate

Coursera | Google Cloud Platform Big Data and Machine Learning Fundamentals

Day 67 | March 20, 2020 | Friday

Today's Progress: Today I ended and reviewd the Time Series Analysis in Python 2020 course on udemy.

Description: I learned following things:

  • Differentiate between time series data and cross-sectional data.
  • Understand the fundamental assumptions of time series data and how to take advantage of them.
  • Transforming a data set into a time-series.
  • Start coding in Python and learn how to use it for statistical analysis.
  • Carry out time-series analysis in Python and interpreting the results, based on the data in question.
  • Examine the crucial differences between related series like prices and returns.
  • Comprehend the need to normalize data when comparing different time series.
  • Encounter special types of time series like White Noise and Random Walks.
  • Learn about "autocorrelation" and how to account for it.
  • Learn about accounting for "unexpected shocks" via moving averages.
  • Discuss model selection in time series and the role residuals play in it.
  • Comprehend stationarity and how to test for its existence.
  • Acknowledge the notion of integration and understand when, why and how to properly use it.
  • Realize the importance of volatility and how we can measure it.
  • Forecast the future based on patterns observed in the past.

Day 66 | March 19, 2020 | Thursday

Today's Progress: Today I continued the Time Series Analysis in Python 2020 course on udemy.

Description: This includes the following:

  • Business Case: Automobile Industry
    • Analysing the data leading up to the Volkswagen buyout of Porsche
    • In Retrospect Approach
    • The Dieselgate Scandal
    • Auto ARIMA
    • Predictions using Exogenous Variables
    • Measuring Volatility
    • GARCH

Relevant Codebase

Day 65 | March 18, 2020 | Wednesday

Today's Progress: Today I continued the Time Series Analysis in Python 2020 course on udemy.

Description: This includes the following:

  • Forecasting
    • Forecast vs Prediction
    • Forecasting Time Series:
      • ARMA
      • ARIMA
      • ARIMAX
      • SARIMA
      • SARIMAX
      • GARCH
    • Pitfalls of Forecasting
    • Multivariate Forecasting

Relevant Codebase

Day 64 | March 17, 2020 | Tuesday

Today's Progress: Today I continued the Time Series Analysis in Python 2020 course on udemy.

Description: This includes the following:

  • The Auto ARIMA Model
    • The Auto AutoRegressive Integrated Moving Average (ARIMA) Model
      • Manual vs Automatic Empirical Analysis
      • Pros & Cons
    • Fitting Default Best Fit Auto ARIMA Model
    • Auto ARIMA with Custom Arguments
    • Implementation in Python

Relevant Codebase

Day 63 | March 16, 2020 | Monday

Today's Progress: Today I continued the Time Series Analysis in Python 2020 course on udemy.

Description: This includes the following:

  • The ARCH Model
    • The AutoRegressive Conditional Heteroskedasticity (ARCH) Model
    • Volatility
      • EGARCH
    • Fitting a Simple ARCH Model
    • Fitting Higher Lag ARCH Model
    • Generalized AutoRegressive Conditional Heteroskedasticity (GARCH) Model
      • Volatility Clustering
    • Implementation in Python

Relevant Codebase

Day 62 | March 15, 2020 | Sunday

Today's Progress: Today I continued the Time Series Analysis in Python 2020 course on udemy.

Description: This includes the following:

  • The ARIMA Model
    • The AutoRegressive Integrated Moving Average (ARIMA) Model
    • Fitting a Simple ARIMA Model
    • Fitting Higher-Lag ARIMA Model
    • Higher Levels of Integration
    • Outdide Factors
      • Exogeneous Variables
      • ARMAX
      • ARIMAX
    • Seasonal Models
      • SARMA
      • SARIMA
      • SARIMAX
    • Predicting Stability
      • Volatility & Variance
    • Implementation in Python

Relevant Codebase

Day 61 | March 14, 2020 | Saturday

Today's Progress: Today I continued the Time Series Analysis in Python 2020 course on udemy.

Description: This includes the following:

  • The ARMA Model
    • The AutoRegressive Moving Average (ARMA) Model
    • Fitting a Simple ARMA Model
    • Fitting Higher-Lag ARMA Model
    • Examining the ARMA Model Residuals
    • ARMA Models and Non-Stationary Data
    • Implementation in Python

Relevant Codebase

Day 60 | March 13, 2020 | Friday

Today's Progress: Today I continued the Time Series Analysis in Python 2020 course on udemy.

Description: This includes the following:

  • Adjusting to Shocks The MA Model
    • Moving Average (MA) Model
    • Fitting the MA Model
    • Fitting Higher-Lag MA Model
    • Examining the MA Model Residuals
    • Model Selection for Normalized Returns
    • Past Values and Past Errors
    • Implementation in Python

Relevant Codebase

Day 59 | March 12, 2020 | Thursday

Today's Progress: Today I continued the Time Series Analysis in Python 2020 course on udemy.

Description: This includes the following:

  • Modelling AutoRegression
    • AR Model
    • ACF & PACF
    • Dicky-Fuller Test
    • LLR Test
    • Error Analysis w/ Residuals
    • Implementation in Python

Relevant Codebase

Day 58 | March 11, 2020 | Wednesday

Today's Progress: Today I continued the Time Series Analysis in Python 2020 course on udemy.

Description: This includes the following:

  • Picking the Correct Model
    • Significant Coefficients
    • Parsimonious
      • Log-Liklihood Ratio Test
      • AIC & BIC
    • Residuals

Day 57 | March 10, 2020 | Tuesday

Today's Progress: Today I continued the Time Series Analysis in Python 2020 course on udemy.

Description: This includes the following:

  • Working with Time Series in Python
    • White Noise
      1. Constant Mean
      2. Constant Variance
      3. No Autocorrelation
    • Random Walk
      • Market Efficiency
      • Arbitrage
    • Stationary
      • Covariance Stationary
        1. Constant Mean
        2. Constant Variance
        3. Consistent Covariance b/w different Time Periods
      • Determining Weak Form Stationary
        1. Dickey-Fuller Test
    • Seasonality
      • Decomposition
        • Trend
        • Seasonal
        • Residual
      • Naive Decomposition
        1. Additive
        2. Multiplicative
    • Autocorrelation
      • AutoCorrelation Function (ACF)
      • Partial AutoCorrelation Function (PACF)
    • Python Implementation

Relevant Codebase

Day 56 | March 9, 2020 | Monday

Today's Progress: Today I continued the Time Series Analysis in Python 2020 course on udemy.

Description: This includes the following:

  • Creating a Time Series Object
    • Transforming String Inputs into DateTime Values
    • Using Date as an Index
    • Setting the Frequency
    • Filling Missing Values
    • Adding and Removing Columns in DataFrame
    • Splitting Up the Data
    • Implementation in Python

Relevant Codebase

Day 55 | March 8, 2020 | Sunday

Today's Progress: Today I enrolled in Time Series Analysis in Python 2020 course on udemy.

Description: This includes the following:

  • Introduction
  • Setting Up the Environment
  • Introduction to Time Series in Python
    • Time Periods
    • Frequency
    • Pattern Persistance
    • Notations in Time-Series
    • Peculiarities of Time Series
    • Implementation in Python
      • Loading Data
      • Exploring Data
      • Visulaizing Data
      • QQ Plot

Important Links: Udemy | Time Series Analysis in Python 2020

Relevant Codebase

Day 54 | March 7, 2020 | Saturday

Today's Progress: Today I ended and reviewed with Python for Finance: Investment Fundamentals & Data Analytics course on udemy.

Description: This includes the following:

  • Rate of return of stocks
  • Risk of stocks
  • Rate of return of stock portfolios
  • Risk of stock portfolios
  • Correlation between stocks
  • Covariance
  • Diversifiable and non-diversifiable risk
  • Regression analysis
  • Alpha and Beta coefficients
  • Measuring a regression’s explanatory power with R^2
  • Markowitz Efficient frontier calculation
  • Capital asset pricing model
  • Sharpe ratio
  • Multivariate regression analysis
  • Monte Carlo simulations
  • Using Monte Carlo in a Corporate Finance context
  • Derivatives and type of derivatives
  • Applying the Black Scholes formula
  • Using Monte Carlo for options pricing
  • Using Monte Carlo for stock pricing

Day 53 | March 6, 2020 | Friday

Today's Progress: Today I continued with Python for Finance: Investment Fundamentals & Data Analytics course on udemy.

Description: This includes the following:

  • Part 17: Monte Carlo Simulations as a decision-making Tool
    • Monte Carlo Simulations
    • Monte Carlo in Corporate Finance Setting
      • Revenues
      • Cost of Goods Sold
      • Gross Profit
      • Cogs & Opex
    • Asset Pricing with Monte Carlo
      • Brownian Motion
        1. Drift
        2. Volatility
    • Derivative Contracts
      • Assets
        1. Stocks
        2. Bonds
        3. Interest Rates
        4. Commodities
        5. Exchange Rates
      • Groups of Derivatives
        1. Hedging
        2. Speculating
        3. Aribtrageurs
      • Types of Derivatives
        1. Forwards
        2. Futures
        3. Swaps
        4. Options
        • Call Options vs Put Options
    • The Black Scholes Formula
      • Efficient Markets
      • Transaction Costs
      • No Divided Payments
      • Known Volatility & Risk-Free
    • Implementation in Python

Relevant Codebase

Day 52 | March 5, 2020 | Thursday

Today's Progress: Today I continued with Python for Finance: Investment Fundamentals & Data Analytics course on udemy.

Description: This includes the following:

  • Part 16: Multivariate Regression Analysis
    • Fundamentals of Multivariate Regression
      • Higher Dimensions
      • R-Squared
      • P-Value
      • Beta Coefficients
    • Implementation in Python

Relevant Codebase

Day 51 | March 4, 2020 | Wednesday

Today's Progress: Today I continued with Python for Finance: Investment Fundamentals & Data Analytics course on udemy.

Description: This includes the following:

  • Part 15: The Capital Asset Pricing Model
    • The Capital Asset Pricing Model
      • Market Portfolio
      • Risk Free Asset
      • Beta Coefficient
      • Capital Market Line
      • Market Risk Premium
    • Sharpe Ratio
    • Achieving Alpha
    • Types of Investment Strategies
      • Passive Investing
      • Active Investing
      • Arbitrage Trading
      • Value Investing
    • Implementation in Python
      • Calculating Beta of a Stock
      • Calculating CAPM of a Stock
      • Calculating Sharpe Ratio

Relevant Codebase

Day 50 | March 3, 2020 | Tuesday

Today's Progress: Today I continued with Python for Finance: Investment Fundamentals & Data Analytics course on udemy.

Description: This includes the following:

  • Part 14: Markowitz Portfolio Optimization
    • Markowitz Portfolio Theory
      • Single Investment vs Diversified Potfolio
      • Markowitz Efficient Frontier
    • Implementation in Python
      • Calculating Expected Portfolio Return
      • Calculating Expected Portfolio Variance
      • Calculating Expected Portfolio Volatility
      • Calculating Markowitz Efficient Frontier

Relevant Codebase

Day 49 | March 2, 2020 | Monday

Today's Progress: Today I continued with Python for Finance: Investment Fundamentals & Data Analytics course on udemy.

Description: This includes the following:

  • Part 13: Using Regression for Financial Analysis
    • Fundamental of Simple Regression
      1. Univariate Regression
      2. Multivariate Regression
    • Good vs Bad Regression
      • R-squared
    • Implementation in Python
      • Running Regression model for House Price Data
      • Calculating:
        1. Slope (beta)
        2. Intercept (alpha)
        3. R Value
        4. R-squared Value
        5. P Value
        6. Standard Error

Relevant Codebase

Day 48 | March 1, 2020 | Sunday

Today's Progress: Today I continued with Python for Finance: Investment Fundamentals & Data Analytics course on udemy.

Description: This includes the following:

  • Implementation in Python
    • Calculating Simple Rate of Return
    • Calculating Log Rate of Return
    • Calculating Rate of Return of Indices
    • Calculating Risk of a Security
      • Variance
      • Correlation
      • Volatility
    • Calculating Risk of an Investment Portfolio
      • Calculating Diversifiable and Non-Diversifiable Risk

Relevant Codebase

Day 47 | February 29, 2020 | Saturday

Today's Progress: Today I enrolled in Python for Finance: Investment Fundamentals & Data Analytics course on udemy.

Description: This includes the following:

  • Part 1-10: Python Refresher
    • Skipped
  • Part 11: Calculating and Comparing Rates of Return
    • Rate of Return of Stocks
      • Simple Returns
      • Log Returns
    • Rate of Return of Stock Portfolios
      • Market Indices
        1. S&P500
        2. Dow Jones Industrial Average
        3. NASDAQ
  • Part 12: Measuring Investment Risk
    • Risk of Stocks
      • Variability
        1. Variance
        2. Standard Deviation
    • Risk of Stock Portfolios
      • Portfolio Diversification
      • Covariance
      • Correlation
      • Un-diversifiable (Systematic) Risk vs Diversifiable (Idiosyncratic) Risk

Important Links: Udemy | Python for Finance: Investment Fundamentals & Data Analytics

Day 46 | February 28, 2020 | Friday

Today's Progress: Today I ended and reviewed the Artificial Intelligence for Business course on Udemy.

  • OPTIMIZE BUSINESS PROCESSES
    • Implement Q-Learning
    • Build an Optimization Model
    • Maximize Efficiency
  • MINIMIZE COSTS
    • Implement Deep Q-Learning
    • Build an AI Environment from scratch
    • Build an Artificial Brain
    • Master the General AI Framework
    • Save and Load a model
    • Implement Early Stopping
  • MAXIMIZE REVENUES
    • Implement Thompson Sampling
    • Leverage AI to make the best decision
    • Implement Online Learning
    • Implement Regret Analysis

Day 45 | February 27, 2020 | Thursday

Today's Progress: Today I continued with the Artificial Intelligence for Business course on udemy.

Description: This includes the following:

  • Part-3: Maximizing Revenue
    • AI Solution
      • The Multi-Armed Bandit Problem
      • Thompson Sampling
    • Implementation in Python

Day 44 | February 26, 2020 | Wednesday

Today's Progress: Today I continued with the Artificial Intelligence for Business course on udemy.

Description: This includes the following:

  • Part-3: Maximizing Revenue
    • Case Study: Maximizing Revenue of an Online Retail Business
      • Problem to Solve
      • Environment to Define
        1. Defining States
        2. Defining Actions
        3. Defining Reward Function

Day 43 | February 25, 2020 | Tuesday

Today's Progress: Today I continued with the Artificial Intelligence for Business course on udemy.

Description: This includes the following:

  • Part-2: Minimizing Costs
    • Case Study: Minimizing Costs in Energy Consumption of a Data Center
      • Problem to Solve
      • Environment to Define
        1. Defining States
        2. Defining Actions
        3. Defining Reward Function
    • AI Solution
      • Deep Q-Learning: Intuition
      • Deep Q-Learning: Action
      • Experience Replay
      • Action Selection Policies
    • Implementation in Python

Day 42 | February 24, 2020 | Monday

Today's Progress: Today I enrolled in Artificial Intelligence for Business course on udemy.

Description: This includes the following:

  • Introduction to Course
  • Part-1: Optimizing Business Process
    • Case Study: Optimizing the Flows in an E-Commerce Warehouse
      • Problem to Solve
      • Environment to Define
        1. Defining States
        2. Defining Actions
        3. Defining Reward Function
    • AI Solution
      • Intro to Reinforcement Learning
      • The Bellman Equation
      • The "Plan"
      • Markov Decision Process (MDP)
        1. Deterministic Search
        2. Non-Deterministic Search
      • Policy vs Plan
      • Adding a "Living Penalty"
      • Q-Learning: Intuition
      • Temporal Difference
      • Q-Learning: Visualization
    • Implementation in Python

Important Links: Udemy | Artificial Intelligence for Business Relevant Codebase

Day 41 | February 23, 2020 | Sunday

Today's Progress: Today I ended and reviewed the Taming Big Data with Apache Spark and Python - Hands On! course on Udemy.

Description: I learned following skills in this course:

  • Use DataFrames and Structured Streaming in Spark 3
  • Frame big data analysis problems as Spark problems
  • Use Amazon's Elastic MapReduce service to run your job on a cluster with Hadoop YARN
  • Install and run Apache Spark on a desktop computer or on a cluster
  • Use Spark's Resilient Distributed Datasets to process and analyze large data sets across many CPU's
  • Implement iterative algorithms such as breadth-first-search using Spark
  • Use the MLLib machine learning library to answer common data mining questions
  • Understand how Spark SQL lets you work with structured data
  • Understand how Spark Streaming lets your process continuous streams of data in real time
  • Tune and troubleshoot large jobs running on a cluster
  • Share information between nodes on a Spark cluster using broadcast variables and accumulators
  • Understand how the GraphX library helps with network analysis problems

Day 40 | February 22, 2020 | Saturday

Today's Progress: Today I continued the Taming Big Data with Apache Spark and Python - Hands On! course on Udemy.

Description: This includes the following:

  • Other Spark Technologies
  • Intro to MLLib
  • Using DataFrames with MLLib
  • Spark Streaming
  • GraphX

Day 39 | February 21, 2020 | Friday

Today's Progress: Today I continued the Taming Big Data with Apache Spark and Python - Hands On! course on Udemy.

Description: This includes the following:

  • Intro to SparkSQL
  • DataFrame:
    • Executing SQL commands
    • SQL-style function
  • Using DataFrame instead of RDDs

Day 38 | February 20, 2020 | Thursday

Today's Progress: Today I continued the Taming Big Data with Apache Spark and Python - Hands On! course on Udemy.

Description: This includes the following:

  • Elastic MapReduce
  • Partitioning
  • Troubleshooting Spark on a Cluster
  • Managing Dependencies

Day 37 | February 19, 2020 | Wednesday

Today's Progress: Today I continued the Taming Big Data with Apache Spark and Python - Hands On! course on Udemy.

Description: This includes the following:

  • Advance Examples of Spark Programs
    • Activities
      • Finding Most Popular Movie
      • Using Broadcast Variables
  • Using Graphs
    • Superhero Degree of Seperation
      • Intro to Breadth-First Search
      • Accumlators, and Implementing BFS in Spark
  • Item-Based Collaborative Fillering in Spark
    • cache() and persist()
    • Activity
      • Similar Movie Script using Spark's Cluster Manager

Day 36 | February 18, 2020 | Tuesday

Today's Progress: Today I continued the Taming Big Data with Apache Spark and Python - Hands On! course on Udemy.

Description: This includes the following:

  • Spark Basics and Simple Examples
    • Intro to Spark
    • Resilient Distributed Dataset (RDD)
    • Key/Value RDDs
      • Example: Average Friends by Age
    • Filtering RDDs
      • Example: Minimum Temperature by Location
      • Example: Maximum Temperature by Location
    • Map vs FaltMap
      • Example: Word Count

Day 35 | February 17, 2020 | Monday

Today's Progress: Today I enrolled in Taming Big Data with Apache Spark and Python - Hands On! course on udemy.

Description: This includes the following:

  • Getting Started with Spark
    • Introduction to course
    • Setting up the Environment
    • Running First Spark Program
      • Ratings Histogram for MovieLens Movie Rating Dataset

Important Links:

Day 34 | February 16, 2020 | Sunday

Today's Progress: Today I learned to use the normal distribution as an approximation of the binomial distribution, when appropriate.

Description: This includes the following:

  • Normal Random Variables
    • Applications
      • Approximation to Binomial
      • Continuity Correction
  • Wrap-Up: Random Variable

Day 33 | February 15, 2020 | Saturday

Today's Progress: Today I learned to explain how a density function is used to find probabilities involving continuous random variables. I also learned to find probabilities associated with the normal distribution.

Description: This includes the following:

  • Continuous Random Variables
    • Probability Distribution
      • Discrete Random Variable
  • Normal Random Variables
    • Standard Deviation Rule
    • Standardizing Values
    • Standard Normal Table
      • Introduction
      • Finding z value
      • Working with Non-standard Normal Values
      • Finding X value

Day 32 | February 14, 2020 | Friday

Today's Progress: Today I learned to fit the binomial model when appropriate, and use it to perform simple calculations.

Description: This includes the following:

  • Binomial Random Variables
    • Binomial Experiment
    • Probability Distribution
    • Mean and Standard Deviation

Day 31 | February 13, 2020 | Thursday

Today's Progress: Today I learned how to find the mean and variance of a discrete random variable, and apply these concepts to solve real-world problems. I also learned to apply the rules of means and variances to find the mean and variance of a linear transformation of a random variable and the sum of two independent random variables.

Description: This includes the following:

  • Discrete Random Variables
    • Mean and Variance
      • Introduction
      • Applications
    • Standard Deviation
    • Rules for Mean and Variances
      • Add, Subtract, Multiplication by Constant
      • Linear Transformation
      • Sum of Two Variables

Day 30 | February 12, 2020 | Wednesday

Today's Progress: Today I continued on the first course of the Tensorflow: Data and Deployment Specialization.

Description: This includes the following:

  • Week 4
    • Train a model in your web browser by using images captured via a webcam
    • Apply transfer learning to train a model to recognize hand gestures of rock, paper, and scissors
    • Apply transfer learning to train a model to recognize hand gestures of rock, paper, scissors, lizard, and spock

Day 29 | February 11, 2020 | Tuesday

Today's Progress: Today I continued on the first course of the Tensorflow: Data and Deployment Specialization.

Description: This includes the following:

  • Week 3
    • Use a toxicity model to determine if a phrase is toxic in a number of categories
    • Use Mobilenet to detect objects in images
    • Use the tensorflow.js converter to convert a Keras model to JSON format

Day 28 | February 10, 2020 | Monday

Today's Progress: Today I started the first course of the Tensorflow: Data and Deployment Specialization.

Description: This includes the following:

  • Week 1
    • Use TensorFlow.js to build and train simple machine learning models in JavaScript
    • Use Web Server for Chrome to serve web pages from a local folder over the network using HTTP.
    • Describe the key characteristics of one-hot encoding
    • Use TensorFlow.js to load data from a CSV file
  • Week 2
    • Use tf-vis to visulize the output of callbacks
    • Use a convolutional neural network to build a handwriting classifier
    • Use a sprite sheet to train a classifier

Important Links:

Day 27 | February 9, 2020 | Sunday

Today's Progress: Today I learned to distinguish between discrete and continuous random variables. Also learned to find the probability distribution of discrete random variables, and use it to find the probability of events of interest.

Description: This includes the following:

  • Discrete Random Variables
    • Random Variables
      • Introduction
      • Count vs Measure
  • Probability Distribution
    • Table of Outcomes
    • Probability Histograms
    • Applications
    • Using Conditional Probability

Day 26 | February 8, 2020 | Saturday

Today's Progress: Today I learned to use the General Multiplication Rule to find the probability that two events occur (P(A and B)) and to use probability trees as a tool for finding probabilities.

Description: This includes the following:

  • Multiplication Rule
    • General Multiplication Rule
      • Definition
      • Applications
  • Probability Trees
    • Definition
    • Applications
    • Other Methods
  • Wrap-Up: Conditional Probability and Independance

Day 25 | February 7, 2020 | Friday

Today's Progress: Today I learned about the reasoning behind conditional probability, and how this reasoning is expressed by the definition of conditional probability. Also learned to find conditional probabilities and interpret them, and determine whether two events are independent or not.

Description: This includes the following:

  • Conditional Probability
    • Reasoning
    • Definition
  • Independence Check
    • Compare P(B | A) and P(B)
    • Other Methods

Day 24 | February 6, 2020 | Thursday

Today's Progress: Today I learned how to apply probability rules in order to find the likelihood of an event. I also learned to use tools such as Venn diagrams or probability tables as aids for finding probabilities, when appropriate.

Description: This includes the following:

  • Probability Rules
    • Range and Sum Rules
    • Complement Rule
    • Disjoint Events
    • Addition Rule for Disjoint Events
    • P(A and B) for Independent Events
    • Multiplication Rule for Independent Events
    • Extensions
    • At Least One of...
    • General Addition Rule
    • Probability Tables
      • Solving Problems
  • Wrap-Up: Finding Probability of Events

Day 23 | February 5, 2020 | Wednesday

Today's Progress: Today I learned how to determine the sample space of a given random experiment. I also learned to find the probability of events in the case in which all outcomes are equally likely.

Description: This includes the following:

  • Probability of Events
  • Sample Spaces
    • Random Experiments
  • Events of Interest
  • Equally Likely Outcomes
    • Overview
    • Examples

Day 22 | February 4, 2020 | Tuesday

Today's Progress: Today I learned how to relate the probability of an event to the likelihood of this event occurring.I also learned how relative frequency can be used to estimate the probability of an event.

Description: This includes the following:

  • Empirical Methods for Determinig Probability
  • Verifying Classical Probability
  • Relative Frequency
    • Definition
    • Law of Large Numbers

Day 21 | February 3, 2020 | Monday

Today's Progress: Today I learned how to relate the probability of an event to the likelihood of this event occurring.

Description: This includes the following:

  • Probability
    • Introduction
      • The Bigger Picture
      • Intuition
      • Formal Definition
  • Determining Probability
    • Theoritical/Classical
    • Empirical/Observational

Day 20 | February 2, 2020 | Sunday

Today's Progress: Today I learned how to identify the design of a study (controlled experiment vs. observational study) and other features of the study design (randomized, blind etc.).

Description: This includes the following:

  • Experiments: More than One Explanatory Variable
    • Modification to Randomization
  • Wrap-Up: Designing Studies
  • Summary: Producing Data

Day 19 | February 1, 2020 | Saturday

Today's Progress: Today I learned how to identify the design of a study (controlled experiment vs. observational study) and other features of the study design (randomized, blind etc.).

Description: This includes the following:

  • Experiments: One Explanatory Variable
    • Caustaion and Experiments
      • Randomized Controll Experiments
      • Inclusion of a Control Group
    • Blind and Double-Blind Experiments
    • Pitfalls

Day 18 | January 31, 2020 | Friday

Today's Progress: Today I learned how the study design impacts the types of conclusions that can be drawn. Also learned to determine how the features of a survey impact the collected data and the accuracy of the data.

Description: This includes the following:

  • Observational Studies
    • Caustaion and Observational Studies
      • Lurking Variables
      • Other Pitfalls
    • Design Issues
    • Summary

Day 17 | January 30, 2020 | Thursday

Today's Progress: Today I learned to use Convolutions on top of DNNs and RNNs and then put it all together using a real-world data series -- one which measures sunspot activity over hundreds of years.

Description: This includes the following:

  • Week 4: Real-World Time Series Data
    • Convolutions
    • Bi-Directional LSTMs
    • Batch Sizing
    • Training and Tunning
    • Prediction

Important Links:

Day 16 | January 29, 2020 | Wednesday

Today's Progress: Having explored time series and some of the common attributes of time series such as trend and seasonality, and then having used statistical methods for projection, today I learned neural networks to recognize and predict on time series. I also learned that Recurrent Neural networks and Long Short Term Memory networks are really useful to classify and predict on sequential data.

Description: This includes the following:

  • Week 2: Deep Neural Networks for Time Series
    • Data Preparation
    • Sequence Bias
    • Feeding Windowed Data to Neural Network
    • Prediction
  • Week 3: Recurrent Neural Network for Time Series
    • Lambda Layers
    • Dynamically adjusting Learning Rate
    • Huber Loss
    • RNN
    • LSTM

Day 15 | January 28, 2020 | Tuesday

Today's Progress: Today I learned about the nature of time series data, and saw some of the more common attributes of them, including things like seasonality and trend. I also looked at some statistical methods for predicting time series data also.

Description: This includes the following:

  • Week 1: Sequences and Prediction
    • Introduction
    • Common Patterns
      • Trend
      • Seasonality
      • White Noise
      • Autocorrelation
      • Impulses
    • Metrics for Evaluation
      • MSE
      • RMSE
      • MAE
    • Moving Average and Differencing
    • Trailing vs Centered Windows
    • Forecasting

Important Links:

Day 14 | January 27, 2020 | Monday

Today's Progress: Today I learned to use TensorFlow for various Natural Language Processing problems.

Description: This includes the following:

  • Sentiment in Text
    • Text to Sequence
    • Tokenizer
    • Padding
  • Word Embeddings
    • Introduction
    • Vectors
    • Loss Function
    • Pre-Tokenized Datasets
  • Sequence Models
    • LSTMs
    • Accuracy and Loss
    • Convolutional Networks
  • Sequence Models and Literature
    • Subword Tokenization
    • Text Generation
    • Shakespearean Poetry Generation

Important Links:

Day 13 | January 26, 2020 | Sunday

Today's Progress: Today I learned to identify the design of a study (controlled experiment vs. observational study) and other features of the study design (randomized, blind etc.).

Description: This includes the following:

  • Producing Data: Designing Studies
    • Introduction
    • Types of Studies
      • Experimental Studies
      • Obesrvational Studies
        • Prospective
        • Retrospective

Day 12 | January 25, 2020 | Saturday

Today's Progress: Today I learned various techniques by which one can choose a sample of individuals from an entire population to collect data from. This is seemingly a simple step in the big picture of statistics, but it turns out that it has a crucial effect on the conclusions we can draw from the sample about the entire population.

Description: This includes the following:

  • Producing Data: Sampling
  • Types of Samples
    • Volunteer Sample
    • Convenience Sample
    • Sampling Frame
    • Systematic Sampling
  • Probability Sampling Plans
    • Simple Random Sampling
    • Cluster Sampling
    • Stratified Sampling
  • Wrap-Up: Sampling

Day 11 | January 24, 2020 | Friday

Today's Progress: Today I summarized how to explore the relationship between the explanatory and response variables using visual displays and numerical measures, and how to choose what kind of measure to use based on the role-type classification of the two variables. I also emphasized how important it is to interpret any observed association in the context of the problem, but NOT to be tempted to interpret association as causation, due to the possible presence of lurking variables.

Description: This includes the following:

  • Wrap-Up: Examining Relationships
  • Summary: EDA

Day 10 | January 23, 2020 | Thursday

Today's Progress: Today I learned how to recognize the distinction between association and causation, and identify potential lurking variables for explaining an observed relationship. Association does not imply causation!

Description: This includes the following:

  • Causation and Lurking Variables
    • Introduction
    • Confounds
    • Simpson's Paradox

Day 09 | January 22, 2020 | Wednesday

Today's Progress:

Description: This includes the following:

  • AWS Builders Online Series
    • Introductory Guide to AWS Cost Management and Efficiency
    • Move Fast & Be Secure on AWS Cloud
    • AWS Purpose-Built Database Strategy: The Right Tool for The Right Job
    • Host your Static Website on Amazon Simple Storage Service (S3)
    • Building Serverless Applications that Scale
  • Project Management Professional Certification: Introduction

Day 08 | January 21, 2020 | Tuesday

Today's Progress: Today I learned about a special case of the relationship between two quantitative variables is the linear relationship. In this case, a straight line simply and adequately summarizes the relationship. When the scatterplot displays a linear relationship, we supplement it with the correlation coefficient (r). The least-squares regression line has the smallest sum of squared vertical deviations of the data points from the line. Extrapolation is the prediction of values of the explanatory variable that falls outside the range of the data.

Description: Following topics were covered:

  • Case Q → Q: Linear Relationships
    • Introduction
    • Correlation
      • R - Coefficient of Correlation
      • Properties of R
    • Regression
    • Least Squares Regression
    • Intercept and Slope
    • Predictions

Day 07 | January 20, 2020 | Monday

Today's Progress: Today I learned how to graphically display the relationship between two quantitative variables and describe: a) the overall pattern and b) striking deviations from the pattern.

Description: Following topics were covered:

  • Case Q → Q
    • Two Quantitative Variables
    • Scatterplots
      • Introduction
      • Interpretation
      • Examples
      • Labeled
      • Exercises

Day 06 | January 19, 2020 | Sunday

Today's Progress: Today I learned about the C → C relationship between two categorical variables. Building a two-way table and interpreting the information stored in it about the association between two categorical variables by comparing conditional percentages.

Description: Following topics were covered:

  • Case C → C
    • Two Categorical Variables
    • Conditional Percents
    • Exercises

Day 05 | January 18, 2020 | Saturday

Today's Progress: Today I learned about how to examine relationships between 2 variables using visual displays and numerical summaries.

Description: These includes the following topics:

  • EDA: Examining Relationships
  • Exploring Two Variables: Explanatory and Response
  • Role-Type Classification
  • Case C → Q
    • Introduction
    • Applications

Day 04 | January 17, 2020 | Friday

Today's Progress: Today I learned that the range covered by the data is the most intuitive measure of spread and is exactly the distance between the smallest data point (min) and the largest one (Max). Another measure of spread is the inter-quartile range (IQR), which is the range covered by the middle 50% of the data. The IQR can be used to detect outliers using the 1.5(IQR) criterion. Outliers are observations that fall below Q1 - 1.5(IQR) or above Q3 + 1.5(IQR). The five-number summary of distribution consists of the median (M), the two quartiles (Q1, Q3) and the extremes (Min, Max). The standard deviation measures the spread by reporting a typical (average) distance between the data points and their average.

Description: These includes the following topics:

  • One Quantitative Variable: Measure of Spread
    • Range
    • Inter-Quartile Range
      • Using IQR to detect outliers
    • Outliers
      • Identification
      • Understanding
      • Handling
    • Boxplots
    • Standard Deviation
      • Idea
      • Notion
      • Calculation
      • Properties
      • Standard Deviation Rule
  • WrapUp: EDA

Day 03 | January 16, 2020 | Thursday

Today's Progress: Learned how to quantify the center and spread of distribution with various numerical measures, some of the properties of those numerical measures; and how to choose the appropriate numerical measures of center and spread to supplement the histogram.

Description: This includes the following:

  • One Quantitative Variable: Measures of Center
    • Introduction
    • Mode
    • Median
    • Mean
    • Comparison b/w Mean and Median

Day 02 | January 15, 2020 | Wednesday

Today's Progress: Learned about uni-quantitative variables and how to represent it using Histogram and Stemplot. I also learned about how to interpret these graphs for further insights.

Description: This includes the following:

  • One Quantitative Variable: Graphs
    • Introduction
    • Histogram
      • Intervals
      • Shape
      • Center, Spread, and Outliers
    • Stemplot

Day 01 | January 14, 2020 | Tuesday

Today's Progress: Got a formal intro to statistics, learned about Exploratory Data Analysis and One Categorical Variable

Description: This includes the following:

  • Introduction to Statistics
  • Exploratory Data Analysis Overview
    • Data and Variables
    • Scales of Measurement
    • Examining Distributions
  • One Categorical Variable
    • Frequency Distributions
    • Pie and Bar Charts
    • Pictograms

Pilot | January 13, 2020 | Monday

Today's Progress:

  • Setup Repository for #thepersonalmsds
  • Created template for Social Media
  • Enrolled in Stanford University's Probability & Statistics Course

Description: None

Important Links: Stanford | Probability & Statistics