Linear Regression ⭐⭐

Directory Structure 📁

│   collinear_dataset.py     
│   compare_time.py
│   contour_plot.gif
│   degreevstheta.py
│   gif1.gif
│   gif2.gif
│   linear_regression_test.py
│   line_plot.gif
│   Makefile
│   metrics.py
│   Normal_regression.py     
│   plot_contour.py
│   poly_features_test.py    
│   README.md
│   surface_plot.gif
│
├───images
│       q5plot.png
│       q6plot.png
│       q8features.png       
│       q8samples.png
│
├───linearRegression
│   │   linearRegression.py
│   │   __init__.py
│   │
│   └───__pycache__
│           linearRegression.cpython-37.pyc
│           __init__.cpython-37.pyc
│
├───preprocessing
│   │   polynomial_features.py
│   │   __init__.py
│   │
│   └───__pycache__
│           polynomial_features.cpython-37.pyc
│           __init__.cpython-37.pyc
│
├───temp_images
└───__pycache__
        metrics.cpython-37.pyc

Instructions to run 🏃

make help
make regression
make polynomial_features
make normal_regression
make poly_theta
make contour
make compare_time
make collinear

Stochastic GD (Batch size = 1) ☝️

Learning rate type = constant RMSE: 0.9119624181584616 MAE: 0.7126923090787688
Learning rate type = inverse RMSE: 0.9049599308106121 MAE: 0.7098334683036919

Vanilla GD (Batch size = N) ✋

Learning rate type = constant RMSE: 0.9069295672718122 MAE: 0.7108301179089876
Learning rate type = inverse RMSE: 0.9607329070540364 MAE: 0.7641616657610887

Mini Batch GD (Batch size between 1 and N(5)) 🤘

Learning rate type = constant RMSE: 0.9046502501334435 MAE: 0.7102161700019564
Learning rate type = inverse RMSE: 0.9268357442221973 MAE: 0.7309246821952116

Polynomial Feature Transformation 🔰

The output [[1, 2]] is [[1, 1, 2, 1, 2, 4]]
The output for [[1, 2, 3]] is [[1, 1, 2, 3, 1, 2, 3, 4, 6, 9]]
The outputs are similar to sklearn's PolynomialFeatures fit transform

Theta vs degree 📈

Conclusion - As the degree of the polynomial increases, the norm of theta increases because of overfitting.

L2 Norm of Theta vs Degree of Polynomial for varying Sample size 📈

Conclusion

As the degree increases magnitude of theta increases due to overfitting of data.
But at the same degree, as the number of samples increases, the magnitude of theta decreases because more samples reduce the overfitting to some extent.

Linear Regression line fit 🔥

Linear Regression Surface plot 🔥

Linear Regression Contour plot 🔥

Time Complexities ⏳

Theoretical time complexity of Normal equation is O(D^2N) + O(D^3)
Theoretical time complexity of Gradient Descent equation is O((t+N)D^2)

Time vs Number of Features ⏳📊

When the number of samples are kept constant, normal equation solution takes more time as it has a factor of D^3 whereas Gradient Descent has a factor of D^2 in the time complexity.

Time vs Number of Samples ⏳📊

When the number of features are kept constant varying number of samples, it can be noticed that time for normal equation is still higher as compared to gradient descent because of computational expenses.

Multicollinearity in Dataset ❗ ❗

The gradient descent implementation works for the multicollinearity.
But as the multiplication factor increases, RMSE and MAE values takes a large shoot
It reduces the precision of the coefficients

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Linear Regression ⭐⭐

Directory Structure 📁

Instructions to run 🏃

Stochastic GD (Batch size = 1) ☝️

Vanilla GD (Batch size = N) ✋

Mini Batch GD (Batch size between 1 and N(5)) 🤘

Polynomial Feature Transformation 🔰

Theta vs degree 📈

L2 Norm of Theta vs Degree of Polynomial for varying Sample size 📈

Linear Regression line fit 🔥

Linear Regression Surface plot 🔥

Linear Regression Contour plot 🔥

Time Complexities ⏳

Time vs Number of Features ⏳📊

Time vs Number of Samples ⏳📊

Multicollinearity in Dataset ❗ ❗

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
images		images
linearRegression		linearRegression
preprocessing		preprocessing
.gitattributes		.gitattributes
.gitignore		.gitignore
Makefile		Makefile
Normal_regression.py		Normal_regression.py
README.md		README.md
collinear_dataset.py		collinear_dataset.py
compare_time.py		compare_time.py
contour_plot.gif		contour_plot.gif
degreevstheta.py		degreevstheta.py
gif1.gif		gif1.gif
gif2.gif		gif2.gif
line_plot.gif		line_plot.gif
linear_regression_test.py		linear_regression_test.py
metrics.py		metrics.py
plot_contour.py		plot_contour.py
poly_features_test.py		poly_features_test.py
surface_plot.gif		surface_plot.gif

SoniSiddharth/ML-Linear-Regression-from-scratch

Folders and files

Latest commit

History

Repository files navigation

Linear Regression ⭐⭐

Directory Structure 📁

Instructions to run 🏃

Stochastic GD (Batch size = 1) ☝️

Vanilla GD (Batch size = N) ✋

Mini Batch GD (Batch size between 1 and N(5)) 🤘

Polynomial Feature Transformation 🔰

Theta vs degree 📈

L2 Norm of Theta vs Degree of Polynomial for varying Sample size 📈

Linear Regression line fit 🔥

Linear Regression Surface plot 🔥

Linear Regression Contour plot 🔥

Time Complexities ⏳

Time vs Number of Features ⏳📊

Time vs Number of Samples ⏳📊

Multicollinearity in Dataset ❗ ❗

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages