Hello, World! I am the DataByte. This repo hosts my scripts (R) that I walk through in my linear regression series on YouTube.
Check out my video Linear Regression with Excel, Python & R to hear more about these topics:
In my YouTube video, I give a short mathematical explanation of how linear regression trendlines are calculated with the least squares method.
The r2 value (the coefficient of determination) helps you determine if your data is exhibiting linear behavior or not. R2 values range between 0 and 1, with 1 indicating a perfect linear trend. Values beteween 0.7 and 1 indicate that your data has a linear trend, the closer to 1, the better. You probably don't want to be using any linear trendlines with r2 values less than 0.7 to make predictions.
Linear regression in R is easy using the lm()
function.
linearmodel <- lm(y ~ x)
Calling summary()
will give you a description of your model.
Linear regression can help you capture the general trend of your data, but if the r2 value is less than 0.7, it's not going to be very much good for making predictions.
Sometimes, you may be able to find a subset in your dataset that may still exhibit strongly linear characteristics, like Tesla stock's closing prices between March 14th and April 1st, 2022. It's an interesting observation, but if you told anyone you were trading off a linear model, they'd probably think you were crazy!
None
It is easy to find sources online to learn more about linear regression, here are some great references to what I used.
Miller, I., Freund, J. E., & Johnson, R. A. (2005). Miller and Freund's Probability and statistics for engineers. Upper Saddle River, NJ: Prentice Hall.
If you want to hear more from me, check out my Hello, World! video on YouTube and subscribe! I'm constantly working on new content :)