practical-time-series-analysis_COURSE_NOTES.txt

Linear regression erros assumptions:
	1- normaly distributed
	2- mean of zero
	3- all errors have same variance (they are homoscedastic)
	4- unrelated to each orhter (they are independent across observations) (we could not predict sign of next error given previous error sign(amir))

residuals = resid(lm_model)

When we look at a set of plots here on the residuals, we'll be able to assess normality. if you have an abundance of data, you have hundreds of data points. A histogram is a valid approach for looking at structure your data. If you only have 10 or 15 data points, a histogram not the best way to go, we could we could probably do better. In particular if we're assessing normality, we can do what's called a normal probability plot (qqnorm(resid(lm_model)); 	qqline(resid(lm_model))), If our residuals are normally distributed, we would expect to see most of our data essentially looking linear, like a straight line you could fit on this plot. Here, I see some systematic departures from the lower and upper tail and so it lets me question the normal assumption a bit. Digging a little bit deeper, what a normal probability plot is going to do is to say. if I have a certain data set, a certain number of points, where would I expect, especially if I standardize, subtract off the mean and divide by the standard deviation. Where would I expect to see the first residual? Where would I expect to see second residual in a dataset of a certain size of through the last one? I can pair that to where I actually have my first residual, the first data point that I'm assessing normality on. And were really was the second is it where you expected if it was coming from a normal distribution? Or are you systematically away? We would expect to see a little scatter in [INAUDIBLE] but here this seems more systematic than random. So the residuals here seem to be roughly normally distributed but not exactly normally distributed. Some of our assumptions are robust to violating that assumption. And so one might not worry too much in a large data set with a plot like this. our data points are systematically above the straight line on the left and on the right, that's the curvature we were talking about. It's hard on such a small tight plot to get a sense of the oscillatory nature of our data. So what I've done on the next plot is to zoom in on our residuals. Instead of looking across a few decades, now we're just looking across a few years. (plot(residuals ~ time(dependent_var)), xlim= c(1960, 1963), main="Zoomed Residuals"). At this point, it would be very hard to convince anybody that your residuals were independent. There's an apparent time structure in your residuals. From a linear regression sense, that might not be desirable but we're plotting time series data. For us, the structure of these residuals is actually quite interesting.

# (df 10 observations, extra: numeric column, group:discrete variable(1,2))
extra.1=df$extra[df$group==1]
extra.2=df$extra[df$group==2]
t.test(extra.1, extra.2, paired=TRUE, alternative="two.sided")

alpha in confidence interve:
	P(Type 1 error ) ; usually 0.05

COV :
	mean(
		(xi - mean(x)) * (yi - mean(y)) 
		)

COR:
	mean(
		((x1 - mean(x)) / sd(x)) * ((y1 - mean(y)) / sd(y))
		)

	mean(z_score(x) * z_score(y)) # here; we might use stand error for z score Instead of sd of sample