Multicollinearity is when two or more independent variables are strongly correlated so it is difficult to know which of the variables really explains the dependent variable. This problem increases the errors and the variance of the estimated coefficients. The solution to this problem is to eliminate one of the correlated variables, increase the number of observations in our data or generate synthetic indexes with the correlated variables.

Example of perfect multicollinearity: 
\begin{equation}\label{decompose}
Y=B_o +B_1X_1+B_2X_2+B_3X_3+ \epsilon\\
\end{equation}


\begin{equation}\label{decompose}
X_1=a_1X_2+a_2X_3\\
\end{equation}

\begin{equation}\label{decompose}
Y=B_o+(B_2+B_1a_1)X_2+(B_3+a_2B_1)X_3+ \epsilon\\
\end{equation}

Examples of multicollinearity:

\begin{equation}\label{decompose}
Lwage=B_o +B_1Gender+B_2 Female+B_3 Male+B_4Education+B_5Profession+ \epsilon\\
\end{equation}


\begin{equation}\label{decompose}
Quality of life=B_o+B_1Pbi+B_2Population+B_3Wage+B_4Profession+ \epsilon\\
\end{equation}


\begin{equation}\label{decompose}
Health=B_o+B_1Age+B_2Height+B_3Weight+B_4Profession+ \epsilon
\end{equation}



For the case of salary, the female and male sex variables are perfectly multicollinear because the sex variable is a linear combination of the other two variables.
In the second equation, the GDP and population variables are strongly correlated, so it is advisable to use GDP per capita instead of the two variables. For the last example, age, height and weight are strongly correlated.
For the case of salary, the female and male sex variables are perfectly multicollinear because the sex variable is a linear combination of the other two variables.


In [1]:
set.seed(1234)
x=matrix(rnorm(100,0,1),10,10)
x

0,1,2,3,4,5,6,7,8,9
-1.2070657,-0.4771927,0.1340882,1.1022975,1.4494963,-1.8060313,0.656588464,0.006892838,-0.17779,-0.05315882
0.2774292,-0.99838644,-0.4906859,-0.4755931,-1.0686427,-0.5820759,2.548991071,-0.455468738,-0.1699941,0.255196
1.0844412,-0.77625389,-0.4405479,-0.70944,-0.8553646,-1.1088896,-0.03476039,-0.366523933,-1.3723019,1.70596401
-2.3456977,0.06445882,0.4595894,-0.5012581,-0.280623,-1.014962,-0.66963358,0.648286568,-0.1737872,1.00151325
0.4291247,0.95949406,-0.6937202,-1.6290935,-0.9943401,-0.1623095,-0.007604756,2.070270861,0.8502323,-0.49558344
0.5060559,-0.11028549,-1.4482049,-1.1676193,-0.9685143,0.5630558,1.777084448,-0.153398412,0.6976087,0.3555503
-0.57474,-0.51100951,0.5747557,-2.1800396,-1.1073182,1.6478175,-1.138607737,-1.390700947,0.5499974,-1.13460804
-0.5466319,-0.91119542,-1.0236557,-1.3409932,-1.2519859,-0.7733534,1.367827179,-0.723581777,-0.402732,0.87820363
-0.564452,-0.83717168,-0.0151383,-0.2942939,-0.5238281,1.6059096,1.329564791,0.258261762,-0.1915938,0.97291675
-0.8900378,2.41583518,-0.9359486,-0.4658975,-0.49685,-1.1578085,0.336472797,-0.317059115,-1.1945279,2.12111711


In [2]:
#last column is linear combination of 3 columns(7,8,9)
x[,10]=2*x[,8]+3*x[,9]+x[,7]
x


0,1,2,3,4,5,6,7,8,9
-1.2070657,-0.4771927,0.1340882,1.1022975,1.4494963,-1.8060313,0.656588464,0.006892838,-0.17779,0.1370043
0.2774292,-0.99838644,-0.4906859,-0.4755931,-1.0686427,-0.5820759,2.548991071,-0.455468738,-0.1699941,1.1280714
1.0844412,-0.77625389,-0.4405479,-0.70944,-0.8553646,-1.1088896,-0.03476039,-0.366523933,-1.3723019,-4.8847139
-2.3456977,0.06445882,0.4595894,-0.5012581,-0.280623,-1.014962,-0.66963358,0.648286568,-0.1737872,0.105578
0.4291247,0.95949406,-0.6937202,-1.6290935,-0.9943401,-0.1623095,-0.007604756,2.070270861,0.8502323,6.6836337
0.5060559,-0.11028549,-1.4482049,-1.1676193,-0.9685143,0.5630558,1.777084448,-0.153398412,0.6976087,3.5631138
-0.57474,-0.51100951,0.5747557,-2.1800396,-1.1073182,1.6478175,-1.138607737,-1.390700947,0.5499974,-2.2700176
-0.5466319,-0.91119542,-1.0236557,-1.3409932,-1.2519859,-0.7733534,1.367827179,-0.723581777,-0.402732,-1.2875323
-0.564452,-0.83717168,-0.0151383,-0.2942939,-0.5238281,1.6059096,1.329564791,0.258261762,-0.1915938,1.271307
-0.8900378,2.41583518,-0.9359486,-0.4658975,-0.49685,-1.1578085,0.336472797,-0.317059115,-1.1945279,-3.8812291


In [3]:
#inverse of the matrix 
inv=solve(x)
inv
# cannot be solved because the matrix is not invertible since the last column is a linear combination of three other columns.

ERROR: Error in solve.default(x): sistema es computacionalmente singular: número de condición recíproco = 3.84535e-18


When we generate numbers with a normal distribution we get float numbers which are a limited approximation of the real numbers because computers work in binary form and this limits our results.
Unlike R, python has a higher tolerance for these numbers, so it can find the inverse. However, if we increase the tolerance(2.2204444604925031e-16) in R we can find the inverse.

In [4]:
# R- Tolerance
.Machine$double.eps

In [5]:
# increasing the tolerance in R to be able to find the inverse
inv_xx=solve(x,tol=3.84535e-18)
inv_xx

0,1,2,3,4,5,6,7,8,9
0.05595286,0.165333,0.3159078,-0.6362506,0.2302402,-0.7939428,0.1124223,0.05176454,-0.06169122,-0.01260246
-0.01744075,0.25126,-0.06572327,-0.1434216,-0.05380856,-0.2621085,0.126567,-0.1079283,-0.09582258,0.3112659
0.2073708,0.7821743,0.02492586,-0.5029457,0.09970346,-2.121492,0.4489588,-0.1191929,0.05741051,0.1158974
-0.759723,0.1627978,-0.2170637,1.250969,-0.8682548,1.73651,-0.5969252,-1.362892,-0.3255956,-0.203862
1.045195,-0.4920562,-0.05577138,-1.361519,0.8384009,-0.7812178,0.4188475,1.149448,0.426137,0.1821296
-0.05636937,-0.2737941,-0.3221107,0.1932664,-0.1932664,0.3221107,-0.0483166,0.1932664,0.346269,0.03221107
-104683800000000.0,-199727100000000.0,-747808700000000.0,-855811200000000.0,423874500000000.0,-1286743000000000.0,-117641200000000.0,1473516000000000.0,86332330000000.0,13048020000000.0
-209367600000000.0,-399454300000000.0,-1495617000000000.0,-1711622000000000.0,847749000000000.0,-2573486000000000.0,-235282400000000.0,2947032000000000.0,172664700000000.0,26096040000000.0
-314051500000000.0,-599181400000000.0,-2243426000000000.0,-2567434000000000.0,1271623000000000.0,-3860228000000000.0,-352923600000000.0,4420547000000000.0,258997000000000.0,39144070000000.0
104683800000000.0,199727100000000.0,747808700000000.0,855811200000000.0,-423874500000000.0,1286743000000000.0,117641200000000.0,-1473516000000000.0,-86332330000000.0,-13048020000000.0
