Multicollinearity is when two or more independent variables are strongly correlated so it is difficult to know which of the variables really explains the dependent variable. This problem increases the errors and the variance of the estimated coefficients. The solution to this problem is to eliminate one of the correlated variables, increase the number of observations in our data or generate synthetic indexes with the correlated variables.

Example of perfect multicollinearity: 
\begin{equation}\label{decompose}
Y=B_o +B_1X_1+B_2X_2+B_3X_3+ \epsilon\\
X_1=a_1X_2+a_2X_3\\
Y=B_o+(B_2+B_1a_1)X_2+(B_3+a_2B_1)X_3+ \epsilon\\
\end{equation}


Examples of multicollinearity
\begin{equation}\label{decompose}
Lwage=B_o +B_1Gender+B_2 Female+B_3 Male+B_4Education+B_5Profession+ \epsilon\\
Quality of life=B_o+B_1Pbi+B_2Population+B_3Wage+B_4Profession+ \epsilon\\
Health=B_o+B_1Age+B_2Height+B_3Weight+B_4Profession+ \epsilon
\end{equation}

For the case of salary, the female and male sex variables are perfectly multicollinear because the sex variable is a linear combination of the other two variables.
In the second equation, the GDP and population variables are strongly correlated, so it is advisable to use GDP per capita instead of the two variables. For the last example, age, height and weight are strongly correlated.
For the case of salary, the female and male sex variables are perfectly multicollinear because the sex variable is a linear combination of the other two variables.

In [1]:
import pandas as pd
import random
import numpy as np 


In [9]:
# matriz10x10 with normal distribution
np.random.seed(123)
x=np.random.normal(0,1,size=(10,10))
x

array([[-1.0856306 ,  0.99734545,  0.2829785 , -1.50629471, -0.57860025,
         1.65143654, -2.42667924, -0.42891263,  1.26593626, -0.8667404 ],
       [-0.67888615, -0.09470897,  1.49138963, -0.638902  , -0.44398196,
        -0.43435128,  2.20593008,  2.18678609,  1.0040539 ,  0.3861864 ],
       [ 0.73736858,  1.49073203, -0.93583387,  1.17582904, -1.25388067,
        -0.6377515 ,  0.9071052 , -1.4286807 , -0.14006872, -0.8617549 ],
       [-0.25561937, -2.79858911, -1.7715331 , -0.69987723,  0.92746243,
        -0.17363568,  0.00284592,  0.68822271, -0.87953634,  0.28362732],
       [-0.80536652, -1.72766949, -0.39089979,  0.57380586,  0.33858905,
        -0.01183049,  2.39236527,  0.41291216,  0.97873601,  2.23814334],
       [-1.29408532, -1.03878821,  1.74371223, -0.79806274,  0.02968323,
         1.06931597,  0.89070639,  1.75488618,  1.49564414,  1.06939267],
       [-0.77270871,  0.79486267,  0.31427199, -1.32626546,  1.41729905,
         0.80723653,  0.04549008, -0.23309206

In [10]:
#last column is linear combination of 3 columns(7,8,9)
x[:,9]=x[:,6]+x[:,7]+x[:,8]
x=pd.DataFrame(x)


In [11]:
#inverse of the matrix
inv_x=pd.DataFrame(np.linalg.inv(x))
inv_x

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,0.3381746,2.014572,-1.32174,1.178865,0.1575252,0.7678071,0.4017992,-0.2711926,-0.1426106,1.487734
1,0.349757,0.4502479,-0.3221589,-0.2548177,0.2540231,-0.7210575,0.03191653,-0.09346311,-0.2306959,0.2826954
2,-0.3900009,-0.4898282,-0.3620898,-0.4494761,0.1047738,0.1757375,-0.01062789,0.2044664,0.1905147,-0.3677224
3,-0.3893641,-0.3203823,-0.7346305,-0.961216,0.275996,-0.7802691,-0.5923144,0.04417899,-0.2039036,-0.2691014
4,0.0998676,0.342035,-0.2119686,0.01309255,0.1009191,-0.2119686,0.3105699,-0.3027573,-0.0100342,0.379091
5,0.2890671,0.6306919,0.1051153,0.157673,0.05255766,0.1051153,0.2102306,-0.157673,-0.3284854,0.8277832
6,-684466400000000.0,-1138601000000000.0,2251800000000000.0,1231803000000000.0,-1740069000000000.0,3008989000000000.0,330659200000000.0,-381216100000000.0,398514700000000.0,-378409200000000.0
7,-684466400000000.0,-1138601000000000.0,2251800000000000.0,1231803000000000.0,-1740069000000000.0,3008989000000000.0,330659200000000.0,-381216100000000.0,398514700000000.0,-378409200000000.0
8,-684466400000000.0,-1138601000000000.0,2251800000000000.0,1231803000000000.0,-1740069000000000.0,3008989000000000.0,330659200000000.0,-381216100000000.0,398514700000000.0,-378409200000000.0
9,684466400000000.0,1138601000000000.0,-2251800000000000.0,-1231803000000000.0,1740069000000000.0,-3008989000000000.0,-330659200000000.0,381216100000000.0,-398514700000000.0,378409200000000.0


When we generate numbers with a normal distribution we get float numbers which are a limited approximation of the real numbers because computers work in binary form and this limits our results. Unlike R, python has a higher tolerance for these numbers, so it can find the inverse. However, if we increase the tolerance(2.2204444604925031e-16) in R we can find the inverse.