This is a nice way to 'eyeball' the range that eigenvalues can be in.  It also has a nice picture associated with it.  I'd finally add that this leads to an immediate proof for understanding diagonally dominant matrices, a clean way to prove that symmetric (or in complex space, Hermitian) matrices are positive (semi) definite.

Indeed we can use these Gerschgorin discs to make especially strong claims about matrices with special structure.  

At first this post was going to be my adaptation of the proof from Kuttler's "Linear Algebra" (page 177).  However I ultimately decided that a more indirect route -- via Levy-Desplanques -- was more a lot more intuitive.  


While I generally prefer proofs other than contradictions, this proof of Levy-Desplanques is elementary, short and sweet and immediately leads to Gerschgorin discs in a very intuitive way.

The first part of this is, essentially, a direct lift from 
https://shreevatsa.wordpress.com/2007/10/03/a-nice-theorem-and-trying-to-invert-a-function/ .  However, I added extra steps in here to make the derivation a bit more deliberate.  

We say that $\mathbf A \in \mathbb C^{n x n}$ is (strictly) diagonally dominant if for every row i, $\big \vert a_{i,i}\big \vert \gt \Sigma_{j \neq i}\big \vert a_{i,j}\big \vert$.

**claim:** if (square) matrix $\mathbf A$ is diagonally dominant, then $det(\mathbf A) \neq 0$

The contradiction comes from assuming $det(\mathbf A) =0$, i.e. that $\mathbf A$ is not invertible.  If this is the case, there must some $\mathbf x \neq \mathbf 0$ where $\mathbf A \mathbf x = \mathbf 0$.  

Consider the maximal coordinate on $\mathbf x$, $x_k$, where $\big \vert x_k \big \vert \geq \big \vert x_j\big \vert$ for $j = \{1, 2, ..., n\}$

hence $x_k \neq 0$, and we can say that $\big \vert x_k \big \vert \gt 0$

Now look at the kth row of $\mathbf Ax$.  We have 

$a_{k, 1} x_1  + a_{k, 2} x_2 + ... + a_{k,n} x_n = \Sigma_{j=1}^{n} a_{k,j} x_j = a_{k,k}x_k + \Sigma_{j \neq k} a_{k,j} x_j$

now recall that we are considering the case where $\mathbf {Ax} = \mathbf 0$ for some non-zero $\mathbf x$, hence

$a_{k,k}x_k + \Sigma_{j \neq k} a_{k,j} x_j = 0$

or 

$a_{k,k}x_k = - \Sigma_{j \neq k} a_{k,j} x_j $

now take the magnitude of both sides

$\big \vert a_{k,k}x_k \big \vert = \big \vert a_{k,k}\big \vert \big \vert x_k \big \vert = \big \vert \Sigma_{j \neq k} a_{k,j} x_j \big \vert \leq  \Sigma_{j \neq k} \big \vert a_{k,j}\big \vert \big \vert x_j \big \vert \leq \Sigma_{j \neq k} \big \vert a_{k,j}\big \vert \big \vert x_k \big \vert = \big \vert x_k \big \vert\big( \Sigma_{j \neq k} \big \vert a_{k,j}\big \vert \big)$


From here the contradiction is aparent, but we can further distill this to:

$\big \vert a_{k,k}\big \vert \big \vert x_k \big \vert \leq \big \vert x_k \big \vert\big( \Sigma_{j \neq k} \big \vert a_{k,j}\big \vert \big)$

thus, because $\big \vert x_k \big \vert \gt 0$, we can divide it out of the above and since it is positive, the inequality sign does not change.  

$\big \vert a_{k,k}\big \vert \leq \Sigma_{j \neq k} \big \vert a_{k,j}\big \vert$

yet this contradicts our (strong) defintion of diagonal dominance, because we said our matix satisfied:

$\big \vert a_{k,k}\big \vert \gt \Sigma_{j \neq k}\big \vert a_{k,j}\big \vert$

hence we know $\mathbf A \mathbf x = \mathbf 0$ if and only if $\mathbf x = \mathbf 0$, thus $det\big(\mathbf A\big) \neq 0$.

# Now the good part:

for each eigenvalue, $\lambda$ in $\mathbf A$, we can say that when we consider the matrix

$\big(\mathbf A - \lambda \mathbf I\big)$, it is not invertible, so per Levy-Desplanques  there must be some diagonal entry on $\mathbf A$ where 

$\big \vert a_{i,i} - \lambda \big \vert \ngtr \Sigma_{j\neq i} \big \vert a_{i,j}\big \vert$

we can restate this as:

$\big \vert a_{i,i} - \lambda \big \vert \leq \Sigma_{j\neq i} \big \vert a_{i,j}\big \vert$

and this is the Gershgorin disc formula.  To be clear, the above does not tell us which diagonal entry gives us the range for a given eigenvalue -- it just tells us that any given eigenvalue must be located in a disc associated with one of these diagonal entries.  

It is perhaps worth noting that we can also apply this to the conjugate transpose of $\mathbf A$ -- hence we could instead interpret this formula in terms of the columns of $\mathbf A$.  

# Why might this be useful?

Gerschgorin discs are a very nice tool for identifying things like whether a Hermitian --Symmetric, in reals-- matrix is positive (semi) definite.  To be clear, the test associated with the discs will generally tell us yes it is positive semi-definite or it will say unclear.  In some cases it may allow us to reject the hypothesis as well, but we already have other tools at our disposal, like -- (a) are any of the diagonal entries negative, (b) if any of the diagonal entries are zero, then you need the entire column and row associated with that diagonal to be zero, and (c) we also have the inequality $N * trace\big(\mathbf A \big) \geq sum \big(\mathbf{A}\big)$ -- proved over reals in my posting "SPDSP_Trace_Inequality" which must be true for a Hermitian or Symmetric matrix that is positive semi-definite.  Notably all of these other tools allow us to reject whether a matrix is positive semi-definite -- they don't allow us to confirm that it is.  However, in certain cases, Gerschgorin discs do allow us to confirm this-- the calculation involved is quite simple, and their proof is quite simple as well.   

Now, if we are doing something like a second derivative test, using a Hessian Matrix, we would in fact just use numeric values at or around a critical point.  There are times, however where we may want to evaluate our Hessian symbolically over a large range of values.  

for example consider the function, f, where 

$f(x,y,z) = x^{2} y^{2} z^{4} + z^{2}$

which has the following Hessian:

$\left[\begin{matrix}2 y^{2} z^{4} & 4 x y z^{4} & 8 x y^{2} z^{3}\\4 x y z^{4} & 2 x^{2} z^{4} & 8 x^{2} y z^{3}\\8 x y^{2} z^{3} & 8 x^{2} y z^{3} & 12 x^{2} y^{2} z^{2} + 2\end{matrix}\right]$






This matrix is small enough (3x3 means cubic root) that we can solve symbollically for the eigenvalues exactly, but those eigenvalues -- given two cells down-- are not so easy to interpret.  This gets increasingly difficult for much larger matrices.  As is, we an simply look at (1,1,1) and see that the trace inequality is not being observed, hence the Hessian is not positive semi-definite at (1,1,1) and the function is thus not convex.  (We can of course look at the fact that diagonal elements are always positive to know that the function is *not negative convex*.)  


There are *many* applications where we may want to make claims about the eigenvalues / singularity of a matrix without looking at specific numerical values.


# Application: Markov Chains

Using Gerschgorin discs may in some cases tell us things we already know, but in a simpler form.  Consider a markov chain matrix -- i.e. **a stochastic matrix -- i.e. where each row sums to one, and of course all entries are real valued and non-negative**.  Thus $\mathbf {A1} = \lambda_r \mathbf 1 = \mathbf 1$.  This tells us that the $\lambda_r = 1$ and using Gerschgorin discs, we can be certain that the biggest possible magnitude of an eigenvalue is 1.  There are of course many other arguments for proving this, but Gerschgorin discs provide a simple, direct, and visual way of determinining this bound.  I.e. Start by considering the case where the diagonal element $a_{i,i} = 0$, and since the matrix is stochastic, we know that $\big \vert \lambda \big \vert \leq \Sigma_{j\neq i} \big \vert a_{i,j}\big \vert = \Sigma_{j\neq i} a_{i,j} = 1$, hence we can say the upper bound on $\big \vert a_{i,i} - \lambda \big \vert = \big \vert 0 - \lambda \big \vert = \big \vert \lambda \big \vert$ is given by  $\Sigma_{j\neq i} \big \vert a_{i,j}\big \vert \ \leq 1$ for a markov chain.  If we allocate any positive amount to $a_{i,i}$ it necessarily tightens this bound while at the same time increasing the 'launch point' for the eigenvalue by the same amount.  E.g. if $a_{i,i} = 0.4$, then $\Sigma_{j\neq i} \big \vert a_{i,j} \big \vert = \Sigma_{j\neq i} a_{i,j} = 0.6$, so we know that the difference between $ a_{i,i}$ and an eigenvalue is at most 0.6, which again tells us that the maximal eigenvalue magnitude is 1, which occurs when $\lambda = 1$.  It is perhaps interesting to remark that we can get even *further* information from this.  We have already put an upper bound on eigenvalue magnitudes of 1.  Technically, they can, and do, occur anywhere on the unit circle.   

Eigenvalues come in the form of $\lambda = \alpha + \beta i$.  If we consider the case of a non-zero diagonal element, where $a_{i,i} = \epsilon$ for $0 \lt \epsilon \leq 1$, Gerschgorin discs tell us that **the maximal eigenvalue, which is on the unit cirlce** (whose magnitude has been squared for convenience) associated with this is $\big\vert \epsilon - \lambda \big \vert^2 =\big((\epsilon - \alpha) - \beta i \big)\big((\epsilon -\alpha) + \beta i\big) = (\alpha-\epsilon)^2 + \beta^2 = \big(\Sigma_{j\neq i} a_{i,j}\big)^2 = (1-\epsilon)^2 $. 

Re-arranging terms we see that: $\beta^2 = (1-\epsilon)^2 - (\alpha - \epsilon)^2$.  Hence the squared length of this eigenvalue is given by:  

$ \big \vert \lambda \big \vert^2 = \alpha^2 + \beta^2 = \alpha^2 -  (\alpha - \epsilon)^2 +  (1-\epsilon)^2 = 1 + 2\epsilon(\alpha - 1)$
- - - - 
*Note that the above value appears to suggest that magnitudes can go negative but that is an illusion.  The writeup here is explicitly dealing with a maximal magnitude eigenvalue and dropped the inequality associated with the Gerschgorin disc accordingly.  Hence any calculated magnitude* $\neq 1$ * is by definition not a maximum eigenvalue, and hence the above equation does not apply.*
- - - - 

When we consider that $\alpha \leq 1$ (because we know the upper bound of an eigenvalue's real component is 1) then we recognize that $2\epsilon(\alpha - 1) \leq 0$ with equality if and only if $\alpha = 1$ for all $0 \lt \epsilon \leq 1$.  Thus we can observe the expression $ 1 + 2\epsilon(\alpha -1)$ is maximized only when instead of subtracting a positive number, we subtract zero, which occurs when $\alpha = 1$.  Plugging this back into our formula we see $\beta^2 = (1 - \epsilon)^2 - ((1) - \epsilon)^2 = 0$. 

To close out the argument, what this tells us is if our stochastic matrix $\mathbf A$ has all non-zero diagonal entries, then there can be no eigenvalues associated with said matrix, with magnitude = 1, *except* in the case where the eigenvalues are real and positive -- i.e. $\lambda = 1$.  All other cases would involve an $\alpha < 1$ which would mean that the magnitude given by $1 + 2\epsilon(\alpha - 1) \lt 1$.  Notice, however, if $\epsilon = 0$, we have no real restrictions on whether or not some $\lambda$ with magnitude = 1 is complex or real (and if real, possibly negative).  What this tells us then, is that we can rule out periodic behavior of the graph, if we ensure that the diagonal elements of the matrix are all $\gt 0$.  

(In the case where $\mathbf A$ has multiple eigenvalues = +1, the associated eigenvectors must be linearly indepedent -- i.e. geometric multiplicity = algebraic multiplicity.  We can use a Jordan Form argument or we can look to the underlying graph and recognize that the eigenvectors associated with $\lambda = 1$ must in fact be orthogonal, as there cannot be communication between two recurrent classes -- if there was, those two would in fact only be one recurrent class.)

In a nutshell, this gives us the proof that for a stochastic matrix $\mathbf A$ where all diagonal elements $a_{i,i} >0$ then the graph is aperiodic -- i.e. has nice limitting behavior.  If it is a connected graph, then it is ergodic and thus the entire graph is one aperiodic recurrent class.

From here, we know that for an ergodic graph, the algebra multiplicity = geometric multiplicity = 1 of $\lambda = 1$, because there is only one recurrent eigenvector associated with $\lambda = 1$. Yet all other eigenvalues cannot be equal to one, and hence their magnitude must be less than one -- because $a_{i,i} >0$.  

Thus we say $\lambda_1 = 1 \gt \big \vert \lambda_2 \big \vert \geq \big \vert \lambda_3 \big \vert \geq ... \geq \big \vert \lambda_n \big \vert \geq 0$.  This is a proof for the sole steady state solution of Page Rank, for instance.  Normally people import considerably heavier duty machinery to prove this (specifically Perron-Frobenius), yet we were able to do this almost entirely using Gerschgorin discs.  

The fact that having all diagonal elements of a transition matrix $\mathbf A$ have a weight $\gt 0$ guarantees there will not be periodic behavior should feel quite intutitive.  From a graph theoretic standpoint, what we are doing is ensuring that each node has a self loop.  This intuitively means that no matter where we start, after $n$ iterations, we will have a 'presence' on each (directly or indirectly) connected node, and this presence will then begin to converge to the steady state which also has a presence on all nodes.  


# Other Applications


And again, whether we think of this in terms of eigenvalues with Gerschgorin discs, or Levy-Desplanques, we also have a way of knowing whether or not certain matrices are invertible without going through the full calculation -- i.e. if they are diagonally dominant, their invertibility should just jump off the page at you. 

Thus the ability to bound eigenvalues via the simple and intuitive Gerschgorin Discs, gives us a new way of interpretting special structure in matrices.  

There are of course also numeric applications in engineering where matrices with special structures are used.  In these cases if may be nice to prove that the matrices are symmetric positive semi defnite in general, irrespetive of their size -- whether they are 5x5 or 1,000 x 1,000 or more generally $n$ x $n$.  There may be alternative approaches that involve importing machinery like Cauchy Interlacing and then doing induction on $n$, but using Gerschgorin Discs gives a simple, direct and very visual way to prove this.  


For instance consider the matrix given on page number 34 (35 of 42 according to PDF viewer) here: https://ocw.mit.edu/courses/aeronautics-and-astronautics/16-920j-numerical-methods-for-partial-differential-equations-sma-5212-spring-2003/lecture-notes/lec15.pdf

Just by looking at it, we we tell it is real valued and symmetric, so we know its eigenvalues are all real.  We can use gerschgorin discs to determine that the minimum possible eigvenvalue is zero.  Hence we know the matrix is at least Symmetric Positive Semi-Definite.  We can also easily look through the implicit Gram-Schmidt and determine that the first n - 1 columns must be linearly independent, and hence the rank of this $n$ x $n$ matrix is at least n - 1.  With a small bit of more work, we can then determine that the final column must be in linearly independent as well, and thus the matrix is Symmetric Positive Definite.  But the main point is that by simply eyeballing the symmetry, and knowing about Gerschgorin discs, we were able to have deep and general understanding of the spectrum underlying this matrix for any finite $n$ x $n$ dimension that it may take on.  

Finally, Levy-Desplanques and Gerschgorin Discs also are used in numerical linear algebra for evaluating conditioning, making claims on whether pivotting is needed in Gaussian Elimination, and so on.

In [24]:
import sympy as sp

x = sp.Symbol('x')
y = sp.Symbol('y')
z = sp.Symbol('z')

myfunc = x**2*y**2*z**4 + z**2

mylist = [x, y, z]

gradient = [myfunc.diff(variable) for variable in mylist]
hessian = [[partial1.diff(variable) for variable in mylist] for partial1 in gradient]


hessianmatrix = sp.Matrix(hessian)

print(hessianmatrix.eigenvals())
# these are NOT easy to interpret and it is a very small Hessian!
# use a different tool!
    

{4*x**2*y**2*z**2 + 2*x**2*z**4/3 + 2*y**2*z**4/3 - (-1/2 - sqrt(3)*I/2)*(-1512*x**4*y**4*z**10 + 324*x**2*y**2*z**8 + sqrt((-3024*x**4*y**4*z**10 + 648*x**2*y**2*z**8 - (-108*x**2*y**2*z**2 - 18*x**2*z**4 - 18*y**2*z**4 - 18)*(-40*x**4*y**2*z**6 - 40*x**2*y**4*z**6 - 12*x**2*y**2*z**8 + 4*x**2*z**4 + 4*y**2*z**4) + 2*(-12*x**2*y**2*z**2 - 2*x**2*z**4 - 2*y**2*z**4 - 2)**3)**2 - 4*(120*x**4*y**2*z**6 + 120*x**2*y**4*z**6 + 36*x**2*y**2*z**8 - 12*x**2*z**4 - 12*y**2*z**4 + (-12*x**2*y**2*z**2 - 2*x**2*z**4 - 2*y**2*z**4 - 2)**2)**3)/2 - (-108*x**2*y**2*z**2 - 18*x**2*z**4 - 18*y**2*z**4 - 18)*(-40*x**4*y**2*z**6 - 40*x**2*y**4*z**6 - 12*x**2*y**2*z**8 + 4*x**2*z**4 + 4*y**2*z**4)/2 + (-12*x**2*y**2*z**2 - 2*x**2*z**4 - 2*y**2*z**4 - 2)**3)**(1/3)/3 + 2/3 - (120*x**4*y**2*z**6 + 120*x**2*y**4*z**6 + 36*x**2*y**2*z**8 - 12*x**2*z**4 - 12*y**2*z**4 + (-12*x**2*y**2*z**2 - 2*x**2*z**4 - 2*y**2*z**4 - 2)**2)/(3*(-1/2 - sqrt(3)*I/2)*(-1512*x**4*y**4*z**10 + 324*x**2*y**2*z**8 + sqrt((-3024*x*