This is a nice way to 'eyeball' the range that eigenvalues can be in.  It also has a nice picture associated with it.  I'd finally add that this leads to an immediate proof for understanding diagonally dominant matrices, a clean way to prove that symmetric (or in complex space, Hermitian) matrices are positive (semi) definite.

Indeed we can use these Gerschgorin discs to make especially strong claims about matrices with special structure.  

At first this post was going to be my adaptation of the proof from Kuttler's "Linear Algebra" (page 177).  However I ultimately decided that a more indirect route -- via Levy-Desplanques -- was more a lot more intuitive.  


While I generally prefer proofs other than contradictions, this proof of Levy-Desplanques is elementary, short and sweet and immediately leads to Gerschgorin discs in a very intuitive way.

The first part of this is, essentially, a direct lift from 
https://shreevatsa.wordpress.com/2007/10/03/a-nice-theorem-and-trying-to-invert-a-function/ .  However, I added extra steps in here to make the derivation a bit more deliberate.  

We say that $\mathbf A \in \mathbb C^{n x n}$ is (strictly) diagonally dominant if for every row i, $\big \vert a_{i,i}\big \vert \gt \sum_{j \neq i}\big \vert a_{i,j}\big \vert$.

**claim:** if (square) matrix $\mathbf A$ is diagonally dominant, then $\det(\mathbf A) \neq 0$

The contradiction comes from assuming $\det(\mathbf A) =0$, i.e. that $\mathbf A$ is not invertible.  If this is the case, there must some $\mathbf x \neq \mathbf 0$ where $\mathbf A \mathbf x = \mathbf 0$.  

Consider the maximal coordinate on $\mathbf x$, $x_k$, where $\big \vert x_k \big \vert \geq \big \vert x_j\big \vert$ for $j = \{1, 2, ..., n\}$

hence $x_k \neq 0$, and we can say that $\big \vert x_k \big \vert \gt 0$

Now look at the kth row of $\mathbf {Ax}$.  We have 

$a_{k, 1} x_1  + a_{k, 2} x_2 + ... + a_{k,n} x_n = \sum_{j=1}^{n} a_{k,j} x_j = a_{k,k}x_k + \sum_{j \neq k} a_{k,j} x_j$

now recall that we are considering the case where $\mathbf {Ax} = \mathbf 0$ for some non-zero $\mathbf x$, hence

$a_{k,k}x_k + \sum_{j \neq k} a_{k,j} x_j = 0$

or 

$a_{k,k}x_k = - \sum_{j \neq k} a_{k,j} x_j $

now take the magnitude of both sides

$\big \vert a_{k,k}x_k \big \vert = \big \vert a_{k,k}\big \vert \big \vert x_k \big \vert = \big \vert \sum_{j \neq k} a_{k,j} x_j \big \vert \leq  \sum_{j \neq k} \big \vert a_{k,j}\big \vert \big \vert x_j \big \vert \leq \sum_{j \neq k} \big \vert a_{k,j}\big \vert \big \vert x_k \big \vert = \big \vert x_k \big \vert\big( \sum_{j \neq k} \big \vert a_{k,j}\big \vert \big)$


From here the contradiction is aparent, but we can further distill this to:

$\big \vert a_{k,k}\big \vert \big \vert x_k \big \vert \leq \big \vert x_k \big \vert\big( \sum_{j \neq k} \big \vert a_{k,j}\big \vert \big)$

thus, because $\big \vert x_k \big \vert \gt 0$, we can divide it out of the above and since it is positive, the inequality sign does not change.  

$\big \vert a_{k,k}\big \vert \leq \sum_{j \neq k} \big \vert a_{k,j}\big \vert$

yet this contradicts our (strong) defintion of diagonal dominance, because we said our matix satisfied:

$\big \vert a_{k,k}\big \vert \gt \sum_{j \neq k}\big \vert a_{k,j}\big \vert$

hence we know $\mathbf A \mathbf x = \mathbf 0$ if and only if $\mathbf x = \mathbf 0$, thus $det\big(\mathbf A\big) \neq 0$.

# Now the good part:

for each eigenvalue, $\lambda$ in $\mathbf A$, we can say that when we consider the matrix

$\big(\mathbf A - \lambda \mathbf I\big)$, it is not invertible, so per Levy-Desplanques  there must be some diagonal entry on $\mathbf A$ where 

$\big \vert a_{i,i} - \lambda \big \vert \ngtr \sum_{j\neq i} \big \vert a_{i,j}\big \vert$

we can restate this as:

$\big \vert a_{i,i} - \lambda \big \vert \leq \sum_{j\neq i} \big \vert a_{i,j}\big \vert$

and this is the Gershgorin disc formula.  To be clear, the above does not tell us which diagonal entry gives us the range for a given eigenvalue -- it just tells us that any given eigenvalue must be located in a disc associated with one of these diagonal entries.  

It is perhaps worth noting that we can also apply this to the conjugate transpose of $\mathbf A$ -- hence we could instead interpret this formula in terms of the columns of $\mathbf A$.  

# Why might this be useful?

Gerschgorin discs are a very nice tool for identifying things like whether a Hermitian --Symmetric, in reals-- matrix is positive (semi) definite.  To be clear, the test associated with the discs will generally tell us yes it is positive semi-definite or it will say unclear.  In some cases it may allow us to reject the hypothesis as well, but we already have other tools at our disposal, like -- (a) are any of the diagonal entries negative, (b) if any of the diagonal entries are zero, then you need the entire column and row associated with that diagonal to be zero, and (c) we also have the inequality $N * trace\big(\mathbf A \big) \geq sum \big(\mathbf{A}\big)$ -- proved over reals in my posting "SPDSP_Trace_Inequality" which must be true for a Hermitian or Symmetric matrix that is positive semi-definite.  Notably all of these other tools allow us to reject whether a matrix is positive semi-definite -- they don't allow us to confirm that it is.  However, in certain cases, Gerschgorin discs do allow us to confirm this-- the calculation involved is quite simple, and their proof is quite simple as well.   

Now, if we are doing something like a second derivative test, using a Hessian Matrix, we would in fact just use numeric values at or around a critical point.  There are times, however where we may want to evaluate our Hessian symbolically over a large range of values.  

for example consider the function, f, where 

$f(x,y,z) = x^{2} y^{2} z^{4} + z^{2}$

which has the following Hessian:

$\left[\begin{matrix}2 y^{2} z^{4} & 4 x y z^{4} & 8 x y^{2} z^{3}\\4 x y z^{4} & 2 x^{2} z^{4} & 8 x^{2} y z^{3}\\8 x y^{2} z^{3} & 8 x^{2} y z^{3} & 12 x^{2} y^{2} z^{2} + 2\end{matrix}\right]$




**Taussky's Refinement**  
reference pages 59 - 61 of Brualdi's *The Mutually Beneficial Relationship of Graphs and Matrices*  

A *weakly* diagonally dominant square matrix is one where   
$\big \vert a_{i,i} \big\vert \geq \sum_{j\neq i} \big \vert a_{i,j}\big \vert = r_i$  (i.e. radius on row i)  
with the inequality strict for at least one row $i$.  


**claim:**  
if $\mathbf A$ is irreducible and weakly diagonally dominant, then  
$\det\big(\mathbf A\big) \neq 0$  

*your author's approach:*  
remark: being irreducible implies that there are no zero rows, which implies each $a_{i,i} \neq 0$  

For this proof, *we can assume WLOG that each $a_{i,i} = -1$*  

- - - -  
Why? If this isn't the case we can consider 

$\mathbf A = \mathbf {DZ}$  

where $\mathbf D$ is a normalizing diagonal matrix such that $z_{i,i} = -1$.  As such $\det \big( \mathbf D\big) \neq 0$, so we need to determine whether $\det\big(\mathbf Z\big) = 0$ to determine the singularity of $\mathbf A$.  
- - - -  

Hence if $\det\big(\mathbf A\big) = 0$ then there is some $\mathbf x \neq \mathbf 0$ such that 
$\mathbf {Ax} = \mathbf 0$  

Equivalently if $\mathbf A$ is singular, then 

$\big(\mathbf A + \mathbf I\big)\mathbf x = \mathbf I \mathbf x = \mathbf x$  

so we define 

$\mathbf B := \big(\mathbf A + \mathbf I\big)$  

Now, by repeated application of triangle inequality, we know  

$\big \vert \mathbf x \big \vert =  \big \vert \big(\mathbf B \mathbf x\big) \big \vert \leq  \big(\big \vert\mathbf B\big \vert \cdot \big \vert \mathbf x\big \vert\big) $  



where the magnitude / absolute value is understood to be applied component wise and the inequality is evaluated component-wise.  (Notationally this is similar to that used in Brualdi and elsewhere in discussions of Peron Frobenius Theory).  

equivalently, if we look at the scalars in row $i$, this reads  
$\big \vert x_i\big \vert = \big \vert \sum_{j=1}^n b_{i,j}x_j\big \vert \leq \sum_{j=1}^n \big \vert b_{i,j}x_j\big \vert = \sum_{j=1}^n \big \vert b_{i,j}\big \vert \cdot \big \vert x_j\big \vert$  

and by further application of triangle inequality, we have 

$\big \vert \mathbf x \big \vert = \big \vert\big(\mathbf B^2 \mathbf x\big) \big \vert \leq \big \vert\mathbf B\big \vert \cdot \big \vert \big(\mathbf B \mathbf x\big) \big \vert \leq  \big(\big \vert\mathbf B\big \vert \big \vert\mathbf B\big \vert \cdot \big \vert \mathbf x\big \vert\big) = \big \vert\mathbf B\big \vert^2 \cdot \big \vert \mathbf x\big \vert $  

and by induction we have  

$\big \vert \mathbf x \big \vert = \big \vert\big(\mathbf B^k \mathbf x\big) \leq \big \vert\mathbf B\big \vert^k \cdot \big \vert \mathbf x\big \vert = \mathbf P^k \cdot \big \vert \mathbf x\big \vert$  

for all natural numbers $k$, where $\mathbf P :=  \big \vert\mathbf B\big \vert$  

Based on our construction of weak diagonal dominance and magnitude one on the diagonal, we see that $\mathbf P$ is an substochastic matrix associated with an irreducible markov chain.  To bring the point home, we can embed $\mathbf P$ in an absorbing chai, where we've inserted a state 0 as the absorbing state

$\mathbf M = \begin{bmatrix} 
1 & 0\\ 
* & \mathbf P\\ 
\end{bmatrix}$


where at least one starred component is positive so that the matrix is row stochastic $\mathbf {M1} = \mathbf 1$.  That means that at least one state $\mathbf P$ communicates with the absorbing state -- so said state is transient.  But since the underlying graph in $\mathbf P$ is irreducible, and transience is a class property, *all states are transient*.  We also, of course can multiply this in blocked form:  

$\mathbf M^k = \begin{bmatrix} 
1 & 0\\ 
* & \mathbf P^k\\ 
\end{bmatrix}$

The end result, is that we have   

$\big \vert \mathbf x \big \vert = \lim_{k\to \infty} \big \vert \mathbf x \big \vert \leq \lim_{k\to \infty}  \mathbf P^k \cdot \big \vert \mathbf x\big \vert = \mathbf 0$ 

or, if the reader prefers, with 

$\mathbf v := \begin{bmatrix} 
0\\
\big \vert \mathbf x\big \vert
\end{bmatrix}$  


$\mathbf v = \lim_{k\to \infty} \mathbf v \leq \lim_{k\to \infty}  \mathbf M^k \cdot \big \vert \mathbf v\big \vert = \mathbf e_0$ 

which is a contradiction.  

- - - - -
the above are standard markov chain results.  Based on zero or one laws, if a node in a communicating class is transient, then all in that class are transient.  Since a renewal does not occur with probability one, this implies that the expected number of visits to said state is finite, which means that the number of visits after time $t$ tends to zero by selecting large enough $t$ (either via Borell Cantelli or Markov Inequality).  Thus all diagonal components of $P$ tend to zero. For avoidance of doubt, this *also* implies that the off diagonal components tends to zero.  Consider:  

for $j, i \geq 1$  

$\sum_{k=1}^\infty p_{i,j} = E\Big[N\Big] = E\Big[\big[N\big \vert \text{vist state j once}\big]\Big] = 0 + p \cdot \big(1 + \sum_{k=1}^\infty P_{j,j}^{(k)}\big) \leq 1 + \sum_{k=1}^\infty P_{j,j}^{(k)} \lt \infty$  

with $p$ being the total probability of ever reaching state $j$ from state $i$.  Being probability we know $p \in [0,1]$.  

this implies that 
$P_{j,j}^{(k)} \to 0$ 
by selecting large enough $k$ (again by Borell Cantelli).  Equivalently the this is a delayed defective renewal process, where the 'real renewal' occurs at state $j$ but the transition probability to there from $i$ may be defective, and we *know* that the renewal process from $j\to j$ is defective.   

*remark:*  
Some of the ideas here closely follow pages 400 - 402 of Feller Vol 1 (3rd edition).  

*Brualdi's Approach*  

The standard approach in Brualdi is much shorter but does not rely on probability theory but relies on a nested contradiction.  

As before $\mathbf A$ is an $n$ x $n$ matrix that is irreducible and weakly diagonally dominant. If $\det\big(\mathbf A\big) = 0$ then there is some $\mathbf x \neq 0$ such that $\mathbf {A x} = \mathbf 0$.  

Now, since the inequality is strict for at least one row of $\mathbf A$, we can see that $\mathbf x \propto \mathbf 1 \neq \mathbf 0$.  This means there is some maximal magnitude component of our vector, called $x_k$ as well as at least one $\big \vert x_i\big \vert \lt \big \vert x_k \big \vert $.  So we create a bipartition -- the set $U$ has all components of $\mathbf x$ where $\big \vert x_j \big \vert = \big \vert x_k\big \vert$, and $U^C$ has all other components of $\mathbf x$.  Since the underlying graph is irreducible, there must be a $p \in U$ and $q \in U^C$ (**tbc mechanics of why this holds**) such that $a_{p,q} = \neq 0$.  Since $p \in U$ we can infer that $\big \vert x_q\big \vert = \big \vert x_k\big \vert$ 

**(needs cleaned up and finished)**  







The immediate corollary is:  

if $\mathbf A$ is irreducible, then if we revisit our Gershgoring Disc formula:  

$\big \vert a_{i,i} - \lambda \big \vert \leq \sum_{j\neq i} \big \vert a_{i,j}\big \vert$

we find that $\lambda$ can be an eigenvalue on the boundary of the union of discs **iff** it is a boundary point of all of the circular discs  


This matrix is small enough (3x3 means cubic root) that we can solve symbollically for the eigenvalues exactly, but those eigenvalues -- given two cells down-- are not so easy to interpret.  This gets increasingly difficult for much larger matrices.  As is, we an simply look at (1,1,1) and see that the trace inequality is not being observed, hence the Hessian is not positive semi-definite at (1,1,1) and the function is thus not convex.  (We can of course look at the fact that diagonal elements are always positive to know that the function is *not negative convex*.)  

There are *many* applications where we may want to make claims about the eigenvalues / singularity of a matrix without looking at specific numerical values.


# Application: Graph Laplacian

By construction we know that the Graph Laplacian is symmetric and real.  Thus we know all eigenvalues are real.  Further, we know that the diagonal entries are positive, and all off diagonal entries are either zero or negative.  An example is shown below.

$\mathbf L = 
\begin{bmatrix}
3 & -1 & 0 & -1 &  -1 & 0\\ 
-1 & 3 & -1 & 0 & -1 & 0\\ 
0 & -1 & 2 & 0 & -1 &0 \\ 
 -1& 0 & 0 & 2 & -1 & 0\\ 
-1 & -1 & -1 & -1 & 5 & -1\\ 
0 & 0 & 0 &0  & -1 & 1
\end{bmatrix}$


The minus ones correspond to edges and the diagonal values represented the degree of a given node (recall that the graph is undirected).  Thus for any Laplacian, we know:

$\mathbf{L1} = \mathbf 0$

This combined with aforementioned structure (symmetric, positives on diagonal, non-positives off diagonal) tells us, that the Laplacian is singular, and using Gershgorin's discs we can observed that $\big \vert l_{i,i} - \lambda \big \vert \leq l_{i,i}$, which tells us that the smallest an eigenvalue can be is 0.  (If an eigenvalue were less than zero, its distance from the strictly postively valued $l_{i,i}$ would necessarily be more than $l_{i,i}$.)  Hence we observe that the graph Laplacian is Symmetric Positive Semi-Definite.  There are other ways to prove this fact, of course, but the Gerschgorin disc approach is extremely quick and intuitive.  

- - - -


# Other Applications


And again, whether we think of this in terms of eigenvalues with Gerschgorin discs, or Levy-Desplanques, we also have a way of knowing whether or not certain matrices are invertible without going through the full calculation -- i.e. if they are diagonally dominant, their invertibility should just jump off the page at you. 

Thus the ability to bound eigenvalues via the simple and intuitive Gerschgorin Discs, gives us a new way of interpretting special structure in matrices.  

There are of course also numeric applications in engineering where matrices with special structures are used.  In these cases if may be nice to prove that the matrices are symmetric positive semi defnite in general, irrespetive of their size -- whether they are 5x5 or 1,000 x 1,000 or more generally $n$ x $n$.  There may be alternative approaches that involve importing machinery like Cauchy Interlacing and then doing induction on $n$, but using Gerschgorin Discs gives a simple, direct and very visual way to prove this.  


For instance consider the matrix given on page number 34 (35 of 42 according to PDF viewer) here: https://ocw.mit.edu/courses/aeronautics-and-astronautics/16-920j-numerical-methods-for-partial-differential-equations-sma-5212-spring-2003/lecture-notes/lec15.pdf

Just by looking at it, we we tell it is real valued and symmetric, so we know its eigenvalues are all real.  We can use gerschgorin discs to determine that the minimum possible eigvenvalue is zero.  Hence we know the matrix is at least Symmetric Positive Semi-Definite.  We can also easily look through the implicit Gram-Schmidt and determine that the first n - 1 columns must be linearly independent, and hence the rank of this $n$ x $n$ matrix is at least n - 1.  With a small bit of more work, we can then determine that the final column must be in linearly independent as well, and thus the matrix is Symmetric Positive Definite.  But the main point is that by simply eyeballing the symmetry, and knowing about Gerschgorin discs, we were able to have deep and general understanding of the spectrum underlying this matrix for any finite $n$ x $n$ dimension that it may take on.  

Finally, Levy-Desplanques and Gerschgorin Discs also are used in numerical linear algebra for evaluating conditioning, making claims on whether pivotting is needed in Gaussian Elimination, and so on.

Note: with respect to time homogenous finite state Markov Chains, while there are more powerful approaches using greatest common divisor (which generaize to countable state markov chains), Gerschgorin discs (with a strictness refinement due to Taussky) immediately tell us that for a graph with a single communicating class (irreducible) and even one self-loop -- said graph cannot have periodic behavior because the only point on the unit circle touching/inside of *all* Gerschgorin Discs is the value 1 (this is the Taussky refinement), hence the only eigenvalue with magnitude 1 is an eigenvalue of one. All other eigenvalues have magnitude less than 1 and may be made arbitrarily small after a large enough number of iterations.  The fact that the eigenvalue of 1 is simple (i.e. algebraic multiplicity of one) is of course given by Perron Frobenius Theory or standard markov chain results from Kolmogorov.  

Of interest: it is also implied directly by the elementary renewal theorem with a delayed start -- or perhaps better, we could formulate this as a renewal rewards problem where are reward of one is given each time we are visit any state in the graph, of a cycle starting and finishing at some arbitrary node $i$.  The renewal reward theorem (and perhaps common sense) tells us that we have a time averaged reward of 1 -- but this is equivalent to 

$1  =  \lim_{t \to \infty}\frac{E[r(t)]}{t}  =\lim_{t \to \infty} \frac{1}{t}\sum_{k=1}^t \text{trace}\big(\mathbf A^k\big)$  

However if the algebraic multipliciity of eigenvalue $1$ is larger than 1, (e.g. 2 or 3 or...) then the time averaged trace must be at least $2$, which is a contradiction-- this proves that the eigenvalue of one is simple. (Note that this *also*  proves simplicity of eigenvalue 1 for any irreducible time homogenous finite state markov chain -- including periodic chains.  Consider the matrix $\mathbf A$ and its eigenvalues.  Now consider the convex combination of $\mathbf B: = \frac{1}{2}\big(\mathbf A + \mathbf I\big)$. Here the graph in $\mathbf B$ is connected but has (many) self-loops and hence is aperiodic -- and the above tells us that $\mathbf B$ has a simple eigenvalue of $1$ and all others with magnitude less than $1$.  Yet the eigenvalues of $\mathbf B$ the average of the eigenvalues of $\mathbf A$ and $1$.  Thus if $\mathbf A$ had multiple eigenvalues of $1$, so would $\mathbf B$, which tells us that $\mathbf A$ has a simple eigenvalue of $1$.  


In [1]:
import sympy as sp

x = sp.Symbol('x')
y = sp.Symbol('y')
z = sp.Symbol('z')

myfunc = x**2*y**2*z**4 + z**2

mylist = [x, y, z]

gradient = [myfunc.diff(variable) for variable in mylist]
hessian = [[partial1.diff(variable) for variable in mylist] for partial1 in gradient]


hessianmatrix = sp.Matrix(hessian)

print(hessianmatrix.eigenvals())
# these are NOT easy to interpret and it is a very small Hessian!
# use a different tool!
    

{4*x**2*y**2*z**2 + 2*x**2*z**4/3 + 2*y**2*z**4/3 - (-1512*x**4*y**4*z**10 + 324*x**2*y**2*z**8 + sqrt((-3024*x**4*y**4*z**10 + 648*x**2*y**2*z**8 - (-108*x**2*y**2*z**2 - 18*x**2*z**4 - 18*y**2*z**4 - 18)*(-40*x**4*y**2*z**6 - 40*x**2*y**4*z**6 - 12*x**2*y**2*z**8 + 4*x**2*z**4 + 4*y**2*z**4) + 2*(-12*x**2*y**2*z**2 - 2*x**2*z**4 - 2*y**2*z**4 - 2)**3)**2 - 4*(120*x**4*y**2*z**6 + 120*x**2*y**4*z**6 + 36*x**2*y**2*z**8 - 12*x**2*z**4 - 12*y**2*z**4 + (-12*x**2*y**2*z**2 - 2*x**2*z**4 - 2*y**2*z**4 - 2)**2)**3)/2 - (-108*x**2*y**2*z**2 - 18*x**2*z**4 - 18*y**2*z**4 - 18)*(-40*x**4*y**2*z**6 - 40*x**2*y**4*z**6 - 12*x**2*y**2*z**8 + 4*x**2*z**4 + 4*y**2*z**4)/2 + (-12*x**2*y**2*z**2 - 2*x**2*z**4 - 2*y**2*z**4 - 2)**3)**(1/3)/3 + 2/3 - (120*x**4*y**2*z**6 + 120*x**2*y**4*z**6 + 36*x**2*y**2*z**8 - 12*x**2*z**4 - 12*y**2*z**4 + (-12*x**2*y**2*z**2 - 2*x**2*z**4 - 2*y**2*z**4 - 2)**2)/(3*(-1512*x**4*y**4*z**10 + 324*x**2*y**2*z**8 + sqrt((-3024*x**4*y**4*z**10 + 648*x**2*y**2*z**8 - (-108

# extension: include Cassini Disks

http://bwlewis.github.io/cassini/#br1

or better: work through the Casini ovals stated (and then extended with graph properties), in this file: 

'CasiniOvals_extension.pdf'

located in Linear Algebra folder ...

also this seems to quite good

http://planetmath.org/sites/default/files/texpdf/37503.pdf

also this: 

http://www.math.kent.edu/~varga/pub/paper_232.pdf



