# What to Monitor


## Driver and Executor Processes
You need to monitor the driver since this is where all of the state of your application lives, and you'll have to ensure that it is running in a stable manner. 

## queries, Jobs, Stages and Tasks
Sometimes we need to debug what's going on at the level of a specific query. This information allows us to know exactly what is running on the cluster at a given time.

# Spark Logs
One of the most detailed ways to monitor spark is through its log files. One challenge however, is that Python won't be able to integrate directly with Spark's Java-based logging library. Using Python's logging module or event simple print statements will still print the results to standard error and make them easy to find. 
To change spark's log level, simply run the following command

```
spark.sparkContext.setLogLevel('INFO')
```

This will allow you to read the logs. 

In [5]:
from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()

# The Spark UI
This provides a visual way to monitor applications while they arer running as well as metrics about your spark workload, at the spark and JVM level. Every sparkContext running launches a web UI, by default on port 4040, that displays useful information about the application

In [6]:
spark.read\
        .option('header', 'true')\
        .csv('/home/kevin/Desktop/Big-Data-with-Pyspark/data/retail-data/all/online-retail-dataset.csv')\
        .repartition(2)\
        .selectExpr("instr(Description, 'GLASS' ) >= 1 as is_glass ")\
        .groupBy('is_glass')\
        .count()\
        .collect()

                                                                                

[Row(is_glass=None, count=1454),
 Row(is_glass=True, count=12861),
 Row(is_glass=False, count=527594)]

In [7]:
spark.stop()