

# Analyze Batch Jobs 

Running batch job is critical operation on mainframe. Everyday there are 10000~60000 jobs running at day and night. During first or last day of month/quarter/year, the workload may reach twice of regular days.

Machine learning analytics on large volume logs of batch job could help administrators in 4 aspects:

* Understand batch job seasonality and workload change trend
* Gain insight of important impact for batch job elapsed time
* Predict elapsed time of long time batch jobs
* Identify candidate of abnormal batch job instances

In this sample project, we will show some analytics samples for a fictional bank of BankABC's Master Batch Job by using python notebook and modeler flow, based on customized sample data.
   
   >  BankABC wants to analyze elapsed time of Master Batch Job(MBJ). MBJ is a big application run at every midnight, includes over 10000 jobs, covered various types of business transaction. <br>
   BankABC hopes to know how MBJ elapsed time changes, which factors impact it mostly, what is the correlation among daily business transaction volumes and elapsed time. <br>
   Additionally, BankABC want to know if MBJ elapsed time is predictable. Usually MBJ will run 2~5 hours at midnight, BankABC want to make sure it complete before office hour in next morning. A reasonable predition will help to arrange MBJ and maintenance jobs productively, and useful for anomaly detection. <br>

When you have completed this sample project, you will understand how to:

* Extract batch job log data from SMF type 30 record in mainframe
* Explore log data to get insight about batch job elapsed time 
* Use several algorithms to predict batch job elapsed time 
* Identify candidates of abnormal batch job instances and business transaction volume


   
## Architecture

<!--add an image in this path-->
<img src="https://raw.githubusercontent.com/IBM/analyze-batch-job-z/master/Image/architecture.png">


### Highlights
1. You could work on WMLz(IBM Watson Machine Learning for z/OS) through web browser.
2. WMLz provides Jupyter Notebook for you to code in Python, SCALA and R.
3. WMLz provides Modeler Flow for you to explore data and train model in canvas by drop and down.
4. You could read mainframe native file eg SMF Type 30 record with Python notebook based on included mainframe data service.


## What is included ?
   **There are 3 important parts in the project**
   
   ### Dataset
    1. df_smf.csv
       Sample output of 1_BatchJob_SMF30Extract.ipynb, which is batch job run time metrics, the most important data input for later batch job analytics.
       In z/OS operation log, SMF provides a common interface to extract system operation measurements. SMF Type 30 includes records of batch job operation. 
       In real client environments, such data could be also collected by other 3rd party softwares.

    2. MasterBatchJob.csv
       Elapsed time of MBJ in one year, sample data simulated for demo.
       The elasped time of MBJ is the minutes between the start time of first job in MBJ and the end time of last job in MBJ, could be calculated from df_smf.csv

    3. TxnVolume.csv
       Transaction Volumns of various business type one year, sample data simulated for demo.   

    4. calendar_join.csv
       Calendar data with calenday elements like weekday, day, month and etc.

    5. widetable_MBJ.csv
       A joined wide table of MBJ elapsed time, transaction volumes and calendar data, ready for model training for 3_BatchJob_MBJ_Prediction.ipynb.
    

   
   ### Notebook 
    It includes source code of 4 Python notebooks. You could open them in Jupyter web environment and run them one by one following the instruction inside notebook.
   
    0_readme.ipynb
      Overview of this sample.

    1_BatchJob_SMF30Extract.ipynb
      It extract batch job run time operation data from SMF Type 30 record. 
      You could refer to IBM Knowledge Center about SMF Type 30 record, to get know more about mainframe job metrics definition according to your z/OS version.
      Since SMF 30 log record is usually large in size, so not included in the project package, but a sample output from this Notebook df_smf.csv is included for your information.

    2_BatchJob_MBJ_DataExploration.ipynb
      This notebook explores MBJ's elpased time, get insight on trend in timeline, correlation to daytime business volumes, periodicity on week/day/month.

    3_BatchJob_MBJ_Prediction.ipynb
      This notebook applies 3 methods to predict MBJ elapsed time based on historical data, calendar information and business transaction volume data, emsenble them as final results.

    Note:
        In notebook cell, when read local dataset on Waston Machine Learning for z/OS, will need a token for authetication, the expiration period is configurated by adminitrator, when you run cell failed and error message shows that token expired, please click on top right button to get a new up-to-date token to replace the old project context.
  
  ### Flow
    It includes 2 flows. You could open them in canvas, use drop and draw to do experiments. 
    
    4_BatchJob_MBJ_TSPredict.str
      This flow applies time series algorithm to predict elapsed time.
      
    5_BatchJob_MBJ_AnomalyDetect.str
      This flow detects anomaly in elapsed time and business transaction volumes.
      

# Related links
<a href="https://developer.ibm.com/patterns/analyze-batch-job-with-watson-machine-learning-for-zos"> Overview: Analyze Batch Jobs on IBM mainframe with machine learning</a><p>
<a href="https://github.com/IBM/analyze-batch-job-z"> GitHub: Analyze Batch Jobs via Watson Machine Learning on z/OS </a><p>

## Backgroud context

<a href="https://www.ibm.com/us-en/marketplace/machine-learning-for-zos">IBM Watson Machine Learning for z/OS </a><p>
<a href="http://www.redbooks.ibm.com/abstracts/sg248421.html?Open">Turning Data into Insight with IBM Machine Learning for z/OS </a><p>
<a href="https://www.ibm.com/support/knowledgecenter/zosbasics/com.ibm.zos.zmainframe/zconc_batchproc.htm">Mainframes working after hours: Batch processing </a><p>
<a href="https://en.wikipedia.org/wiki/IBM_System_Management_Facilities">IBM System Management Facilities </a><p>
<a href="https://www.ibm.com/support/knowledgecenter/en/SSLTBW_2.3.0/com.ibm.zos.v2r3.ieag200/rec30.htm">IBM SMF Type 30 record </a><p>
