# Incident Resolution and Recommendation System (IRRS) on Xpresso.AI

## Supervised ML Classification System to identify incidents similar to the one reported



**STEP 1: FETCH THE DATA**
Xpresso.ai provides data connectivity libraries that enable data fetch from various databases and file systems

Databases currently supported: *MySQL, MS SQL Server, Mongo and Cassandra*

File Systems currently supported: *NFS, HDFS* (Support for AWS, GCP and Azure under development)

In [64]:
import xpresso.ai.core.data.connector.data_conn as dc

# create a data connector
xpr_conn = dc.DataConnector()

# set parameters for importing a CSV file
irrs_data_desc = {"type": "FS", 
                  "file_name":"xpresso_platform_dev/k8/irrs/data/dataxlsx/IT-Incidents-Distinct January and February.csv"}

# import data from a CSV file into a data frame
irrs_data_frame = xpr_conn.import_data(irrs_data_desc)
irrs_data_frame.head()

Unnamed: 0,INCIDENT_ID,TITLE,STATUS,IMPACT,PRIORITY,URGENCY,AREA,SUB_AREA,SITE_CATEGORY,OPENED_DT,...,UPDATED_DT,ESCALATED_(Y/N),UPDATED_BY,VENDOR,VENDOR_TICKET_STATUS,VENDOR_RELEASE_DT,CORRECTIVE_ACTION_DT,CI_SUBTYPE,CI_DEVELOPMENT_DIRECTOR,RETAIL_STORE
0,IM3922544,Remote site containing node RTGANORBB01 is unr...,Closed,4-User,4-Low,4-Low,MONITORING,FAULT,REMOTE,13-02-2017 02:08,...,13-02-2017 02:33,,,,,,,Router,,
1,IM3937797,G5S_Down Triggered on Application : APP-G5S-PR...,Closed,4-User,3-Average,1-Critical,MONITORING,FAULT,,23-02-2017 01:21,...,23-02-2017 01:59,,,,,,,BusinessApplication,"BIGNOTTI, ENRICO C.",
2,IM3933249,G5S_Down Triggered on Application : APP-G5S-PR...,Closed,4-User,3-Average,1-Critical,MONITORING,FAULT,,20-02-2017 03:09,...,20-02-2017 03:47,,,,,,,BusinessApplication,"BIGNOTTI, ENRICO C.",
3,IM3932312,G5S_Down Triggered on Application : APP-G5S-PR...,Closed,4-User,3-Average,1-Critical,MONITORING,FAULT,,19-02-2017 00:22,...,19-02-2017 07:11,,,,,,,BusinessApplication,"BIGNOTTI, ENRICO C.",
4,IM3917008,G5S_Down Triggered on Application : APP-G5S-PR...,Closed,4-User,3-Average,1-Critical,MONITORING,FAULT,,08-02-2017 02:10,...,08-02-2017 02:40,,,,,,,BusinessApplication,"BIGNOTTI, ENRICO C.",


In [65]:
trimmed_irrs_data_frame = irrs_data_frame.drop(irrs_data_frame.columns[30:-1], axis=1)
trimmed_irrs_data_frame.shape

(2000, 31)

**STEP 2: CREATE A DATASET**

A **Dataset** is at the core of Xpresso Data Analytics capability

It can be instantiated as a **StructuredDataset** or an **UnstructuredDataset**

**StructuredDataset** objects reflect data contained in a single Excel sheet / database table / CSV file

**UnstructuredDataset** objects reflect data contained in a set of binary files (e.g., images, videos, etc.)


In [66]:
from xpresso.ai.core.data.xdm.structured_dataset import StructuredDataset
# create a structured dataset to store the data
irrs_dataset = StructuredDataset()
irrs_dataset.name = "IRRS_Data"
irrs_dataset.project = "IRRS"
irrs_dataset.creation_by = "naveen.sinha"
irrs_dataset.data = trimmed_irrs_data_frame
print(irrs_dataset.data.shape)


(2000, 31)


**STEP 3: Store the data version in a Data Repository**

A good habit is to store the original data (and any changed versions thereafter) in a **Data Repository**

Xpresso provides Data Repositories for every project

In [60]:
from xpresso.ai.core.data.versioning.controller_factory import VersionControllerFactory
import pprint
pp = pprint.PrettyPrinter(indent=4)

dvc_factory = VersionControllerFactory(uid="pi", pwd="myawesomepassword", env="dev")
# Create a Repo Manager object
repo_manager = dvc_factory.get_version_controller()

In [61]:
repo_manager.list_repo()

[{'repo_name': 'pi_2project', 'Date of creation': '06/09/19'}]

In [None]:
# create a repo - this is done automatically by xpressoai when a project is created
repo_manager.create_repo({
    "repo_name": "irrs_repo"
})


In [48]:

# create a branch within the repo
repo_manager.create_branch({
    "repo_name": "irrs_repo",
    "branch_name": "release_1_0"
})

In [49]:
repo_manager.push_dataset(repo_name="irrs_repo", branch_name="release_1_0",  
                          dataset=irrs_dataset, description="Original Customer Data")

**Data repository libraries have methods to list and pull datasets**

In [117]:
pp.pprint(repo_manager.list_dataset(repo_name="irrs_repo", branch_name="release_1_0", path="/dataset/IRRS_Data"))

{   'commit': {   'branch_name': 'release_1_0',
                  'commit_id': '196b2813b3154296a1030451f85c5604',
                  'description': 'Original Customer Data',
                  'repo_name': 'irrs_repo'},
    'dataset': [   {   'file_name': 'IRRS_Data_dataset__00000.pkl',
                       'path': '/dataset/IRRS_Data/IRRS_Data_dataset__00000.pkl',
                       'size_in_bytes': 886682,
                       'type': 'File'}]}


In [118]:
irrs_dataset = repo_manager.pull_dataset(repo_name="irrs_repo", branch_name="release_1_0", path="/dataset/IRRS_Data")
irrs_dataset.data.shape

100%|██████████| 1/1 [00:03<00:00,  3.55s/it]


(2000, 31)

**STEP 4: EXPLORE THE DATA**

Xpresso provides libraries for Data Exploration and Visualization

Data Exploration proceeds in three steps:

1. *Understand* the data - identify the types of attributes in the data

2. Perform *univariate* analysis

3. Perform *bivariate* analysis

In [67]:
from xpresso.ai.core.data.exploration.dataset_explorer import Explorer

# create a data explorer
explorer = Explorer(irrs_dataset)

In [68]:

#Understand the data
explorer.understand(verbose=True)


  3%|▎         | 1/31 [00:00<00:04,  7.31it/s]


Starting Data Understanding:



100%|██████████| 31/31 [00:02<00:00, 11.55it/s]


Unnamed: 0,Datatype
INCIDENT_ID,string
TITLE,text
STATUS,nominal
IMPACT,nominal
PRIORITY,nominal
URGENCY,nominal
AREA,nominal
SUB_AREA,nominal
SITE_CATEGORY,nominal
OPENED_DT,date


In [69]:
%%time
#Changing type of the attribute
irrs_dataset.change_type(attribute_name="RETAIL_STORE",new_type="nominal")

CPU times: user 35 µs, sys: 1 µs, total: 36 µs
Wall time: 42 µs


In [70]:
%%time
# Perform univariate analysis
explorer.explore_univariate(verbose=True,to_excel=True)

  0%|          | 0/31 [00:00<?, ?it/s]


Starting UniVariate Exploration:



[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!

elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


Unable to find mode for ADDITIONAL_INFORMATION_1.
Unable to find mode for ADDITIONAL_INFORMATION_2.
Unable to find mode for ADDITIONAL_INFORMATION_3.
Unable to find mode for ADDITIONAL_INFORMATION_4.
Unable to find mode for ADDITIONAL_INFORMATION_5.


100%|██████████| 31/31 [00:01<00:00, 27.32it/s]

Unable to find mode for OUTAGE_DURATION_HOURS.
Unable to find mode for OUTAGE_DURATION_MINUTES.

Categorical Analysis:





Unnamed: 0,NA Count,type,Unique,NA %,Mode
STATUS,0,nominal,"[Closed, Open]",0.0,Closed
IMPACT,0,nominal,"[4-User, 3-Multiple Users, 1-Enterprise]",0.0,4-User
PRIORITY,0,nominal,"[4-Low, 3-Average, 1-Critical]",0.0,4-Low
URGENCY,0,nominal,"[3-Average, 1-Critical, 4-Low, 2-High]",0.0,3-Average
AREA,0,nominal,"[MONITORING, CLICKIT, RETAIL NETWORK]",0.0,MONITORING
SUB_AREA,0,nominal,"[FAULT, INCIDENT]",0.0,FAULT
SITE_CATEGORY,1883,nominal,"[RETAIL, REMOTE, METRO, CALL CENTER]",94.15,RETAIL
OPENED_BY,0,nominal,"[bsm_integration, clickit]",0.0,bsm_integration
OPEN_GROUP,1974,nominal,"[NETOPS-DATA, HDWSRV-WNTL, HDWSRV-NSO-UNIX, HD...",98.7,NETOPS-DATA
OPEN_GROUP_MANAGER,1974,nominal,"[WILLIS, GARY D., ROSS, DANIEL L., HART, KYLE ...",98.7,"WILLIS, GARY D."



Date Analysis:


Unnamed: 0,Max,Min,Missing Dates,NA %,NA Count,type
OPENED_DT,2017-12-02 23:43:00,2017-01-01 04:29:00,"[(2017-01-02, 2016-12-31), (2017-01-02, 2016-1...",0,0,date



String Analysis:


Unnamed: 0,INCIDENT_ID,INCIDENT_ID_count,AFFECTED_CI,AFFECTED_CI_count,CI_LOCATION,CI_LOCATION_count
0,IM3907284,1,PVMX6819,96.0,MOLESB,894.0
1,IM3869377,1,PVMX1235,72.0,VARESB,77.0
2,IM3880022,1,DVMX6741,47.0,TXARLF0101,16.0
3,IM3902301,1,PVMX0490,42.0,TXIVGB,16.0
4,IM3939187,1,PVMX6265,42.0,ILNILD0101,14.0
...,...,...,...,...,...,...
1995,IM3897416,1,,,,
1996,IM3944459,1,,,,
1997,IM3922518,1,,,,
1998,IM3901696,1,,,,



Text Analysis:


Unnamed: 0,TITLE_unigram,TITLE_unigram_count,TITLE_bigram,TITLE_bigram_count,TITLE_trigram,TITLE_trigram_count,DESCRIPTION_unigram,DESCRIPTION_unigram_count,DESCRIPTION_bigram,DESCRIPTION_bigram_count,DESCRIPTION_trigram,DESCRIPTION_trigram_count
0,"(remote,)",336,"(remote, site)",336,"(remote, site, containing)",336,"(remote,)",336,"(remote, site)",336,"(remote, site, containing)",336
1,"(site,)",336,"(site, containing)",336,"(site, containing, node)",336,"(site,)",336,"(site, containing)",336,"(site, containing, node)",336
2,"(containing,)",336,"(containing, node)",336,"(containing, node, rtganorbb01)",1,"(containing,)",336,"(containing, node)",336,"(containing, node, rtganorbb01)",1
3,"(node,)",955,"(node, rtganorbb01)",1,"(node, rtganorbb01, unreachable)",1,"(node,)",2772,"(node, rtganorbb01)",1,"(node, rtganorbb01, unreachable)",1
4,"(rtganorbb01,)",1,"(rtganorbb01, unreachable)",1,"(rtganorbb01, unreachable, g5s_down)",1,"(rtganorbb01,)",1,"(rtganorbb01, unreachable)",1,"(rtganorbb01, unreachable, object)",1
...,...,...,...,...,...,...,...,...,...,...,...,...
922,"(sanjmltbr01,)",2,"(99, node)",1,"((, 502.07, mb)",1,"(330261,)",1,"(island-6030, node)",1,"(', :, '101)",31
923,"(128.16,)",1,"(node, svm)",1,"(502.07, mb, ))",1,"(app-6wn-prod,)",5,"(:, samospebr01.net.sprint)",1,"(:, '101, ')",31
924,"(/tools/jdk/jdk1.8.0_92/bin/java,)",29,"(svm, metadevice)",1,"(., (, 498.57)",1,"(6wn_down,)",5,"(samospebr01.net.sprint, 10.216.15.2)",1,"('101, ', ,)",31
925,"(-dmule.home=/tools/mule/lvo/apigw211_lvo_,)",29,"(metadevice, problem)",1,"((, 498.57, mb)",1,"(6wn_wlnp,)",20,"(10.216.15.2, process)",1,"(', ,, 'units)",40


CPU times: user 2.01 s, sys: 24.4 ms, total: 2.04 s
Wall time: 2.12 s


In [72]:
%%time
#Finding the top k unique values for a categorical attribute 
irrs_dataset.unique(attr_name="CI_APPL_ID", top=3)

Unnamed: 0,0
LVO,30
9SS,29
E1L,29


CPU times: user 5.73 ms, sys: 42 µs, total: 5.77 ms
Wall time: 5.36 ms


In [71]:
%%time
#Perform multivariate analysis
explorer.explore_multivariate(verbose=True,to_excel=True)

  1%|          | 5/625 [00:00<00:13, 46.56it/s]

Starting Multivariate Exploration:

24769


100%|██████████| 625/625 [00:15<00:00, 40.54it/s]

chi_square Analysis






Unnamed: 0,ADDITIONAL_INFORMATION_1,ADDITIONAL_INFORMATION_2,ADDITIONAL_INFORMATION_3,ADDITIONAL_INFORMATION_4,ADDITIONAL_INFORMATION_5,AREA,ASSIGNMENT_GROUP,ASSIGNMENT_GROUP_DIRECTOR,ASSIGNMENT_GROUP_MANAGER,CI_APPL_ID,...,OPEN_GROUP_MANAGER,OUTAGE_DURATION_HOURS,OUTAGE_DURATION_MINUTES,PRIORITY,RETAIL_STORE,SITE_CATEGORY,SOLUTION,STATUS,SUB_AREA,URGENCY
ADDITIONAL_INFORMATION_1,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
ADDITIONAL_INFORMATION_2,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
ADDITIONAL_INFORMATION_3,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
ADDITIONAL_INFORMATION_4,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
ADDITIONAL_INFORMATION_5,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
AREA,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,1.0,1.0,0.47,1.0,0.0,0.0,0.0,0.0,0.02
ASSIGNMENT_GROUP,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
ASSIGNMENT_GROUP_DIRECTOR,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
ASSIGNMENT_GROUP_MANAGER,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
CI_APPL_ID,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,1.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0


CPU times: user 15.3 s, sys: 144 ms, total: 15.4 s
Wall time: 15.5 s


Xpresso.ai provides **Visualization** utilities as well

In [73]:
from xpresso.ai.core.data.visualization.visualize import Visualization

visualize = Visualization(irrs_dataset)

In [74]:
%%time
visualize.render_all(report=True,
                     output_path="./irrs_report/")

  0%|          | 0/31 [00:00<?, ?it/s]


Performing Univariate Report Generation



100%|██████████| 31/31 [00:13<00:00,  2.34it/s]
  0%|          | 0/29 [00:00<?, ?it/s]


Performing Multivariate Report Generation



100%|██████████| 29/29 [00:00<00:00, 55.61it/s]

CPU times: user 23.6 s, sys: 248 ms, total: 23.8 s
Wall time: 32.8 s





In [76]:
%%time
visualize.render_univariate(attr_name="CI_BUILDING",plot_type="bar")

CPU times: user 746 ms, sys: 56 ms, total: 802 ms
Wall time: 818 ms


In [36]:
visualize.render_multivariate(report=False,
                              output_path="./diabetic/")

**STEP 5: Create, test and deploy models**

Xpresso provides a Command Line Interface (CLI) tool called *xprctl* to enable developers to create, build and deploy projects

*Important Commands*

*Login*
xprctl login -w <workspace> -u <uid>
    
*Create a project*
xprctl create_project -f [project definition JSON file]
    
 - Project definition involves architecting the project in terms of components, viz., jobs, services, databases and pipelines. 
 - Xpresso automatically creates Bitbucket repository with a standard folder structure for each component, starter build scripts, Jenkins pipeline, shared folder on the NFS drive and Data repository for project

(Developer clones repository and adds code for model)

*Build a project*
xprct build_project -f [build instructions JSON file]
    
 - Developer can specify components to be built
 - Jenkins pipelines will be run for selected components, resulting in Docker images in the Docker Repository
    

xprct deploy_project -f [deploy instructions JSON file]

 - Deploys the selected project components (jobs, services and database) on Kubernetes within the project namespace
 - Pipelines deployed on KubeFlow
 - Developer can specify the nummber of replicas to be run on Kubernetes