# Jira Ticketing Process App - Tutorial

This process app requires a Jira account.

We are going to use the following packages:
- request to query data using Jira REST API
- pandas to manipulate data frames -- tables to build the event log
- json to manipulate query configurations
- datetime to maniputate and transform dates

In [None]:
import requests
from requests.auth import HTTPBasicAuth
import json
import pandas as pd
import datetime as dt
from datetime import datetime


## Simple Process App Python Code
Upon start, the empty python code looks like this:
- The execute() function the function called by IBM Process Mining when the process app is executed.
- The main() function is not used by IBM Process Mining, but is very useful to develop and debug the process app as a standalone python program. Or when you want to solely execute the extractor from any machine, to generate an event log as a CSV file. 

The execute() function is called by IBM Process Mining with context, a JSON dictionary. 
- config: passes the user inputs of the process app
- fileUploadName: refers to the ZIP file name that can be uploaded by the user, when the data source is passed as a series of files compressed in this ZIP file.

In this short example, retrieve the user inputs of the process app, and we create an array of 4 events that is used to create the data frame that the execute() function returns. 
The user inputs are defined in the process app builder, and can be either mandatory or optional.

The main() function creates the context that is normally passed by IBM Process Mining, run the execute() function, and generate a CSV file from the dataframe returned by the execute() function.


In [None]:
import pandas as pd
def execute(context):
    my_config = context['config']
    url = my_config['url']
    account = my_config['account']

    aFewEvents = [
        {'processid':'p1', 'activity':'analyze request', 'startdate':'2023-01-01'},
        {'processid':'p1', 'activity':'approve request', 'startdate':'2023-01-02'},
        {'processid':'p2', 'activity':'approve request', 'startdate':'2023-01-02'},
        {'processid':'p2', 'activity':'reject request', 'startdate':'2023-01-04'},
    ]
    df = pd.DataFrame(aFewEvents)
    return df

if __name__ == "__main__":
    context = {'config': {'url':'https://aURL.com', 'account':'myaccount'}}
    df = execute(context)
    df.to_csv('eventlog.csv', index=None)

## Jira Issue Table and Change History Table
When creating a process app, we need to understand the location and the schema of the original data that we want to transform.

Jira tickets are called issues.

JIRA issues are stored in a JIRA table that can be accessed through a JIRA REST API.
Each issue is composed of many common data like creation date, creator, project, etc. It also includes a long series of custom fields that each company can use as desired.
Since this connector is generic, we will ignore the custom fields, but they could be easily added.

For more information about the issue table: https://developer.atlassian.com/server/jira/platform/database-issue-fields/
In this connector, we are not going to directly query the table. We will get the data via REST APIs.

The issue table is not enough to recreate the issue lifecycle. We need to access the changelog table that keeps track of all the changes. All the issue fields can be changed, and the former and new value are kept in the change history table.
For process mining, we considere that the interesting changes are when the issue status changes, and when an assignee changes. We could track other changes like priority changes, etc.
For more information about the change history table: https://developer.atlassian.com/server/jira/platform/database-change-history/

## Jira REST APIs, Accounts and API Tokens
For more documentation about Jira REST APIs : https://developer.atlassian.com/server/jira/platform/rest-apis/

Dependending on the Jira version you will be connected to, you can use v2 or v3. The APIs used in this connector are identical. https://developer.atlassian.com/cloud/jira/platform/rest/v3/intro/#about

You need a Jira account, and an API token to call the Jira REST API. Follow the instructions from this documentation:
https://support.atlassian.com/atlassian-account/docs/manage-api-tokens-for-your-atlassian-account/

To execute this tutorial, copy [./my_config_template.json](./my_config_template.json) as ./my_config.json, and replace <YOUR JIRA>, <YOUR ACCOUNT>, and <YOUR TOKEN> with your values.


## Getting Issues with Jira REST APIs
### Jira Projects and Issues
An issue belongs to a project. We need to specify the projects from which we want to retrieve the issues, or we can retrieve the issues from all the projects.

When we mine issues from all the projects, we can discard projects that do not have enough issues, that could be testing or experimental projects. 

The code below requests the Jira projects

In [None]:
import requests
from requests.auth import HTTPBasicAuth
import json
import pandas as pd
import datetime as dt
from datetime import datetime

def ws_get_projects(config):

    auth = HTTPBasicAuth(config['user'],config['token'])

    headers = {
    "Accept": "application/json"
    }
    url = config['url']+'project'
    response = requests.request(
        "GET",
        url,
        headers=headers,
        auth=auth)
    if response.status_code == 200:
        return json.loads(response.text)
    else: 
        print("error get project %s " % response.status_code)
        return {'issues':[]}

# Test the function
if __name__ == "__main__":

    # load the 'my_config.json' that includes your configuration
    try: 
        f = open('./my_config.json')
        my_config = json.load(f)
    except Exception as e:
        print("*** WARNING: %s. Create ./my_config.json file from ./my_config_template.json" % e)

projects = ws_get_projects(my_config)
print(" %s projects found" % len(projects))
for project in projects:
    print(project['key'])

### Count the number of issues for each project
For each project, we would like to count the number of issues. As mentioned above, we want to discard the projects that have too few issues, that could be testing or experimental projects.

We also want the process app to be able to scope the issues to an optional time period. The time period is set with two user inputs: from_date and to_date.

ws_count_tickets applies some tricks: this REST API call can return all the issues from a project. At that stage, we don't need to receive all the issues and their details because we will discard some projects.

Jira supports paging, which means that we can specify the number of issues we want to retrieve, and the API returns the total number of issues that the project contains. This way, we can retrieve for example 100 issues, and call again the same API starting at issue 100, etc until we have retrieved all the issues.
- mawResults : number of issues retrieved
- startAt: index of the issue for which we start the retrieval
- total: field returned that contains the total number of issues.

In ws_count_tickets, we just want to get 1 issue per project (maxResults=1), and we check the total field.

In [None]:
# Utility function to count the number of issues per project
def ws_count_tickets(config):
    auth = HTTPBasicAuth(config['user'],config['token'])

    query = 'project = ' + config['project_key']
    if 'from_date' in config:
        query = query + ' AND created >= ' + config['from_date']
    if 'to_date' in config:
        query = query + ' AND created <= ' + config['to_date']

    params = {
    'jql': query,
    'fields':'created',
    'maxResults' : 1,
    'startAt': 0
    }

    headers = {
    "Accept": "application/json"
    }
    url = config['url']+'search'
    response = requests.request(
        "GET",
        url,
        headers=headers,
        params=params,
        auth=auth)
    if response.status_code == 200:
        return json.loads(response.text)
    else: 
        print("error count ticket %s " % response.status_code)
        return None
    
# Test the function
if __name__ == "__main__":

    # load the 'my_config.json' that includes your configuration
    try: 
        f = open('./my_config.json')
        my_config = json.load(f)
    except Exception as e:
        print("*** WARNING: %s. Create ./my_config.json file from ./my_config_template.json" % e)

projects = ws_get_projects(my_config)
print(" %s projects found" % len(projects))
for project in projects:
    my_config['project_key']=project['key']
    result = ws_count_tickets(my_config)
    print('project: %s : %s issues' % (project['key'], result['total']))

### Get the issues
We need a function that returns the ticket details that we need, and the history of the ticket.

The REST API that can query the issues of a project by using sql.

Note that we do not need to make another query for the historical changes, this API enables getting the changelog of each issue from the same call.

We don't need all the fields potentially available for an issue, as there are a lot of custom fields or fields that are not interesting for process mining. The standard_fields variable lists the fields that we want to retrieve, feel free to add or remove some.

Finally, we need to use the paging capability. Our configuration file limits the number of issues retrieved to 100. We will thus loop again and again until we have retrieved all the issues from the project.

In [None]:
# Standard fields that can be used in all JIRA/environements
standard_fields = 'created,creator,project,resolution,resolutiondate,updated,duedate,timespent,timeestimate,timeoriginalestimate,status,issuetype,reporter,priority,assignee'


def ws_get_tickets(config):
    auth = HTTPBasicAuth(config['user'],config['token'])

    query = 'project = ' + config['project_key']
    if 'from_date' in config:
        query = query + ' AND created >= ' + config['from_date']
    if 'to_date' in config:
        query = query + ' AND created <= ' + config['to_date']
    params = {
    'jql': query,
    'fields': standard_fields,
    'expand' : 'changelog',
    'maxResults' : config['maxResults'],
    'startAt': config['startAt']
    }

    headers = {
    "Accept": "application/json"
    }
    url = config['url']+'search'
    response = requests.request(
        "GET",
        url,
        headers=headers,
        params=params,
        auth=auth)
    if response.status_code == 200:
        return json.loads(response.text)
    else: 
        print("error get ticket %s " % response.status_code)
        return None
    
# Test the function
if __name__ == "__main__":

    # load the 'my_config.json' that includes your configuration
    try: 
        f = open('./my_config.json')
        my_config = json.load(f)
    except Exception as e:
        print("*** WARNING: %s. Create ./my_config.json file from ./my_config_template.json" % e)

projects = ws_get_projects(my_config)
print(" %s projects found" % len(projects))
# Test the function with the first project
# For this test, we limit the number of issues to 3 and we won't loop to get all of them
project = projects[0]
my_config['maxResults'] = 3
my_config['project_key']=project['key']
result = ws_get_tickets(my_config)
print('project: %s : %s issues' % (project['key'], result['total']))
result['issues']

## Transforming Jira data into a Process Mining event log
We have seen the main Jira REST API calls that we are going to use in this connector.

Let's now focus on what we are doing with the data returned from Jira.

### Selecting Process Mining relevant data for each ticket
We need a function that extracts the data we want from an issue, and that returns a JSON object for each ticket.

The JSON object ticket list will be used to create a Pandas dataframe (a table), that we will use to create our event log.

Each issue is identified with a key that we keep as a ticket_id.

The issue fields contains the data we need. Fields can be values (ex: issue['created'] =  creation date), or can be JSON objects (ex: issue['type'], from which we only need the name issue['type']['name'] ).
Some fields might be empty, we need to check before addressing their keys.


In [None]:
def create_ticket(issue):
    ticket = { }
    ticket['ticket_id'] = issue['key']
    
    issue_fields = issue['fields']
    ticket['created'] = issue_fields['created']
    # We could anonymize the name (same for the changelog)
    # use creator.displayName for full name or creator.key. creator.key often contains the full name too...
    ticket['creator'] = issue_fields['creator']['displayName']
    ticket['resolutiondate'] = issue_fields['resolutiondate']
    ticket['duedate'] = issue_fields['duedate']  
    ticket['timespent'] = issue_fields['timespent']
    ticket['timeestimate'] = issue_fields['timeestimate']
    ticket['timeoriginalestimate'] = issue_fields['timeoriginalestimate']
    ticket['project_key'] = issue_fields['project']['key']



    # Fields that are dicts. Pick a value in the dict
    if 'issuetype' in issue_fields: 
        ticket['type'] = issue_fields['issuetype']['name']
    else:
        ticket['type'] = None
    if issue_fields['resolution']:
        ticket['resolution'] = issue_fields['resolution']['name']
    else: ticket['resolution'] = None 
    if issue_fields['reporter']:
        ticket['reporter'] = issue_fields['reporter']['displayName']
    else: ticket['reporter'] = None 
    if issue_fields['priority']:
        ticket['priority'] = issue_fields['priority']['name']
    else: ticket['priority'] = None 
    if issue_fields['assignee']:
        ticket['assignee'] = issue_fields['assignee']['displayName']
    else: ticket['assignee'] = None 

    return(ticket)

# Test the function
if __name__ == "__main__":

    # load the 'my_config.json' that includes your configuration
    try: 
        f = open('./my_config.json')
        my_config = json.load(f)
    except Exception as e:
        print("*** WARNING: %s. Create ./my_config.json file from ./my_config_template.json" % e)

projects = ws_get_projects(my_config)
print(" %s projects found" % len(projects))

# Test the function with the first project
# For this test, we limit the number of issues to 10 and we won't loop to get all of them
# Change the project ID until you find a project with issues that contain a changelog
project = projects[2]
my_config['maxResults'] = 10
my_config['project_key']=project['key']
result = ws_get_tickets(my_config)

print('project: %s : %s issues' % (project['key'], result['total']))
issues = result['issues']
# Store each ticket in the ticket_list
all_tickets = []
for issue in issues:
    ticket = create_ticket(issue)
    all_tickets.append(ticket)

all_tickets_df = pd.DataFrame(all_tickets)
all_tickets_df

### Processing the Change log
Each issue returned by the REST API can contain a changelog.

The changelog can be a complex array that contains several changes at several dates.

We want to create a list of change logs that we index with the ticket_id.

WARNING: some projects do not store the changelog. Try several projects until one returns non empty change logs

Each change log contains a histories field that contains one or several changes (see maxResults and total).


In [None]:
if __name__ == "__main__":

    # load the 'my_config.json' that includes your configuration
    try: 
        f = open('./my_config.json')
        my_config = json.load(f)
    except Exception as e:
        print("*** WARNING: %s. Create ./my_config.json file from ./my_config_template.json" % e)

projects = ws_get_projects(my_config)
print(" %s projects found" % len(projects))

# Test the function with the first project
# For this test, we limit the number of issues to 10 and we won't loop to get all of them

# Change the project ID until you find a project with issues that contain a changelog
project = projects[2]
my_config['maxResults'] = 10
my_config['project_key']=project['key']
result = ws_get_tickets(my_config)

print('project: %s : %s issues' % (project['key'], result['total']))
issues = result['issues']

# Store all the changelog in the changelog_list
changelog_list = []
for issue in issues:
    changelog = issue['changelog']
    changelog['ticket_id'] =  issue['key']
    # We could add other case-level data
    changelog_list.append(issue['changelog'])

changelog_list

### Processing the change log 2/2
We process the changelog_list to extract each individual change log. 
Then we create a change log JSON object that we store in a list that will be use to create a Pandas dataframe (table).
We don't need to keep all the changes, we only create a change log object when the status or the assignee is changed.

In [None]:
def create_ticket_changelog(ticket_id, author, datestr, item):
    # Only keep changes of status and assignee (you can change this)
    field_changes_to_keep = ['status', 'assignee']
    if item['field'] not in field_changes_to_keep:
        return 0

    ticket_change = {}
    ticket_change['ticket_id'] = ticket_id
    ticket_change['author'] = author
    ticket_change['created'] = datestr
    ticket_change['field'] =  item['field']
    ticket_change['from'] = item['from']
    ticket_change['fromString'] = item['fromString']
    ticket_change['to'] = item['to']
    ticket_change['toString'] = item['toString']
    return ticket_change

ticket_changes_list =  []
for changelog in changelog_list:
    # there could be several histories in each change log (histories==values when calling the {ticketid}/changelog API)
    for history in changelog['histories']:
        # there could be several item changes in each history
        for item in history['items']:
            ticket_changelog = create_ticket_changelog(changelog['ticket_id'], history['author']['key'], history['created'], item)
            if ticket_changelog :
                ticket_changes_list.append(ticket_changelog)

print("Total number of changelogs: %s" % len(ticket_changes_list))
all_ticket_changes_df = pd.DataFrame(ticket_changes_list)
all_ticket_changes_df

## Using Pandas to create an event log from the ticket and the change log data frames
### Ticket Creation events
The Ticket Created events are created from the tickets data frame.

In [None]:
# Ticket creation events
ticket_created_df = all_tickets_df.copy()
# Add a field 'activity'
ticket_created_df['activity'] = 'Ticket Created'
# Replace the fieldname creator by user
ticket_created_df.rename(columns={'creator':'user'}, inplace=True)
# Add a field 'start_date' from the field 'created' (we could have renamed it)
ticket_created_df['start_date'] = ticket_created_df['created']
# Re-order the fields
ticket_created_df = ticket_created_df[['ticket_id','activity','start_date','user', 'project_key','type', 'priority', 'resolutiondate', 'duedate',
    'timespent', 'timeestimate', 'timeoriginalestimate', 'resolution', 'reporter', 'assignee']]
print("Total number of tickets created: %s" % len(ticket_created_df))
ticket_created_df

### Status changes events
For process mining we are interested in tracking all the ticket status changes. The changelog provides this information.
Each time the field 'status' is changed, we create an activity named 'Ticket ' to which we happend the name of the new status. We also store the date and the author of the change 

In [None]:
# Create an event log from the ticket status changes
status_changes_df = all_ticket_changes_df[all_ticket_changes_df['field']=='status'].copy()
status_changes_df['activity'] = 'Ticket ' + status_changes_df['toString']
status_changes_df.rename(columns={'author':'user'}, inplace=True)
status_changes_df['start_date'] = status_changes_df['created']
status_changes_df = status_changes_df[['ticket_id', 'activity', 'start_date','user']]
print("Total number of status changes in changelog: %s" % len(status_changes_df))
status_changes_df

### Assignee changes events
Changing the ticket assignee is potentially an important event for process mining, we create a table of such changes.

In [None]:
# Create event log from ticket assignee changes
assignee_changes_df = all_ticket_changes_df[all_ticket_changes_df['field']=='assignee'].copy()
assignee_changes_df['activity'] = 'Ticket Assigned'
assignee_changes_df['start_date'] = assignee_changes_df['created']
assignee_changes_df.rename(columns={'author':'user', 'toString':'assignee'}, inplace=True)
assignee_changes_df = assignee_changes_df[['ticket_id', 'activity', 'start_date','user','assignee']]
print("Total number of assignee changes in changelog: %s" % len( assignee_changes_df))
assignee_changes_df

### Final Process Mining Event Log
For this connector, we only need to concatenate the 3 dataframes to create our final event log.

To simplify the mapping in IBM Process Mining, we reformat the Jira dates into ISO dates such that they are easily detected during the mapping.

In [None]:
def XformToIso(d):
    if d == '':
        return ''
    else:
        aDate = datetime.strptime(d,'%Y-%m-%dT%H:%M:%S.%f%z')
    return  aDate.isoformat(sep = 'T', timespec = 'milliseconds')

# CREATE THE FINAL EVENT LOG
eventlog_df = pd.concat([ticket_created_df, assignee_changes_df, status_changes_df])
# Replace NaN by blank ''
eventlog_df.fillna('', inplace=True)
# Change the date format such that it is automatically understood in Process Mining
eventlog_df['start_date'] = eventlog_df['start_date'].apply(XformToIso)
eventlog_df['resolutiondate'] = eventlog_df['resolutiondate'].apply(XformToIso)
eventlog_df.fillna('', inplace=True)
# force the type to strings
eventlog_df.astype(str)
eventlog_df

## Conclusion

In this tutorial, we have covered the principal steps required to get data from a Jira server, to select the data we want to keep in process mining, to transform these data into IBM Process Mining compatible format, and to eventually generate the event log as a Pandas dataframe.

The complete python program that you upload in the Process App is located [here](./JiraConnector.py).