# Map priority levels into a class for each.
This notebook will map each different priority levels in a common class.
The project with 5 different priority levels are selected.

In [1]:
import pandas as pd
# Import dataset from csv file
df = pd.read_csv(f'test_sets_projects.csv')

In [2]:
# rename columns using rename function
df.rename(columns={'fields.priority.name': 'priority', 'fields.description': 'description', 'fields.project.name': 'project', 'fields.issuetype.name': 'issuetype', 'fields.labels': 'labels'}, inplace=True)
df

Unnamed: 0,priority,description,project,labels,issuetype,collection
0,Low,some errors show up as shown in the screenshot...,Sourcetree for Windows,[],Bug,Jira
1,Low,I have been using Sourcetree 3.4.4. We use cu...,Sourcetree for Windows,[],Bug,Jira
2,Low,After installing SourceTree for Windows 10 64b...,Sourcetree for Windows,[],Bug,Jira
3,Low,"On windows, Sourcetree.exe will start ""git.exe...",Sourcetree for Windows,[],Bug,Jira
4,Low,"Hello,\r\n\r\nSourceTree 3.4.7.\r\n\r\nOS: Win...",Sourcetree for Windows,[],Bug,Jira
...,...,...,...,...,...,...
386200,1 - Blocker,I am attempting to to follow the guide found h...,Artifactory Binary Repository,[],Bug,JFrog
386201,4 - Normal,"In binarystore.xml, maxCacheSize is in bytes b...",Artifactory Binary Repository,[],Bug,JFrog
386202,4 - Normal,{color:#000000}We are using an artifact(folder...,Artifactory Binary Repository,[],New Feature,JFrog
386203,4 - Normal,Remote repositories created with the repo name...,Artifactory Binary Repository,[],Bug,JFrog


In [3]:
# drop priority rows with NaN
df = df.dropna(subset=['priority'])
# rest index
df = df.reset_index(drop=True)
df

Unnamed: 0,priority,description,project,labels,issuetype,collection
0,Low,some errors show up as shown in the screenshot...,Sourcetree for Windows,[],Bug,Jira
1,Low,I have been using Sourcetree 3.4.4. We use cu...,Sourcetree for Windows,[],Bug,Jira
2,Low,After installing SourceTree for Windows 10 64b...,Sourcetree for Windows,[],Bug,Jira
3,Low,"On windows, Sourcetree.exe will start ""git.exe...",Sourcetree for Windows,[],Bug,Jira
4,Low,"Hello,\r\n\r\nSourceTree 3.4.7.\r\n\r\nOS: Win...",Sourcetree for Windows,[],Bug,Jira
...,...,...,...,...,...,...
386200,1 - Blocker,I am attempting to to follow the guide found h...,Artifactory Binary Repository,[],Bug,JFrog
386201,4 - Normal,"In binarystore.xml, maxCacheSize is in bytes b...",Artifactory Binary Repository,[],Bug,JFrog
386202,4 - Normal,{color:#000000}We are using an artifact(folder...,Artifactory Binary Repository,[],New Feature,JFrog
386203,4 - Normal,Remote repositories created with the repo name...,Artifactory Binary Repository,[],Bug,JFrog


In [4]:
# count priority
df['priority'].value_counts().to_frame()[:50]


Unnamed: 0_level_0,count
priority,Unnamed: 1_level_1
Major - P3,109573
Low,76547
P2: Important,46926
Medium,43672
P3: Somewhat important,28075
P1: Critical,20750
Minor - P4,12964
4 - Normal,11231
High,10283
P4: Low,7375


In [5]:
# Remove issues with no priority level set.
df = df[df['priority'] != 'Unset']
df = df[df['priority'] != 'TBD']
df = df[df['priority'] != 'Undefined']
df = df[df['priority'] != 'Unprioritized']
df = df[df['priority'] != 'Not Evaluated']
df = df[df['priority'] != 'Unknown']


# rest index
df = df.reset_index(drop=True)
df

Unnamed: 0,priority,description,project,labels,issuetype,collection
0,Low,some errors show up as shown in the screenshot...,Sourcetree for Windows,[],Bug,Jira
1,Low,I have been using Sourcetree 3.4.4. We use cu...,Sourcetree for Windows,[],Bug,Jira
2,Low,After installing SourceTree for Windows 10 64b...,Sourcetree for Windows,[],Bug,Jira
3,Low,"On windows, Sourcetree.exe will start ""git.exe...",Sourcetree for Windows,[],Bug,Jira
4,Low,"Hello,\r\n\r\nSourceTree 3.4.7.\r\n\r\nOS: Win...",Sourcetree for Windows,[],Bug,Jira
...,...,...,...,...,...,...
386200,1 - Blocker,I am attempting to to follow the guide found h...,Artifactory Binary Repository,[],Bug,JFrog
386201,4 - Normal,"In binarystore.xml, maxCacheSize is in bytes b...",Artifactory Binary Repository,[],Bug,JFrog
386202,4 - Normal,{color:#000000}We are using an artifact(folder...,Artifactory Binary Repository,[],New Feature,JFrog
386203,4 - Normal,Remote repositories created with the repo name...,Artifactory Binary Repository,[],Bug,JFrog


In [6]:

df['priority'].value_counts().to_frame()[:50]

Unnamed: 0_level_0,count
priority,Unnamed: 1_level_1
Major - P3,109573
Low,76547
P2: Important,46926
Medium,43672
P3: Somewhat important,28075
P1: Critical,20750
Minor - P4,12964
4 - Normal,11231
High,10283
P4: Low,7375


## Create dataset with projects with 5 different priority levels.
Using definition from Jira docs.
https://support.atlassian.com/jira-service-management-cloud/docs/what-are-priority-levels-in-jira-service-management/

Example:
* 4: Trivial = Lowest
* 3: Minor = Low
* 2: Major  = Medium
* 1: Critical = High
* 0: Blocker = Highest


In [7]:
# Unique collection in the training set
df['collection'].unique()

array(['Jira', 'MongoDB', 'Qt', 'JFrog'], dtype=object)

In [11]:
# Count of priority per collection
df.groupby('collection')['priority'].value_counts().to_frame()[:50]

Unnamed: 0_level_0,Unnamed: 1_level_0,count
collection,priority,Unnamed: 2_level_1
JFrog,4 - Normal,11231
JFrog,3 - High,2509
JFrog,2 - Critical,1145
JFrog,1 - Blocker,498
JFrog,5 - Minor,37
JFrog,6 - Trivial,8
Jira,Low,76547
Jira,Medium,43672
Jira,High,10283
Jira,Highest,5022


In [14]:
priority_mapping = {
    # Highest priority
    'Blocker': 'Highest',
    'P0': 'Highest',
    'Urgent': 'Highest',
    'Showstopper': 'Highest',
    'P0: Blocker': 'Highest', # Note: some project uses P0 as the highest priority level,
    'P1-Urgent': 'Highest', # others use P1 as the highest priority level
    'Highest': 'Highest',
    'Blocker - P1': 'Highest',
    # High
    'High': 'High',
    'Critical': 'High',
    'P1': 'High',
    'Severe': 'High',
    'Critical': 'High',
    'P2-High': 'High',
    'P1: Critical': 'High',
    'Critical - P2': 'High',
    # Medium
    'Medium': 'Medium',
    'Major': 'Medium',
    'P2': 'Medium',
    'P3-Medium': 'Medium', 
    'Important': 'Medium',
    'P2: Important': 'Medium',
    'Major - P3': 'Medium',
    
    # Low
    'Low': 'Low',
    'Minor': 'Low',
    'P3': 'Low',
    'Normal': 'Low',
    'P4-Low': 'Low',
    'P3: Somewhat important': 'Low',
    'Minor - P4': 'Low',
    '4 - Normal': 'Low',
    # Lowest
    'Lowest': 'Lowest',
    'P4': 'Lowest',
    'Trivial': 'Lowest',
    'P5-Trivial': 'Lowest',
    'P4: Low': 'Lowest',
    'Trivial - P5': 'Lowest',
    'P5: Not important': 'Lowest',
    '6 - Trivial': 'Lowest',
    '5 - Minor': 'Lowest',


}
# Apply mapping
df['class'] = df['priority'].map(priority_mapping)


In [15]:
# Find the priorities that are not mapped to 'Highest' by checking for nulls in 'class'
unmapped_priorities = df[df['class'].isnull()]['priority'].unique()

print("Priorities not mapped to a new class:")
for priority in unmapped_priorities:
    print(f"- {priority}")



Priorities not mapped to a new class:
- 2 - Critical
- 3 - High
- 1 - Blocker


In [18]:
# Value counts of class
df['class'].value_counts().to_frame()[:50]

Unnamed: 0_level_0,count
class,Unnamed: 1_level_1
Medium,1118149
Low,299765
High,89629
Highest,61766
Lowest,36706


In [19]:
df

Unnamed: 0,priority,description,project,labels,issuetype,collection,class
0,Blocker,We tried upgrading from Spring Boot 2.0.6 to S...,Spring XD,[],Bug,Spring,Highest
1,Major,The jobs that appear under Executions section ...,Spring XD,[],Bug,Spring,Medium
2,Trivial,Working with Spring-XD version 1.3.2.RELEASE\n...,Spring XD,[],Bug,Spring,Lowest
3,Major,My project 7 node cluster and in that 2 node a...,Spring XD,"['Spring', 'xd']",Bug,Spring,Medium
4,Minor,See https://github.com/spring-projects/spring-...,Spring XD,[],Story,Spring,Low
...,...,...,...,...,...,...,...
2018905,Major,it is very beautiful.,Community Support - Open Source Project Reposi...,[],New Project,Sonatype,Medium
2018906,Major,library,Community Support - Open Source Project Reposi...,[],New Project,Sonatype,Medium
2018907,Major,What is reactive-gremlin\r\n\r\nreactive-greml...,Community Support - Open Source Project Reposi...,[],New Project,Sonatype,Medium
2018908,Major,"Android view for a swipeable, weekly calendar.",Community Support - Open Source Project Reposi...,[],New Project,Sonatype,Medium


In [20]:
# Show count issuetype
df['issuetype'].value_counts().to_frame()[:50]

Unnamed: 0_level_0,count
issuetype,Unnamed: 1_level_1
Bug,759733
Improvement,268685
Task,157186
Sub-task,118987
New Feature,71440
New Project,65487
Feature Request,42457
Story,28386
Enhancement,28333
Test,10147


In [21]:
# Make new df with only issuetype: Technical Debt
technical_debt = df[df['issuetype'] == 'Technical Debt']
# To csv
technical_debt.to_csv('priority_with_TD.csv', index=False)
technical_debt

Unnamed: 0,priority,description,project,labels,issuetype,collection,class
457429,Major,Currently SourceClear scan for UPM is defined ...,Universal Plugin Manager,[],Technical Debt,JiraEcosystem,Medium
471740,Minor,Consider moving away from Promise and start us...,JIRA REST Java Client Library,[],Technical Debt,JiraEcosystem,Low
494010,Blocker,We are waiting for Jira to unlocked after Impo...,Atlassian Connect in Jira Cloud,[],Technical Debt,JiraEcosystem,Highest
494190,Major,We are creating a SAML 2.0 plugin that will be...,Atlassian Connect in Jira Cloud,[],Technical Debt,JiraEcosystem,Medium
494422,Major,This issue was raised following comments on th...,Atlassian Connect in Jira Cloud,[],Technical Debt,JiraEcosystem,Medium
...,...,...,...,...,...,...,...
1967155,Minor,Load a stock Nexus and analyze the Central rep...,Dev - Nexus Repo,[],Technical Debt,Sonatype,Low
1967252,Major,"When an exception is thrown inside nexus, ther...",Dev - Nexus Repo,[],Technical Debt,Sonatype,Medium
1967283,Major,Decouple shadow repository actions from their ...,Dev - Nexus Repo,[],Technical Debt,Sonatype,Medium
1967321,Major,,Dev - Nexus Repo,[],Technical Debt,Sonatype,Medium


In [22]:
technical_debt['class'].value_counts().to_frame()[:50]

Unnamed: 0_level_0,count
class,Unnamed: 1_level_1
Medium,115
Low,68
Highest,12
High,10
Lowest,3


In [23]:
# Remove issues with Technical Debt issue type
df = df[df['issuetype'] != 'Technical Debt']
df = df.reset_index(drop=True)
df

Unnamed: 0,priority,description,project,labels,issuetype,collection,class
0,Blocker,We tried upgrading from Spring Boot 2.0.6 to S...,Spring XD,[],Bug,Spring,Highest
1,Major,The jobs that appear under Executions section ...,Spring XD,[],Bug,Spring,Medium
2,Trivial,Working with Spring-XD version 1.3.2.RELEASE\n...,Spring XD,[],Bug,Spring,Lowest
3,Major,My project 7 node cluster and in that 2 node a...,Spring XD,"['Spring', 'xd']",Bug,Spring,Medium
4,Minor,See https://github.com/spring-projects/spring-...,Spring XD,[],Story,Spring,Low
...,...,...,...,...,...,...,...
1611180,Major,it is very beautiful.,Community Support - Open Source Project Reposi...,[],New Project,Sonatype,Medium
1611181,Major,library,Community Support - Open Source Project Reposi...,[],New Project,Sonatype,Medium
1611182,Major,What is reactive-gremlin\r\n\r\nreactive-greml...,Community Support - Open Source Project Reposi...,[],New Project,Sonatype,Medium
1611183,Major,"Android view for a swipeable, weekly calendar.",Community Support - Open Source Project Reposi...,[],New Project,Sonatype,Medium


In [24]:
df["class"].value_counts().to_frame()[:50]

Unnamed: 0_level_0,count
class,Unnamed: 1_level_1
Medium,1118034
Low,299697
High,89619
Highest,61754
Lowest,36703


In [25]:
# Save to csv
df.to_csv('all_priority_group_in_classes.csv', index=False)

In [26]:
# Read csv to check if file is saved correctly
df = pd.read_csv('all_priority_group_in_classes.csv')

In [27]:
import os
priority_levels = ['Highest', 'High', 'Medium', 'Low', 'Lowest']

# Saves each class to a separate csv file
for level in priority_levels:
    try:
        # Make dir with level
        os.makedirs(f'{level}', exist_ok=True)
        # df with level class
        df_level = df[df['class'] == level]
        # Save to csv
        df_level.to_csv(f'{level}/{level}.csv', index=False)
        print(f"Saved {level}.csv")
    except Exception as e:
        print(f"An error occurred for level {level}: {str(e)}")


Saved Highest.csv
Saved High.csv
Saved Medium.csv
Saved Low.csv
Saved Lowest.csv


In [28]:
# Read csv to check if file is saved correctly
for level in priority_levels:
    try:
        df = pd.read_csv(f'{level}/{level}.csv')
        print(f"Read {level}.csv")
    except Exception as e:
        print(f"An error occurred while reading {level}.csv: {str(e)}")


Read Highest.csv
Read High.csv
Read Medium.csv
Read Low.csv
Read Lowest.csv


## Definition of each priority level according to the atlassian documentation.
* Lowest - Trivial problem with little or no impact on progress. Color: Light grey.
* Low - Minor problem or easily worked around. Color: Dark grey.
* Medium - Has the potential to affect progress. Color: Yellow.
* High - Serious problem that could block progress. Color: Orange.
* Highest - The problem will block progress. Color: A dark red.

#### Compared to github

* High == High and highest 
* medium == medium
* low = low and lowest

# Ranking of priority levels.
Most commonly used priority levels in this dataset.
Using 4-6 levels of priority.
5 levels is most common.
From low to high
## 5 priority levels
### Sonatype, MongoDB, Apache, RedHat, Spring, Sakai(not any trivial), JiraEcosytem, Sonatype
* Trivial
* Minor
* Major
* Critical
* Blocker
### Apache
* P4
* P3
* P2
* P1
* P0

### Hyperledger, Mindville
* Lowest
* Low 
* Medium
* High
* Highest

### RedHat
* Low
* Normal
* Medium
* High
* Urgent

### IntelDOAS:
* P5-Trivial
* P4-Low
* P3-Medium
* P2-High
* P1-Urgent

### SecondLife
* Trivial
* Minor
* Major
* Severe
* Showstopper

### Mojang
* Low
* Normal
* Important
* Critical
* Blocker

## 6 levels
### QT
* P5: Not important (removing this)
* P4: Low
* P3: Somewhat important
* P2: Important
* P1: Critical
* P0: Blocker

### JFrog
* Trivial
* Minor
* Normal
* High
* Critical
* Blocker


## 4 levels
### Apache
* Low
* Normal
* High
* Urgent

### Jira (the org)
* Low
* Medium
* High
* Highest
### Mindville (few issues) can ignore.
* Level 4
* Level 3
* Level 2
* Level 1