## Reminder
Please use Python 3.10 or Python 3.9 to run the code. Otherwise the pd.json_normalize function may not work

## Data collection idea
* Use "Board" dataset as a bridge to link both "Sprint" and "Issue" URL and data table.
* For spint and issues table, attach create a new column and store board id in it, so that we can know which board does the data comes from.

### Steps: 
#### Board: 
1. Retrieve board from API URL: {Base URL of your project}/rest/agile/1.0/board?projectKeyOrId={projectKey}
* Base URL: check the excel file on OneDrive
* ProjectKey: Search the project name on corresponding platform (RedHat or Apache) -> Under the "Project" category
2. Request data by using GET method and get the json output.
3. Transform data from JSON format to the pandas dataframe.
4. Filter the board type (type == 'scrum') and reset the index
5. Store board_id for future analysis
6. Save as csv file so that we don't have to rerun the code every time.

#### Sprint: 
1. Loop pre-stored board_id to get all sprint and store them into an empty list (as computation cost of expand a cost is much smaller than expand a pandas dataframe).
2. Normalize the list and we will get a dataframe (called df_initalResult) whose records are all nested.
3. For each row, we want to flatten the data into a normal data record type. Therefore, we create a new empty dataframe (df_finalResult) first then loop through the "df_initalResult" by using its index and save each records in a temp dataframe (df_temp). Then concat df_finalResult and df_temp, after which, we can get a complete datasets that contains all records.
4. Save as csv file.

#### Issues:
1. Similar as Sprint method. The thing is that we only want the issues that have story points, which is stored in customer fields (check column name in excel file). Therefore, after we getting all issues, we need to filter out the issues that have story points.
2. Save as csv file.

In [1]:
import requests
from requests.auth import HTTPBasicAuth
import json
import base64
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings("ignore")

## Your Project Name

In [2]:
#For RedHat, just replace your token and keep string "Bearer"
auth = "Bearer MTgyMzA3NDEyMjM4OsoPSl9xCrK2UckJ1KRM0wzOefbD"

### Board

In [3]:
# Replace BaseURL and project key with your own project
board_url = 'https://issues.apache.org/jira/rest/agile/1.0/board?projectKeyOrId=AURORA'

headers = {
   "Content-Type": "application/json",
   "Authorization": auth,  
}

payload = {
   "maxResults":50 #100, 20
}

# Request data by using GET method and directly get the json output
board = requests.request("GET",board_url,headers=headers, params=payload).json()

# As we only intreseted in the "values" in the board, so we only normalize the board['values']
board_df = pd.json_normalize(board['values'])

# Filter the scrum board type and maintain the pd dataframe formate
board_df = board_df[board_df['type']=='scrum']

# Reset the index for the future convience.
board_df = board_df.reset_index()

# Store the board id into variable board_id
board_id = board_df['id']

#Save the df as csv file. Use your project name to define the file name
board_df.to_csv('Aurora_board.csv') 

#### Sprint

In [4]:
headers = {
   "Content-Type": "application/json",
   "Authorization": auth,  
}


payload_sprint = {
'maxResults':2000  #Retrieve at most 2000 records, can be modified based on your own project
}

list1 = [] 

# Loop through the required board, and store the records into the empty list1
for i in board_id:
   sprint_url = 'https://issues.apache.org/jira/rest/agile/1.0/board/' + f'{i}' + '/sprint'
   sprint = requests.request("GET",sprint_url ,headers=headers, params=payload_sprint).json()
   try:
      list1.append(sprint['values'])
   except:
      pass

In [5]:
# Normalize the list and get the inital dataframe, whose record are nested.
sprint_df = pd.json_normalize(list1)
sprint_df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,30,31,32,33,34,35,36,37,38,39
0,"{'id': 7, 'self': 'https://issues.apache.org/j...","{'id': 11, 'self': 'https://issues.apache.org/...","{'id': 13, 'self': 'https://issues.apache.org/...","{'id': 15, 'self': 'https://issues.apache.org/...","{'id': 23, 'self': 'https://issues.apache.org/...","{'id': 26, 'self': 'https://issues.apache.org/...","{'id': 39, 'self': 'https://issues.apache.org/...","{'id': 42, 'self': 'https://issues.apache.org/...","{'id': 46, 'self': 'https://issues.apache.org/...","{'id': 52, 'self': 'https://issues.apache.org/...",...,"{'id': 190, 'self': 'https://issues.apache.org...","{'id': 191, 'self': 'https://issues.apache.org...","{'id': 197, 'self': 'https://issues.apache.org...","{'id': 203, 'self': 'https://issues.apache.org...","{'id': 211, 'self': 'https://issues.apache.org...","{'id': 212, 'self': 'https://issues.apache.org...","{'id': 220, 'self': 'https://issues.apache.org...","{'id': 223, 'self': 'https://issues.apache.org...","{'id': 243, 'self': 'https://issues.apache.org...","{'id': 250, 'self': 'https://issues.apache.org..."


In [6]:
#Create an empty dataframe
sprint_result = pd.DataFrame()
sprint_result.head()

# sprint_df.index is the row number. For each row, we need to flatten the records and store them into a dataframe. Our goal is to flatten all these record and generate a single dataframe.
for i in sprint_df.index:
   # Flatten records in a row based on row index
   sprint_df_1 = sprint_df.iloc[i].apply(pd.Series) 
   # Attach corresponding board_id
   sprint_df_1['board_id'] = board_df['id'][i]
   # Create a list called "frame" to tell python we want to contact two dataframs together -- sprint_result and sprint_df_1
   frame = [sprint_result, sprint_df_1]
   #Contact two dataframes
   sprint_result = pd.concat(frame)

In [7]:
# Filter the sprint that totally empty
sprint_result = sprint_result[sprint_result['id'].notna()]
sprint_result.head()

Unnamed: 0,id,self,state,name,startDate,endDate,completeDate,activatedDate,originBoardId,board_id
0,7,https://issues.apache.org/jira/rest/agile/1.0/...,closed,Q2 Sprint 1,2014-05-12T21:26:22.322Z,2014-05-17T00:00:00.000Z,2014-05-19T19:11:51.658Z,2014-05-12T21:26:22.322Z,37,37
1,11,https://issues.apache.org/jira/rest/agile/1.0/...,closed,Q2 Sprint 2,2014-05-19T19:00:10.084Z,2014-05-24T00:00:00.000Z,2014-06-02T19:39:27.844Z,2014-05-19T19:00:10.084Z,37,37
2,13,https://issues.apache.org/jira/rest/agile/1.0/...,closed,Q2 Sprint 3,2014-05-31T00:00:30.253Z,2014-06-12T00:00:00.000Z,2014-06-13T23:05:05.267Z,2014-05-31T00:00:30.253Z,35,37
3,15,https://issues.apache.org/jira/rest/agile/1.0/...,closed,Q2 Sprint 3,2014-06-02T19:00:46.818Z,2014-06-07T00:00:00.000Z,2014-07-14T16:51:27.048Z,2014-06-02T19:00:46.818Z,37,37
4,23,https://issues.apache.org/jira/rest/agile/1.0/...,closed,Q3 Sprint 1,2014-07-15T05:43:31.920Z,2014-07-19T00:00:00.000Z,2014-07-25T16:28:42.789Z,2014-07-15T05:43:31.920Z,37,37


In [8]:
# Reset the index
sprint_result = sprint_result.reset_index()
# Drop the origianl index column
sprint_result = sprint_result.drop(columns=['index'])
sprint_result.head()

Unnamed: 0,id,self,state,name,startDate,endDate,completeDate,activatedDate,originBoardId,board_id
0,7,https://issues.apache.org/jira/rest/agile/1.0/...,closed,Q2 Sprint 1,2014-05-12T21:26:22.322Z,2014-05-17T00:00:00.000Z,2014-05-19T19:11:51.658Z,2014-05-12T21:26:22.322Z,37,37
1,11,https://issues.apache.org/jira/rest/agile/1.0/...,closed,Q2 Sprint 2,2014-05-19T19:00:10.084Z,2014-05-24T00:00:00.000Z,2014-06-02T19:39:27.844Z,2014-05-19T19:00:10.084Z,37,37
2,13,https://issues.apache.org/jira/rest/agile/1.0/...,closed,Q2 Sprint 3,2014-05-31T00:00:30.253Z,2014-06-12T00:00:00.000Z,2014-06-13T23:05:05.267Z,2014-05-31T00:00:30.253Z,35,37
3,15,https://issues.apache.org/jira/rest/agile/1.0/...,closed,Q2 Sprint 3,2014-06-02T19:00:46.818Z,2014-06-07T00:00:00.000Z,2014-07-14T16:51:27.048Z,2014-06-02T19:00:46.818Z,37,37
4,23,https://issues.apache.org/jira/rest/agile/1.0/...,closed,Q3 Sprint 1,2014-07-15T05:43:31.920Z,2014-07-19T00:00:00.000Z,2014-07-25T16:28:42.789Z,2014-07-15T05:43:31.920Z,37,37


In [9]:
# Save as csv file
sprint_result.to_csv('Aurora_sprint.csv')

#### Issues


In [10]:
headers = {
   "Content-Type": "application/json",
   "Authorization": auth,  
}

payload_issues = {
'maxResults':3000
}

list1 = []
# Loop through the required board, and store the records into the empty list1
for i in board_id:
   issues_url = 'https://issues.apache.org/jira/rest/agile/1.0/board/' + f'{i}' + '/issue'
   issues = requests.request("GET",issues_url ,headers=headers, params=payload_issues).json()
   try:
      list1.append(issues['issues'])
   except:
      pass

In [11]:
# Normalize the list and get the inital dataframe, whose record are nested.
issue_df = pd.json_normalize(list1)
issue_df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,990,991,992,993,994,995,996,997,998,999
0,"{'expand': 'operations,versionedRepresentation...","{'expand': 'operations,versionedRepresentation...","{'expand': 'operations,versionedRepresentation...","{'expand': 'operations,versionedRepresentation...","{'expand': 'operations,versionedRepresentation...","{'expand': 'operations,versionedRepresentation...","{'expand': 'operations,versionedRepresentation...","{'expand': 'operations,versionedRepresentation...","{'expand': 'operations,versionedRepresentation...","{'expand': 'operations,versionedRepresentation...",...,"{'expand': 'operations,versionedRepresentation...","{'expand': 'operations,versionedRepresentation...","{'expand': 'operations,versionedRepresentation...","{'expand': 'operations,versionedRepresentation...","{'expand': 'operations,versionedRepresentation...","{'expand': 'operations,versionedRepresentation...","{'expand': 'operations,versionedRepresentation...","{'expand': 'operations,versionedRepresentation...","{'expand': 'operations,versionedRepresentation...","{'expand': 'operations,versionedRepresentation..."


In [12]:
#Check a single record content
issue_df.iloc[0][0]

{'expand': 'operations,versionedRepresentations,editmeta,changelog,renderedFields',
 'id': '12992393',
 'self': 'https://issues.apache.org/jira/rest/agile/1.0/issue/12992393',
 'key': 'AURORA-1740',
 'fields.fixVersions': [{'self': 'https://issues.apache.org/jira/rest/api/2/version/12335572',
   'id': '12335572',
   'name': '0.16.0',
   'archived': False,
   'released': True,
   'releaseDate': '2016-09-28'}],
 'fields.resolution.self': 'https://issues.apache.org/jira/rest/api/2/resolution/1',
 'fields.resolution.id': '1',
 'fields.resolution.description': 'A fix for this issue is checked into the tree and tested.',
 'fields.resolution.name': 'Fixed',
 'fields.customfield_12312322': None,
 'fields.customfield_12312323': None,
 'fields.customfield_12310420': '9223372036854775807',
 'fields.customfield_12312320': None,
 'fields.customfield_12312321': None,
 'fields.customfield_12312328': None,
 'fields.customfield_12312329': None,
 'fields.customfield_12312326': None,
 'fields.customfield

In [13]:
issue_result = pd.DataFrame()

for i in issue_df.index:
   issue_df_1 = issue_df.iloc[i].apply(pd.Series)
   issue_df_1['board_id'] = board_df['id'][i]
   frame = [issue_result, issue_df_1]
   issue_result = pd.concat(frame)

issue_result = issue_result.reset_index()
issue_result = issue_result.drop(columns=['index'])

In [15]:
#Only select issues that have story points
issue_result = issue_result[issue_result['fields.customfield_12310293'].notna()]
issue_result.head()

Unnamed: 0,expand,id,self,key,fields.fixVersions,fields.resolution.self,fields.resolution.id,fields.resolution.description,fields.resolution.name,fields.customfield_12312322,...,fields.sprint.originBoardId,fields.resolution,fields.assignee,fields.aggregateprogress.percent,fields.progress.percent,fields.timetracking.originalEstimate,fields.timetracking.remainingEstimate,fields.timetracking.originalEstimateSeconds,fields.timetracking.remainingEstimateSeconds,board_id
5,"operations,versionedRepresentations,editmeta,c...",12819021,https://issues.apache.org/jira/rest/agile/1.0/...,AURORA-1258,[{'self': 'https://issues.apache.org/jira/rest...,https://issues.apache.org/jira/rest/api/2/reso...,1,A fix for this issue is checked into the tree ...,Fixed,,...,,,,,,,,,,37
9,"operations,versionedRepresentations,editmeta,c...",12901444,https://issues.apache.org/jira/rest/agile/1.0/...,AURORA-1506,[{'self': 'https://issues.apache.org/jira/rest...,https://issues.apache.org/jira/rest/api/2/reso...,1,A fix for this issue is checked into the tree ...,Fixed,,...,,,,,,,,,,37
11,"operations,versionedRepresentations,editmeta,c...",12818763,https://issues.apache.org/jira/rest/agile/1.0/...,AURORA-1250,[{'self': 'https://issues.apache.org/jira/rest...,https://issues.apache.org/jira/rest/api/2/reso...,1,A fix for this issue is checked into the tree ...,Fixed,,...,,,,,,,,,,37
12,"operations,versionedRepresentations,editmeta,c...",12901982,https://issues.apache.org/jira/rest/agile/1.0/...,AURORA-1511,[{'self': 'https://issues.apache.org/jira/rest...,https://issues.apache.org/jira/rest/api/2/reso...,1,A fix for this issue is checked into the tree ...,Fixed,,...,,,,,,,,,,37
13,"operations,versionedRepresentations,editmeta,c...",12839636,https://issues.apache.org/jira/rest/agile/1.0/...,AURORA-1364,[{'self': 'https://issues.apache.org/jira/rest...,https://issues.apache.org/jira/rest/api/2/reso...,1,A fix for this issue is checked into the tree ...,Fixed,,...,,,,,,,,,,37


In [41]:
#Check the dataframe shape
issue_result.shape

(2610, 336)

In [42]:
#Save as csv file
issue_result.to_csv('YourProjectName_issues.csv')