# Working with Different Software Data Groups

The ScrumSaga system extracts many different (100+) data fields, with transformation alorithms for many more.  This can seem unmanageable, but becomes intuitive after learning the various categories and sub-categories of metric Data Groups.

This guide describes a few of the basic Data Groups, how they are related.  While all of the data can be represented in any of the categories, for instructional purposes, we can categorize them by their structure and typical use:

* Metric (timeseries)
* Hierarchical (parent-child)
* Entity-Relation (graph/network)
* Descriptive

We also provide the _processing data_ for those with interest.

### Preparations
_Set-up Environment_

In [1]:
# Ensure API Wrapper is available and load it
! ls ./ScrumSaga

Account.py
Portfolio.py
Project.py
README.md
Repo.py
__init__.py
__pycache__


In [3]:
import sys
path = r'C:\Users\Jason\Documents\IPython Notebooks\SS-Reports\ScrumSaga'
sys.path.append(path)
import ScrumSaga as saga

In [4]:
# Acocunt information (must be manipulated on website: scrumsaga.com)
SAGA_ACCT = {"email":"dev.team@mgmt-tech.org","password":"IMTorgTestUserPassword"}

Acct = saga.Account(acct_email=SAGA_ACCT['email'], acct_password=SAGA_ACCT['password'])
Acct.login()

passwords match


_Check Available Repo Data_

In [5]:
Acct.view_data()

['IMTorgTestCode--testprj_Java_aSimple']


### Simple Java Project

In [76]:
# create project
JavaRepo = saga.Repo('IMTorgTestProj','information@mgmt-tech.org','demoprj_Java_HumanResourceApp')
jHrApp = saga.Project(Acct, JavaRepo)

In [77]:
jHrApp.extract()

It appears there are no contributors


<Response [400]>

_Load Data from Repo_ 

In [7]:
# load all metric groups
JSimple.load_all()

PROJECT group records:  27
 -elapsed time: 0.233480
ENTITY_STRUCTURE group records:  10
 -elapsed time: 0.456476
ENTITY_CHARACTERISTIC group records:  25
 -elapsed time: 4.469988
SIZE group records:  14
 -elapsed time: 0.230188
TAG group records:  10
 -elapsed time: 0.234169
RELATION group records:  8
 -elapsed time: 0.231337
ERROR group records:  0
 -elapsed time: 0.239268
QUALITY group records:  0
 -elapsed time: 0.226874
COMPLEXITY group records:  14
 -elapsed time: 0.662828
AUTHOR group records:  6
 -elapsed time: 0.233922
PROCESS_LOG group records:  0
 -elapsed time: 0.228385
Loading completed with no errors


### Metric Data

project, size, complexity

In [29]:
import pandas
import numpy as np

# per commit
print( JSimple['project'].columns ) 
print( JSimple['size'].columns )

# per commit, per entity
print( JSimple['complexity'].columns) 

Index(['author_add', 'author_commits_count', 'author_del', 'author_files_size',
       'author_id', 'author_modified_count', 'author_original_count',
       'author_paths_count', 'author_total', 'authors_count', 'hash', 'id',
       'prj_id', 'project', 'release_count', 'reviewer_add',
       'reviewer_commits_count', 'reviewer_del', 'reviewer_files_size',
       'reviewer_modified_count', 'reviewer_name', 'reviewer_original_count',
       'reviewer_paths_count', 'reviewer_total', 'stamp', 'stamp_author',
       'subject'],
      dtype='object')
Index(['count', 'files_count', 'files_size', 'hash', 'id', 'loc_add',
       'loc_del', 'loc_total', 'modified_file_count', 'original_file_count',
       'prj_id', 'project', 'stamp', 'tag_count'],
      dtype='object')
Index(['bugs', 'calculated_length', 'cyclomatic_complexity', 'difficulty',
       'effort', 'entity_id', 'hash', 'id', 'n1', 'n2', 'nn1', 'nn2', 'time',
       'volume'],
      dtype='object')


In [47]:
JSimple['complexity']['volume'] = JSimple['complexity']['volume'].astype('float')

In [67]:
## SELECT day, AVG(tip), COUNT(*) FROM tips GROUP BY day;
cmplx = JSimple['complexity'].groupby('hash').agg({'volume': np.sum})

In [73]:
cmplx['hash'] = cmplx.index.values

In [74]:
m1 = pandas.merge(JSimple['project'], JSimple['size'], on='hash', how='left')
metric = pandas.merge(m1, cmplx, on='hash', how='left')

In [75]:
metric[['stamp_x','loc_total','volume']]

Unnamed: 0,stamp_x,loc_total,volume
0,2015-12-10 13:58:11.000000,7976,
1,2015-12-10 14:43:48.000000,8279,
2,2015-12-10 14:46:29.000000,8291,342.36
3,2015-12-10 14:51:57.000000,9192,342.36
4,2015-12-10 14:52:40.000000,9201,1393.2
5,2015-12-10 15:33:23.000000,9326,3187.98
6,2015-12-10 16:15:33.000000,9695,7651.37
7,2015-12-10 16:24:03.000000,9842,7832.81
8,2015-12-10 16:30:12.000000,9944,10056.64
9,2016-02-15 13:52:37.000000,9948,10056.64


### Hierarchical Data

entity_structure, tag

In [80]:
print( JSimple['entity_structure'].columns )
print( JSimple['tag'].columns )

Index(['child_of', 'child_of_id', 'created_hash', 'entity_name', 'entity_type',
       'ext', 'id', 'last_before_removed_hash', 'prj_id', 'type'],
      dtype='object')
Index(['class_name', 'file_path', 'func_id', 'hash', 'id', 'project',
       'tag_key', 'tag_value', 'user', 'var_name'],
      dtype='object')


In [102]:
tmp = JSimple['entity_structure']
tmp = tmp[tmp['last_before_removed_hash'] =='']
tmp = JSimple['entity_structure'].groupby('entity_type').agg({'entity_type':np.size})
tmp.reindex(['project','directory','file','class','method','param','variable'])

Unnamed: 0_level_0,entity_type
entity_type,Unnamed: 1_level_1
project,1
directory,79
file,154
class,4
method,7
param,5
variable,9


In [85]:
JSimple['tag'].tail()

Unnamed: 0,class_name,file_path,func_id,hash,id,project,tag_key,tag_value,user,var_name
17,aSimple::Animal,,,4fc5084ad116ea2bcc1953bf816f0ae6af34a979,18,testprj_Java_aSimple,thisIs,aClassDeclaration,IMTorgTestCode,
18,,,,4fc5084ad116ea2bcc1953bf816f0ae6af34a979,19,testprj_Java_aSimple,thisIs,classVariable,IMTorgTestCode,NumOfAnimals
19,,,225.0,400da368b4a92ea4ed33d6847d5c4deaaf388a16,20,testprj_Java_aSimple,thisIs,classMethod,IMTorgTestCode,
20,aSimple::Animal,,,400da368b4a92ea4ed33d6847d5c4deaaf388a16,21,testprj_Java_aSimple,thisIs,aClassDeclaration,IMTorgTestCode,
21,,,,400da368b4a92ea4ed33d6847d5c4deaaf388a16,22,testprj_Java_aSimple,thisIs,classVariable,IMTorgTestCode,NumOfAnimals


### Entity-Relation Data

In [None]:
entity_structure, relation

In [95]:
JSimple['relation'].shape

(13, 8)

### Descriptive Data

entity_characteristic, quality, error, author

In [123]:
print( JSimple['author'].columns )
print( JSimple['entity_characteristic'].columns )
print( JSimple['quality'].columns )
print( JSimple['error'].columns )

Index(['author_domain', 'author_email', 'author_name', 'date_author_join_prj',
       'id', 'prj_id'],
      dtype='object')
Index(['blank', 'brief_desc', 'code', 'comment', 'detailed_desc', 'end_line',
       'entity_id', 'hash', 'id', 'inbody_desc', 'last_modification_hash',
       'last_modification_loc_added', 'last_modification_loc_changes',
       'last_modification_loc_removed', 'last_modification_user', 'loc_add',
       'loc_del', 'loc_total', 'location', 'modifications', 'reimplements_id',
       'start_line', 'total_loc_added', 'total_loc_removed',
       'total_references'],
      dtype='object')
Index([], dtype='object')
Index([], dtype='object')


In [106]:
tmp = JSimple['entity_structure']
tmp = tmp[['id','entity_name']]
m1 = pandas.merge(tmp,JSimple['entity_characteristic'], on='id', how='left')
m2 = pandas.merge(m1, JSimple['quality'], on='hash', how='left')

Unnamed: 0,child_of,child_of_id,created_hash,entity_name,entity_type,ext,id,last_before_removed_hash,prj_id,type
254,aSimple/bin/aSimple,127,7405846a24596c8fdcadec8be1f392783d1517fc,Cat.class,file,.class,255,,1,
255,aSimple/src/aSimple,130,7405846a24596c8fdcadec8be1f392783d1517fc,Cat.java,file,.java,256,,1,
256,Cat.java,256,7405846a24596c8fdcadec8be1f392783d1517fc,aSimple::Cat,class,.java,257,,1,
257,aSimple::Cat,257,7405846a24596c8fdcadec8be1f392783d1517fc,Cat,method,.java,258,,1,
258,Cat,258,7405846a24596c8fdcadec8be1f392783d1517fc,name,param,.java,259,,1,String


Unnamed: 0,author_domain,author_email,author_name,date_author_join_prj,id,prj_id
0,mgmt-tech.org,jason.beach@mgmt-tech.org,IMTorg,2015-12-10 13:58:11.000000,1,1
1,gmx.com,claytonk@gmx.com,clayton,2016-02-15 13:52:37.000000,2,1


### Processing - Related

process_log

In [124]:
JSimple['process_log'].info()

<class 'pandas.core.frame.DataFrame'>
Index: 0 entries
Empty DataFrame

### Conclusion

This overview of the raw data collected provides a basis for workflows and understanding advanced, calculated data.  You can learn more in follow-on [guides](http://guides.scrumsaga.com/).