# Domain Champion

This analysis is based on the "Domain Champion" pattern in the Plurasight [book](https://www.pluralsight.com/content/dam/pluralsight2/landing-pages/offers/flow/pdf/Pluralsight_20Patterns_ebook.pdf). 

A Domain Champion (DC) is an expert in a particular portion of the code base, and mainly contribute to that portion of the project, while relatively few others contribute there. The positive aspect of that pattern is that the DC can be very productive in the short term. However, because others may now know that domain as well, code reviews are unlikely to result in good feedback. Longer term, this pattern can lead to stagnation, which in turn can lead to attrition. Moreover, if the DC has to leave for any reason, others may not be able to effectively maintain the DC's domain due to lack of expertise. 

An action item to consider when a project has developers who can be identified as domain champion is to consider encouraging them to contribute to other parts of the project, if appropriate, or to ensure that more people are familiar with that portion of the code.

In an HPC context, DC is not necessarily a pattern to be avoided since there can be naturally distinct software components that are within the area of expertise of a contributor, e.g., an interface to another package. Nevertheless, the projects must consider how they will maintain or extend the domain if any single developer departs the project for any reason.

In [6]:
import sys, os, getpass, warnings
warnings.filterwarnings('ignore')
from patterns.patterns import Patterns
from patterns.visualizer import Visualizer

## Load data for project(s)

First, we will load some git commits data from the MySQL database. Note that this can take a very long time since we are loading all available data. 

In [2]:
vis = Visualizer(project_name='spack', db_pwd='cabbage')#db_pwd=getpass.getpass(prompt='Database password:'))
vis.get_data()
all_commits = vis.commit_data
all_commits.head()

Unnamed: 0_level_0,sha,branch,author,message,filepath,diff,year,month,day,doy,dow,locc,locc-,locc+,change-size
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
2021-02-23 15:46:37,93ed1a410c4a202eab3a68769fd8c0d4ff8b1c8e,* develop\n remotes/origin/HEAD -> origin/dev...,Josh Essman,Updates to support clingo-cffi (#20657)\n \...,.github/workflows/linux_unit_tests.yaml,+ clingo-cffi:\n+ # Test for the clingo ba...,2021,2,23,54,Tuesday,51,0,51,1.0
2021-02-23 15:46:37,93ed1a410c4a202eab3a68769fd8c0d4ff8b1c8e,* develop\n remotes/origin/HEAD -> origin/dev...,Josh Essman,Updates to support clingo-cffi (#20657)\n \...,lib/spack/spack/solver/asp.py,+ # There may be a better way to detect thi...,2021,2,23,54,Tuesday,2,0,2,1.0
2021-02-03 09:39:57,61641ecff2d9aad26340a7f3ab80c29f0d8ba690,* develop\n remotes/origin/HEAD -> origin/dev...,Josh Essman,sundials: expose monitoring build option (#21429),var/spack/repos/builtin/packages/sundials/pack...,"+ # Monitoring\n+ variant('monitoring', ...",2021,2,3,34,Wednesday,7,0,7,1.0
2021-02-24 08:35:27,d65002a6761cd2d745b76eae0592a50056255b7f,* develop\n remotes/origin/HEAD -> origin/dev...,kuramoto-fj,n2p2: Add new package (#21709)\n \n * n2...,var/spack/repos/builtin/packages/n2p2/interfac...,+--- a/src/interface/makefile\n++++ b/src/inte...,2021,2,24,55,Wednesday,6,0,6,1.0
2021-02-24 08:35:27,d65002a6761cd2d745b76eae0592a50056255b7f,* develop\n remotes/origin/HEAD -> origin/dev...,kuramoto-fj,n2p2: Add new package (#21709)\n \n * n2...,var/spack/repos/builtin/packages/n2p2/interfac...,+--- a/src/interface/makefile\n++++ b/src/inte...,2021,2,24,55,Wednesday,6,0,6,1.0


In [3]:
print("Table size: ", all_commits.shape)
all_commits.describe()

Table size:  (68828, 15)


Unnamed: 0,year,month,day,doy,locc,locc-,locc+,change-size
count,68828.0,68828.0,68828.0,68828.0,68828.0,68828.0,68828.0,68828.0
mean,2018.26004,6.984163,16.230909,197.753356,7.786351,1.709159,6.077192,0.7772164
std,1.755194,3.954377,10.202866,124.156478,34.150429,12.037671,30.970787,0.3482191
min,2013.0,1.0,1.0,1.0,0.0,0.0,0.0,-2.220446e-16
25%,2017.0,3.0,7.0,83.0,0.0,0.0,0.0,0.4625991
50%,2018.0,8.0,16.0,218.0,2.0,0.0,1.0,1.0
75%,2020.0,10.0,25.0,300.0,6.0,1.0,4.0,1.0
max,2021.0,12.0,31.0,366.0,1864.0,1383.0,1864.0,1.0


Next we remove non-code files, but preserve the original full data in the all_commits dataframe. The determination of what is code is made by including common suffixes, as well as checking manually a sampling of ECP projects for the suffixes used for things that can be labeled as code (vs. input simulation data, documentation, or generated files).

## Analysis and visualizations

First, we can exclude some portions of the data. For example, we may wish to filter out documentation, which may be automatically generated.

In [4]:
vis.remove_noncode()
print("Removed %d noncode files!" % (all_commits.shape[0] - vis.commit_data.shape[0]))
vis.commit_data.head()

Removed 3919 noncode files!


Unnamed: 0_level_0,sha,branch,author,message,filepath,diff,year,month,day,doy,dow,locc,locc-,locc+,change-size
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
2021-02-23 15:46:37,93ed1a410c4a202eab3a68769fd8c0d4ff8b1c8e,* develop\n remotes/origin/HEAD -> origin/dev...,Josh Essman,Updates to support clingo-cffi (#20657)\n \...,.github/workflows/linux_unit_tests.yaml,+ clingo-cffi:\n+ # Test for the clingo ba...,2021,2,23,54,Tuesday,51,0,51,1.0
2021-02-23 15:46:37,93ed1a410c4a202eab3a68769fd8c0d4ff8b1c8e,* develop\n remotes/origin/HEAD -> origin/dev...,Josh Essman,Updates to support clingo-cffi (#20657)\n \...,lib/spack/spack/solver/asp.py,+ # There may be a better way to detect thi...,2021,2,23,54,Tuesday,2,0,2,1.0
2021-02-03 09:39:57,61641ecff2d9aad26340a7f3ab80c29f0d8ba690,* develop\n remotes/origin/HEAD -> origin/dev...,Josh Essman,sundials: expose monitoring build option (#21429),var/spack/repos/builtin/packages/sundials/pack...,"+ # Monitoring\n+ variant('monitoring', ...",2021,2,3,34,Wednesday,7,0,7,1.0
2021-02-24 08:35:27,d65002a6761cd2d745b76eae0592a50056255b7f,* develop\n remotes/origin/HEAD -> origin/dev...,kuramoto-fj,n2p2: Add new package (#21709)\n \n * n2...,var/spack/repos/builtin/packages/n2p2/package.py,+# Copyright 2013-2021 Lawrence Livermore Nati...,2021,2,24,55,Wednesday,111,0,111,1.0
2021-02-22 01:04:29,8e8c599299bdd49e9260b65a6fc897b81de886c1,* develop\n remotes/origin/HEAD -> origin/dev...,kuramoto-fj,pfapack: forbid building in parallel (#21826)\...,var/spack/repos/builtin/packages/pfapack/packa...,+ parallel = False\n+,2021,2,22,53,Monday,1,0,1,1.0


In [5]:
files_vs_dev = vis.make_file_developer_df(value_column='locc')