# BLOCKER and CRITICAL Maven issues analysis

## Loading data

Query used to get all the described issues and connection string to the database (Postgres):

In [1]:
query_issues = '''
    select
        i.kee as uuid,
        i.severity,
        i.message as message,
        i.line as line,
        p.name as file_name,
        m.name as metric,
        l.value as value
    from
        issues i 
        inner join projects p on i.component_uuid = p.uuid
        inner join live_measures l on i.component_uuid = l.component_uuid
        inner join metrics m on l.metric_id = m.id
    where
        -- dívidas técnicas com tipo BLOCKER ou CRITICAL
        i.severity in ('BLOCKER', 'CRITICAL')
        and l.metric_id in (3, 18) -- metricas que se deseja extrair do arquivo em questao'''

connection_url = 'postgresql://sonar:sonar@localhost/sonar'

Importing analysis libraries:

In [2]:
import pandas as pd

Loading the results into a Dataframe:

In [3]:
df_issues = pd.read_sql(query_issues, connection_url)
df_issues.head()

Unnamed: 0,uuid,severity,message,line,file_name,metric,value
0,AWrwqUzAm0KLequXiiF4,CRITICAL,Refactor this method to reduce its Cognitive C...,51,SystemPropertyProfileActivator.java,ncloc,78.0
1,AWrwqUzAm0KLequXiiF4,CRITICAL,Refactor this method to reduce its Cognitive C...,51,SystemPropertyProfileActivator.java,complexity,11.0
2,AWrwqUzJm0KLequXiiF5,CRITICAL,Move constants to a class or enum.,28,ProfileActivator.java,ncloc,10.0
3,AWrwqUzJm0KLequXiiF5,CRITICAL,Move constants to a class or enum.,28,ProfileActivator.java,complexity,0.0
4,AWrwqUzYm0KLequXiiF-,CRITICAL,Move constants to a class or enum.,31,MavenProfilesBuilder.java,ncloc,11.0


## Insights

Size of the Dataframe

In [4]:
df_issues.shape

(1066, 7)

Issues count per severity

In [5]:
df_issues.drop_duplicates('uuid').groupby('severity').count().uuid

severity
BLOCKER      15
CRITICAL    518
Name: uuid, dtype: int64

Descriptive statistics per metric type (complexity and ncloc)

In [6]:
df_issues.loc[df_issues['metric'] == 'complexity', 'value'].describe()

count    533.000000
mean      68.324578
std       99.771608
min        0.000000
25%       10.000000
50%       37.000000
75%       69.000000
max      664.000000
Name: value, dtype: float64

In [7]:
df_issues.loc[df_issues['metric'] == 'ncloc', 'value'].describe()

count     533.000000
mean      369.288931
std       466.194592
min         6.000000
25%        62.000000
50%       211.000000
75%       443.000000
max      2693.000000
Name: value, dtype: float64

The same analysis, but without the _test_ files

# Same analysis, but with all the files in the project for comparison

## Loading data

Query used to get all the described issues and connection string to the database (Postgres):

In [8]:
query_all = """
    select
        p.name as file_name,
        m.name as metric,
        l.value as value
    from 
        projects p
        inner join live_measures l on p.uuid = l.component_uuid
        inner join metrics m on l.metric_id = m.id
    where
        l.metric_id in (3, 18) -- metricas que se deseja extrair do arquivo em questao
        and p."scope" = 'FIL' and p.qualifier = 'FIL'"""

Loading the results into a Dataframe:

In [9]:
df_all = pd.read_sql(query_all, connection_url)
df_all.head()

Unnamed: 0,file_name,metric,value
0,DefaultMavenProfilesBuilder.java,complexity,2.0
1,DefaultMavenProfilesBuilder.java,ncloc,54.0
2,ProfileManager.java,complexity,0.0
3,ProfileManager.java,ncloc,23.0
4,DefaultProfileManager.java,complexity,27.0


## Insights

Size of the Dataframe

In [10]:
df_all.shape

(1415, 3)

Descriptive statistics per metric type (complexity and ncloc)

In [12]:
df_all.loc[df_all['metric'] == 'complexity', 'value'].describe()

count    700.000000
mean      14.334286
std       35.735593
min        0.000000
25%        0.000000
50%        5.000000
75%       15.000000
max      664.000000
Name: value, dtype: float64

In [13]:
df_all.loc[df_all['metric'] == 'ncloc', 'value'].describe()

count     715.000000
mean       88.925874
std       172.153021
min         1.000000
25%        14.500000
50%        40.000000
75%        95.000000
max      2693.000000
Name: value, dtype: float64