# Requirements Documentation and Notes

# SQL Samples


2. Total monthly commits
```sql
        SELECT
            date_trunc( 'month', commits.cmt_author_timestamp AT TIME ZONE'America/Chicago' ) AS DATE,
            repo_name,
            rg_name,
            cmt_author_name,
            cmt_author_email,
            COUNT ( cmt_author_email ) AS author_count 
        FROM
            commits,
            repo,
            repo_groups 
        WHERE
            commits.repo_id = repo.repo_id 
            AND repo.repo_group_id = repo_groups.repo_group_id 
            AND commits.cmt_author_timestamp AT TIME ZONE'America/Chicago' BETWEEN '2019-11-01' 
            AND '2019-11-30' 
        GROUP BY
            DATE,
            repo_name,
            rg_name,
            cmt_author_name,
            cmt_author_email 
        ORDER BY
            DATE,
            cmt_author_name,
            cmt_author_email; 

```

### Metrics: Lines of Code and Commit Summaries by Week, Month and Year
There are six summary tables : 
1. dm_repo_annual
2. dm_repo_monthly
3. dm_repo_weekly
4. dm_repo_group_annual
5. dm_repo_group_monthly
6. dm_repo_group_weekly

```sql
SELECT
    repo.repo_id,
    repo.repo_name,
    repo_groups.rg_name,
    dm_repo_annual.YEAR,
    SUM ( dm_repo_annual.added ) AS lines_added,
    SUM ( dm_repo_annual.whitespace ) AS whitespace_added,
    SUM ( dm_repo_annual.removed ) AS lines_removed,
    SUM ( dm_repo_annual.files ) AS files,
    SUM ( dm_repo_annual.patches ) AS commits 
FROM
    dm_repo_annual,
    repo,
    repo_groups 
WHERE
    dm_repo_annual.repo_id = repo.repo_id 
    AND repo.repo_group_id = repo_groups.repo_group_id 
GROUP BY
    repo.repo_id,
    repo.repo_name,
    repo_groups.rg_name,
YEAR 
ORDER BY
    YEAR,
    rg_name,
    repo_name

```


### Metrics: Value / Labor / Lines of Code (Total, NOT Commits)
1. Total lines in a repository by language and line type. This is like software as an asset. Its lines of code, at a point in time, ++
```sql 
SELECT
    repo.repo_id,
    repo.repo_name,
    programming_language,
    SUM ( total_lines ) AS repo_total_lines,
    SUM ( code_lines ) AS repo_code_lines,
    SUM ( comment_lines ) AS repo_comment_lines,
    SUM ( blank_lines ) AS repo_blank_lines,
    AVG ( code_complexity ) AS repo_lang_avg_code_complexity 
FROM
    repo_labor,
    repo,
    repo_groups 
WHERE
    repo.repo_group_id = repo_groups.repo_group_id 
    and 
    repo.repo_id = repo_labor.repo_id
GROUP BY
    repo.repo_id,
    programming_language 
ORDER BY
    repo_id


--

```

#### Estimated Labor Hours by Repository 
```sql 
SELECT C
    .repo_id,
    C.repo_name,
    SUM ( estimated_labor_hours ) 
FROM
    (
    SELECT A
        .repo_id,
        b.repo_name,
        programming_language,
        SUM ( total_lines ) AS repo_total_lines,
        SUM ( code_lines ) AS repo_code_lines,
        SUM ( comment_lines ) AS repo_comment_lines,
        SUM ( blank_lines ) AS repo_blank_lines,
        AVG ( code_complexity ) AS repo_lang_avg_code_complexity,
        AVG ( code_complexity ) * SUM ( code_lines ) + 20 AS estimated_labor_hours 
    FROM
        repo_labor A,
        repo b 
    WHERE
        A.repo_id = b.repo_id 
    GROUP BY
        A.repo_id,
        programming_language,
        repo_name 
    ORDER BY
        repo_name,
        A.repo_id,
        programming_language 
    ) C 
GROUP BY
    repo_id,
    repo_name;
```

#### Estimated Labor Hours by Language
```sql
SELECT C
    .repo_id,
    C.repo_name,
    programming_language,
    SUM ( estimated_labor_hours ) 
FROM
    (
    SELECT A
        .repo_id,
        b.repo_name,
        programming_language,
        SUM ( total_lines ) AS repo_total_lines,
        SUM ( code_lines ) AS repo_code_lines,
        SUM ( comment_lines ) AS repo_comment_lines,
        SUM ( blank_lines ) AS repo_blank_lines,
        AVG ( code_complexity ) AS repo_lang_avg_code_complexity,
        AVG ( code_complexity ) * SUM ( code_lines ) + 20 AS estimated_labor_hours 
    FROM
        repo_labor A,
        repo b 
    WHERE
        A.repo_id = b.repo_id 
    GROUP BY
        A.repo_id,
        programming_language,
        repo_name 
    ORDER BY
        repo_name,
        A.repo_id,
        programming_language 
    ) C 
GROUP BY
    repo_id,
    repo_name,
    programming_language 
ORDER BY
    programming_language;
```



## Issues
### Issue Collection Status
1. Currently 100% Complete
```sql
SELECT a.repo_id, a.repo_name, a.repo_git, 
    b.issues_count,
    d.repo_id AS issue_repo_id,
    e.last_collected,
    COUNT ( * ) AS issues_collected_count,
    (
    b.issues_count - COUNT ( * )) AS issues_missing,
    ABS (
    CAST (( COUNT ( * )) AS DOUBLE PRECISION ) / CAST ( b.issues_count AS DOUBLE PRECISION )) AS ratio_abs,
    (
    CAST (( COUNT ( * )) AS DOUBLE PRECISION ) / CAST ( b.issues_count AS DOUBLE PRECISION )) AS ratio_issues 
FROM
    augur_data.repo a,
    augur_data.issues d,
    augur_data.repo_info b,
    ( SELECT repo_id, MAX ( data_collection_date ) AS last_collected FROM augur_data.repo_info GROUP BY repo_id ORDER BY repo_id ) e 
WHERE
    a.repo_id = b.repo_id 
    AND a.repo_id = d.repo_id 
    AND b.repo_id = d.repo_id 
    AND e.repo_id = a.repo_id 
    AND b.data_collection_date = e.last_collected 
    AND d.pull_request_id IS NULL 
GROUP BY
    a.repo_id,
    d.repo_id,
    b.issues_count,
    e.last_collected, 
        a.repo_git 
ORDER BY
   repo_name, repo_id,  ratio_abs;
```

### Repositories with GitHub Issue Tracking
```sql  
        
select repo_id, count(*) from repo_info where issues_count > 0
        group by repo_id; 
```