# Gerrit exporter - Chapter 4

This notebook shows how to export Gerrit data to a CSV file. The CSV file can be used to import the data into a database or used in a machine learning pipeline to train a model.

This version of the notebook shows how to log the information quality to a log file. 

## Configuration of the information quality checks

First, we configure the information quality logs

In [3]:
# now we add the logging to note whether something has actually gone wrong
import logging

# create a logging file
# including the format of the log messages
logging.basicConfig(filename='./information_quality_gerrit.log', 
                    filemode='w',
                    format='%(asctime)s;%(name)s;%(levelname)s;%(message)s',
                    level=logging.DEBUG)

# specifying the name of the logger,
# which will tell us that the message comes from this program
# and not from any other modules or components imported
logger = logging.getLogger('Gerrit export flow')

# the first log message to indicate the start of the execution
# it is important to add this, since the same log-file can be re-used
# the re-use can be done by other components to provide one single point of logging
logger.info('Configuration started')

## Configuration

Important imports that are needed to communicate with gerrit

In [4]:
# in this example, we check that all the libraries are available
# if they are, we do not log anything
# but if they are not, we log an error message
# in the information quality program, we will look for this kind of problems
try:
    from requests.auth import HTTPDigestAuth
    from pygerrit2 import GerritRestAPI, HTTPBasicAuth
    from IPython.display import clear_output
    import requests
    import urllib
    # to simulate this error, we can use the following line (uncomment it)
    # import library.that.does.not.exist
except Exception as e:
    logger.error(f'LIBRARIES: {e.msg}')

# debug
import pprint

logger.info('Importing complete')



## Exporting the data about changes

The first entry point, which we need to use when parsing output from Gerrit, is the changes endpoint.

It is a query that returns the list of changes as a JSON dictionary. 

We can then print this dictionary to see what it contains.

In [5]:
# A bit of config - repo
gerrit_url = "https://gerrit.onap.org/r"
fileName = "./gerrit_reviews.csv"

# since we use a public oss repository, we don't need to authenticate
auth = None

# this line gets sets the parameters for the HTML API
rest = GerritRestAPI(url=gerrit_url, auth = auth)

logger.info('REST API set-up complete')

# a set of parameters for the JSON API to get changes in batches of 500
start = 0                       # which batch we start from - usually 0

logger.info('Connecting to Gerrit server and accessing changes')

try: 
    # the main query where we ask the endpoing to provide us the list and details of all changes
    # each change is essentially a review that has been submitted to the repository
    changes = rest.get("/changes/?q=status:merged&o=ALL_FILES&o=ALL_REVISIONS&o=DETAILED_LABELS&start={}".format(start), 
                       headers={'Content-Type': 'application/json'})
except Exception as e:
    logger.error('ENTITY ACCESS - Error retrieving changes: {}'.format(e))

logger.info('Changes retrieved')


In [6]:
# pretty printer for json files that the Gerrit API returns
pp= pprint.PrettyPrinter(indent=1)

pp.pprint(changes[0])

logger.info('Execution complete')

{'_number': 132912,
 'branch': 'master',
 'change_id': 'I418bd9e44304d617c0eb875008f8af5826692cfa',
 'created': '2023-01-13 12:26:53.000000000',
 'current_revision': 'b1125ecc1a211121d65b812dc85f028ad4f5fb31',
 'deletions': 2,
 'has_review_started': True,
 'hashtags': [],
 'id': 'dcaegen2%2Fcollectors%2Fdatafile~master~I418bd9e44304d617c0eb875008f8af5826692cfa',
 'insertions': 5,
 'labels': {'Code-Review': {'all': [{'_account_id': 105,
                                     'date': '2023-01-14 03:21:23.000000000',
                                     'permitted_voting_range': {'max': 2,
                                                                'min': 2},
                                     'value': 2},
                                    {'_account_id': 459, 'value': 0}],
                            'default_value': 0,
                            'values': {' 0': 'No score',
                                       '+1': 'Looks good to me, but someone '
                             

## What we see

The above list is the set of all the changes in the project (well, only the 500 of these changes). We need to parse these changes in order to extract the review comments and similar. 

Please take a look at another notebook - gerrit_exporter_loop_comments - to see how to extract the comments from the changes.