Repository for UC Santa Cruz's work on Libresoft's CVSAnalY
Pull request Compare This branch is 159 commits behind SoftwareIntrospectionLab:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.




The CVSAnalY tool extracts information out of source code repository logs and stores it into a database.

Quick installation

  1. Get pip: sudo easy_install pip
  2. Use pip: pip install ""

Slower installation


Note for upgraders: CVSAnalY now uses setuptools for installation. Depending on your PYTHONPATH, the old CVSAnalY might not be removed (or worse, override this release). Please check for and remove old installations before installing this version.

CVSAnalY has the following dependencies:

  • Python 2.5 or higher

  • RepositoryHandler (this needs to be placed in your PYTHONPATH)

    git clone

  • Guilty (optional. Required for the Blame or HunkBlame extensions, also needs to be discoverable in the PYTHONPATH)

    git clone

  • CVS (optional. Required for CVS support. Make sure to read the "SCM Support" section.)

  • Subversion (optional. Required for SVN support. Make sure to read the "SCM Support section.)

  • Git (optional. Required for Git support. Must be >= 1.7.4 for HunkBlame extension to work)

  • Python MySQLDb (optional, but of course required if you wish to actually use MySQL as your database engine!)


You don't need to do anything if you are happy using CVSAnalY from the path you downloaded it to. This is easiest if you intend on staying up-to-date with our releases from our Git repositories. You can also move the directory around to wherever you wish.

If you want to install it to a system location, you can do this by running the script:

python install

If you do this, you'll need to remember to run this every time you get a new release.

If you don't have root privledges, you can just add CVSAnalY to your $PATH (cvsanalydir is the directory where CVSAnalY is installed):

export PATH=$PATH:cvsanalydir

CVSAnalY needs RepositoryHandler. If it is not installed in the usual path for Python packages, PKG_CONFIG_PATH should include the directory where it is installed (repohandlerdir is the path where RepositoryHandler is installed):

export PKG_CONFIG_PATH=$PKG_CONFIG_PATH:repohandlerdir

You are now ready to use CVSAnalY!

Running CVSAnalY if you installed it

Just checkout (from Git/SVN/CVS) to obtain a local version of your repository, and then run cvsanaly2. Here's an example using Voldemort

$ git clone git:// ~/Downloads/voldemort
$ cd ~/Downloads/voldemort
$ ~/Downloads/voldemort$ cvsanaly2 

More options, and a more detailed info about the options, can be found by running cvsanaly2 --help.

Running CVSAnalY from its directory

Just checkout (from Git/SVN/CVS) to obtain a local version of your repository, and then run cvsanaly2, pointing to where you downloaded it. Here's an example using Voldemort:

$ git clone git:// ~/Downloads/voldemort
$ cd [where you downloaded CVSAnalY to]
[CVSAnalY directory]$ ./cvsanaly2 ~/Downloads/voldemort 

More options, and a more detailed info about the options, can be found by running ./cvsanaly2 --help.

SCM Support

At this point in time, only Git is fully tested and supported across all of CVSAnalY and its extensions. SVN is a "best effort" basis: things shouldn't break using SVN, but if they do, you're unlikely to have anyone respond to a bug tracker issue without a pull request patch.

CVSAnalY was originally created to support CVS and SVN. Git support appeared later, and Bazaar support was started but abandoned. As development has continued, it has become clear that Git represents the best possibilities for data mining source code repositories. Because Git allows all the source history to be downloaded to local storage, CVSAnalY actions are orders of magnitude faster. For example, the Content extension can get every revision of a file. With CVS and SVN, this requires sending the request to the central server, have the server (slowly) process it, and then get the content back. We've found that operations which take hours on Git can take weeks with SVN.

If you have an SVN repository that you want to mine, but you can't find a Git mirror for it, we've had good success with svn2git.

If you're having problems

Packet bigger than max_allowed_packet

Sometimes, a lot of data can pass between CVSAnalY and MySQL, and packet limits are set too small.

Follow the instructions here.

UnicodeEncodeError: 'ascii' codec can't encode character

This happens because Python is trying to print out a Unicode string to a terminal that has told Python it only supports ASCII. You can coerce Python into printing Unicode by setting up your


CVSAnalY is developed by the GSyC/LibreSoft group at the Universidad Rey Juan Carlos in Móstoles, near Madrid (Spain). It is part of a wider research on libre software engineering, aimed to gain knowledge on how libre software is developed and maintained.

CVSAnalY is actively contributed to by the Software Introspection Lab at University of California, Santa Cruz, and hosts Git mirrors at . UCSC can review pull requests and bug reports using GitHub's systems. This is currently more active than the official LibreSoft repository ecosystem, and may be more likely to have your issue reviewed.

More information

Main authors of CVSAnalY