Database Schema

Alvaro del Castillo edited this page Oct 16, 2013 · 3 revisions
Clone this wiki locally

Schema

This schema is a bit out of date, but this is the best we have so far. This figure was obtained from the deliverable 3.1 in the FLOSSMetrics project.

CVSAnalY Database Schema

Details per table

  • scmlog This table contains general information about the commits. Every commit in the repository is represented by a record in the scmlog table
Name Data Type Description Key
id INTEGER Commit unique identifier PK
repository id INTEGER Repository identifier FK
author id INTEGER Author identifier. Some source control management systems,differentiate the person who did the commit from the person who actually made the changes. When not supported by the repository, this field will be NULL FK
committer id INTEGER Committer identifier. It is the identifier in the database of the person who did the commit It’s the revision identifier in the repository. It’s always unique in every repository. FK
rev MEDIUMTEXT Revision number assigned by the SCM (CVS, SVN,GIT,...)
date DATETIME Date and time of the commit
message LONGTEXT General comment about the commit
composed rev BOOL It’s a boolean to indicate whether the rev field is composed or not. This is needed because the rev field must be unique in every repository which is not possible in CVS since it uses revision numbers per file. The combination of a file path and its revision is what make a commit unique in a system like CVS. For this particular case the rev field is represented by the concatenation of the revision number, the pipe character and the file path.
  • file types This table contains a register for each kind of file that may be found in the repository, such as documentation source code, images, etc.
Name Data Type Description Key
id INTEGER File type unique identifier PK
file_id INTEGER File identifier FK
type_id MEDIUMTEXT File type are: code:source code files (C/C++,Python, Java, etc.), build:files used to build and configure the source code (Make-file, configure, cvsignore, etc.), ui:files containing graphical user interface definitions (glade, gtkbuilder, ui files, etc.), i18n: translation files (.po, .mo, etc.), documentation:documentation files devel doc:documentation for developers (HACK-ING, ChangeLog, etc.), package:package files (.tar.gz, .deb, .rmp, etc.), image:icons and files (.png, .jpeg, etc.), multimedia: audio and video files (.ogg, .avi, .mp3, etc.), unknown: files with an unknown type, generally files that don’t have extension
  • actions This table contains the different actions performed in a commit. In sys- tems like CVS, where the commit is limited to a single file, there will be only one record in the actions table for every commit. However, most of the version control systems support atomic commits, where several actions are carried out on several files@footnote
Name Data Type Description Key
id INTEGER Action unique identifier PK
commit_id INTEGER Commit identifier where the action was performed FK
file_id INTEGER File identifier FK
branch_id INTEGER Branch identifier. FK
type VARCHAR(1) Action type (A:Added, M:Modified, D:Deleted, V:Renamed, C:copied, R:Replaced, ). Not all of the action types are always supported, for example, for CVS repositories only A, M and D actions are supported.
  • branches This table contains the distinct branches of a repository.
Name Data Type Description Key
id INTEGER Branches unique identifier PK
name VARCHAR(255) Branches name FK
  • metrics This table contains distinct metrics obtained from a file. This extension provides simple source code metrics for every revision of every single file found in the repository. Since this extension is about source code, it uses the FileTypes extension to get only source code files. Note: a full descrip- tion of this metrics can be found in the appendixes of this document
Name Data Type Description Key
id INTEGER Metric unique identifier PK
file_id INTEGER File identifier FK
commit_id INTEGER Commit identifier FK
lang TINYTEXT Programming language
sloc INTEGER Number lines of code
loc INTEGER NUmber lines of all the file
ncomment INTEGER Number of comments
lcomment INTEGER Number lines of the comments
lblank INTEGER Number of blank lines
mccabe_min INTEGER Minimun Mccabe complexity of the functions that exists in the file
nfunctions INTEGER Number of functions
mccabe_max INTEGER Maximum Mccabe complexity of the functions that exists in the file
mccabe_sum INTEGER Sum Mccabe complexity of the functions that exists in the file
mccabe_mean INTEGER Mean Mccabe complexity of the functions that exists in the file
mccabe_median INTEGER Median Mccabe complexity of the functions that exists in the file
halstead_length INTEGER Halstead length in the file
halstead_vol INTEGER Halstead volumen in the file
halstead_level DOUBLE Halstead level in the file
halstead_md INTEGER Halstead mental discrimination
  • people This table contains registers about people have worked in the repository. Contains the name and email (when available) of the people involved in the repository.
Name Data Type Description Key
people_id INTEGER People unique identifier PK
name VARCHAR(255) People name
mail VARCHAR(255) People mail
  • repositories This table contains uris to the analysed repositories
Name Data Type Description Key
id INTEGER Repository unique identifier PK
uri VARCHAR(255) URI of the repository
name VARCHAR(255) Repository name
type_2 VARCHAR(30) Repository type (cvs, svn, git)
  • commits lines This table contains information about the line’s number added and re- moved by commit.
Name Data Type Description Key
id INTEGER Commit line unique identifier PK
commit_id INTEGER Commit identifier FK
added_name INTEGER Number lines added
removed_name INTEGER Number lines removed
  • file copies This table contains general information about the file copies. It is used to store additional information about actions that involve more than one file. Copies, moves, renames and replacements are actions performed over two or more files. The file id field of the actions table refers always to the file that is the object of the action.
Name Data Type Description Key
id INTEGER File copies unique identifier PK
from_id INTEGER Source file identifier. Identifier of the file that is the source of the action. In a move or copy operation this is the file id of the file from which the move or copy is done. It’s a foreign key that references the ’id’ field of the ’files’ table FK
from_commit_id INTEGER Commit source identifier. The source file contents are taken from the revision associated to this commit identifier. It’s a foreign key that references the ’id’ field of the ’scmlog’ table FK
to_id INTEGER Target file identifier. Identifier of the file that is the destination of the action. In a move or copy operation, this field is the same than the 'file id’ in the ’actions’ table. However, when a file is replaced, the ’file id’ stored in the ’actions’ table is the existing file being replaced, and this field contains the ’file id’ of the new file that replaces the existing one. It’s a foreign key that references the ’id’ field of the ’files’ table FK
action_id INTEGER Action identifier FK
new_file_name MEDIUMTEXT Contains the new name of the file for rename actions or ’NULL’ for other actions
  • files This table contains general information about the files found in the repos- itory
Name Data Type Description Key
id INTEGER File unique identifier PK
repository_id INTEGER Repository identifier FK
file_name VARCHAR(255) FIle or directory name
  • files links This table contains general information about the topology between files. The relationship between two files is always parent - child.
Name Data Type Description Key
id INTEGER File links unique identifier PK
file_id INTEGER File identifier FK
parent_id INTEGER Parent file identifier or -1 if the file is in the root of the repository. It’s a foreign key that references the id field of the files table FK
commit_id INTEGER Commit identifier FK
  • tag revisions This table contains general information about the list of revisions pointing to every tag
Name Data Type Description Key
id INTEGER Tag revision unique identifier PK
datasource_id INTEGER Datasource identifier FK
commit_id INTEGER Commit identifier FK
tag_id INTEGER Tag identifier FK
  • tags This table contains general information about the name of the tags.
Name Data Type Description Key
people_id INTEGER Tag unique identifier PK
name VARCHAR(255) Tag name