Database Schema

Dawn Foster edited this page May 29, 2015 · 6 revisions
Clone this wiki locally

Details per table

compressed_files: This table contains a register for each archive file that has been downloaded, or tried to download.

Name Data Type Description Key
url VARCHAR (255) URL of the file PK
mailing_list_url VARCHAR (255) URL of the web archives of the mailing list where this file belongs to FK
status ENUM Either visited, new or failed
last_analysis DATETIME Date and time of the last analysis of this time

mailing_lists: This table contains a register for each different mailing list analysed.

Name Data Type Description Key
mailing_list_url VARCHAR(255) URL of the archives web page PK
mailing_list_name VARCHAR(255) Name of the mailing list, as it appears in the headers of the messages
project_name VARCHAR(255) Name of the software project were this list belongs to. Taken from the email address of the mailing list
last_analysis DATETIME Date and time of the last analysis performed on this mailing list

mailing_lists_people: This table joins the table mailing lists and people, making possible to search for people grouping by different mailing lists.

Name Data Type Description Key
email_address VARCHAR(255) People unique identifier by email address PK
mailing_list_url VARCHAR(255) URL of the mailing list archives web page PK

messages: This table contains a register for each message in the mailing list archives. It contains all the information in the headers plus the message itself.

Name Data Type Description Key
message_id VARCHAR(255) Unique identifier assigned by the mailing list manager PK
mailing_list_url VARCHAR(255) URL of the archives web page of the mailing list PK
mailing_list VARCHAR(255) Name and address of the mailing list
first_date DATETIME Local date written in the message by the original sender
first_date_tz DECIMAL(11,0) Time zone of the above date
arrival_date DATETIME Local time of the server that received the message
arrival_date_tz DECIMAL(11,0) Time zone of the above date
subject VARCHAR(1024) Subject of the message
message_body MEDIUMTEXT Main text of the message
mail_path TEXT Mail path
is_response_of VARCHAR(255) If this message is a reply of another, this is the id of the original message FK

messages_people: This is a table establish the relationship between email addresses and messages.

Name Data Type Description Key
message_id VARCHAR(255) Id of the message where that person appears PK
email_address VARCHAR(255) People unique identifier by email address PK
type_of_recipient ENUM Either From, To, or Cc PK
mailing_list_url VARCHAR(255) URL of the mailing list archives web page PK

people: This table contains a register for each one of the people who has written a message to the mailing list, or at least appears as destination in a message that has been sent to the mailing list.

Name Data Type Description Key
email_address VARCHAR(255) People unique identifier by email address PK
name VARCHAR(255) Name (if appears in the header)
username VARCHAR(255) The first part (before the @) of the email address
domain_name VARCHAR(255) The second part (after the @) of the email address
top_level_domain VARCHAR(255) Top level domain of the email address (.com, .org, .es, etc)

Schema

This schema is very out of date, but this is the best we have so far. This figure was obtained from the deliverable 3.1 in the FLOSSMetrics project.

MLStats Database Schema