Skip to content

daviddavis/pulp_repo_version_test

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 

Motivation

In order to decide between two different designs in how to handle versioned repositories (one being a direct relation between repo version and content and the other involving storing the differences between two versions), we decided to evaluate how PostgreSQL copes with tables that contain billions of records). We wanted to test what would happen if we had 1 billion associations between repo versions and content units. How would our PostgreSQL database handle that? Would it be slow? Would it be large?

Setup

To get an idea of how a large table of 1 billion records affected performance, I used the existing code base and instead of dealing with repo versions, I just used repositories. I imagine that the RepositoryContent model will be replaced with a RepositoryVersionContent model that has the same fields except repository_version_id instead of repository_id.

We also decided that large repositories like RHEL 7 typically contained about 10,000 units. So to seed the data, we create 10,000 Content records. With 10,000 Content records, it was necessarily to create 100,000 repositories since 100,000 times 10,000 is 1 billion. This setup should give us the same performance as if we had repository versions--we're simply substituting repositories for repo versions.

Implementation

I started with first benchmarking some intermediate numbers of repositories before getting to 100,000 repositories (0, 100, ...). For metrics, I looked at how long it took to insert a new repository with 10,000 repositories, how long it took to read all the content units for an existing repository, how large the table got, and how big the indexes became (pk, repository_id, content_id, and repository_id + content_id).

Hardware

  • Vagrant VM with 2 cores
  • HDD ~110 MB/s (tested using dd if=/dev/zero of=speedtest bs=64k count=3200 conv=fdatasync)
  • 12 GB of RAM
  • Intel Xeon 4-core E5606 processor

Results

Repo Versions Write (s) Read (s) Table (MB) Index (MB)
0 2.291756 0.008121 2.3 6.5
100 2.182993 0.007838 74 183
1000 2.725601 0.005223 735 1720
10000 55.44191 0.006683 7309 14691
50000
100000

Note: There are 10,000 repository-version to content associations for each repo version.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages