Skip to content

Commit

Permalink
Change blacklist to whitelist
Browse files Browse the repository at this point in the history
This changes a blacklist in gitaggregate to a whitelist.

Doing this helps identify files that are actually needed, preventing the calculation of some rather large files that cause OOM errors and take a long time to generate.
  • Loading branch information
hayfield authored Mar 23, 2017
1 parent 4f63d2a commit 342bc6a
Showing 1 changed file with 16 additions and 10 deletions.
26 changes: 16 additions & 10 deletions statsrunner/gitaggregate.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,21 @@
# Exclude some json stats files from being aggregated
# These are typically the largest stats files that would consume large amounts of
# memory/disk space if aggregated over time
blacklisted_stats_files = [
'codelist_values',
'iati_identifiers',
'duplicate_identifiers',
'publisher_duplicate_identifiers',
'participating_orgs_text',
'transaction_dates',
'comprehensiveness_current_activities',
'forwardlooking_excluded_activities'
whitelisted_stats_files = [
'activities',
'activity_files',
'file_size_bins',
'file_size',
'invalidxml',
'nonstandardroots',
'organisation_files',
'publisher_has_org_file',
'publishers_per_version',
'publishers',
'publishers_validation',
'unique_identifiers',
'validation',
'versions'
]

# Load the reference of commits to dates
Expand All @@ -50,7 +56,7 @@

k = fname[:-5] # remove '.json' from the filename
# Ignore certain files
if k in blacklisted_stats_files:
if k not in whitelisted_stats_files:
continue

print 'Adding to {} for file: {}'.format('gitaggregate-dated' if dated else 'gitaggregate', fname)
Expand Down

0 comments on commit 342bc6a

Please sign in to comment.