Fork for tracking CNCF projects
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
affiliation_finder Fix removing sensitive data Apr 12, 2018
clearbit_tools lukaszgryglicki-changed-@-to-excl-in-emails Jul 17, 2017
cncf-config One more Amazon fix Jan 22, 2019
cni_repos lukaszgryglicki-changed-@-to-excl-in-emails Jul 17, 2017
containerd_repos lukaszgryglicki-changed-@-to-excl-in-emails Jul 17, 2017
coredns_repos lukaszgryglicki-changed-@-to-excl-in-emails Jul 17, 2017
etcd_1y_repos lukaszgryglicki-changed-@-to-excl-in-emails Jul 17, 2017
etcd_6m_repos lukaszgryglicki-changed-@-to-excl-in-emails Jul 17, 2017
etcd_repos lukaszgryglicki-changed-@-to-excl-in-emails Jul 17, 2017
fluentd_repos lukaszgryglicki-changed-@-to-excl-in-emails Jul 17, 2017
gerrit initial commit add files Apr 19, 2017
git_logs Maing fetch all git logs MT - thsi should be a great boost Nov 24, 2018
grpc_repos lukaszgryglicki-changed-@-to-excl-in-emails Jul 17, 2017
kubernetes HMTL are not needed Sep 14, 2018
kubernetes_repos lukaszgryglicki-changed-@-to-excl-in-emails Jul 17, 2017
launchpad initial commit add files Apr 19, 2017
linkerd_repos lukaszgryglicki-changed-@-to-excl-in-emails Jul 17, 2017
linux_stats Updates for velocity charts Nov 1, 2018
old Updating affiliations very wip Jan 4, 2019
openstack-config lukaszgryglicki-changed-@-to-excl-in-emails Jul 17, 2017
opentracing_repos Cleaned data for new forbidden user Apr 12, 2018
other_repos Updated affiliations Nov 2, 2018
per_dirs lukaszgryglicki-updated-mapping Jul 25, 2017
prometheus_1y_repos Removed new forbiden email address Feb 27, 2018
prometheus_6m_repos Removed new forbiden email address Feb 27, 2018
prometheus_repos HMTL are not needed Sep 14, 2018
report lukaszgryglicki-updated-mapping Jul 25, 2017
repos HMTL are not needed Sep 14, 2018
res lukaszgryglicki-changed-@-to-excl-in-emails Jul 17, 2017
rkt_repos lukaszgryglicki-changed-@-to-excl-in-emails Jul 17, 2017
sample-config lukaszgryglicki-changed-@-to-excl-in-emails Jul 17, 2017
stats Removed forbidden data Nov 22, 2017
tests initial commit add files Apr 19, 2017
tools initial commit add files Apr 19, 2017
.gitignore Final text affiliations files for today Nov 24, 2018
ADD_PROJECT.md Seperate instruction for adding new project in ADD_PROJECT.md file Jul 13, 2018
COPYING initial commit add files Apr 19, 2017
ConfigFile.py Update python code to handle affiliation source type Jan 7, 2019
FORBIDDEN_DATA.md Removed forbidden data Nov 22, 2017
Makefile Add tool to calculate Levenshtein distance for company names to check… Oct 30, 2018
PULL_DATE lukaszgryglicki-add-support-for-1.7.0-beta.2 Jun 19, 2017
README.md Another docs update Aug 9, 2018
SYNC.md Docs update Jan 16, 2019
actors.txt generated linux actors too Jan 10, 2019
add_actors.rb Machinery to support multiple GitHub keys - very wip 2 Jan 16, 2019
add_actors.sh Some more fixes [wip] Jan 11, 2019
add_affiliations.sh Regenerated affiliation files Nov 13, 2017
add_forbidden_data.rb Removed unneeded pry May 22, 2018
add_notfound_source.rb Special priority for not found data Jan 4, 2019
added_top100.csv lukaszgryglicki-changed-@-to-excl-in-emails Jul 17, 2017
added_top20.csv lukaszgryglicki-changed-@-to-excl-in-emails Jul 17, 2017
aff.rb lukaszgryglicki-changed-@-to-excl-in-emails Jul 17, 2017
aff.sh lukaszgryglicki-devs-affiliations-date-ranges Jun 5, 2017
affiliations.csv Imported affiliations from Justa work Jan 22, 2019
affiliations.rb One more not found case Jan 7, 2019
affiliations.sh New affiliations.rb script Sep 17, 2018
affiliations_20180925.csv new affiliations round 2 Sep 27, 2018
affiliations_answers_cache.json CSV fixes and updated cache Jan 7, 2019
aliaser.rb lukaszgryglicki-changed-@-to-excl-in-emails Jul 17, 2017
aliaser.sh lukaszgryglicki-new-aliaser-tool-and-new-affiliations-found Jun 27, 2017
aliaser.txt lukaszgryglicki-changed-@-to-excl-in-emails Jul 17, 2017
all.txt Regenerated log files with a new affiliations imported Jan 22, 2019
all_added.csv Removed forbidden data Nov 22, 2017
all_affs.csv One more Amazon fix Jan 22, 2019
all_changed.csv Removed forbidden data Nov 22, 2017
all_changesets.csv Removed forbidden data Nov 22, 2017
all_no_map.sh lukaszgryglicki-updated-mapping-tools May 5, 2017
all_removed.csv Removed forbidden data Nov 22, 2017
all_repos_log.rb Add NCPUS variable to override autodetection if/when needed Jan 16, 2019
all_repos_log.sh MT git logs generated Nov 24, 2018
all_repos_log_st.sh Adding fetching git logs in MT Nov 24, 2018
all_unknown.json Final JSOn to import into DevStats tomorrow Jan 8, 2019
all_with_map.sh lukaszgryglicki-updated-mapping-tools May 5, 2017
alldevs.csv new taskfiles Dec 4, 2018
alldevs.txt Regenerated log files with a new affiliations imported Jan 22, 2019
analysis.rb Renamed 'Self' --> 'Independent' Dec 18, 2017
analysis_all.sh lukaszgryglicki-new-data-run May 9, 2017
analysis_all_repos.sh lukaszgryglicki-who-writes-kubernetes-2nd-data-regenerated May 19, 2017
analysis_rels.sh lukaszgryglicki-before-new-1.7-analysis Jun 8, 2017
anyrepo.sh lukaszgryglicki-skip-godeps Jul 6, 2017
anyreporange.sh lukaszgryglicki-skip-godeps Jul 6, 2017
cc.rb Country id/code to name Sep 14, 2018
cc.tsv Country id/code to name Sep 14, 2018
changesets_top100.csv lukaszgryglicki-changed-@-to-excl-in-emails Jul 17, 2017
changesets_top20.csv lukaszgryglicki-changed-@-to-excl-in-emails Jul 17, 2017
check_map_file.rb Adding unknowns.txt enricher Aug 13, 2018
check_spell.go Fuzziness Oct 31, 2018
chromium_blink.sh lukaszgryglicki-eod-2017-05-30 May 30, 2017
cleanup_mapping.rb lukaszgryglicki-mapping-updates Jul 25, 2017
cleanup_mapping.sh lukaszgryglicki-mapping-updates Jul 25, 2017
clear_affiliations_in_json.rb New clear affs tool to be used for debug multiple affiliation issue Jan 27, 2018
clear_affiliations_in_json.sh New clear affs tool to be used for debug multiple affiliation issue Jan 27, 2018
clone_cni.sh lukaszgryglicki-cloned-cncf-projects-and-computed-join-stats Jun 5, 2017
clone_containerd.sh lukaszgryglicki-cloned-cncf-projects-and-computed-join-stats Jun 5, 2017
clone_coredns.sh lukaszgryglicki-cloned-cncf-projects-and-computed-join-stats Jun 5, 2017
clone_etcd.sh lukaszgryglicki-cloned-cncf-projects-and-computed-join-stats Jun 5, 2017
clone_fluentd.sh lukaszgryglicki-cloned-cncf-projects-and-computed-join-stats Jun 5, 2017
clone_grpc.sh lukaszgryglicki-cloned-cncf-projects-and-computed-join-stats Jun 5, 2017
clone_kubernetes.sh Adding kubernetes-helm org Oct 31, 2017
clone_linkerd.sh lukaszgryglicki-cloned-cncf-projects-and-computed-join-stats Jun 5, 2017
clone_opentracing.sh lukaszgryglicki-linkerd-fluentd-new-files Jun 5, 2017
clone_prometheus.sh Updated clone prometheus log Nov 4, 2017
clone_rkt.sh lukaszgryglicki-cloned-cncf-projects-and-computed-join-stats Jun 5, 2017
cncf_join_analysis.sh lukaszgryglicki-new-stats-with-percents Jun 5, 2017
cncfdm.py Add reporting all developers Oct 8, 2018
cnt.rb Wip genderize.rb Sep 11, 2018
comment.rb lukaszgryglicki-data-regenerated-after-checked-developers-down-to-2-c… Jun 21, 2017
commits_in_default_ranges.sh lukaszgryglicki-update-to-v1.7.0 Jul 4, 2017
commits_in_ranges.sh lukaszgryglicki-skip-godeps Jul 6, 2017
committags.py initial commit add files Apr 19, 2017
companies_by_count.csv lukaszgryglicki-changed-@-to-excl-in-emails Jul 17, 2017
companies_by_name.csv lukaszgryglicki-changed-@-to-excl-in-emails Jul 17, 2017
company-names-mapping Update company-names-mapping Jan 18, 2019
company_developers.txt One more Amazon fix Jan 22, 2019
compare_results.rb lukaszgryglicki-changed-@-to-excl-in-emails Jul 17, 2017
compare_results.sh lukaszgryglicki-devs-affiliations-date-ranges Jun 5, 2017
compare_results.txt luakszgryglicki-finished-facade-cncf-gitdm-compare-tool-and-generated… Jun 5, 2017
compile_get_all_repos.sh Adding tool to pull all defined repos for all projects automatically Jan 4, 2018
copy_source.rb Copied source type from previous JSON - new tool added Jan 7, 2019
copy_source.sh Copied source type from previous JSON - new tool added Jan 7, 2019
correlations.rb Update cnfig file correlations finder Nov 25, 2018
correlations.sh JSON correlations finder Sep 17, 2018
correlations.txt More manual fixes Jan 8, 2019
correlations_config.sh Update cnfig file correlations finder Nov 25, 2018
correlations_config.txt More manual fixes Jan 8, 2019
correlations_json.sh JSON correlations finder Sep 17, 2018
correlations_json.txt More manual fixes Jan 8, 2019
csvdump.py lukaszgryglicki-backport-changes-from-tokenized-version Jul 25, 2017
database.py Update python code to handle affiliation source type Jan 7, 2019
debug.sh lukaszgryglicki-skip-godeps Jul 6, 2017
default_data.json Import from stackalytics trying to handle complex cases Jun 6, 2018
delete_json_fields.rb Got rid of add '*_url' JSON fields - they're not needed anymore Jan 9, 2019
delete_json_fields.sh Some more fixes [wip] Jan 11, 2019
developers_affiliations.txt One more Amazon fix Jan 22, 2019
do-it.sh initial commit add files Apr 19, 2017
email_code.rb Add prometheus processing to ghusers, udpated affiliations Nov 4, 2017
email_encode_file.sh lukaszgryglicki-changed-@-to-excl-in-emails Jul 17, 2017
empty.json cache converted, retry remaining genderize calls Sep 12, 2018
enchance_all_affs.rb generated new affs file Jan 11, 2019
enchance_all_affs.sh Adding Linux devs Jan 11, 2019
enchance_cache.json Enhanced JSON Jan 14, 2019
encode_emails.rb intermediate Jan 12, 2019
enhance_json.rb Created load test that exhausted single token and did tests on exhaus… Jan 17, 2019
enhance_json.sh Merge improvements Sep 27, 2018
facade_kubernetes.csv lukaszgryglicki-changed-@-to-excl-in-emails Jul 17, 2017
filestats.rb Renamed 'Self' --> 'Independent' Dec 18, 2017
filestats.sh lukaszgryglicki-added-per-file-statistics-tool Jul 5, 2017
final_analysis.sh lukaszgryglicki-4-kinds-of-stacked-charts Jul 12, 2017
find.sh Added few bot emails, updated find.sh script and rerun correlations Sep 25, 2018
find_content.sh lukaszgryglicki-add-find-content-string Jul 17, 2017
findoldfiles.py initial commit add files Apr 19, 2017
for_each_go_file.sh Add tool to calculate Levenshtein distance for company names to check… Oct 30, 2018
freebsd_svn.sh lukaszgryglicki-added-freebsd-svn-analysis-script Jun 6, 2017
gen_aff_files.rb Finally found how to fix Dillaman case - it was a whitespace in email… Sep 17, 2018
gen_aff_files.sh lukaszgryglicki-added-sanity-check-affiliations-and-now-rechecking-de… Jul 6, 2017
gen_aff_task.rb Add user github location to gen_affs_task.rb tool Jan 16, 2019
genderize.rb Fix missing function arg Oct 5, 2018
genderize.sh Handle backup frequency during geolocalize/genderize calls Oct 5, 2018
genderize_cache.json Genderized JSON Jan 14, 2019
generate_actors.sh get linux actors too Jan 10, 2019
geodata.rb Country id/code to name Sep 14, 2018
geodata.sh Wip genderize.rb Sep 11, 2018
geonames.info geodata wip Sep 10, 2018
geonames.sql fco and fcl fields are needed, adding new tool that will localization… Sep 10, 2018
geonames_idx.sql Wip genderize.rb Sep 11, 2018
geousers.rb Fix missing function arg Oct 5, 2018
geousers.sh Handle backup frequency during geolocalize/genderize calls Oct 5, 2018
geousers_cache.json Localized data in new JSON Jan 14, 2019
gerritdm.py initial commit add files Apr 19, 2017
get_repos.txt New 'all.txt' for all CNCF projects + Rook Jan 27, 2018
ghapi.rb Add support for checking connected clients usability via CHECK_USABIL… Jan 17, 2019
ghusers.rb detected broken json - need to debug it Jan 22, 2019
ghusers.sh Updated actors and added fetching commits since last cached commit Oct 31, 2018
ghusers_cached.sh Generated new github_users.json - new affiliations Jan 5, 2018
ghusers_partially_cached.sh intermediate Jan 12, 2019
gitdm.config-cncf Handle source type from python done Jan 7, 2019
gitdm.config-openstack initial commit add files Apr 19, 2017
gitdm.py lukaszgryglicki-added-all-kubernetes-org-repos-no-map-update-yet May 16, 2017
github_users.json One more Amazon fix Jan 22, 2019
github_users_to_map.rb Finally found how to fix Dillaman case - it was a whitespace in email… Sep 17, 2018
github_users_to_map.sh Finally found how to fix Dillaman case - it was a whitespace in email… Sep 17, 2018
gmailers_with_any_data.json Final JSOn to import into DevStats tomorrow Jan 8, 2019
google_other.txt lukaszgryglicki-backport-changed-addition-to-local Jul 26, 2017
handle_forbidden_data.rb regenerated git.log with new cncf projects, regenerated gitdm stats May 24, 2018
handle_forbidden_data.sh Rerun log analysis pass 2 Oct 31, 2018
import_affs.rb revert last changes, will use JSON flag instead Jan 2, 2019
import_affs.sh lukaszgryglicki-import-pretty-files-wip-and-update-gen-pretty-files Jul 7, 2017
import_from_github_users.rb New cncf-config/emails map - import from GitHub company field. Jan 8, 2019
import_from_github_users.sh Maybe fixed multiple affiliations issue Jan 27, 2018
import_from_json.rb revert last changes, will use JSON flag instead Jan 2, 2019
import_from_json.sh lukaszgryglicki-finally-imported-stackalytics-json Jun 13, 2017
import_from_stackalytics.rb Stackalytics JSON sync wip Jun 7, 2018
import_from_stackalytics.sh Import from stackalytics phase 1 probably finished Jun 6, 2018
import_github_from_stackalytics.rb Machinery to support multiple GitHub keys - wip 3 Jan 16, 2019
insert_comp Updated mappings after merging PR Oct 16, 2017
join_stats.sh lukaszgryglicki-new-stats-with-percents Jun 5, 2017
join_stats.txt lukaszgryglicki-all-cncf-projects-join-stats-final-report Jun 5, 2017
k8s_commits_in_ranges.sh lukaszgryglicki-added-commits-count-analysis-tools-updated-readme-and… Jun 8, 2017
kubernetes_repos.sh lukaszgryglicki-added-commits-count-analysis-tools-updated-readme-and… Jun 8, 2017
last_processed.txt lukaszgryglicki-processed-all-linkedin-entries-before-data-regenerating Jun 22, 2017
linetags.py initial commit add files Apr 19, 2017
link_kubernetes_repos.sh Adding kubernetes-helm org Oct 31, 2017
linkedin.rb lukaszgryglicki-linkedin-research-wip Jun 22, 2017
linux_range.sh Updates for velocity charts Nov 1, 2018
loadtest.rb Created load test that exhausted single token and did tests on exhaus… Jan 17, 2019
logparser.py lukaszgryglicki-added-date-range-processing-to-gitdm May 11, 2017
lookup_json.rb Manual email enrichment & company name normalizations - wip Jan 16, 2018
lookup_json.sh luakszgryglicki-few-more-patterns-to-search-unknowns Jun 29, 2017
lower_unique.sh detect differences with only @ vs ! and rerun log analysis on correct… Nov 26, 2018
lpdm.py initial commit add files Apr 19, 2017
maintainers.csv New files to import - affiliations & maintainers Jan 22, 2019
maintainers.rb Machinery to support multiple GitHub keys - wip 3 Jan 16, 2019
maintainers.sh checking maintainers wip Feb 14, 2018
manual_all.sh lukaszgryglicki-skip-godeps Jul 6, 2017
merge.rb Merged GitHub logins Jan 8, 2019
merge_csvs.rb new unknowns task Dec 5, 2018
merge_csvs.sh new unknowns task Dec 5, 2018
merge_github_logins.rb Maybe fixed multiple affiliations issue Jan 27, 2018
merge_github_logins.sh Fixed duplicates in 'cncf-config/email-map' Jan 19, 2018
merge_json_cache.json Merged JSONs Jan 14, 2019
merge_jsons.rb Merged JSONs Jan 8, 2019
merge_jsons.sh New program to merge old and new jsons Sep 21, 2018
mgetc.rb lukaszgryglicki-added-syncing-options-to-import-affs-tool2 Jul 13, 2017
multirepo.sh lukaszgryglicki-skip-godeps Jul 6, 2017
nats_devs.txt Final text affiliation files Apr 20, 2018
new_actors.json Some more fixes [wip] Jan 11, 2019
new_affs.csv Enchanced all_affs.csv Jan 22, 2019
new_devs.csv lukaszgryglicki-regenerated-data-after-company-mapping Jul 7, 2017
new_devs.rb lukaszgryglicki-changed-@-to-excl-in-emails Jul 17, 2017
new_devs.sh lukaszgryglicki-before-new-1.7-analysis Jun 8, 2017
octotest.rb Add support for checking connected clients usability via CHECK_USABIL… Jan 17, 2019
patterns.py lukaszgryglicki-changed-@-to-excl-in-emails Jul 17, 2017
patterns.vim lukaszgryglicki-added-lookup-json-tool Jun 23, 2017
per_dirs.sh lukaszgryglicki-added-per-file-statistics-tool Jul 5, 2017
per_dirs_all.sh lukaszgryglicki-skip-godeps Jul 6, 2017
per_dirs_rel.sh lukaszgryglicki-skip-godeps Jul 6, 2017
percent_stats.rb lukaszgryglicki-new-stats-with-percents Jun 5, 2017
progress_report.rb Renamed 'Self' --> 'Independent' Dec 18, 2017
progress_report.sh lukaszgryglicki-updated-script-and-few-affiliations-double-checked-to… Jun 23, 2017
progress_report.txt lukaszgryglicki-updated-mapping Jul 25, 2017
prometheus_longer_period.txt lukaszgryglicki-prometheus-longer-stats Jun 5, 2017
prometheus_repos.sh lukaszgryglicki-cncf-join-analysis-prometheus-ready-and-general-scrip… Jun 2, 2017
pull_kubernetes.sh lukaszgryglicki-added-script-to-update-k8s-repos Jun 8, 2017
pull_prometheus.sh Regenerated github_users.json Nov 21, 2017
range.sh lukaszgryglicki-skip-godeps Jul 6, 2017
ranges_monthly.sh lukaszgryglicki-computed-new-data May 12, 2017
ranges_quarters.sh lukaszgryglicki-computed-new-data May 12, 2017
ranges_years.sh lukaszgryglicki-computed-new-data May 12, 2017
rels.sh lukaszgryglicki-update-to-v1.7.0 Jul 4, 2017
rels_no_map.sh lukaszgryglicki-update-to-v1.7.0 Jul 4, 2017
rels_strict.sh lukaszgryglicki-update-to-v1.7.0 Jul 4, 2017
remap.csv lukaszgryglicki-finally-imported-stackalytics-json Jun 13, 2017
remap_emails.csv lukaszgryglicki-changed-@-to-excl-in-emails Jul 17, 2017
remove_ignored.sh lukaszgryglicki-removed-unneeded-data-files Jun 20, 2017
removed_top100.csv lukaszgryglicki-changed-@-to-excl-in-emails Jul 17, 2017
removed_top20.csv lukaszgryglicki-changed-@-to-excl-in-emails Jul 17, 2017
repo_in_range.sh Updates for velocity charts Nov 1, 2018
repo_in_range_with_exclude.sh lukaszgryglicki-eod-2017-05-30 May 30, 2017
repo_log.sh Adding fetching git logs in MT Nov 24, 2018
reports.py Add reporting all developers Oct 8, 2018
repos.txt new repos list Jan 11, 2019
rerun_data.sh lukaszgryglicki-some-fixes Jul 4, 2017
restore_cfg.sh lukaszgryglicki-stackalytics-mapping-import-very-wip Jun 12, 2017
run.sh lukaszgryglicki-skip-godeps Jul 6, 2017
run_all.sh Add tool to calculate Levenshtein distance for company names to check… Oct 30, 2018
run_for_rels.sh lukaszgryglicki-skip-godeps Jul 6, 2017
run_for_rels_no_map.sh lukaszgryglicki-mapping-updates Jul 25, 2017
run_for_rels_strict.sh lukaszgryglicki-mapping-updates Jul 25, 2017
run_no_map.sh lukaszgryglicki-skip-godeps Jul 6, 2017
run_with_map.sh lukaszgryglicki-skip-godeps Jul 6, 2017
scrub.rb New unknowns task Jan 11, 2019
see_parser.sh initial commit add files Apr 19, 2017
skip_github_logins.txt Update skip list Jan 7, 2019
small.json Final genderize command ready - waiting for TZs to run this Sep 12, 2018
small.tsv geodata wip Sep 10, 2018
small_stripped.json Filling missing TZs Sep 12, 2018
sort_configs.sh Unique call on email-map Jan 19, 2018
sort_json.rb Fix typo and add sort json program Sep 21, 2018
stacked_chart.rb Renamed 'Self' --> 'Independent' Dec 18, 2017
stacked_chart_months_csets.csv lukaszgryglicki-updated-mapping Jul 25, 2017
stacked_chart_months_csets.sh lukaszgryglicki-4-kinds-of-stacked-charts Jul 12, 2017
stacked_chart_months_perc.csv lukaszgryglicki-updated-mapping Jul 25, 2017
stacked_chart_months_perc.sh lukaszgryglicki-4-kinds-of-stacked-charts Jul 12, 2017
stacked_chart_rels_csets.csv lukaszgryglicki-4-kinds-of-stacked-charts Jul 12, 2017
stacked_chart_rels_csets.sh lukaszgryglicki-4-kinds-of-stacked-charts Jul 12, 2017
stacked_chart_rels_perc.csv lukaszgryglicki-updated-mapping Jul 25, 2017
stacked_chart_rels_perc.sh lukaszgryglicki-4-kinds-of-stacked-charts Jul 12, 2017
stacked_charts.sh lukaszgryglicki-4-kinds-of-stacked-charts Jul 12, 2017
strip_json.rb Adding affiliation source type wip Jan 2, 2019
strip_json.sh Added stripped json mode for cncf/devstats Feb 2, 2018
stripped.json Amazon mapping fixes Jan 22, 2019
topdevs.rb lukaszgryglicki-backport-changes-from-tokenized-version Jul 25, 2017
topdevs.sh lukaszgryglicki-backport-changes-from-tokenized-version Jul 25, 2017
treeplot.py initial commit add files Apr 19, 2017
uncertain.csv lukaszgryglicki-changed-@-to-excl-in-emails Jul 17, 2017
unique_json.rb New unique JSON (lower unique, cjhoosing non full lowercase if such i… Sep 28, 2018
unknown_2_committers.json Final JSOn to import into DevStats tomorrow Jan 8, 2019
unknown_devs.csv lukaszgryglicki-no-unknowns-regenerated-data Jun 29, 2017
unknown_devs.txt lukaszgryglicki-no-unknowns-regenerated-data Jun 29, 2017
unknown_emails.csv lukaszgryglicki-no-unknowns-regenerated-data Jun 29, 2017
unknown_gmail.json Final JSOn to import into DevStats tomorrow Jan 8, 2019
unknown_with_any_data.json Final JSOn to import into DevStats tomorrow Jan 8, 2019
unknown_with_at.json Final JSOn to import into DevStats tomorrow Jan 8, 2019
unknown_with_blog.json Final JSOn to import into DevStats tomorrow Jan 8, 2019
unknown_with_linkedin.json Final JSOn to import into DevStats tomorrow Jan 8, 2019
unknown_with_location_and_name.json Final JSOn to import into DevStats tomorrow Jan 8, 2019
unknown_with_location_and_name2.json Final JSOn to import into DevStats tomorrow Jan 8, 2019
unknown_with_searchable_email.json Final JSOn to import into DevStats tomorrow Jan 8, 2019
unknown_with_social.json Final JSOn to import into DevStats tomorrow Jan 8, 2019
unknowns.csv New unknowns.csv (task for affiliations) Jan 15, 2019
unknowns.json handle different case GitHub logins Nov 26, 2018
unknowns.txt Regenerated log files with a new affiliations imported Jan 22, 2019
update_all_repos.sh Update for 2017-09 Sep 26, 2017

README.md

CNCF gitdm

This is the Cloud Native Computing Foundation's fork of Jon Corbet and Greg KH's gitdm tool for calculating contributions based on developers and their companies. Companies and developers can check if they are correctly attributed at the following links:

Company Developers list

Developers affiliations list

Adding/Updating affiliation

If you find any errors or missing affiliations in those lists, please submit a pull request with edits to developer affiliations file.

Only the Developers affiliations list should be edited manually.

The Company Developers list is a computed derivative of the first list.

Other files used for affiliations are email map file and github users file.

Removing affiliations

If you do not want to have your email listed here please read how to remove your email.

Testing changes

You can test any changes locally by cloning this repository and regenerating all data by running ./rerun_data.sh.

Then generate config files by running: ./import_affs.sh.

If those two files are out of sync, the tool will notify you about this.

This tool will generate a new email-map file.

Check if your changes processed properly and move the file to cncf-config/email-map (replace)

Sync workflow

Please follow the instructions from SYNC.md.

Running

Use *.sh scripts to run analytics (all*.sh for full analysis and rels*.sh for per release stats)

This program assumes that gitdm resides in: ~/dev/cncf/gitdm/ and that kubernetes is in ~/dev/go/src/k8s.io/kubernetes/

Output files are placed in the kubernetes directory.

To regenerate all statistics just run: ./rerun_data.sh

This is an iterative process: Run any of scripts. Review its output in the kubernetes directory. Iteratively adjust mappings to handle more authors.

You can also run via ./debug.sh to halt in debugger and review the hackers structure and those who were not found. See cncfdm.py:DebugUnknowns

Final report:

Data

Report

Contributing

Pull requests are welcome.

Our mapping is never complete, please see config files in Config files.

File email-map is a direct email to the employer mapping.

There is also a long list of unknown emails. For that, scroll to the section called Developers with unknown affiliation: in all.txt

All of those were searched for in various sources but we were not able to find their affiliation.

Detailed Description

Regenerating all data with ./rerun_data.sh means:

  • Data for kubernetes/kubernetes repository (all time) with 3 mappings of Unknown developers: no mapping (list them with their email & name), map them to their email domain (user@gmail.com --> 'Gmail *'), map all of them to '(Unknown)'. This is done via running: (./all.sh, ./all_no_map.sh, ./all_with_map.sh). Output goes to kubernetes/all_time/ directory
  • Data for kubernetes/kubernetes repository divided into releases v1.0.0, v1.1.0, ..., v1.7.0 (with 3 types of mappings described above). This is done via (./rels.sh, ./rels_strict.sh, ./rels_no_map.sh). Output goes to kubernetes/v1.X.0-v1.Y.0/ directory: X=0,1,2,3,4,5,6 Y=1,2,3,4,5,6,7)

After performing those two steps, cncfdm.py output neds to be analysed. It is done by calling: ./analysis_all.sh (analyses all time results) and then ./analysis_rels.sh (for per-release data)

Data for all 68 repos (currently) which makes the entire Kubernetes project with ./kubernetes_repos.sh script.

Final files generated by first 2 calls (for single repo kubernetes/kubernetes) are in kubernetes/all_time/*.txt and ./kubernetes/v1.X.0-v1.Y.0/*.txt

All scripts are configured to ignore commits related to files from vendor and Godeps directories. This is because external sources are placed here and many commits are just adding external libraries. Accounting for them would make the results less accurate

All of them use a git log call with specific args piped to cncfdm.py call with specific parameters.

See ./run.sh for an example. All other calls use the same commands git log and cncfdm.py with other parameters.

To get a list of parameters for cncfdm.py, see comments inside of the cncfdm.py file describing all possible options.

For more details about how cncfdm.py tool works refer to its sources and other *.py files.

Those files are analysed by ./analysis_all.sh and ./analysis_rels.sh.

The first one calls: ruby analysis.rb all kubernetes/all_time/first_run_patch.txt kubernetes/all_time/run_no_map_patch.txt kubernetes/all_time/run_with_map_patch.txt

The second calls: ruby analysis.rb v1.0_v1.1 kubernetes/*/output_strict_patch.txt kubernetes/*/output_patch.txt kubernetes/*/output_no_map_patch.txt

This ruby tool expects to get 3 files (one with no unknown developers mapping, 2nd with mapping to a domain name and 3rd with mapping to (Unknown).

The output of this analysis.rb tool goes to project/<prefix>_<key>_<type>.csv files. : can be all or v1.X.0-v1.Y.0 - it means that thefile is for all time data or for specific release of kubernetes/kubernetes : can be changeset, employers, lines, signoffs - it means that the file contains data sorted by this desc. : can be sum, top, all:

  • all means that the file contains all data for given sorted by desc (header is: idx,company,n,percent which means n-th, company name, n developers, % all developers) All known is sum of all detected developers
  • top means that there will be top 10 data from all but also must contain data for: '(Unknown)', 'Gmail *', 'Qq *', 'Outlook *', 'Yahoo *', 'Hotmail *', '(Independent)', '(Not Found)'. The header is the same as in all.
  • sum contains a summary value for all found developers. It has a different header: N companies,sum,percent numer of developer's companies found, the sum of for all found developers, % of sum as a part of sum for all developers.
  • Special names: All known (sum all known developers), (Independent) (developers working on their own), (Not Found) (developers for whom an employer was not found even though the search was done in multiple sources), (Unknown) (developers not mapped (yet?)), Some name * (sum of developers having emails on Some name domain). Asterisk * added to indicate this.

This data is directly used for "Who writes Kubernetes" report.

./kubernetes_repos.sh script is used to generate all time data for all kubernetes repos.

To use it, you must have all of kubernetes repositories (68 from 3 different organizations) cloned in ~/dev/go/src/k8s/.

Orgs are: kubernetes, kubernetes-incubator, kubernetes-client.

It generates statistics for each single repo via: ./anyrepo.sh ~/dev/go/src/k8s.io/<repo-name> <repo-name>

See details in ./kubernetes_repos.sh. is a directory where a given kubernetes repository is cloned.

To clone a repository, do: cd ~/dev/go/src/k8s/ git clone https://github.com/<one-of-3-kubernetes-orgs>/<kubernetes-repo-name>.git.

one-of-3-kubernetes-orgs: kubernetes, kubernetes-incubator and kubernetes-client

kubernetes-repo-name: please look up all repo names in all kubernetes orgs on GitHub.

./anyrepo.sh just calls cncfdm.py with appropriate args (like exclude vendor dir numstat etc).

There is also ./anyreporange.sh that allows querying a repo for a specific time range (cncfdm.py supports that as well).

Output of this goes to repos/<repo-name>.<ext> : repository name ./anyrepo.sh was called with. : txt, csv, html, out: txt: main data file, csv: dumps list of employers in given repo, html: the same as txt but in HTML format, out: cncfdm.py verbose output messages (for debugging)

Finally, ./kubernetes_repos.sh calls: ./multirepo.sh with all 68 repository directories listed.

It gathers git log on each of them and concatenates all those files and then run cncfdm.py on the concatenated result (see ./multirepo.sh)

Results are saved to repos/combined.<ext> is the same as for anyrepo.sh.

Typical work flow is re-runing ./kubernetes_repos.sh and examining repos/combined.txt for unknown developers.

Research on google, Clearbit, FullContact, github, LinkedIn, Facebook, any other source -> update cncf-config/<filename> and re-run ./kubernetes_repos.sh : usually in this order: email-map, domain-map, a in very rare cases: aliases, gitdm.config-cncf or group mappings in groups/

Also, when running data for a single kubernetes/kubernetes for example with ./all.sh examining developers found in ./kubernetes/all_time/first_run_patch.txt.

After all this data is generated, ./kubernetes_repos.sh concatenates all single repo data into a single output file: repos/merged.out to allow browsing all the data in a single file.

It also generates developers and companies statistics via a ./topdevs.sh call.

It calls a ruby tool on the combined output of all 68 kubernetes repos (saved as CSV) like so: ruby topdevs.rb repos/combined.csv

That tool generates files as follows:

  • companies_by_name.csv - this is a list of companies found, sorted by their names (not case sensitive) to allow manual examination for duplicates which came about from different names such as "Google" vs "Googe Corporation" vs "Google Corp." or "google"
  • companies_by_count.csv - list of companies found, sorted (desc) by the number of employers. This serves a similar purpose but from a different perspective.
  • unknown_devs.txt, unknown_devs.csv, unknown_emails.csv - list of developers for whom there isn't a mapping. Used to prioritize searching for devs, and unknown_emails.csv is in the format fitting a clearbit batch.

There are clearbit tools in clearbit_tools/ directory.

Look for any files with .rb extension. 3 rounds of commercial Clearbit requests were performed. And they returned quite a lot of data.

But those files are not checked in and are listed in ./.gitignore because we have to pay for that data.

Those tools are used to enrich of cncf-config/email-map mapping. google_other.txt - contains a list of Google developers with email on a domain different than @google.com. ./changesets.csv, ./added.csv, ./removed.csv files contain developers sorted by changesets, added lines, removed lines desc.

A new set of tools to get Clearbit and FullContact data is located in affiliation_finder/ directory. The two tools are described inthe 'Tools to help find unknown affiliations' section of this document.

This is used to generate Top N developers in given criteria.

./new_devs.sh (also used by ./rerun_data.sh) is used to generate statistics about new developers between kubernetes/kubernetes releases.

It calls: ruby new_devs.rb kubernetes/v1.X.0-v1.Y.0/output_strict_patch.csv for all X and Y. new_devs.rb simply generates information about developers who were new between each release and file new_devs.csv, which contains a list of companies who introduced most new developers overall (sorted by # of new developers desc).

That covers a typical usage and data for "Who writes Kubernetes report"

Other tools

Other tools include:

  • see_parser.sh - display data feed as used by cncfdm.py tool
  • range.sh - generate stats for Linux kernel for given data range (1st and 2nd command line argument like 2016-01-01 2017-01-01), assumes Linux repo (torvalds/linux) is cloned in ~/dev/linux/
  • range_<period>.sh - used to generate monthly, quarterly, yearly stats using above ./range.sh, for example ./range_monthly.sh.

To work on Prometheus contributors before and after joining CNCF:

Prometheus joined CNCF on 2016-05-09.

You need to clone all Prometheus repos into ~/dev/prometheus using ./clone_prometheus.sh

Then you need to get a number of distinct Prometheus contributors before joining CNCF: ./prometheus_repos.sh 2015-05-09 2016-05-08 ~/dev/prometheus/

Result is:

Processed 2721 csets from 230 developers
252 employers found
A total of 1558445 lines added, 353900 removed (delta 1204545)

Now check the number of distinct contributors after 2016-05-09: ./prometheus_repos.sh 2016-05-09 2017-06-01 ~/dev/prometheus/

Processed 2817 csets from 346 developers
365 employers found
A total of 2696196 lines added, 771502 removed (delta 1924694)

We have a change from 230 to 365 which is a 59% increase.

Report

Links to data and generated report are here: ./res/links.txt

CNCF Projects join statistics

  • CNCF Projects join dates are: https://github.com/cncf/toc#projects

  • To generate statistics for Prometheus 90 days before joining CNCF and 90 days after joining try this:

  • Run ./clone_prometheus.sh

  • Run ./cncf_join_analysis.sh prometheus 2016-05-09 90 ~/dev/prometheus/

  • Results go to prometheus_repos/result.txt

  • Create a directory where you want to put links to kubernetes repos, like this: mkdir ~/dev/kubernetes_repos_links

  • Copy kubernetes_repos.sh to link_kubernetes_repos.sh: cp kubernetes_repos.sh link_kubernetes_repos.sh

  • Open the copy and add 1st line: cd ~/dev/kubernetes_repos_links

  • Replace lines like ./anyrepo.sh ~/dev/go/src/k8s.io/test-infra/ test-infra with ln -s ~/dev/go/src/k8s.io/test-infra/ test-infra; run it; done. k8s repos links are now in ~/dev/kubernetes_repos_links

  • The command that takes on Kubernetes repos should be: ./cncf_join_analysis.sh kubernetes 2016-03-10 90 ~/dev/kubernetes_repos_links

  • Results go to kubernetes_repos/result.txt

  • To generate statistics for OpenTracing 90 days before joining CNCF and 90 days after joining try this:

  • Run ./clone_opentracing.sh

  • Run ./cncf_join_analysis.sh opentracing 2016-08-17 90 ~/dev/opentracing/

  • Results go to opentracing_repos/result.txt

  • There is also an All-in-one script to regenerate all CNCF Projects joint statistics, run ./join_stats.sh

Typical update of "Who writes Kubernetes report"

Since the kubernetes project started in June 2014, 2623 Developers from 789 Companies worked on it (counting Kubernetes and all its projects 68 repos from 3 orgs).
A total of 28.4 million lines of code were added, 16.3 million lines removed.

Taken from: ./repos/combined.txt

Processed 59041 csets from 2623 developers
789 employers found
A total of 28440262 lines added, 16342872 removed (delta 12097390)

For a single kubernetes/kubernetes repo, the data is in: kubernetes/all_time/first_run_numstat.txt

Processed 28225 csets from 1338 developers
400 employers found
A total of 6667288 lines added, 4132224 removed (delta 2535064)
  • About how to fill data sheet/chart:
  • Sheet "all time data":
  • analysis_all_repos.sh, generates files starting with: report/all_repos_rest
  • report/prefix_key_type (prefix: all - for kubernetes/kubernetes, all_repos - for all repos, v1.x for releases), project/
  • Commits info is in other_repos/all_kubernetes_dtfrom_dtto and other_repos/kubernetes_dtfrom_dtto (for all k8s repos and kubernetes/kubernetes alone)
  • To see commits for all kubernetes repos combined for last year & for last 12 months (each) separately: grep -HIn "csets from" other_repos/all_kubernetes_range_unknown_201*
  • The same for kubernetes/kubernetes repo: grep -HIn "csets from" other_repos/kubernetes_range_unknown_201*
  • Update report and report data sheet with those results
  • Number of github events etc - from cncf/velocity:projects/unlimited.csv (this is for 201606-201705)
  • Values for May 2017 are in: cncf/velocity:projects/cncf_projects_201705.csv
activity,comments,prs,commits,issues,authors
Last year: 308313,217684,46351,16000,28278,1728
Last month: 30227,21371,4645,1741,2470,451
  • Analyses of kubernetes/kubernetes (main repo) are in this format: report/all_{key}_top.csv, import them to the 2nd sheet
  • Big summaries like all developers etc are in ./repos/combined.txt, for the main k8s repo: kubernetes/all_time/first_run_numstat.txt
  • Top developer stats are here: stats/all_key.csv (for all repos), stats/kubernetes_key.csv (for the main repo) and stats/v1.x_key.csv per versions.
  • Import those to the last 3 sheets in data set
  • Per verion data: report/v1.x_v1.y_key_top.csv, key: changesets, lines, developers, import to data sheet for all versions: 7 x 3 = 21 imports

Affiliations of some developers are uncertain despite best effort. These developers are listed in uncertain.csv file.

GitHub users can be pulled using Octokit GiHub API.

To do this, call: ruby ghusers.rb or ./ghusers.sh

Required are:

  • Standard GitHub OAuth token: https://github.com/settings/tokens --> Personal access tokens, put it in /etc/github/oauth file.
  • A GitHub Application to increase rate limit from 60 to 5000 (60 is not enough to process kubernetes, 5000 is enough).
  • See: https://github.com/settings/ --> OAuth application, put your client_id and client_secret in /ect/github/client_id, /etc/github/client_secret files.
  • This tool will cache all GitHub calls (save them as JSON files in ./ghusers/)
  • Final JSON will be saved in ./github_users.json (subsequent calls will use data from this file, so to reset cache, just remove this file and all files from ghusers/ directory
  • To generate the actual mapping, manually process this JSON (and do some mapping of company names - GitHub users sometimes put strange values there)
  • I've done that by iteratively using a new tool: import_from_github_users.sh, import_from_github_users.rb with a mapping file (that tries to map a GiHub user company name into something more accurate): company-names-mapping

Tools to help find unknown affiliations

To enhance this json with pre-existing affiliations, call: ./enchance_json.sh

  • To generate JSON with some filtered data (like all unknown devs with location or LinkedIn profile link or just a blog entry) call: ./lookup_json.sh (see script for details, also lookup_json.rb have a lot of comments on how to use it).

  • To generate a progress report (report about how many Not Found, Unknowns, and Independent devs are defined in our affiliation, call: ./progress_report.sh).

  • To generate aliases for emails that are already known (are using the same GitHub user name) try ./aliaser.sh, the output is aliaser.txt that can be analyzed and manually added to cncf-config/aliases if needed.

  • To generate a correlations map for company name (to avoid mapping typos etc) run ./correlations.sh script. Result is in correlations.txt file that can be used to update cncf-config/email-map with corrected employer names.

  • To generate per-files/directories statistics, use: ./per_dirs.sh, this is a part of a standard workflow, results are in csv files in per_dirs directory

  • To generate affiliation files (developers_affiliations.txt, company_developers.txt), use ./gen_aff_files.sh

  • To generate data for the stacked chart, run ./stacked_chart_<months|rels>_<csets|perc>.sh. It generates a csv file: stacked_chart_<months|rels>_<csets|perc>.csv, to generate all stacked charts: ./stacked_charts.sh

  • To import data from pretty-formatted files use import_affs.sh, this is not a part of the standard workflow

All those tools are automatically called when running the full data regeneration script: ./rerun_data.sh

  • To automatically find affiliations (email to company) using Clearbit, run two scripts from affiliation_finder folder in order:
    • clearbit_affiliation_lookup.rb
    • ruby clearbit_affiliation_merge.rb

The first one works with one argument and generates a file clearbit_affiliation_lookup.csv. The argument can be skipped or have a value of 'true' or 'false' - default. Invocation would be clearbit_affiliation_lookup.rb or clearbit_affiliation_lookup.rb false or clearbit_affiliation_lookup.rb true. The argument is used to whether the script's output data should be overwriten (normally data would be appended to the file) and at the same time it will allow previously looked-up email addresses to be checked again.
The execution environment needs to have a proper value for this: Clearbit.key = ENV['CLEARBIT_KEY'] It is a secret API key on a Clearbit account which has been set up for subscription. When the file is generated, open it in a csv editor, sort by the 'chance' field. Visually check and correct data in the 'affiliation_suggestion' column. Replace values such as 'http://www.ghostcloud.cn/' with 'Ghostcloud'. If you find affiliations for other developers manually, just change the 'none' value in the 'chance' column to 'high' and provide a value in the 'affiliation_suggestion' column. Columns to the right of 'affiliation_suggestion' are not required.

The second script reads the 'clearbit_affiliation_lookup.csv' file. Data is processed against the cncf-config/email-map file. When done, the 'email-map' file will have new and updated affiliations. The file will be sorted as well. The lookup file will not be altered.

  • To automatically find affiliations (email to company) using FullContact, run two scripts from affiliation_finder folder in order:
    • ruby fullcontact_affiliation_lookup.rb
    • ruby fullcontact_affiliation_merge.rb

The first one works with one argument and generates a file fullcontact_affiliation_lookup.csv. The argument can be skipped or have a value of 'true' or 'false' - default. Invocation would be fullcontact_affiliation_lookup.rb or fullcontact_affiliation_lookup.rb false or fullcontact_affiliation_lookup.rb true. The argument is used to whether the script's output data should be overwriten (normally data would be appended to the file) and at the same time it will allow previously looked-up email addresses to be checked again.
The execution environment needs to have a proper value for this: config.api_key = ENV['FULLCONTACT_KEY'] It is a secret API key on a FullContact account which has been set up for subscription. The columns differ in this file compared to that of Clearbit. If you find affiliations for other developers manually, just change the value in the 'org_1' column. The column by default should have 5 pipe-delimited values. If you do not have the values for the other 4, just type 4 pipes. Columns to the right of 'org_1' are not required.

The second script reads the 'clearbit_affiliation_lookup.csv' file. Data is processed against the cncf-config/email-map file. When done, the 'email-map' file will have new and updated affiliations. The file will be sorted as well. The lookup file will not be altered. The merge scripts export developer work history to fullcontact_developer_historical_irganizations.csv.

Add new project ( cncf or non-cncf) to get affiliation for it.

Please follow the instructions from ADD_PROJECT.md.