## Summary

### tl;dr

Notes:
* OpenSearch was forked from Elasticsearch on 2021-04-12 and was an Amazon AWS Project until it moved into the Linux Foundation on 2024-09-16.
* Caveat: This is purely a summary of the data that has not been validated from anyone within the community
* Analysis of contributions to one repo: https://github.com/opensearch-project/OpenSearch

While Amazon / AWS have always been the dominant contributor to OpenSearch, the organizational diversity has been gradually improving.

### After the Fork , Before the LF (2021-04-12 - 2024-09-16)

Amazon / AWS Employees with >= 10 commits: 
* People: 64 20.71% of people
* Commits: 2638 54.53% of total commits
* Additions: 891163 69.87% of total additions
* Deletions: 394616 82.72% of total deletions

Non-Employees with >= 10 commits: 
* People: 10 3.24% of people
* Commits: 631 13.04% of total commits
* Additions: 129341 10.14% of total additions
* Deletions: 45417 9.52% of total deletions

Totals in dataset of people with >=10 commits:
* 80.02% of total additions
* 92.24% of total deletions


### First Year After the Fork (2021-04-12 to 2022-04-12)

Amazon / AWS Employees with >= 10 commits: 
* People: 7 7.53% of people
* Commits: 246 34.36% of total commits
* Additions: 296720 80.20% of total additions
* Deletions: 224179 90.90% of total deletions

Non-Employees with >= 10 commits: 
* People: 2 2.15% of people
* Commits: 110 15.36% of total commits
* Additions: 26995 7.30% of total additions
* Deletions: 10799 4.38% of total deletions

Totals in dataset of people with >=10 commits:
* 87.49% of total additions
* 95.28% of total deletions

|Person|Company|Commits|Additions|Deletions|
|:---|:---|:---|:---|:---|
| reta | Aiven | 74 | 15854 | 2981 |
| xuezhou25 | None | 8 | 116 | 280 |
| adnapibar | Microsoft | 36 | 11141 | 7818 |

### Final Year under AWS before LF (2023-09-16 to 2024-09-16)

Amazon / AWS Employees with >= 10 commits: 
* People: 40 23.95% of people
* Commits: 923 48.84% of total commits
* Additions: 237781 63.28% of total additions
* Deletions: 48894 64.61% of total deletions

Non-Employees with >= 10 commits: 
* People: 6 3.59% of people
* Commits: 242 12.80% of total commits
* Additions: 42863 11.41% of total additions
* Deletions: 9936 13.13% of total deletions

Totals in dataset of people with >=10 commits:
* 74.69% of total additions
* 77.74% of total deletions

|Person|Company|Commits|Additions|Deletions|
|:---|:---|:---|:---|:---|
| skumawat2025 | IIT Kharagpur | 8 | 2615 | 523 |
| Pranshu-S | None | 6 | 2342 | 89 |
| reta | Aiven | 182 | 31718 | 8175 |
| dzane17 | None | 11 | 2502 | 530 |
| rajiv-kv | None | 11 | 3410 | 186 |
| SwethaGuptha | None | 11 | 1792 | 523 |
| lukas-vlcek | Aiven | 9 | 497 | 69 |
| kkewwei | None | 16 | 2412 | 118 |
| HUSTERGS | @ByteDance | 6 | 260 | 48 |
| akolarkunnu | NetApp | 6 | 48 | 63 |
| gargharsh3134 | None | 8 | 996 | 222 |
| bugmakerrrrrr | None | 11 | 1029 | 404 |

### 6 Months After LF (2024-09-16 - 2025-03-16)

Note: No improvement in organizational diversity after moving to the LF

Amazon / AWS Employees with >= 10 commits: 
* People: 42 25.15% of people
* Commits: 945 50.00% of total commits
* Additions: 242075 64.42% of total additions
* Deletions: 49947 66.00% of total deletions

Non-Employees with >= 10 commits: 
* People: 4 2.40% of people
* Commits: 220 11.64% of total commits
* Additions: 38569 10.26% of total additions
* Deletions: 8883 11.74% of total deletions

Totals in dataset of people with >=10 commits:
* 74.69% of total additions
* 77.74% of total deletions

|Person|Company|Commits|Additions|Deletions|
|:---|:---|:---|:---|:---|
| skumawat2025 | IIT Kharagpur | 8 | 2615 | 523 |
| reta | Aiven | 182 | 31718 | 8175 |
| rajiv-kv | None | 11 | 3410 | 186 |
| lukas-vlcek | Aiven | 9 | 497 | 69 |
| kkewwei | None | 16 | 2412 | 118 |
| HUSTERGS | @ByteDance | 6 | 260 | 48 |
| akolarkunnu | NetApp | 6 | 48 | 63 |
| gargharsh3134 | None | 8 | 996 | 222 |
| bugmakerrrrrr | None | 11 | 1029 | 404 |


### Additional Notes

Nick Knize (nknize) left Elastic and joined Amazon well before the 2021-02-03 Elastic relicensing, but was instrumental in creating the OpenSearch fork. Stopped committing in Aug 2023 before he left Amazon, so all of his commits can be attributed to Amazon.
* Amazon: Nov 2020 - Mar 2024
* Elastic: Nov 2014 - Nov 2020

# After the Fork, Before the LF (2021-04-12 - 2024-09-16)

In [1]:
from pprint import pprint
import collections
import pandas as pd
import pickle

# Pickle files generated by this script:
# https://github.com/chaoss/wg-data-science/blob/main/dataset/license-changes/fork-case-study/commits_people.py

people_pickle = '../data-files/OpenSearch_people_2021-04-12_2024-09-16.pkl'

with open(people_pickle, 'rb') as f:
    person_dict = pickle.load(f)

In [2]:
people = len(person_dict)
commits = 0
additions = 0
deletions = 0

for key,value in person_dict.items():
    # Normalize company names and use emails to derive Amazon affiliations
    if value['company'] == None:
        for email in value['email']:
            if "amazon.com" in email:
                person_dict[key]['company'] = 'Amazon'
    elif any(x in value['company'].lower() for x in ['aws','amazon']):
        person_dict[key]['company'] = 'Amazon'
    elif "aiven" in value['company'].lower():
        person_dict[key]['company'] = 'Aiven'
        
    # Get descriptive statistics
    commits = commits + value['commits']
    additions = additions + value['additions']
    deletions = deletions + value['deletions']
    
print("People:", people)
print("Commits:", commits)
print("Additions", additions)
print("Deletions", deletions)

People: 309
Commits: 4838
Additions 1275368
Deletions 477051


In [3]:
for key,value in person_dict.items():
    try:
        if (value['commits'] > 5) and (value['company'] == None):
            print(key,value)
    except:
        pass

rajiv-kv {'name': None, 'deletions': 186, 'company': None, 'additions': 3410, 'email': ['157019998+rajiv-kv@users.noreply.github.com'], 'commits': 11}
Rishikesh1159 {'name': 'Rishikesh', 'deletions': 2644, 'company': None, 'additions': 9336, 'email': ['62345295+Rishikesh1159@users.noreply.github.com', 'rishireddy1159@gmail.com'], 'commits': 76}
SwethaGuptha {'name': None, 'deletions': 523, 'company': None, 'additions': 1792, 'email': ['156877431+SwethaGuptha@users.noreply.github.com'], 'commits': 11}
kkewwei {'name': 'kkewwei', 'deletions': 118, 'company': None, 'additions': 2412, 'email': ['kewei.11@bytedance.com', 'kkewwei@163.com'], 'commits': 16}
shourya035 {'name': 'Shourya Dutta Biswas', 'deletions': 1801, 'company': None, 'additions': 7995, 'email': ['114977491+shourya035@users.noreply.github.com'], 'commits': 22}
dependabot[bot] {'name': None, 'deletions': 6780, 'company': None, 'additions': 8263, 'email': ['49699333+dependabot[bot]@users.noreply.github.com', 'dependabot[bot]@u

In [4]:
# Manual Fixes
person_dict['nknize']['company'] = 'Amazon' # https://www.linkedin.com/in/nknize/
person_dict['Rishikesh1159']['company'] = 'Amazon' # https://www.linkedin.com/in/rishikesh-reddy-pasham-678271164/
person_dict['rishabhmaurya']['company'] = 'Amazon' # https://www.linkedin.com/in/rishabh-maurya/
person_dict['shourya035']['company'] = 'Amazon' # https://www.linkedin.com/in/shourya-dutta-biswas-436a0b132/
person_dict['bowenlan-amzn']['company'] = 'Amazon' # https://www.linkedin.com/in/lanbowen23/
person_dict['harshavamsi']['company'] = 'Amazon' # https://www.linkedin.com/in/harshavamsi/
person_dict['vikasvb90']['company'] = 'Amazon' # https://www.linkedin.com/in/vikasbansal1/
person_dict['mattweber']['company'] = 'Amazon' # https://www.linkedin.com/in/matthew-g-weber/


# Remove bots
del person_dict['dependabot[bot]']
del person_dict['opensearch-trigger-bot[bot]']
del person_dict['opensearch-ci-bot']

In [5]:
org_people = 0
org_commits = 0
org_additions = 0
org_deletions = 0

other_people = 0
other_commits = 0
other_additions = 0
other_deletions = 0

for key,value in person_dict.items():
    try:
        if value['commits'] >= 10:
            if value['company'] == 'Amazon':
                org_people += 1
                org_commits = org_commits + value['commits']
                org_additions = org_additions + value['additions']
                org_deletions = org_deletions + value['deletions']
            else:
                other_people += 1
                other_commits = other_commits + value['commits']
                other_additions = other_additions + value['additions']
                other_deletions = other_deletions + value['deletions']
                print(key,value)
            i+=1
    except:
        pass
    
print("\nAmazon / AWS Employees with >= 10 commits:", "\n* People:", org_people, format(org_people/people, ".2%"), "of people")
print("* Commits:", org_commits, format(org_commits/commits, ".2%"), "of total commits")
print("* Additions:", org_additions, format(org_additions/additions, ".2%"), "of total additions")
print("* Deletions:", org_deletions, format(org_deletions/deletions, ".2%"), "of total deletions")

print("\nNon-Employees with >= 10 commits:", "\n* People:", other_people, format(other_people/people, ".2%"), "of people")
print("* Commits:", other_commits, format(other_commits/commits, ".2%"), "of total commits")
print("* Additions:", other_additions, format(other_additions/additions, ".2%"), "of total additions")
print("* Deletions:", other_deletions, format(other_deletions/deletions, ".2%"), "of total deletions")
      
print("\nTotals in dataset of people with >=10 commits:")
print('*', format((other_additions + org_additions)/additions, ".2%"), "of total additions")
print('*', format((other_deletions + org_deletions)/deletions, ".2%"), "of total deletions")

reta {'name': 'Andriy Redko', 'deletions': 29059, 'company': 'Aiven', 'additions': 90299, 'email': ['andriy.redko@aiven.io', 'reta@users.noreply.github.com', 'drreta@gmail.com'], 'commits': 459}
rajiv-kv {'name': None, 'deletions': 186, 'company': None, 'additions': 3410, 'email': ['157019998+rajiv-kv@users.noreply.github.com'], 'commits': 11}
SwethaGuptha {'name': None, 'deletions': 523, 'company': None, 'additions': 1792, 'email': ['156877431+SwethaGuptha@users.noreply.github.com'], 'commits': 11}
kkewwei {'name': 'kkewwei', 'deletions': 118, 'company': None, 'additions': 2412, 'email': ['kewei.11@bytedance.com', 'kkewwei@163.com'], 'commits': 16}
adnapibar {'name': 'Rabi Panda', 'deletions': 10739, 'company': 'Microsoft', 'additions': 12976, 'email': ['adnapibar@gmail.com'], 'commits': 64}
lukas-vlcek {'name': 'Lukáš Vlček', 'deletions': 334, 'company': 'Aiven', 'additions': 3518, 'email': ['lukas.vlcek@aiven.io'], 'commits': 17}
willyborankin {'name': 'Andrey Pleskach', 'deletions'

In [6]:
# Make it easy for the print statements to be copied into a Markdown table
print('|Person|Company|Commits|Additions|Deletions|')
print('|:---|:---|:---|:---|:---|')

for key,value in person_dict.items():
    try:
        if (value['commits'] > 5) and (value['company'] != 'Amazon'):
            print('|',key,'|',value['company'],'|',value['commits'],'|',value['additions'],'|',value['deletions'],'|')
    except:
        pass

|Person|Company|Commits|Additions|Deletions|
|:---|:---|:---|:---|:---|
| austintlee | Aryn | 8 | 1439 | 78 |
| reta | Aiven | 459 | 90299 | 29059 |
| akolarkunnu | NetApp | 6 | 48 | 63 |
| skumawat2025 | IIT Kharagpur | 8 | 2615 | 523 |
| rajiv-kv | None | 11 | 3410 | 186 |
| SwethaGuptha | None | 11 | 1792 | 523 |
| kkewwei | None | 16 | 2412 | 118 |
| adnapibar | Microsoft | 64 | 12976 | 10739 |
| rursprung | avaloq | 6 | 202 | 363 |
| lukas-vlcek | Aiven | 17 | 3518 | 334 |
| willyborankin | Aiven | 10 | 1826 | 682 |
| dzane17 | None | 12 | 2724 | 543 |
| HUSTERGS | @ByteDance | 6 | 260 | 48 |
| ketanv3 | @google | 17 | 9216 | 2809 |
| mohit0193 | Salesforce | 6 | 11202 | 720 |
| bugmakerrrrrr | None | 14 | 1168 | 424 |
| gargharsh3134 | None | 8 | 996 | 222 |
| Pranshu-S | None | 6 | 2342 | 89 |


# First Year After the Fork (2021-04-12 to 2022-04-12)

In [7]:
from pprint import pprint
import collections
import pandas as pd
import pickle

# Pickle files generated by this script:
# https://github.com/chaoss/wg-data-science/blob/main/dataset/license-changes/fork-case-study/commits_people.py

people_pickle_1styr = '../data-files/OpenSearch_people_2021-04-12_2022-04-12.pkl'

with open(people_pickle_1styr, 'rb') as f:
    person_dict_1styr = pickle.load(f)

In [8]:
people = len(person_dict_1styr)
commits = 0
additions = 0
deletions = 0

for key,value in person_dict_1styr.items():
    # Normalize company names and use emails to derive Amazon affiliations
    if value['company'] == None:
        for email in value['email']:
            if "amazon.com" in email:
                person_dict_1styr[key]['company'] = 'Amazon'
    elif any(x in value['company'].lower() for x in ['aws','amazon']):
        person_dict_1styr[key]['company'] = 'Amazon'
    elif "aiven" in value['company'].lower():
        person_dict_1styr[key]['company'] = 'Aiven'
        
    # Get descriptive statistics
    commits = commits + value['commits']
    additions = additions + value['additions']
    deletions = deletions + value['deletions']
    
print("People:", people)
print("Commits:", commits)
print("Additions", additions)
print("Deletions", deletions)

People: 93
Commits: 716
Additions 369983
Deletions 246615


In [9]:
for key,value in person_dict_1styr.items():
    try:
        if (value['commits'] > 5) and (value['company'] == None):
            print(key,value)
    except:
        pass

Rishikesh1159 {'additions': 321, 'name': 'Rishikesh', 'deletions': 582, 'company': None, 'commits': 7, 'email': ['62345295+Rishikesh1159@users.noreply.github.com']}
xuezhou25 {'additions': 116, 'name': 'Xue Zhou', 'deletions': 280, 'company': None, 'commits': 8, 'email': ['85715413+xuezhou25@users.noreply.github.com']}
dependabot[bot] {'additions': 2181, 'name': None, 'deletions': 1612, 'company': None, 'commits': 150, 'email': ['49699333+dependabot[bot]@users.noreply.github.com', 'dependabot[bot]@users.noreply.github.com']}


In [10]:
# Manual Fixes
person_dict_1styr['nknize']['company'] = 'Amazon' # https://www.linkedin.com/in/nknize/
person_dict_1styr['Rishikesh1159']['company'] = 'Amazon' # https://www.linkedin.com/in/rishikesh-reddy-pasham-678271164/
person_dict_1styr['rishabhmaurya']['company'] = 'Amazon' # https://www.linkedin.com/in/rishabh-maurya/
person_dict_1styr['shourya035']['company'] = 'Amazon' # https://www.linkedin.com/in/shourya-dutta-biswas-436a0b132/
person_dict_1styr['bowenlan-amzn']['company'] = 'Amazon' # https://www.linkedin.com/in/lanbowen23/
person_dict_1styr['harshavamsi']['company'] = 'Amazon' # https://www.linkedin.com/in/harshavamsi/
person_dict_1styr['vikasvb90']['company'] = 'Amazon' # https://www.linkedin.com/in/vikasbansal1/
person_dict_1styr['mattweber']['company'] = 'Amazon' # https://www.linkedin.com/in/matthew-g-weber/


# Remove bots
del person_dict_1styr['dependabot[bot]']

In [11]:
org_people = 0
org_commits = 0
org_additions = 0
org_deletions = 0

other_people = 0
other_commits = 0
other_additions = 0
other_deletions = 0

for key,value in person_dict_1styr.items():
    try:
        if value['commits'] >= 10:
            if value['company'] == 'Amazon':
                org_people += 1
                org_commits = org_commits + value['commits']
                org_additions = org_additions + value['additions']
                org_deletions = org_deletions + value['deletions']
            else:
                other_people += 1
                other_commits = other_commits + value['commits']
                other_additions = other_additions + value['additions']
                other_deletions = other_deletions + value['deletions']
                print(key,value)
            i+=1
    except:
        pass
    
print("\nAmazon / AWS Employees with >= 10 commits:", "\n* People:", org_people, format(org_people/people, ".2%"), "of people")
print("* Commits:", org_commits, format(org_commits/commits, ".2%"), "of total commits")
print("* Additions:", org_additions, format(org_additions/additions, ".2%"), "of total additions")
print("* Deletions:", org_deletions, format(org_deletions/deletions, ".2%"), "of total deletions")

print("\nNon-Employees with >= 10 commits:", "\n* People:", other_people, format(other_people/people, ".2%"), "of people")
print("* Commits:", other_commits, format(other_commits/commits, ".2%"), "of total commits")
print("* Additions:", other_additions, format(other_additions/additions, ".2%"), "of total additions")
print("* Deletions:", other_deletions, format(other_deletions/deletions, ".2%"), "of total deletions")
      
print("\nTotals in dataset of people with >=10 commits:")
print('*', format((other_additions + org_additions)/additions, ".2%"), "of total additions")
print('*', format((other_deletions + org_deletions)/deletions, ".2%"), "of total deletions")

reta {'additions': 15854, 'name': 'Andriy Redko', 'deletions': 2981, 'company': 'Aiven', 'commits': 74, 'email': ['andriy.redko@aiven.io', 'drreta@gmail.com']}
adnapibar {'additions': 11141, 'name': 'Rabi Panda', 'deletions': 7818, 'company': 'Microsoft', 'commits': 36, 'email': ['adnapibar@gmail.com']}

Amazon / AWS Employees with >= 10 commits: 
* People: 7 7.53% of people
* Commits: 246 34.36% of total commits
* Additions: 296720 80.20% of total additions
* Deletions: 224179 90.90% of total deletions

Non-Employees with >= 10 commits: 
* People: 2 2.15% of people
* Commits: 110 15.36% of total commits
* Additions: 26995 7.30% of total additions
* Deletions: 10799 4.38% of total deletions

Totals in dataset of people with >=10 commits:
* 87.49% of total additions
* 95.28% of total deletions


In [12]:
# Make it easy for the print statements to be copied into a Markdown table
print('|Person|Company|Commits|Additions|Deletions|')
print('|:---|:---|:---|:---|:---|')

for key,value in person_dict_1styr.items():
    try:
        if (value['commits'] > 5) and (value['company'] != 'Amazon'):
            print('|',key,'|',value['company'],'|',value['commits'],'|',value['additions'],'|',value['deletions'],'|')
    except:
        pass

|Person|Company|Commits|Additions|Deletions|
|:---|:---|:---|:---|:---|
| reta | Aiven | 74 | 15854 | 2981 |
| xuezhou25 | None | 8 | 116 | 280 |
| adnapibar | Microsoft | 36 | 11141 | 7818 |


# Final Year under AWS before LF (2023-09-16 to 2024-09-16)

In [13]:
from pprint import pprint
import collections
import pandas as pd
import pickle

# Pickle files generated by this script:
# https://github.com/chaoss/wg-data-science/blob/main/dataset/license-changes/fork-case-study/commits_people.py

people_pickle_p1y = '../data-files/OpenSearch_people_2023-09-16_2024-09-16.pkl'

with open(people_pickle_p1y, 'rb') as f:
    person_dict_p1y = pickle.load(f)

In [14]:
people = len(person_dict_p1y)
commits = 0
additions = 0
deletions = 0

for key,value in person_dict_p1y.items():
    # Normalize company names and use emails to derive Amazon affiliations
    if value['company'] == None:
        for email in value['email']:
            if "amazon.com" in email:
                person_dict_p1y[key]['company'] = 'Amazon'
    elif any(x in value['company'].lower() for x in ['aws','amazon']):
        person_dict_p1y[key]['company'] = 'Amazon'
    elif "aiven" in value['company'].lower():
        person_dict_p1y[key]['company'] = 'Aiven'
        
    # Get descriptive statistics
    commits = commits + value['commits']
    additions = additions + value['additions']
    deletions = deletions + value['deletions']
    
print("People:", people)
print("Commits:", commits)
print("Additions", additions)
print("Deletions", deletions)

People: 167
Commits: 1890
Additions 375749
Deletions 75677


In [15]:
for key,value in person_dict_p1y.items():
    try:
        if (value['commits'] > 5) and (value['company'] == None):
            print(key,value)
    except:
        pass

Pranshu-S {'deletions': 89, 'name': 'Pranshu Shukla', 'additions': 2342, 'email': ['55992439+Pranshu-S@users.noreply.github.com'], 'commits': 6, 'company': None}
Rishikesh1159 {'deletions': 329, 'name': 'Rishikesh', 'additions': 938, 'email': ['62345295+Rishikesh1159@users.noreply.github.com'], 'commits': 15, 'company': None}
opensearch-trigger-bot[bot] {'deletions': 12, 'name': None, 'additions': 342, 'email': ['98922864+opensearch-trigger-bot[bot]@users.noreply.github.com', 'opensearch-trigger-bot[bot]@users.noreply.github.com'], 'commits': 10, 'company': None}
dzane17 {'deletions': 530, 'name': 'David Zane', 'additions': 2502, 'email': ['38449481+dzane17@users.noreply.github.com'], 'commits': 11, 'company': None}
rajiv-kv {'deletions': 186, 'name': None, 'additions': 3410, 'email': ['157019998+rajiv-kv@users.noreply.github.com'], 'commits': 11, 'company': None}
vikasvb90 {'deletions': 282, 'name': 'Vikas Bansal', 'additions': 2435, 'email': ['43470111+vikasvb90@users.noreply.github.

In [16]:
# Manual Fixes
person_dict_p1y['nknize']['company'] = 'Amazon' # https://www.linkedin.com/in/nknize/
person_dict_p1y['Rishikesh1159']['company'] = 'Amazon' # https://www.linkedin.com/in/rishikesh-reddy-pasham-678271164/
person_dict_p1y['rishabhmaurya']['company'] = 'Amazon' # https://www.linkedin.com/in/rishabh-maurya/
person_dict_p1y['shourya035']['company'] = 'Amazon' # https://www.linkedin.com/in/shourya-dutta-biswas-436a0b132/
person_dict_p1y['bowenlan-amzn']['company'] = 'Amazon' # https://www.linkedin.com/in/lanbowen23/
person_dict_p1y['harshavamsi']['company'] = 'Amazon' # https://www.linkedin.com/in/harshavamsi/
person_dict_p1y['vikasvb90']['company'] = 'Amazon' # https://www.linkedin.com/in/vikasbansal1/
person_dict_p1y['mattweber']['company'] = 'Amazon' # https://www.linkedin.com/in/matthew-g-weber/


# Remove bots
del person_dict_p1y['dependabot[bot]']
del person_dict_p1y['opensearch-trigger-bot[bot]']
del person_dict_p1y['opensearch-ci-bot']

In [17]:
org_people = 0
org_commits = 0
org_additions = 0
org_deletions = 0

other_people = 0
other_commits = 0
other_additions = 0
other_deletions = 0

for key,value in person_dict_p1y.items():
    try:
        if value['commits'] >= 10:
            if value['company'] == 'Amazon':
                org_people += 1
                org_commits = org_commits + value['commits']
                org_additions = org_additions + value['additions']
                org_deletions = org_deletions + value['deletions']
            else:
                other_people += 1
                other_commits = other_commits + value['commits']
                other_additions = other_additions + value['additions']
                other_deletions = other_deletions + value['deletions']
                print(key,value)
            i+=1
    except:
        pass
    
print("\nAmazon / AWS Employees with >= 10 commits:", "\n* People:", org_people, format(org_people/people, ".2%"), "of people")
print("* Commits:", org_commits, format(org_commits/commits, ".2%"), "of total commits")
print("* Additions:", org_additions, format(org_additions/additions, ".2%"), "of total additions")
print("* Deletions:", org_deletions, format(org_deletions/deletions, ".2%"), "of total deletions")

print("\nNon-Employees with >= 10 commits:", "\n* People:", other_people, format(other_people/people, ".2%"), "of people")
print("* Commits:", other_commits, format(other_commits/commits, ".2%"), "of total commits")
print("* Additions:", other_additions, format(other_additions/additions, ".2%"), "of total additions")
print("* Deletions:", other_deletions, format(other_deletions/deletions, ".2%"), "of total deletions")
      
print("\nTotals in dataset of people with >=10 commits:")
print('*', format((other_additions + org_additions)/additions, ".2%"), "of total additions")
print('*', format((other_deletions + org_deletions)/deletions, ".2%"), "of total deletions")

reta {'deletions': 8175, 'name': 'Andriy Redko', 'additions': 31718, 'email': ['andriy.redko@aiven.io', 'reta@users.noreply.github.com', 'drreta@gmail.com'], 'commits': 182, 'company': 'Aiven'}
dzane17 {'deletions': 530, 'name': 'David Zane', 'additions': 2502, 'email': ['38449481+dzane17@users.noreply.github.com'], 'commits': 11, 'company': None}
rajiv-kv {'deletions': 186, 'name': None, 'additions': 3410, 'email': ['157019998+rajiv-kv@users.noreply.github.com'], 'commits': 11, 'company': None}
SwethaGuptha {'deletions': 523, 'name': None, 'additions': 1792, 'email': ['156877431+SwethaGuptha@users.noreply.github.com'], 'commits': 11, 'company': None}
kkewwei {'deletions': 118, 'name': 'kkewwei', 'additions': 2412, 'email': ['kewei.11@bytedance.com', 'kkewwei@163.com'], 'commits': 16, 'company': None}
bugmakerrrrrr {'deletions': 404, 'name': 'panguixin', 'additions': 1029, 'email': ['panguixin@bytedance.com'], 'commits': 11, 'company': None}

Amazon / AWS Employees with >= 10 commits: 

In [18]:
# Make it easy for the print statements to be copied into a Markdown table
print('|Person|Company|Commits|Additions|Deletions|')
print('|:---|:---|:---|:---|:---|')

for key,value in person_dict_p1y.items():
    try:
        if (value['commits'] > 5) and (value['company'] != 'Amazon'):
            print('|',key,'|',value['company'],'|',value['commits'],'|',value['additions'],'|',value['deletions'],'|')
    except:
        pass

|Person|Company|Commits|Additions|Deletions|
|:---|:---|:---|:---|:---|
| skumawat2025 | IIT Kharagpur | 8 | 2615 | 523 |
| Pranshu-S | None | 6 | 2342 | 89 |
| reta | Aiven | 182 | 31718 | 8175 |
| dzane17 | None | 11 | 2502 | 530 |
| rajiv-kv | None | 11 | 3410 | 186 |
| SwethaGuptha | None | 11 | 1792 | 523 |
| lukas-vlcek | Aiven | 9 | 497 | 69 |
| kkewwei | None | 16 | 2412 | 118 |
| HUSTERGS | @ByteDance | 6 | 260 | 48 |
| akolarkunnu | NetApp | 6 | 48 | 63 |
| gargharsh3134 | None | 8 | 996 | 222 |
| bugmakerrrrrr | None | 11 | 1029 | 404 |


# After LF (2024-09-16 - 2025-03-16)

In [19]:
from pprint import pprint
import collections
import pandas as pd
import pickle

# Pickle files generated by this script:
# https://github.com/chaoss/wg-data-science/blob/main/dataset/license-changes/fork-case-study/commits_people.py

people_pickle_lf6mo = '../data-files/OpenSearch_people_2023-09-16_2024-09-16.pkl'

with open(people_pickle_lf6mo, 'rb') as f:
    person_dict_lf6mo = pickle.load(f)

In [20]:
people = len(person_dict_lf6mo)
commits = 0
additions = 0
deletions = 0

for key,value in person_dict_lf6mo.items():
    # Normalize company names and use emails to derive Amazon affiliations
    if value['company'] == None:
        for email in value['email']:
            if "amazon.com" in email:
                person_dict_lf6mo[key]['company'] = 'Amazon'
    elif any(x in value['company'].lower() for x in ['aws','amazon']):
        person_dict_lf6mo[key]['company'] = 'Amazon'
    elif "aiven" in value['company'].lower():
        person_dict_lf6mo[key]['company'] = 'Aiven'
        
    # Get descriptive statistics
    commits = commits + value['commits']
    additions = additions + value['additions']
    deletions = deletions + value['deletions']
    
print("People:", people)
print("Commits:", commits)
print("Additions", additions)
print("Deletions", deletions)

People: 167
Commits: 1890
Additions 375749
Deletions 75677


In [21]:
# Before Cleanup
for key,value in person_dict_lf6mo.items():
    try:
        if (value['commits'] > 5) and (value['company'] == None):
            print(key,value)
    except:
        pass

Pranshu-S {'deletions': 89, 'name': 'Pranshu Shukla', 'additions': 2342, 'email': ['55992439+Pranshu-S@users.noreply.github.com'], 'commits': 6, 'company': None}
Rishikesh1159 {'deletions': 329, 'name': 'Rishikesh', 'additions': 938, 'email': ['62345295+Rishikesh1159@users.noreply.github.com'], 'commits': 15, 'company': None}
opensearch-trigger-bot[bot] {'deletions': 12, 'name': None, 'additions': 342, 'email': ['98922864+opensearch-trigger-bot[bot]@users.noreply.github.com', 'opensearch-trigger-bot[bot]@users.noreply.github.com'], 'commits': 10, 'company': None}
dzane17 {'deletions': 530, 'name': 'David Zane', 'additions': 2502, 'email': ['38449481+dzane17@users.noreply.github.com'], 'commits': 11, 'company': None}
rajiv-kv {'deletions': 186, 'name': None, 'additions': 3410, 'email': ['157019998+rajiv-kv@users.noreply.github.com'], 'commits': 11, 'company': None}
vikasvb90 {'deletions': 282, 'name': 'Vikas Bansal', 'additions': 2435, 'email': ['43470111+vikasvb90@users.noreply.github.

In [22]:
# Manual Fixes
person_dict_lf6mo['Rishikesh1159']['company'] = 'Amazon' # https://www.linkedin.com/in/rishikesh-reddy-pasham-678271164/
person_dict_lf6mo['rishabhmaurya']['company'] = 'Amazon' # https://www.linkedin.com/in/rishabh-maurya/
person_dict_lf6mo['shourya035']['company'] = 'Amazon' # https://www.linkedin.com/in/shourya-dutta-biswas-436a0b132/
person_dict_lf6mo['vikasvb90']['company'] = 'Amazon' # https://www.linkedin.com/in/vikasbansal1/

person_dict_lf6mo['Pranshu-S']['company'] = 'Amazon' # https://www.linkedin.com/in/pranshu-shukla-50a79931/
person_dict_lf6mo['dzane17']['company'] = 'Amazon' # https://opensearch.org/blog/author/david-zane/
person_dict_lf6mo['SwethaGuptha']['company'] = 'Amazon' # https://www.linkedin.com/in/swetha-g-23209a147/
person_dict_lf6mo['harshavamsi']['company'] = 'Amazon' # https://www.linkedin.com/in/harshavamsi/

# Remove bots
del person_dict_lf6mo['dependabot[bot]']
del person_dict_lf6mo['opensearch-trigger-bot[bot]']
del person_dict_lf6mo['opensearch-ci-bot']

In [23]:
# Unknowns After Cleanup
for key,value in person_dict_lf6mo.items():
    try:
        if (value['commits'] > 5) and (value['company'] == None):
            print(key,value)
    except:
        pass

rajiv-kv {'deletions': 186, 'name': None, 'additions': 3410, 'email': ['157019998+rajiv-kv@users.noreply.github.com'], 'commits': 11, 'company': None}
kkewwei {'deletions': 118, 'name': 'kkewwei', 'additions': 2412, 'email': ['kewei.11@bytedance.com', 'kkewwei@163.com'], 'commits': 16, 'company': None}
gargharsh3134 {'deletions': 222, 'name': None, 'additions': 996, 'email': ['51459091+gargharsh3134@users.noreply.github.com'], 'commits': 8, 'company': None}
bugmakerrrrrr {'deletions': 404, 'name': 'panguixin', 'additions': 1029, 'email': ['panguixin@bytedance.com'], 'commits': 11, 'company': None}


In [24]:
org_people = 0
org_commits = 0
org_additions = 0
org_deletions = 0

other_people = 0
other_commits = 0
other_additions = 0
other_deletions = 0

for key,value in person_dict_lf6mo.items():
    try:
        if value['commits'] >= 10:
            if value['company'] == 'Amazon':
                org_people += 1
                org_commits = org_commits + value['commits']
                org_additions = org_additions + value['additions']
                org_deletions = org_deletions + value['deletions']
            else:
                other_people += 1
                other_commits = other_commits + value['commits']
                other_additions = other_additions + value['additions']
                other_deletions = other_deletions + value['deletions']
                print(key,value)
            i+=1
    except:
        pass
    
print("\nAmazon / AWS Employees with >= 10 commits:", "\n* People:", org_people, format(org_people/people, ".2%"), "of people")
print("* Commits:", org_commits, format(org_commits/commits, ".2%"), "of total commits")
print("* Additions:", org_additions, format(org_additions/additions, ".2%"), "of total additions")
print("* Deletions:", org_deletions, format(org_deletions/deletions, ".2%"), "of total deletions")

print("\nNon-Employees with >= 10 commits:", "\n* People:", other_people, format(other_people/people, ".2%"), "of people")
print("* Commits:", other_commits, format(other_commits/commits, ".2%"), "of total commits")
print("* Additions:", other_additions, format(other_additions/additions, ".2%"), "of total additions")
print("* Deletions:", other_deletions, format(other_deletions/deletions, ".2%"), "of total deletions")
      
print("\nTotals in dataset of people with >=10 commits:")
print('*', format((other_additions + org_additions)/additions, ".2%"), "of total additions")
print('*', format((other_deletions + org_deletions)/deletions, ".2%"), "of total deletions")

reta {'deletions': 8175, 'name': 'Andriy Redko', 'additions': 31718, 'email': ['andriy.redko@aiven.io', 'reta@users.noreply.github.com', 'drreta@gmail.com'], 'commits': 182, 'company': 'Aiven'}
rajiv-kv {'deletions': 186, 'name': None, 'additions': 3410, 'email': ['157019998+rajiv-kv@users.noreply.github.com'], 'commits': 11, 'company': None}
kkewwei {'deletions': 118, 'name': 'kkewwei', 'additions': 2412, 'email': ['kewei.11@bytedance.com', 'kkewwei@163.com'], 'commits': 16, 'company': None}
bugmakerrrrrr {'deletions': 404, 'name': 'panguixin', 'additions': 1029, 'email': ['panguixin@bytedance.com'], 'commits': 11, 'company': None}

Amazon / AWS Employees with >= 10 commits: 
* People: 42 25.15% of people
* Commits: 945 50.00% of total commits
* Additions: 242075 64.42% of total additions
* Deletions: 49947 66.00% of total deletions

Non-Employees with >= 10 commits: 
* People: 4 2.40% of people
* Commits: 220 11.64% of total commits
* Additions: 38569 10.26% of total additions
* Dele

In [25]:
# Make it easy for the print statements to be copied into a Markdown table
print('|Person|Company|Commits|Additions|Deletions|')
print('|:---|:---|:---|:---|:---|')

for key,value in person_dict_lf6mo.items():
    try:
        if (value['commits'] > 5) and (value['company'] != 'Amazon'):
            print('|',key,'|',value['company'],'|',value['commits'],'|',value['additions'],'|',value['deletions'],'|')
    except:
        pass

|Person|Company|Commits|Additions|Deletions|
|:---|:---|:---|:---|:---|
| skumawat2025 | IIT Kharagpur | 8 | 2615 | 523 |
| reta | Aiven | 182 | 31718 | 8175 |
| rajiv-kv | None | 11 | 3410 | 186 |
| lukas-vlcek | Aiven | 9 | 497 | 69 |
| kkewwei | None | 16 | 2412 | 118 |
| HUSTERGS | @ByteDance | 6 | 260 | 48 |
| akolarkunnu | NetApp | 6 | 48 | 63 |
| gargharsh3134 | None | 8 | 996 | 222 |
| bugmakerrrrrr | None | 11 | 1029 | 404 |


# List of people who contributed to Elastic before the relicense and have contributed to OpenSearch

In [26]:
elastic_pickle_1yr = '../data-files/elasticsearch_people_2020-02-03_2021-02-03.pkl'

with open(elastic_pickle_1yr, 'rb') as f:
    elastic_dict_1yr = pickle.load(f)

common_contrib = set(elastic_dict_1yr) & set(person_dict)
print("Common Contributors:", common_contrib)
print(len(common_contrib))

Common Contributors: {'nknize', 'gaobinlong', 'russcam', 'Gaurav614', 'rursprung', 'malpani', 'Bukhtawar', 'uschindler', 'markharwood'}
9


In [27]:
# Contributions to OpenSearch
for x in common_contrib:
    if person_dict[x]['commits'] >= 0:
        print(x, person_dict[x])

nknize {'name': 'Nick Knize', 'deletions': 92793, 'company': 'Amazon', 'additions': 78187, 'email': ['nknize@apache.org', 'nknize@gmail.com'], 'commits': 215}
gaobinlong {'name': 'gaobinlong', 'deletions': 530, 'company': 'Amazon', 'additions': 11046, 'email': ['gbinlong@amazon.com', 'gbl_long@163.com'], 'commits': 59}
russcam {'name': 'Russ Cam', 'deletions': 3, 'company': 'Search Pioneer', 'additions': 187, 'email': ['russ.cam@forloop.co.uk'], 'commits': 1}
Gaurav614 {'name': 'Gaurav Chandani', 'deletions': 355, 'company': 'Amazon', 'additions': 5875, 'email': ['chngau@amazon.com'], 'commits': 6}
rursprung {'name': 'Ralph Ursprung', 'deletions': 363, 'company': 'avaloq', 'additions': 202, 'email': ['39383228+rursprung@users.noreply.github.com'], 'commits': 6}
malpani {'name': None, 'deletions': 240, 'company': 'Amazon', 'additions': 3734, 'email': ['malpani@amazon.com'], 'commits': 2}
Bukhtawar {'name': 'Bukhtawar Khan', 'deletions': 3426, 'company': 'Amazon', 'additions': 17537, 'em

In [28]:
# Contributions to Elasticsearch
for x in common_contrib:
    if elastic_dict_1yr[x]['commits'] >= 5:
        print(x, elastic_dict_1yr[x])

nknize {'company': 'Apache', 'name': 'Nick Knize', 'commits': 6, 'additions': 1852, 'deletions': 1334, 'email': ['nknize@gmail.com', 'nknize@apache.org']}
gaobinlong {'company': 'AWS', 'name': 'gaobinlong', 'commits': 49, 'additions': 3450, 'deletions': 572, 'email': ['gbl_long@163.com', 'gbinlong@amazon.com']}
russcam {'company': None, 'name': 'Russ Cam', 'commits': 19, 'additions': 2152, 'deletions': 390, 'email': ['russ.cam@elastic.co']}
markharwood {'company': None, 'name': None, 'commits': 26, 'additions': 6320, 'deletions': 1075, 'email': ['markharwood@gmail.com']}
