# Rabbit with ghmap
This notebook demonstrates how to use GitHubManager to extract GitHub events, convert them into activities, and predict if a user is a bot or not using the BIMBAS model.

- GitHubManager is used to extract GitHub events, convert them into activities and extract features.
- The model_utils module is used to load the BIMBAS model and make predictions.


# 1 - Extract activities

## 1.1 - Setup the variables
The api key is not necessary to extract the events, but is needed to be able to make > 15 queries per hour.

By default, in this notebook, we will use ghmap to extract the activities. If you want, you can use the mapping of RABBIT by setting the parameter `ghmap` to False.

In [2]:
from gitbot_utils.gh_api import GitHubManager

API_KEY = None # Set KEY
USER = 'robodoo'

gh_manager = GitHubManager(API_KEY, max_queries=3, min_events=5, ghmap=True)

## 1.2 - Get the raw events from the user
The events are extracted from the GitHub API. Maximum `max_queries` of 100 events are extracted per query.

In [8]:
raw_events = gh_manager.query_events(USER)
raw_events[0]

{'id': '49656354046',
 'type': 'PushEvent',
 'actor': {'id': 16837285,
  'login': 'robodoo',
  'display_login': 'robodoo',
  'gravatar_id': '',
  'url': 'https://api.github.com/users/robodoo',
  'avatar_url': 'https://avatars.githubusercontent.com/u/16837285?'},
 'repo': {'id': 232060976,
  'name': 'odoo/o-spreadsheet',
  'url': 'https://api.github.com/repos/odoo/o-spreadsheet'},
 'payload': {'repository_id': 232060976,
  'push_id': 24264943310,
  'size': 2,
  'distinct_size': 2,
  'ref': 'refs/heads/staging.17.0',
  'head': '31c642c8a453afcfd29af16803190abee01c01b8',
  'before': '485b23caaed69416abb311db5bec15a2f7ffb6b1',
  'commits': [{'sha': 'e1458ae8347c6dfc18d6e8fab18994389e5ff998',
    'author': {'email': 'lul@odoo.com', 'name': 'Lucas Lefèvre (lul)'},
    'message': '[FIX] tokenizer: support signed scientific exponent\n\nsigns were not supported in the exponent: 1e+3 or 1e-1\n\nTask: 4766910\nPart-of: odoo/o-spreadsheet#6390\nSigned-off-by: Pierre Rousseau (pro) <pro@odoo.com>\n

## 1.3 - Extract the activities
We can now use ghmap to extract the activities from the raw events.

In [9]:
activities = gh_manager.events_to_activities(raw_events)

In [10]:
print(activities[-1])

{'activity': 'PushCommits', 'start_date': '2025-05-13T09:34:54Z', 'end_date': '2025-05-13T09:34:54Z', 'actor': {'id': 16837285, 'login': 'robodoo'}, 'repository': {'id': 232060976, 'name': 'odoo/o-spreadsheet', 'organisation': 'odoo', 'organisation_id': 6368483}, 'actions': [{'action': 'PushCommits', 'event_id': '49656354046', 'date': '2025-05-13T09:34:54Z', 'details': {'push': {'id': 24264943310, 'ref': 'refs/heads/staging.17.0', 'commits': 2}}}, {'action': 'PushCommits', 'event_id': '49656354046', 'date': '2025-05-13T09:34:54Z', 'details': {'push': {'id': 24264943310, 'ref': 'refs/heads/staging.17.0', 'commits': 2}}}, {'action': 'PushCommits', 'event_id': '49656354046', 'date': '2025-05-13T09:34:54Z', 'details': {'push': {'id': 24264943310, 'ref': 'refs/heads/staging.17.0', 'commits': 2}}}]}


# 2 - Extracting features
Now, we can extract the features used by BIMBAS model.
The features are devided in 2 groups :
- **a** : Counting metrics
- **b** : Aggregated metrics (mean, std, median, IQR, gini)



## 2.1 - Convert to a DataFrame compatible with RABBIT
RABBIT needs a DataFrame with the columns 'date', 'activity', 'contributor' and 'repository'.


In [11]:
df = gh_manager.activity_to_df(activities)
display(df)

Unnamed: 0,date,activity,contributor,repository,owner
0,2025-05-13 08:37:46,CommentPullRequest,robodoo,677342336,odoo
1,2025-05-13 08:43:22,CommentPullRequest,robodoo,19745004,odoo
2,2025-05-13 08:46:10,PushCommits,robodoo,19745004,odoo
3,2025-05-13 08:46:12,PushCommits,robodoo,38825372,odoo
4,2025-05-13 08:46:17,ClosePullRequest,robodoo,19745004,odoo
...,...,...,...,...,...
84,2025-05-13 09:28:11,ClosePullRequest,robodoo,232060976,odoo
85,2025-05-13 09:28:13,PushCommits,robodoo,362812569,odoo
86,2025-05-13 09:28:16,PushCommits,robodoo,38825372,odoo
87,2025-05-13 09:32:44,CommentPullRequest,robodoo,19745004,odoo


## 2.2 - Extract the features
Since we have the DataFrame, we can now extract the features using RABBIT extractor.

In [12]:
df_feat = gh_manager.extract_features(df, USER)
display(df_feat)

Unnamed: 0,NA,NT,NR,NOR,ORR,NAR_mean,NAR_median,NAR_std,NAR_gini,NAR_IQR,...,DCA_mean,DCA_median,DCA_std,DCA_gini,DCA_IQR,DCAT_mean,DCAT_median,DCAT_std,DCAT_gini,DCAT_IQR
robodoo,89,3,7.0,2,0.286,12.714,7.0,16.018,0.549,10.5,...,0.011,0.002,0.019,0.712,0.01,0.013,0.002,0.02,0.695,0.018


To explain correctly how GitHubManager works, we made all steps manually. However, you can directly use the `compute_features` method to extract the features from the user. It does :
1. Query the events from the user. (If the user has less than `min_events`, it returns None)
2. Convert the events to activities.
3. Convert the activities to a DataFrame.
4. Extract the features from the DataFrame.


In [19]:
df_feat = gh_manager.compute_features(USER)
display(df_feat)

Unnamed: 0,NA,NT,NR,NOR,ORR,NAR_mean,NAR_median,NAR_std,NAR_gini,NAR_IQR,...,DCA_mean,DCA_median,DCA_std,DCA_gini,DCA_IQR,DCAT_mean,DCAT_median,DCAT_std,DCAT_gini,DCAT_IQR
robodoo,90,3,7.0,2,0.286,12.857,6.0,18.106,0.6,10.5,...,0.012,0.002,0.03,0.76,0.008,0.012,0.002,0.019,0.701,0.012


# 3 - Predict if the user is a bot or not
We can now use the BIMBAS model to predict if the user is a bot or not.

In [18]:
import gitbot_utils.model_utils as mod

bimbas = mod.load_model("../resources/models/bimbas.joblib")

label, confidence = mod.predict_contributor(df_feat, bimbas)
print(f"Contributor {USER} is a {label} with confidence {confidence}")

Contributor robodoo is a Bot with confidence 0.905
