# Rabbit with ghmap
This notebook is a test notebook to extract activities of a GH user using the `ghmap` package and then extract features used by the `rabbit` package.

In [1]:
from src.bimbas_ghmap import query_events, events_to_activities, activity_to_df

import important_features as rabbit_features

# 1 - Extracting activities

## 1.1 - Setup the variables
We need the api key and the user name to extract the activities.

In [2]:
API_KEY = None # Set KEY
USER = 'robodoo'

## 1.2 - Get the raw events from the user
By default, `ghmap` needs raw github events to extract the activities.

In [3]:
raw_events = query_events(USER, API_KEY)
raw_events[-1]

{'id': '45243062685',
 'type': 'IssueCommentEvent',
 'actor': {'id': 16837285,
  'login': 'robodoo',
  'display_login': 'robodoo',
  'gravatar_id': '',
  'url': 'https://api.github.com/users/robodoo',
  'avatar_url': 'https://avatars.githubusercontent.com/u/16837285?'},
 'repo': {'id': 19745004,
  'name': 'odoo/odoo',
  'url': 'https://api.github.com/repos/odoo/odoo'},
 'payload': {'action': 'created',
  'issue': {'url': 'https://api.github.com/repos/odoo/odoo/issues/192391',
   'repository_url': 'https://api.github.com/repos/odoo/odoo',
   'labels_url': 'https://api.github.com/repos/odoo/odoo/issues/192391/labels{/name}',
   'comments_url': 'https://api.github.com/repos/odoo/odoo/issues/192391/comments',
   'events_url': 'https://api.github.com/repos/odoo/odoo/issues/192391/events',
   'html_url': 'https://github.com/odoo/odoo/pull/192391',
   'id': 2767897765,
   'node_id': 'PR_kwDOAS1I7M6Gremp',
   'number': 192391,
   'title': '[FW][REV] l10n_cl: create EDI document for credit note

## 1.3 - Extract the activities
We can now use ghmap to extract the activities from the raw events.

In [4]:
activities = events_to_activities(raw_events)

Mapping events to actions: 100%|██████████| 243/243 [00:00<00:00, 19301.87event/s]
Mapping actions to activities: 100%|██████████| 4/4 [00:00<00:00, 276.74group/s]


In [5]:
print(activities[-1])

{'activity': 'CommentPullRequest', 'start_date': '2025-01-05T05:37:27Z', 'end_date': '2025-01-05T05:37:27Z', 'actor': {'id': 16837285, 'login': 'robodoo'}, 'repository': {'id': 19745004, 'name': 'odoo/odoo', 'organisation': 'odoo', 'organisation_id': 6368483}, 'actions': [{'action': 'CreatePullRequestComment', 'event_id': '45266197403', 'date': '2025-01-05T05:37:27Z', 'details': {'pull_request': {'id': 2769124955, 'number': 192434, 'title': '[FIX] hr_timesheet_attendance: fix the timezone related issue', 'state': 'open', 'author': {'id': 24606113, 'login': 'ppr-odoo'}, 'labels': [], 'created_date': '2025-01-05T05:37:22Z', 'updated_date': '2025-01-05T05:37:27Z', 'closed_date': None, 'merged_date': None}, 'comment': {'id': 2571509387, 'position': 1}}}]}


# 2 - Extracting features
Now, we can extract the features used by BIMBAS model.
The features are devided in 2 groups :
- **a** : Counting metrics
- **b** : Aggregated metrics (mean, std, median, IQR, gini)



## 2.1 - Convert to a DataFrame compatible with RABBIT
RABBIT needs a DataFrame with the columns 'date', 'activity', 'contributor' and 'repository'.


In [6]:
df = activity_to_df(activities)
display(df)

Unnamed: 0,date,activity,contributor,repository
0,2025-01-03 17:04:22,CommentPullRequest,robodoo,odoo/odoo
1,2025-01-03 17:12:47,PushCommits,robodoo,odoo/odoo
2,2025-01-03 17:12:50,PushCommits,robodoo,odoo/odoo
3,2025-01-03 17:12:56,ClosePullRequest,robodoo,odoo/odoo
4,2025-01-03 17:13:01,PushCommits,robodoo,odoo/odoo
...,...,...,...,...
223,2025-01-04 18:42:12,PushCommits,robodoo,odoo/odoo
224,2025-01-04 18:42:17,PushCommits,robodoo,odoo/design-themes
225,2025-01-04 18:42:18,PushCommits,robodoo,odoo/documentation
226,2025-01-04 19:58:42,CommentPullRequest,robodoo,odoo/odoo


## 2.2 - Extract the features
Since we have the DataFrame, we can now extract the features using RABBIT extractor.

In [7]:
df_feat = (
    rabbit_features.extract_features(df)
    .set_index([[USER]]) # Set the row index to the username
)
display(df_feat)

Unnamed: 0,NA,NT,NOR,ORR,DCA_mean,DCA_median,DCA_std,DCA_gini,NAR_mean,NAR_median,...,DCAT_mean,DCAT_median,DCAT_std,DCAT_gini,DCAT_IQR,NAT_mean,NAT_median,NAT_std,NAT_gini,NAT_IQR
robodoo,228,3,1,0.25,0.161,0.002,0.819,0.93,57.0,37.5,...,0.178,0.002,0.732,0.914,0.031,76.0,50.0,49.427,0.257,44.0


# 3 - Predict if the user is a bot or not
We can now use the BIMBAS model to predict if the user is a bot or not.

In [8]:
from rabbit import get_model, compute_confidence
import warnings

model = get_model()
with warnings.catch_warnings():
    warnings.simplefilter("ignore", category=UserWarning)
    proba = model.predict_proba(df_feat)
contributor_type, confidence = compute_confidence(proba[0][1])
print(f"{USER} is a {contributor_type} with a confidence of {confidence:}")


robodoo is a Bot with a confidence of 0.947
