Published on September 27, 2025. By Prata, Marília (mpwolke)

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import seaborn as sns

#Two lines Required to Plot Plotly
import plotly.io as pio
pio.renderers.default = 'iframe'

import plotly.graph_objs as go
import plotly.offline as py
import plotly.express as px

#Ignore warnings
import warnings
warnings.filterwarnings('ignore')


# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

![](https://media.tenor.com/AflzCeRpLWIAAAAM/lemmiwinks.gif)

## Mouse Action Recognition System (MARS)

The Mouse Action Recognition System (MARS) software pipeline for automated analysis of social behaviors in mice

**Citation**: Cristina SegalinJalani WilliamsTomomi KarigoMay HuiMoriel ZelikowskyJennifer J SunPietro PeronaDavid J AndersonAnn Kennedy (2021) The Mouse Action Recognition System (MARS) software pipeline for automated analysis of social behaviors in mice eLife 10:e63720.
https://doi.org/10.7554/eLife.63720


"The study of naturalistic social behavior requires quantification of animals’ interactions. This is generally done through manual annotation a highly time-consuming and tedious process. Recent advances in computer vision enable tracking the pose (posture) of freely behaving animals. However, automatically and accurately classifying complex social behaviors remains technically challenging."

"The authors introduced the **Mouse Action Recognition System (MARS**), an automated pipeline for pose estimation and behavior quantification in pairs of freely interacting mice. They compared MARS’s annotations to human annotations and find that MARS’s pose estimation and behavior classification achieve human-level performance."

"The authors also released the pose and annotation datasets used to train MARS to serve as community benchmarks and resources. Finally, they introduced the **Behavior Ensemble and Neural Trajectory Observatory (BENTO)**, a graphical user interface for analysis of multimodal neuroscience datasets. Together, MARS and BENTO provide an end-to-end pipeline for behavior data extraction and analysis in a package that is user-friendly and easily modifiable."

"In preliminary exploration, the authors found that high precision and recall values for individual binary behavior classifiers were achieved by gradient boosting using the XGBoost algorithm. Custom Python code to train novel behavior classifiers is included with the MARS_Developer software. Classifier hyperparameters may be set by the user, otherwise MARS will provide default values."

"Each trained classifier produces a predicted probability that the behavior occurred, as well as a binarized output created by thresholding that probability value. Following predictions by individual classifiers, MARS combines all classifier outputs to produce a single, multi-class label for each frame of a behavior video. To do so, the authors selected on each frame the behavior label that has the highest predicted probability of occurring; if no behavior has a predicted probability of >0.5, then the frame is labeled as ‘other’ (no behavior occurring)."

"The advantage of this approach over training multi-class XGBoost is that it allows their ensemble of classifiers to be more easily expanded in the future to include additional behaviors of interest because it does not require the original training set to be fully re-annotated for the new behavior."

https://elifesciences.org/articles/63720

## Competititon Citation:

@misc{MABe-mouse-behavior-detection,

    author = {Jennifer J. Sun and Markus Marks and Sam Golden and Talmo Pereira and Ann Kennedy and Sohier Dane and Addison Howard and Ashley Chow},
    title = {MABe Challenge - Social Action Recognition in Mice},
    year = {2025},
    
    howpublished = {\url{https://kaggle.com/competitions/MABe-mouse-behavior-detection}},
    note = {Kaggle}
}

## train file

In [None]:
train = pd.read_csv('/kaggle/input/MABe-mouse-behavior-detection/train.csv')
pd.set_option('display.max_columns', None)
train.tail(3)

## Missing values

In [None]:
#By Yulya Odintsova https://www.kaggle.com/code/yulyaodintsova/eda-logistic-regression-lgbmclassifier

train_info = pd.DataFrame({
    "DataType": train.dtypes,
    "MissingValues": train.isnull().sum(),
    "UniqueValues": train.nunique()
}).sort_values(by="MissingValues", ascending=False)

train_info['MissingValuesRatio'] = round(train_info['MissingValues'] / train.shape[0] ,2)

train_info

## Describe method

In [None]:
train.describe().loc[['mean','min','max']].T

In [None]:
#Numerical List
#only Integer: list(df.select_dtypes(include='int64').columns)
#Below could be df.select_dtypes(include=[np.number])

list(train.select_dtypes(include=['int64', 'float64']).columns)

In [None]:
numerical_cols = ['video_id','mouse1_id','mouse2_id', 'mouse3_id','mouse4_id',
 'frames_per_second','video_duration_sec','pix_per_cm_approx','video_width_pix',
 'video_height_pix','arena_width_cm','arena_height_cm']

## Correlation - Heatmap

In [None]:
# OutlierPandas https://www.kaggle.com/code/abhyudaya456/s5e6-eda-for-predicting-optimal-fertilizers/notebook 
plt.figure(figsize=(10,6))
sns.heatmap(train[numerical_cols].corr(), annot=True, cmap='summer')
plt.title("Correlation among Numerical Features")
plt.show()

In [None]:
list(train.select_dtypes(include='object').columns)

## Categorical cols distributions

In [None]:
#By H-Z-Ning  https://www.kaggle.com/code/hzning/top-10-solution-0-97525-esay-is-all-you

categorical_columns = ["mouse1_sex", "mouse2_sex","mouse2_strain", "mouse1_color", "mouse2_color","mouse3_color", "mouse4_color", "arena_shape", "arena_type"]

plt.figure(figsize=(14, 12))
for i, column in enumerate(categorical_columns, 1):
    plt.subplot(3, 3, i)
    sns.countplot(x=column, data=train, palette='Set2')
    plt.title(f'Distribution of {column}')
    plt.xticks(rotation=45)

plt.tight_layout()
plt.show()

## Tracking Methods

In [None]:
train['tracking_method'].value_counts()

In [None]:
labels = 'custom HRnet', 'MARS', 'DeepLabCut', 'SLEAP'
sizes = [7926, 528, 211, 124]  #must have same number labels, sizes and explode
explode = (0, 0.2, 0, 0)  # only "explode" the 2nd slice 

fig1, ax1 = plt.subplots(figsize=(6,6))
ax1.pie(sizes, explode=explode, labels=labels, autopct='%1.1f%%',
        shadow=True, startangle=90)
ax1.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.

plt.title('Tracking Methods')
plt.show()

In [None]:
#Code by Ducky https://www.kaggle.com/code/illgamhoduck/nfl-starter-eda

ENV_DIR = '../input'
DATA_DIR = f'{ENV_DIR}/elif-mars'

## MARS video with BENTO

**Citation**: Cristina Segalin, Jalani Williams, Tomomi Karigo, May Hui, Moriel Zelikowsky, Jennifer J Sun, Pietro Perona, David J Anderson, Ann Kennedy (2021) The Mouse Action Recognition System (MARS) software pipeline for automated analysis of social behaviors in mice eLife 10:e63720 https://doi.org/10.7554/eLife.63720

In [None]:
#Code by Ducky https://www.kaggle.com/code/illgamhoduck/nfl-starter-eda

from IPython.display import Video, display

def video(video_path, ratio=0.7):
    nfl_video = Video(f"{DATA_DIR}/{video_path}",
                      embed=True,
                      height=int(720 * ratio),
                      width=int(1280 * ratio))
    return nfl_video
    
video('elife-63720-video2 (1).mp4')

## test file

In [None]:
test = pd.read_csv('/kaggle/input/MABe-mouse-behavior-detection/test.csv')
test.tail()

## submission file

In [None]:
sub = pd.read_csv('/kaggle/input/MABe-mouse-behavior-detection/sample_submission.csv')
sub.tail()

## One DeliriousFly parquet file

In [None]:
#Read One parquet file. 
df = pd.read_parquet("../input/MABe-mouse-behavior-detection/train_tracking/DeliriousFly/1649549863.parquet")
df.tail()

## Mice Ippon

Lemmiwinks vs. WikiLeaks: Sniff, sniff.

![](https://pa1.aminoapps.com/6967/df93577746d0aec536c75fe5c8b6e5caf6f110e2r1-480-270_hq.gif)

## Draft Session: 2h:57m

#Acknowledgements:

Yulya Odintsova https://www.kaggle.com/code/yulyaodintsova/eda-logistic-regression-lgbmclassifier

OutlierPandas https://www.kaggle.com/code/abhyudaya456/s5e6-eda-for-predicting-optimal-fertilizers/notebook

H-Z-Ning  https://www.kaggle.com/code/hzning/top-10-solution-0-97525-esay-is-all-you