Skip to content

Tourlat/AAA_project

Repository files navigation

AAA_project

A lightweight machine learning project that builds a CNN-based file type classifier inspired by Google's Magika. The model distinguishes JavaScript files from other code types using byte-level analysis.


Prerequisites

Required Python Packages

pip install requests tqdm tensorflow numpy scikit-learn python-dotenv seaborn

Github API Token

To avoid GitHub API rate limits, create a personal access token:

  1. Go to GitHub Settings → Developer settings → Personal access tokens
  2. Generate a new token (classic) with public_repo scope
  3. Create a .env file in the project root:
GITHUB_TOKEN=ghp_your_token_here

Or set the environment variable directly in your shell:

export GITHUB_TOKEN=ghp_your_token_here

Without a token, you're limited to 60 API requests per hour.


Roadmap

v1

First we will implement a basic version of the Magika model to classify JavaScript files from other file types.

  • Data Collection (JS and non-JS files from GitHub)
  • Data Preprocessing
  • Implement the CNN model from Magika paper
  • Train and Evaluate the Model

Sources

About

Github repo for our AAA project.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors