Skip to content

Kallappa2005/JobShield-AI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 

Repository files navigation

JobShield AI

JobShield AI is a job fraud and scam detection platform built to screen suspicious job postings, recruiter messages, and related email/domain signals before candidates engage with them. The project combines a Python backend with a React frontend to deliver explainable risk scoring instead of a simple yes/no result.

What The Project Does

JobShield analyzes job-related content across multiple layers:

  • It classifies job text with a BERT-based model.
  • It inspects domains with WHOIS, DNS, PhishTank-style lists, and reputation checks.
  • It checks SSL certificate quality and domain mismatch signals.
  • It parses email headers for SPF, DKIM, DMARC, sender IP, and suspicious server chains.
  • It evaluates recruiter behavior such as free email usage, payment pressure, generic sender style, and urgency language.
  • It compares salary offers against expected ranges.
  • It stores repeated entities and patterns in a Neo4j knowledge graph.
  • It learns recurring scam phrasing and suspicious patterns over time.

The frontend turns those signals into a polished dashboard with navigation, sectioned storytelling, live scoring, and detailed per-signal cards.

Why This Project Is Different

Most scam detectors stop at one signal, such as text classification or domain lookup. JobShield is different because it combines many weak signals into one explainable risk decision.

Key differences

  • Multi-signal analysis instead of a single classifier.
  • Explainable scoring with reasons and per-module detail.
  • Graph memory that remembers recurring emails, domains, phones, and pattern types.
  • Support for both job text and email/header style investigation.
  • A user-facing frontend that shows how the backend actually thinks.
  • A risk taxonomy that keeps the verdict easy to understand for non-technical users.

Risk Score Meaning

The project uses a normalized risk scale from 0 to 100.

  • 0 to 39: Safe
  • 40 to 69: Medium Risk
  • 70 to 100: High Risk

These bands are used throughout the frontend and backend to make the result easy to interpret.

Core Backend Features

1. Text Classification

The backend includes a BERT-based classifier for suspicious job text. It looks for scam phrasing, deceptive hiring language, payment pressure, and other scam-style signals.

2. Domain Intelligence

The domain analyzer checks:

  • domain age
  • DNS records
  • suspicious top-level domains
  • PhishTank-derived domain matches
  • VirusTotal-style reputation signals

This helps identify newly created or reputation-poor domains that often appear in fake job campaigns.

3. Domain Similarity

This module detects brand impersonation and lookalike domains by comparing the extracted domain against known trusted brands.

4. SSL Certificate Analysis

The SSL module checks certificate age, issuer, subject matching, self-signed behavior, and expired certificates.

5. Email Header Analysis

This module parses raw .eml files and extracts:

  • SPF result
  • DKIM result
  • DMARC result
  • sender domain
  • sender IP
  • received server chain

It adds risk when spoofing or unusual header behavior is present.

6. Recruiter Detection

The recruiter analyzer combines heuristics and optional Gemini-based reasoning to score:

  • free email usage
  • generic recruiter names
  • domain mismatch
  • payment requests
  • urgency language
  • suspicious writing style

7. Salary Anomaly Detection

The salary analyzer extracts salary mentions from the text and compares them with an expected range for the detected role.

8. Knowledge Graph Memory

The graph analyzer stores and checks entities such as:

  • email addresses
  • domains
  • phone numbers

This allows repeated scam infrastructure to be remembered across reports.

9. Pattern Learning

The pattern module detects recurring scam phrases such as:

  • payment requests
  • fake offer language
  • urgency pressure

It also stores pattern counts in Neo4j so the system can build memory over time.

Frontend Features

The React frontend is designed as an interactive, responsible UI rather than a basic form.

  • Hero landing section with clear product positioning.
  • Navigation between sections such as About, Features, Uniqueness, Risk Scores, Workflow, and Analyze.
  • Feature cards that explain each backend analyzer.
  • A risk-score section that explains Safe, Medium Risk, and High Risk bands.
  • A workflow section that shows how the analysis works step by step.
  • An analyzer panel that can call the backend and also show a fallback demo mode.
  • Signal breakdown cards for each analyzer module.
  • Responsive design for desktop and smaller screens.

Project Architecture

flowchart LR
	A[Job Text / Recruiter Message / Email] --> B[Frontend UI]
	B --> C[Flask Backend]
	C --> D[BERT Text Analyzer]
	C --> E[Domain Intelligence]
	C --> F[Domain Similarity]
	C --> G[SSL Analyzer]
	C --> H[Email Header Analyzer]
	C --> I[Recruiter Detector]
	C --> J[Salary Anomaly]
	C --> K[Knowledge Graph]
	C --> L[Pattern Learning]
	D --> M[Final Risk Score]
	E --> M
	F --> M
	G --> M
	H --> M
	I --> M
	J --> M
	K --> M
	L --> M
	M --> N[Explainable Result in UI]
Loading

Main Folders

backend/

Contains the Flask app, analyzers, ML model code, training pipeline, domain scripts, scheduler, and sample data.

Important parts include:

  • app.py for API routes.
  • src/analyzers/ for all scam detection modules.
  • src/models/ for the BERT wrapper.
  • training/ for dataset loading and model retraining.
  • scripts/ for phishing domain extraction and filtering.
  • data/ for datasets, samples, and seed data.

frontend/

Contains the React + Tailwind UI used to present the analysis results.

Important parts include:

  • src/App.jsx for the page layout and analyzer experience.
  • src/components/ for navigation, cards, and footer UI.
  • src/data/ for content used in the sections.
  • src/lib/ for backend fetch helpers.

Backend API Endpoints

The backend exposes routes such as:

  • POST /predict
  • POST /analyze-job
  • POST /domain-analysis
  • POST /ssl-analysis
  • POST /domain-similarity
  • GET /email-analysis
  • POST /recruiter-analysis
  • POST /salary-analysis
  • POST /graph-analysis
  • POST /pattern-analysis

How To Run The Project

Backend

Install backend dependencies and start the Flask app:

cd backend
pip install -r requirements.txt
python app.py

Frontend

Install frontend dependencies and start the Vite dev server:

cd frontend
npm install
npm run dev

The frontend is configured to talk to the backend through the local development setup.

Environment Variables

The backend expects external service keys for some optional analyzers and graph features.

Typical values include:

  • GEMINI_API_KEY
  • HUNTER_API_KEY
  • RAPID_API_KEY
  • NEO4J_URL
  • NEO4J_USER
  • NEO4J_PASSWORD
  • VirusTotal-related API key used by the domain intelligence module

Do not commit secrets to source control.

Data And Model Assets

The repository includes sample and training assets that support the detection pipeline:

  • fake job posting dataset
  • phishing domain lists
  • sample real and fake email files
  • seed graph data
  • trained model artifacts and tokenizer files

Notes On Reliability

Some backend features depend on external APIs and local services. If one service is unavailable, the UI can still show the frontend fallback analysis experience, but the live backend result may be partial.

Future Extensions

The project is structured so it can be expanded with:

  • analysis history and report export
  • better visual explainability charts
  • more model training and feedback loops
  • stronger graph visualizations
  • alerting and triage workflows for recruiters or candidates

Summary

JobShield AI is not just a job scam classifier. It is a layered detection and explanation system that combines machine learning, domain investigation, email analysis, graph memory, and a user-friendly interface to help people review suspicious hiring messages more safely.

About

JobShield AI is a job fraud and scam detection platform built to screen suspicious job postings, recruiter messages, and related email/domain signals before candidates engage with them.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors