JobShield AI

JobShield AI is a job fraud and scam detection platform built to screen suspicious job postings, recruiter messages, and related email/domain signals before candidates engage with them. The project combines a Python backend with a React frontend to deliver explainable risk scoring instead of a simple yes/no result.

What The Project Does

JobShield analyzes job-related content across multiple layers:

It classifies job text with a BERT-based model.
It inspects domains with WHOIS, DNS, PhishTank-style lists, and reputation checks.
It checks SSL certificate quality and domain mismatch signals.
It parses email headers for SPF, DKIM, DMARC, sender IP, and suspicious server chains.
It evaluates recruiter behavior such as free email usage, payment pressure, generic sender style, and urgency language.
It compares salary offers against expected ranges.
It stores repeated entities and patterns in a Neo4j knowledge graph.
It learns recurring scam phrasing and suspicious patterns over time.

The frontend turns those signals into a polished dashboard with navigation, sectioned storytelling, live scoring, and detailed per-signal cards.

Why This Project Is Different

Most scam detectors stop at one signal, such as text classification or domain lookup. JobShield is different because it combines many weak signals into one explainable risk decision.

Key differences

Multi-signal analysis instead of a single classifier.
Explainable scoring with reasons and per-module detail.
Graph memory that remembers recurring emails, domains, phones, and pattern types.
Support for both job text and email/header style investigation.
A user-facing frontend that shows how the backend actually thinks.
A risk taxonomy that keeps the verdict easy to understand for non-technical users.

Risk Score Meaning

The project uses a normalized risk scale from 0 to 100.

0 to 39: Safe
40 to 69: Medium Risk
70 to 100: High Risk

These bands are used throughout the frontend and backend to make the result easy to interpret.

Core Backend Features

1. Text Classification

The backend includes a BERT-based classifier for suspicious job text. It looks for scam phrasing, deceptive hiring language, payment pressure, and other scam-style signals.

2. Domain Intelligence

The domain analyzer checks:

domain age
DNS records
suspicious top-level domains
PhishTank-derived domain matches
VirusTotal-style reputation signals

This helps identify newly created or reputation-poor domains that often appear in fake job campaigns.

3. Domain Similarity

This module detects brand impersonation and lookalike domains by comparing the extracted domain against known trusted brands.

4. SSL Certificate Analysis

The SSL module checks certificate age, issuer, subject matching, self-signed behavior, and expired certificates.

5. Email Header Analysis

This module parses raw .eml files and extracts:

SPF result
DKIM result
DMARC result
sender domain
sender IP
received server chain

It adds risk when spoofing or unusual header behavior is present.

6. Recruiter Detection

The recruiter analyzer combines heuristics and optional Gemini-based reasoning to score:

free email usage
generic recruiter names
domain mismatch
payment requests
urgency language
suspicious writing style

7. Salary Anomaly Detection

The salary analyzer extracts salary mentions from the text and compares them with an expected range for the detected role.

8. Knowledge Graph Memory

The graph analyzer stores and checks entities such as:

email addresses
domains
phone numbers

This allows repeated scam infrastructure to be remembered across reports.

9. Pattern Learning

The pattern module detects recurring scam phrases such as:

payment requests
fake offer language
urgency pressure

It also stores pattern counts in Neo4j so the system can build memory over time.

Frontend Features

The React frontend is designed as an interactive, responsible UI rather than a basic form.

Hero landing section with clear product positioning.
Navigation between sections such as About, Features, Uniqueness, Risk Scores, Workflow, and Analyze.
Feature cards that explain each backend analyzer.
A risk-score section that explains Safe, Medium Risk, and High Risk bands.
A workflow section that shows how the analysis works step by step.
An analyzer panel that can call the backend and also show a fallback demo mode.
Signal breakdown cards for each analyzer module.
Responsive design for desktop and smaller screens.

Project Architecture

flowchart LR
	A[Job Text / Recruiter Message / Email] --> B[Frontend UI]
	B --> C[Flask Backend]
	C --> D[BERT Text Analyzer]
	C --> E[Domain Intelligence]
	C --> F[Domain Similarity]
	C --> G[SSL Analyzer]
	C --> H[Email Header Analyzer]
	C --> I[Recruiter Detector]
	C --> J[Salary Anomaly]
	C --> K[Knowledge Graph]
	C --> L[Pattern Learning]
	D --> M[Final Risk Score]
	E --> M
	F --> M
	G --> M
	H --> M
	I --> M
	J --> M
	K --> M
	L --> M
	M --> N[Explainable Result in UI]

Main Folders

`backend/`

Contains the Flask app, analyzers, ML model code, training pipeline, domain scripts, scheduler, and sample data.

Important parts include:

app.py for API routes.
src/analyzers/ for all scam detection modules.
src/models/ for the BERT wrapper.
training/ for dataset loading and model retraining.
scripts/ for phishing domain extraction and filtering.
data/ for datasets, samples, and seed data.

`frontend/`

Contains the React + Tailwind UI used to present the analysis results.

Important parts include:

src/App.jsx for the page layout and analyzer experience.
src/components/ for navigation, cards, and footer UI.
src/data/ for content used in the sections.
src/lib/ for backend fetch helpers.

Backend API Endpoints

The backend exposes routes such as:

POST /predict
POST /analyze-job
POST /domain-analysis
POST /ssl-analysis
POST /domain-similarity
GET /email-analysis
POST /recruiter-analysis
POST /salary-analysis
POST /graph-analysis
POST /pattern-analysis

How To Run The Project

Backend

Install backend dependencies and start the Flask app:

cd backend
pip install -r requirements.txt
python app.py

Frontend

Install frontend dependencies and start the Vite dev server:

cd frontend
npm install
npm run dev

The frontend is configured to talk to the backend through the local development setup.

Environment Variables

The backend expects external service keys for some optional analyzers and graph features.

Typical values include:

GEMINI_API_KEY
HUNTER_API_KEY
RAPID_API_KEY
NEO4J_URL
NEO4J_USER
NEO4J_PASSWORD
VirusTotal-related API key used by the domain intelligence module

Do not commit secrets to source control.

Data And Model Assets

The repository includes sample and training assets that support the detection pipeline:

fake job posting dataset
phishing domain lists
sample real and fake email files
seed graph data
trained model artifacts and tokenizer files

Notes On Reliability

Some backend features depend on external APIs and local services. If one service is unavailable, the UI can still show the frontend fallback analysis experience, but the live backend result may be partial.

Future Extensions

The project is structured so it can be expanded with:

analysis history and report export
better visual explainability charts
more model training and feedback loops
stronger graph visualizations
alerting and triage workflows for recruiters or candidates

Summary

JobShield AI is not just a job scam classifier. It is a layered detection and explanation system that combines machine learning, domain investigation, email analysis, graph memory, and a user-friendly interface to help people review suspicious hiring messages more safely.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
backend		backend
frontend		frontend
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

JobShield AI

What The Project Does

Why This Project Is Different

Key differences

Risk Score Meaning

Core Backend Features

1. Text Classification

2. Domain Intelligence

3. Domain Similarity

4. SSL Certificate Analysis

5. Email Header Analysis

6. Recruiter Detection

7. Salary Anomaly Detection

8. Knowledge Graph Memory

9. Pattern Learning

Frontend Features

Project Architecture

Main Folders

backend/

frontend/

Backend API Endpoints

How To Run The Project

Backend

Frontend

Environment Variables

Data And Model Assets

Notes On Reliability

Future Extensions

Summary

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`backend/`

`frontend/`

Packages