JobShield AI is a job fraud and scam detection platform built to screen suspicious job postings, recruiter messages, and related email/domain signals before candidates engage with them. The project combines a Python backend with a React frontend to deliver explainable risk scoring instead of a simple yes/no result.
JobShield analyzes job-related content across multiple layers:
- It classifies job text with a BERT-based model.
- It inspects domains with WHOIS, DNS, PhishTank-style lists, and reputation checks.
- It checks SSL certificate quality and domain mismatch signals.
- It parses email headers for SPF, DKIM, DMARC, sender IP, and suspicious server chains.
- It evaluates recruiter behavior such as free email usage, payment pressure, generic sender style, and urgency language.
- It compares salary offers against expected ranges.
- It stores repeated entities and patterns in a Neo4j knowledge graph.
- It learns recurring scam phrasing and suspicious patterns over time.
The frontend turns those signals into a polished dashboard with navigation, sectioned storytelling, live scoring, and detailed per-signal cards.
Most scam detectors stop at one signal, such as text classification or domain lookup. JobShield is different because it combines many weak signals into one explainable risk decision.
- Multi-signal analysis instead of a single classifier.
- Explainable scoring with reasons and per-module detail.
- Graph memory that remembers recurring emails, domains, phones, and pattern types.
- Support for both job text and email/header style investigation.
- A user-facing frontend that shows how the backend actually thinks.
- A risk taxonomy that keeps the verdict easy to understand for non-technical users.
The project uses a normalized risk scale from 0 to 100.
- 0 to 39: Safe
- 40 to 69: Medium Risk
- 70 to 100: High Risk
These bands are used throughout the frontend and backend to make the result easy to interpret.
The backend includes a BERT-based classifier for suspicious job text. It looks for scam phrasing, deceptive hiring language, payment pressure, and other scam-style signals.
The domain analyzer checks:
- domain age
- DNS records
- suspicious top-level domains
- PhishTank-derived domain matches
- VirusTotal-style reputation signals
This helps identify newly created or reputation-poor domains that often appear in fake job campaigns.
This module detects brand impersonation and lookalike domains by comparing the extracted domain against known trusted brands.
The SSL module checks certificate age, issuer, subject matching, self-signed behavior, and expired certificates.
This module parses raw .eml files and extracts:
- SPF result
- DKIM result
- DMARC result
- sender domain
- sender IP
- received server chain
It adds risk when spoofing or unusual header behavior is present.
The recruiter analyzer combines heuristics and optional Gemini-based reasoning to score:
- free email usage
- generic recruiter names
- domain mismatch
- payment requests
- urgency language
- suspicious writing style
The salary analyzer extracts salary mentions from the text and compares them with an expected range for the detected role.
The graph analyzer stores and checks entities such as:
- email addresses
- domains
- phone numbers
This allows repeated scam infrastructure to be remembered across reports.
The pattern module detects recurring scam phrases such as:
- payment requests
- fake offer language
- urgency pressure
It also stores pattern counts in Neo4j so the system can build memory over time.
The React frontend is designed as an interactive, responsible UI rather than a basic form.
- Hero landing section with clear product positioning.
- Navigation between sections such as About, Features, Uniqueness, Risk Scores, Workflow, and Analyze.
- Feature cards that explain each backend analyzer.
- A risk-score section that explains Safe, Medium Risk, and High Risk bands.
- A workflow section that shows how the analysis works step by step.
- An analyzer panel that can call the backend and also show a fallback demo mode.
- Signal breakdown cards for each analyzer module.
- Responsive design for desktop and smaller screens.
flowchart LR
A[Job Text / Recruiter Message / Email] --> B[Frontend UI]
B --> C[Flask Backend]
C --> D[BERT Text Analyzer]
C --> E[Domain Intelligence]
C --> F[Domain Similarity]
C --> G[SSL Analyzer]
C --> H[Email Header Analyzer]
C --> I[Recruiter Detector]
C --> J[Salary Anomaly]
C --> K[Knowledge Graph]
C --> L[Pattern Learning]
D --> M[Final Risk Score]
E --> M
F --> M
G --> M
H --> M
I --> M
J --> M
K --> M
L --> M
M --> N[Explainable Result in UI]
Contains the Flask app, analyzers, ML model code, training pipeline, domain scripts, scheduler, and sample data.
Important parts include:
app.pyfor API routes.src/analyzers/for all scam detection modules.src/models/for the BERT wrapper.training/for dataset loading and model retraining.scripts/for phishing domain extraction and filtering.data/for datasets, samples, and seed data.
Contains the React + Tailwind UI used to present the analysis results.
Important parts include:
src/App.jsxfor the page layout and analyzer experience.src/components/for navigation, cards, and footer UI.src/data/for content used in the sections.src/lib/for backend fetch helpers.
The backend exposes routes such as:
POST /predictPOST /analyze-jobPOST /domain-analysisPOST /ssl-analysisPOST /domain-similarityGET /email-analysisPOST /recruiter-analysisPOST /salary-analysisPOST /graph-analysisPOST /pattern-analysis
Install backend dependencies and start the Flask app:
cd backend
pip install -r requirements.txt
python app.pyInstall frontend dependencies and start the Vite dev server:
cd frontend
npm install
npm run devThe frontend is configured to talk to the backend through the local development setup.
The backend expects external service keys for some optional analyzers and graph features.
Typical values include:
GEMINI_API_KEYHUNTER_API_KEYRAPID_API_KEYNEO4J_URLNEO4J_USERNEO4J_PASSWORD- VirusTotal-related API key used by the domain intelligence module
Do not commit secrets to source control.
The repository includes sample and training assets that support the detection pipeline:
- fake job posting dataset
- phishing domain lists
- sample real and fake email files
- seed graph data
- trained model artifacts and tokenizer files
Some backend features depend on external APIs and local services. If one service is unavailable, the UI can still show the frontend fallback analysis experience, but the live backend result may be partial.
The project is structured so it can be expanded with:
- analysis history and report export
- better visual explainability charts
- more model training and feedback loops
- stronger graph visualizations
- alerting and triage workflows for recruiters or candidates
JobShield AI is not just a job scam classifier. It is a layered detection and explanation system that combines machine learning, domain investigation, email analysis, graph memory, and a user-friendly interface to help people review suspicious hiring messages more safely.