A machine learning project designed to identify and predict fraudulent job postings using data analysis and classification techniques.
- Overview
- Features
- Project Structure
- Dataset
- Installation
- Usage
- Model Performance
- Technologies Used
- Contributing
- License
This project aims to build a predictive model that can effectively distinguish between legitimate and fraudulent job postings. By analyzing various features of job postings, the model helps job seekers and platforms identify suspicious or fake job advertisements.
- Data preprocessing and exploration
- Feature engineering and selection
- Classification model development
- Performance evaluation and metrics
- Visualization of results
- Predictive analysis on new job postings
Fake-Job-Prediction/
├── README.md
├── notebooks/
│ └── [Jupyter notebooks with analysis]
├── data/
│ └── [Dataset files]
├── src/
│ └── [Python source files]
└── requirements.txt
The project uses job posting data with various features including:
- Job title and description
- Company information
- Location details
- Salary information
- Application requirements
- Employment type
- Required experience level
Target variable: Binary classification (Fake/Legitimate)
- Clone the repository:
git clone https://github.com/PJDEEPESH/Fake-Job-Prediction.git
cd Fake-Job-Prediction- Create a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install required dependencies:
pip install -r requirements.txt- Open the Jupyter notebooks:
jupyter notebook-
Execute notebooks in order:
- Start with data exploration and preprocessing
- Run feature engineering notebooks
- Execute model training and evaluation
-
For prediction on new data, use the trained model with:
from model import predict_fake_jobs
predictions = predict_fake_jobs(new_data)The model achieves robust performance metrics:
- Accuracy: [To be updated with actual metrics]
- Precision: [To be updated with actual metrics]
- Recall: [To be updated with actual metrics]
- F1-Score: [To be updated with actual metrics]
- Python 3.x - Programming language
- Jupyter Notebook - Interactive development and analysis
- Pandas - Data manipulation and analysis
- NumPy - Numerical computing
- Scikit-learn - Machine learning algorithms
- Matplotlib & Seaborn - Data visualization
Contributions are welcome! Feel free to:
- Fork the repository
- Create a feature branch (
git checkout -b feature/improvement) - Commit your changes (
git commit -m 'Add improvement') - Push to the branch (
git push origin feature/improvement) - Open a Pull Request
This project is open source and available under the MIT License.
For more information or questions, please open an issue on the GitHub repository.