credit-card-fraud-detection

The aim of the project is to build a model where the forecasted variable is the number of shipments in a given week. This repository contains code written in Python using Jupyter Notebooks, along with a requirements.txt file for dependency management.

Overview

This project uses XGBoost models to effectively forecast posting volumes for multiple clients. There is separate model for Customer_X due to the provided forecast and significantly higher average number of packages sent. Model build for remaining clients incorporates laged volume values and dummy variables to distinguish clients.

Dataset

Data description:
- DimDates - CSV File - Database with information about dates
- PostingVolumes - Parquet File - Database with daily shipments
- X_ClientsORDERS - EXCEL File - Data provided by the client (X) since the beginning of 2023. Data for the upcoming month is sent at the end of the previous one. The file contains the number of orders (on the client's platform) expected by the client's analytics team for each day of the upcoming month for the APM product.
- Zadanie_Dane_Temperatura - Folder with CSV Files - The folder contains CSV files where each file stores data on temperatures
Location in this Project:
The dataset files are stored in the input/ folder.

Installation

Clone the Repository:

git clone https://github.com/alestankiewicz/shipment-volume-forecating.git
cd shipment-volume-forecating

Set Up a Virtual Environment (Optional but Recommended):
```
python -m venv venv
source venv/bin/activate
```
Install the Required Packages:
```
pip install -r requirements.txt
```

Usage

Launch Jupyter Notebook:
```
jupyter notebook
```
Open the notebooks:
- data_preparation.ipynb for data preprocessing and exploratory data analysis.
- model_development.ipynb for training models and evaluating performance.

Methodology

Data Preprocessing:
- Handling missing data and analysis of datasets.
Model Training & Hyperparameter Tuning:
- XGBoost regressor is used on the preprocessed data.
- Rolling window split is used to split data.
- Gaussian process minimization (gp_minimize()) is applied for hyperparameter optimization to find the best model configurations.
Evaluation:
- The models are evaluated using RMSE.

Results

Detailed visualizations (provided within the notebooks) that show data analysis findings and show model results.
suggestions for future improvements.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src		src
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

credit-card-fraud-detection

Overview

Dataset

Installation

Usage

Methodology

Results

About

Uh oh!

Uh oh!

Languages

alestankiewicz/shipment-volume-forecasting

Folders and files

Latest commit

History

Repository files navigation

credit-card-fraud-detection

Overview

Dataset

Installation

Usage

Methodology

Results

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages