A collaboration between Systematic and 'camelCaseCrew' |
'PredictIT' is a system for predicting and visualizing device failures. It was developed by a team of ITU students known as 'camelCaseCrew', as a part of a course on scrum and software development in large teams.
Docker and docker-compose: https://docs.docker.com/engine/install/ubuntu/
GIT LFS: https://git-lfs.com/
NodeJS (for the node package manager): https://nodejs.org/en/download - this is only necessary for testing
Since the file /data_generator/data/harddrive.csv
is too large for git to pull, one has to use 'GIT LFS'.
git lfs fetch
git lfs checkout
Alternatively, the file can be downloaded directly.
The project is run using Docker, so make sure to first have Docker running.
To run the program, there are 3 options (all the same)
- Run the
start.sh
file - Run the terminal command
make compose_up_attached
for attached mode ormake compose_up
for detached mode - Run the terminal command
docker compose -f docker/docker-compose.yml up
Then visit http://localhost:3001
Note: sometimes the 'rabbitmq' service is unhealthy - just delete all containers and run it again.
To build, run make build_services
It is also possible to run the frontend seperately (without docker) from the rest of the system.
cd frontend/
npm install
npm run dev
Then visit http://localhost:3003
The data stream simulation can be run at 3 different levels of intensities. You first have to run the system with either forementioned command. See data_generator/app/main.py to change these values.
- 250 records pr minute:
make low_throughput_data_simulation
- 1000 records per minute:
make medium_throughput_data_simulation
- 4000 records per minute:
make high_throughput_data_simulation
To launch multiple ML-workers at once, the --scale
flag can be used.
E.g. the following command will launch 3 instances of the ML-worker on startup:
docker-compose -f docker/docker-compose.yaml up --scale predictive_maintenance=3
The default is 50.000 rows taken out of data_generator/data/harddrive.csv
.
Unfortunately there is no simple way of configuring this amount, it has to be hardcoded.
This can be done on line 35 of data_generator/data_generator/app/main.py
.
The last argument given to CSV_Parser can be changed to any number (the last digit is not allowed to be zero).
These are the files that are unique for the specific pages.
- LogData (Logic)
- LogDataComponent (Visuals)
- FeedbackButton
- tailwind.config.js (Colours)
ClickableIframe OverviewButton
Is currently self-contained. Right now the context global.tsx supports this page with a global filter value, this is currently hardcoded to: 1 = Healthy 2 = Risk 3 = Critical The other pages and components modify this value, and this page will change accordingly.
- Navbar
- NavbarButton
- Logo
- BackButton
General connections between system services |
Frontend - visualizes health of devices. The ability to register new email-addresses to the alerting system. The ability to view and 'flag' specific device health logs.
Grafana - this is where graphs on the frontend are sourced from.
Prometheus - timeseries database responsible for storing all processed device health data. Responsible for detecting when to send alerts.
Alert manager - manages alerts received from prometheus. Registers and deregisters email-adresses. Generates emails to be sent.
SMTP server - Sends alert emails.
Data aggregator - collects processed data and presents it in a way such that prometheus can collect the data.
RabbitMQ - message queue that backend-services interact with when pulling and pushing processed/unprocessed data.
ML Worker - pulls in unprocessed data from the message queue, processes it (finds device health), puts the processed data back again.
Data generator - pushes unprocessed data to the message queue.
Database - stores unprocessed data, as well as logs that have been 'flagged' on the frontend.
The different services are available at the following ports:
- Frontend: http://localhost:3001
- If run with
npm run dev
: http://localhost:3003
- If run with
- Grafana: http://localhost:3000
- with credentials
admin:admin
- with credentials
- Prometheus: http://localhost:9090
- RabbitMQ: http://localhost:15672
- with credentials
guest:guest
- with credentials
- Data Aggregator: http://localhost:8003/metrics
- Alert manager: http://localhost:9093
To register a new email with the alert manager, make sure 'curl' or a similar tool is installed.
Registering:
curl -X PUT http://localhost:5000/update/<email>
curl -X POST http://localhost:9093/-/reload
Deregistering:
curl -X DELETE http://localhost:5000/remove/<email>
curl -X POST http://localhost:9093/-/reload
In a regular runtime of the program, you are able to flag reported logs in the history page, these reported logs are saved in a table in the SQL database. This table allows you to pinpoint all the SMART-values that went into the prediction using a logs unique ID number. This can then be sent to a retraining service, which will have to be implemented from scratch.
There are two different types of tests: End-to-end cypress tests and testing of specific services
# from project root
make # or another command to run the system
npm install
npx cypress run
One can also use npx cypress open
, to open the testing UI.
Screenshots and videos of the tests end up in cypress/screenshots
and cypress/videos
respectively.
There are tests for the following services:
- data-aggregator
- data-stream
- feedback-storage
- ml-worker
- rabbitmq
docker build -t unit_tests ./unit_tests/<service>/
# from project root
make # or any other command to run the system
# wait till services are healthy/running
docker run --network docker_predictive-maintenance-net unit_tests
Replace <service>
with whichever service is to be tested from the above list.