Here are the tools and technologies that I have been working with or have worked with:
This repository represents a python library to work with TNavigator data. You will be able to parse certain data and manage it. Optionally, you can use this library via cli. This is a closed repository for now.
- Python
- Numpy
- Pandas
- Pydantic
- Networkx
- ecl_data_io
- json
- yaml
- click
- logging
This repository covers the topic on how to build the recommendation system and deploy it as a ML microservice.
This system works with the data from ecommerce-dataset and the goal is to build a recommendation system that would increase the number of add_to_cart
events.
- Apache Airflow
- MLFlow
- Python
- NetworkX
- Pydantic
- scikit-learn
- CatBoost
- LightGBM
- XGBoost
- implicit
- Docker
- FastAPI
- Redis
- Prometheus
- Grafana
- Uvicorn
This repository covers the topic on how to build the recommendation system and deploy it as a ML microservice.
This system works with the data from Yandex Music
and the goal is to recommend new tracks to the users.
- MLFlow
- Spark
- Python
- Pydantic
- scikit-learn
- CatBoost
- implicit
- Docker
- FastAPI
- Redis
- Prometheus
- Grafana
- Uvicorn
This repository covers all steps on how to deploy a ML microservice using FastAPI, Python, Docker and Redis and monitor it via Prometheus and Grafana. The microservice is able to reject requests if there are too many of them, validate an input used by the ML model (e.g. a feature must be within a certain range) and send back a proper response indicating all necessary info about the nature of the error if it occurs.
- Docker
- FastAPI
- Redis
- Prometheus
- Grafana
- Python
- Pydantic
- Uvicorn
This repository contains a custom Extract, Transform, Load (ETL) pipeline that utilizes Docker, PostgreSQL, Python and Cron to model an automatic ETL pipeline. Realty Data from Yandex is automatically parsed and then processed via the pipeline.
- Docker
- CronJob
- Postgres
- Python
- Pandas
- SQLAlchemy
- Requests
- BeautifulSoap
- Geopy
An example of how to automate parsing process of news rss feeds (or any news sites with certain modifications) using Cron Jobs and (optionally) proxies, which are also dynamically parsed.
- Cronjob
- Python
- crontab
- feedparser
- bs4
- proxy_parse
An implementation of the hybrid model for Russian News Sentiment Analysis, which is based on neural networks and stacking approach. The model can be used for predicting the sentiment of news text
- PyPI
- Python
- PyTorch
- transformers
- joblib
A deep learning application built using FastAPI, Flask and Docker. It allows users to transform images based on specified textual prompts using the frontend service made via Flask. The application leverages state-of-the-art models from Hugging Face to perform image transformations, making it a useful tool for various image processing tasks.
- Docker
- FastAPI
- Flask
- JavaScript
- HTML5
- Python
- PyTorch
- transformers
- PIL
- OpenCV
- Cuda
This project is designed to give a simple example of how to use Apache Airflow for managing ML workflows based on the telecompany churn dataset stored in a PostgreSQL database. Specifically, it covers how to build an ETL pipeline by utilising DAGs, plugins, hooks and callbacks (to Telegram).
- Docker
- Airflow
- S3
- Postgres
- Python
- Pandas
- SQLAlchemy
- Requests
- Telegram
A simple example of how to use DVC for logging ML models based on the telecompany churn dataset stored in a PostgreSQL database.
- Docker
- DVC
- S3
- Postgres
- Python
- Pandas
- scikit-learn
- CatBoost
- joblib
- SQLAlchemy
This is a project in which both Airflow and DVC are utilised. Airflow is used to automate ETL pipelines, while DVC is used for logging ML models. A dataset is based on realty data from Yandex. The dataset is stored within S3 storage in a PostgreSQL database.
- Docker
- Airflow
- DVC
- S3
- Postgres
- Python
- Pandas
- scikit-learn
- CatBoost
- joblib
- SQLAlchemy
- Telegram
This is a project which covers the buisiness problem of improving the key metrics of the model for predicting the value of Yandex Real Estate flats. The goal is to make the training process and other related processes easily repeatable and improve key model metrics that impact the company's business metrics, particularly the increase in successful transactions. MLflow framework is considered in order to run a large number of experiments and ensure reproducibility.
- MLflow
- S3
- Postgres
- Python
- Pandas
- scikit-learn
- CatBoost
- joblib
- Pydantic
An implementation of the special tools for Minecraft, which are built on the top of mcpi. Can be used for building any photos directly in the game
- PyPI
- Python
- mcpi
- OpenCV
Here are repositories which are related to Kaggle competitions:
PS: I have competed in a lot more Kaggle competitions, but the corresponding code is somewhere missing on the local machine. Probably, I will redo the coding in the future and publish new repositories.