Company name matcher

Comparison of company names and search for similar companies in our database

Pipeline

Project structure

Our solution consists of several main parts:

Сlassifier - solving a classification problem to determine whether the names are one firm
Recommendation - solving the recommendation problem in order to suggest top n similar company names for the name of one company

There is a strict interface for each of the parts. Each part is independent of the other. A special interface has been implemented for this module.

This rep presents three methods for solving the problem:

Using Bert
Using Sentence Transformers
Using FastText

.
├── data
├── notebooks               <- Jupyter notebooks
├── README.md               <- The top-level README for developers using this project
├── requirements.txt        <- The requirements file for reproducing the analysis environment
├── weights                 <- Empty folder for saving results
├── src
│   ├── bert                <- Folder that contains bert solution
│   ├── fasttext            <- Folder that contains fasttext solution
│   ├── sentence_bert       <- Folder that contains tentence transformers solution
│   ├── utils
└── tutorial.ipynb          <- Demonstration work

MLFlow

To track the results of experiments, we used MLFlow. MLFlow - an open source platform for the machine learning lifecycle. We used the architecture shown below in the picture.

With the help of MLFlow, the following tasks were solved:

Collect summary information
Save artifacts
Manage machine learning models (in process)

Metrics

Classification task

Model	F1 Macro Score
Bert	0.97
Sentence Bert	0,61
FastText	0.87

Sentence Bert has worse results than Bert due to the peculiarities of the models. Sentence Bert is used to build embeds and cosine distance is calculated from them, and the names of companies that had similar words will have a similar representation of embeds.

Usage

We tested three different models:

bert
sentence transformer
FastText

Sava artifacts You can combine them however you like. Be careful with experiments, look at the results.

To demonstrate the results of the project, you can use a tutorial.ipynb Before using it, you need to install the project dependencies:

pip install -r requirements.txt

After installing the dependencies, you need to be in the root folder of the repository run commands:

# Linux command
chmod +x load_data.sh
./load_data.sh

Link to the directory with all weights that are used in this work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

docker

docker

notebooks

notebooks

src

src

weights

weights

.env.example

.env.example

.gitignore

.gitignore

README.md

README.md

docker-compose.yaml

docker-compose.yaml

load_data.sh

load_data.sh

requirements.txt

requirements.txt

tutorial.ipynb

tutorial.ipynb

Repository files navigation

Company name matcher

Pipeline

Project structure

MLFlow

Metrics

Usage

Reference

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
data		data
docker		docker
notebooks		notebooks
src		src
weights		weights
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
docker-compose.yaml		docker-compose.yaml
load_data.sh		load_data.sh
requirements.txt		requirements.txt
tutorial.ipynb		tutorial.ipynb

In48semenov/Company-name-matcher

Folders and files

Latest commit

History

Repository files navigation

Company name matcher

Pipeline

Project structure

MLFlow

Metrics

Usage

Reference

About

Resources

Stars

Watchers

Forks

Languages