Skip to content

It is a small library dedicated to the fast Spacy model stack, preparing data for training and training models.

License

Notifications You must be signed in to change notification settings

Lednik7/SpacyToolKit

Repository files navigation

SpacyToolKit

It is a small library dedicated to the fast Spacy model stack, preparing data for training and training models.

Choose the language of documantation: English here

Getting started

Installing:

!git clone https://github.com/Lednik7/SpacyToolKit.git
!python -m pip install -r SpacyToolKit/requirements.txt

Needed packages:

For the library to work correctly, you need to download packages

pip install spacy
pip install googletrans
pip install pymorphy2==0.8
pip install sklearn
pip install numpy

Your first model

Prerequisites:

Before you run the code below, you need to install one of the models:

Russian models:
!git clone -b v2.1 https://github.com/buriy/spacy-ru.git && cp -r ./spacy-ru/ru2/.
English models:
!python -m spacy download en_core_web_sm #size 11 mb
!python -m spacy download en_core_web_md #size 48 mb
!python -m spacy download en_core_web_lg #size 746 mb

More information about models en and ru

Simple model:

To begin, we import the necessary functions and the main class:

from SpacyToolKit.Tools import SpacyTools, sort_doc
from SpacyToolKit.other import get_translate
import en_core_web_sm #!python -m spacy download en_core_web_sm

Now create an instance of the class:

model = SpacyTools()

Since the model works best in English, we’ll use a translation from Google:

text = model.sample_text #your text
trans = get_translate(text) #text translation into english

Now we are ready to load the text into the model and make a prediction:

model.load_text(trans)
nlp = en_core_web_sm.load()
doc = model.create(nlp)

Let's look at the result:

print(sort_doc(doc)) #filter the results
print(model.text)

Output:

Time: 1.39 - en_sm
['Python', 'Data Science', 'GitHub']
Data Analyst with work experience. He graduated from SSAU with a master's degree in mathematics.
I have experience working with various databases and in writing macros. Worked with various data analysis frameworks in Python.
He participated in the development of several systems for data analysis. There are examples of their Data Science projects on GitHub:

Fine! Now you are ready to delve into the topic.

You can see examples here

License

This project is licensed under the MIT License - see the LICENSE file for details

About

It is a small library dedicated to the fast Spacy model stack, preparing data for training and training models.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages