self.text = text
sample_text = """Аналитик данных с опытом работы. Окончил СГАУ со степенью магистра по математике.
Имею опыт работы с различными БД и в написании макросов. Работал с различными фреймворками для анализа данных на Python.
Участвовал в разработке нескольких систем для анализа данных. Есть примеры своих проектов по Data Science на GitHub:""".split("\n")
self.sample_text = "\n".join(element.strip() for element in sample_text)
from SpacyToolKit.Tools import SpacyTools
model = SpacyTools()
SpacyTools is a class that allows you to quickly create and use models for stacking. The main language is English.
model.load_text("your text here")
The function simply saves the received text in self.text.
model.load_file("path to file.txt")
The function reads the text from the file to which the path is specified. Writes text to self.text.
model.create(nlp=None)
The function accepts self and nlp (responsible for the language model). Allows you to quickly create models for stacking. Returns a doc object.
from SpacyToolKit.Tools import SpacyTools
import en_core_web_sm
model = SpacyTools()
model.load_text(model.sample_text)
nlp = en_core_web_sm.load()
model.create(nlp)
For all examples to work:
from SpacyToolKit.Tools import *
text = "имя"
print(get_translate(text)) #text translation into english
For this function to work, you must install the module - googletrans. This function takes one argument - string and returns - string. Any other argument will result in an error.
....
doc = model.create()
print(sort_doc(doc)) #filter the results
This function accepts the doc object obtained by using SpacyTools (). Create (). During the operation of the algorithm, excess values are removed: numbers, omissions. The function returns a cleared list of objects of type string.
....
data = cleaning(data) #after use sort_doc
print(data)
The function takes as input a list of objects of type string and returns a set. The occurrences of a small word in a larger one are deleted. Sometimes it’s worth running 2 times.
data = "list or string"
print(words_count(data))
The function takes a list and returns a list in the format: (number of entries, word). By default, all words are displayed.
data = ["abx", "abc", "agb"]
print(find_copy(data))
The function takes a list and returns a set in the format: (percent similarity, word) You can use max () to find the most similar word.
data = ["abx", "abc", "agb"]
delete_copy(data)
print(data)
The function takes a list and deletes the most similar word (if percentage> = 0.75). The find_copy () function is required to work.
help(get_translate)