# Example usage

To use `resumeanalyser` in a project:

In [None]:
import resumeanalyser
print(resumeanalyser.__version__)

# Example Usage of Text Extraction Functions

`resumeanalyser` allows you to read in text from PDF or docx documents, which are the formats many resumes come in. We will be using the simple text file stores in the tests/data directory of this project repository to demonstrate how these functions work.

## Extraction from docx documents
The function `docx_to_text` allows you to extract text from Word documents and store the text as a string. It takes in a path name ending in `.docx` as an input.

In [None]:
from resumeanalyser.text_reading import *

In [None]:
# Reading in text from Word document
simple_docx_path = "../tests/data/simple_text.docx"
sample_docx_text = docx_to_text(simple_docx_path)
print(sample_docx_text)

## Extraction from PDF documents
Similarly, the function `pdf_to_text` allows you to extract text from PDF documents and store the text as a string. It takes in a path name ending in `.pdf` as an input.

In [None]:
# Reading in text from PDF document
simple_pdf_path = "../tests/data/simple_text_pdf.pdf"
sample_pdf_text = pdf_to_text(simple_pdf_path)
print(sample_pdf_text)

The functions are also able to extract text which has been formatted as headings from Word documents, as can be seen here:

In [None]:
# Reading in fancy formatted text from Word document
fancy_docx_path = "../tests/data/fancy_text_docx.docx"
fancy_sample_docx_text = docx_to_text(fancy_docx_path)
print(fancy_sample_docx_text)

This also works for PDF documents.

In [None]:
# Reading in fancy formatted text from PDF document
fancy_pdf_path = "../tests/data/fancy_text_pdf.pdf"
fancy_sample_pdf_text = pdf_to_text(fancy_pdf_path)
print(fancy_sample_pdf_text)

# Example Usage of Text Cleaning Functions

`resumeanalyser` offers how to use a series of text cleaning functions. These functions include:
1. Removing punctuation
2. Tokenization
3. Converting to lower case
4. Removing stop words
5. Lemmatization

You can apply these functions either step-by-step to understand each part of the text cleaning process, 
or you can use the `clean_text` function to apply all these steps in one go for convenience.

In [None]:
from resumeanalyser.text_cleaning import *

In [None]:
# Example text
sample_text = "The cats are chasing the mice, and one mouse is running faster than the others."

## Demonstrating step-by-step process

In [None]:
# Step 1: Remove punctuation
no_punctuation = remove_punctuation(sample_text)
print("Text without Punctuation:", no_punctuation)

In [None]:
# Step 2: Tokenize the text
tokens = tokenize(no_punctuation)
print("Tokenized Text:", tokens)

In [None]:
# Step 3: Convert to lower case
lower_tokens = to_lower(tokens)
print("Lowercase Tokens:", lower_tokens)

In [None]:
# Step 4: Remove stop words
no_stop_words = remove_stop_words(lower_tokens)
print("Tokens without Stop Words:", no_stop_words)

In [None]:
# Step 5: Lemmatize
lemmatized_tokens = lemmatize(no_stop_words)
print("Lemmatized Tokens:", lemmatized_tokens)

## Using the clean_text function for an all-in-one solution

In [None]:
from resumeanalyser.text_cleaning import clean_text
cleaned_text = clean_text(sample_text)
print("Cleaned Text:", cleaned_text)

# Example Usage of Metric Functions for Comparing two texts

`resumeanalyser` offers two functions to compare the two texts provided by the user. These functions include:
1. Keyword Matching Score
2. Semantic Text Matching Score

## Keyword Matching Score

It typically refers to a measure of how closely two pieces of text align in a character-by-character or word-by-word manner without considering variations or synonyms.

In [1]:
from resumeanalyser.metrics import SimilarityCV

literal_match_score = SimilarityCV("I am studying Data Science at UBC", "There are many good sources to study Data Science online")
print("Literal Match Score:", literal_match_score, "%")

Literal Match Score: 25.82


## Semantic Text Matching Score

It measures the similarity in meaning between two pieces of text. Unlike literal or exact match scores, semantic matching takes into account the context, synonyms, and related concepts to determine how closely the content aligns in terms of intent or significance. 

In [3]:
from resumeanalyser.metrics import SimilaritySpacy

semantic_match_score = SimilaritySpacy("I am studying Data Science at UBC", "There are many good sources to study Data Science online")
print("Syntactic Match Score:", round(semantic_match_score*100,2), "%")

Downloading 'en_core_web_md' model...
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_md')
Syntactic Match Score: 39.45 %


# Examples of Using Plotting Functions of the Package

In [None]:
from resumeanalyser.plotting import *

In [None]:
test_text = 'I am going to fill in a test text here the the the a a a a'

Users can plot the word cloud of the input resume/job description text:

In [None]:
fig1 = plot_wordcloud(test_text)

Or plot the top-frenquency words that are most relvant in the text:

In [None]:
fig2 = plot_topwords(test_text)

It is also possible to plot both in one suite plot for illustration:

In [None]:
fig3 = plot_suite(test_text)