# NAI_MPP_3

- Single-layer language classifier.
- Identifies the language of a given text based on the proportion of Latin letters.
- In the example - trained to recognize 3 languages: English, Danish, and German.
- User can enter their own text to classify its language.
- Provided data are random fragments of Wikipedia articles.

# USAGE

## Data

In [512]:
from src.language_classifier import LanguageClassifier
from collections import Counter

classifier = LanguageClassifier('./data', 0.1, 0)

print(f"Train set size: {len(classifier.train_data_langs)}")
counts = Counter([c['class'] for c in classifier.train_data_langs])
print(f"Train set classes: {counts}")

print(f"Test set size: {len(classifier.test_data_langs)}")
counts = Counter([c['class'] for c in classifier.test_data_langs])
print(f"Test set classes: {counts}")

Train set size: 24
Train set classes: Counter({'angielski': 8, 'duński': 8, 'niemiecki': 8})
Test set size: 6
Test set classes: Counter({'angielski': 2, 'duński': 2, 'niemiecki': 2})


## Training

In [513]:
max_epochs = 100
min_accuracy = 95
epoch = 1
accuracy = 0

while epoch <= max_epochs and accuracy < min_accuracy:
    print(f"[[[ EPOCH {epoch} ]]]")
    print("Training...")
    classifier.learn_once()
    print("Testing...")
    test_result = classifier.test_once()

    count_correct = 0
    for test in test_result:
        is_correct = (test['class'] == test['prediction'])
        print(f"[{'CORRECT' if is_correct else 'INCORRECT'}] Testing for {test['class'].upper()}; prediction is {test['prediction'].upper()}, in '{test['name']}'")
        count_correct += 1 if is_correct else 0

    accuracy = 100 * count_correct / len(test_result)
    print(f"[ACCURACY]: {accuracy}\n")
    epoch += 1

[[[ EPOCH 1 ]]]
Training...
Testing...
[INCORRECT] Testing for ANGIELSKI; prediction is NIEMIECKI, in 'John Horton Conway'
[INCORRECT] Testing for DUŃSKI; prediction is NIEMIECKI, in 'Landsby'
[CORRECT] Testing for NIEMIECKI; prediction is NIEMIECKI, in 'Arthur von Weinberg'
[INCORRECT] Testing for ANGIELSKI; prediction is NIEMIECKI, in 'Myscellus'
[INCORRECT] Testing for DUŃSKI; prediction is NIEMIECKI, in 'Patent'
[CORRECT] Testing for NIEMIECKI; prediction is NIEMIECKI, in 'Siegmund von Hausegger'
[ACCURACY]: 33.333333333333336

[[[ EPOCH 2 ]]]
Training...
Testing...
[INCORRECT] Testing for ANGIELSKI; prediction is NIEMIECKI, in 'John Horton Conway'
[INCORRECT] Testing for DUŃSKI; prediction is NIEMIECKI, in 'Landsby'
[CORRECT] Testing for NIEMIECKI; prediction is NIEMIECKI, in 'Arthur von Weinberg'
[CORRECT] Testing for ANGIELSKI; prediction is ANGIELSKI, in 'Myscellus'
[INCORRECT] Testing for DUŃSKI; prediction is NIEMIECKI, in 'Patent'
[CORRECT] Testing for NIEMIECKI; prediction 

## User input

In [514]:
text = input("Text to classify: ")
import textwrap
text = textwrap.fill(text, width=120, subsequent_indent='\t')
print(f"[TEXT]: {text}\n")
print(f"[CLASSIFICATION]: {classifier.predict_class_of_text(text).upper()}\n")

[TEXT]: In mathematics, Gaussian elimination, also known as row reduction, is an algorithm for solving systems of linear
	equations. It consists of a sequence of row-wise operations performed on the corresponding matrix of coefficients. This
	method can also be used to compute the rank of a matrix, the determinant of a square matrix, and the inverse of an
	invertible matrix. The method is named after Carl Friedrich Gauss (1777–1855). To perform row reduction on a matrix,
	one uses a sequence of elementary row operations to modify the matrix until the lower left-hand corner of the matrix is
	filled with zeros, as much as possible. There are three types of elementary row operations:      Swapping two rows,
	Multiplying a row by a nonzero number,     Adding a multiple of one row to another row.  Using these operations, a
	matrix can always be transformed into an upper triangular matrix (possibly bordered by rows or columns of zeros), and
	in fact one that is in row echelon form. Once all 