# ***"Comprehensive Text Classification with FastText in Python: Training, Evaluation, and Prediction"***

## Importing Libraries

In [1]:
import fasttext
import pandas as pd
from sklearn.model_selection import train_test_split

### Creating an Expanded Dataset

In [2]:
# Create an expanded dataset
data = pd.DataFrame({
    'review': [
        "One of the other reviewers has mentioned that after watching just 1 Oz episode you'll be hooked...",
        "A wonderful little production...",
        "I thought this was a wonderful way to spend time on a too hot summer weekend...",
        "I absolutely loved this movie! It's a masterpiece of storytelling and acting.",
        "Terrible film. I couldn't stand it. Waste of time and money.",
        "Not my cup of tea. The plot was confusing, and the characters were unlikable.",
        "I was pleasantly surprised by this film. It exceeded my expectations.",
        "Awful. I can't believe I wasted two hours of my life on this garbage.",
        "This is a must-see movie for all ages. It's heartwarming and beautifully made.",
        "I found this film to be quite boring and predictable. Not worth the hype.",
        "One of the best movies I've ever seen. The acting and direction are top-notch.",
        "I couldn't finish this movie. It was too slow and uninteresting.",
        "A delightful film that made me smile throughout. Highly recommended.",
        "A complete disaster. The story made no sense, and the acting was terrible.",
        "This film is a true gem. It's emotionally powerful and thought-provoking.",
        "The worst movie I've ever seen. Don't waste your time on it.",
    ],
    'sentiment': ['positive', 'positive', 'positive', 'positive', 'negative', 'negative', 'positive', 'negative',
                  'positive', 'negative', 'positive', 'negative', 'positive', 'negative', 'positive', 'negative']
})


`A dataset is constructed using a pandas DataFrame. This dataset includes 16 reviews, each with a corresponding sentiment label. The reviews cover both positive and negative sentiments.`

### Mapping Sentiment Labels

In [3]:
# Map sentiment labels to FastText format
data['sentiment'] = data['sentiment'].apply(lambda x: '__label__' + x)

`Sentiment labels in the dataset are formatted to match the FastText input requirements. The sentiment labels are prefixed with "label" to create the appropriate FastText format.`

### Splitting the Dataset

In [4]:
# Split the dataset into training and test sets
train_df, test_df = train_test_split(data, test_size=0.2, random_state=42)

`The dataset is split into two parts: a training set and a test set. The split is performed using the train_test_split function from scikit-learn, where 80% of the data is designated for training and 20% for testing.`

### Saving Data to Text Files

In [5]:
# Save the training and test data to text files in FastText format
train_file = 'train.txt'
test_file = 'test.txt'

train_df.to_csv(train_file, sep=' ', header=False, index=False)
test_df.to_csv(test_file, sep=' ', header=False, index=False)

`The training and test datasets are saved to text files in the FastText format. This is important as FastText expects data in this specific format for training and testing. The training data is saved to 'train.txt', and the test data is saved to 'test.txt'.`

### Training the FastText Model

In [20]:
# Train a FastText text classification model
model = fasttext.train_supervised(input=train_file, epoch=500, lr=0.1, wordNgrams=4)

`The FastText text classification model is trained using the training dataset. Several parameters are configured: the model is trained for 10 epochs, with a learning rate of 0.1, and utilizing word n-grams of size 2.`

In [21]:
# Evaluate the model on the test data
results = model.test(test_file)
precision = results[0]  # Corrected index for precision
recall = results[1]     # Corrected index for recall
f1_score = results[2]   # Corrected index for F1 score

print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print(f"F1 Score: {f1_score:.2f}")

Precision: 4.00
Recall: 1.00
F1 Score: 1.00


`The trained model is evaluated using the test dataset. Evaluation metrics, including precision, recall, and F1 score, are computed using the results object provided by FastText.`

### Classifying New Text Examples

In [22]:
# Classify new text examples
new_texts = [
    "This is a great movie!",
    "I didn't like this film at all.",
    "Absolutely fantastic! I was captivated from start to finish.",
    "An utter disappointment. The plot was confusing, and the acting was terrible.",
    "I can't recommend this film enough. It's a masterpiece.",
    "I couldn't even finish watching this. It was so boring and poorly made.",
    "A must-see for anyone who loves a good drama. The performances were outstanding.",
    "A complete waste of time. I regret watching this movie.",
    "I thoroughly enjoyed every minute of this film. Highly recommended.",
    "I found the story to be quite predictable and unoriginal.",
    "One of the best movies I've seen in years. It's a true gem.",
    "This film is a disaster. The script is terrible, and the direction is abysmal.",
    "Heartwarming and beautifully acted. I was moved by this movie.",
    "I couldn't believe how bad this film was. It's one of the worst I've seen.",
]

for text in new_texts:
    label, _ = model.predict(text)
    sentiment = label[0].replace('__label__', '')
    print('-'*35)
    print(f'Text: "{text}" - Predicted Sentiment: {sentiment}')

-----------------------------------
Text: "This is a great movie!" - Predicted Sentiment: positive
-----------------------------------
Text: "I didn't like this film at all." - Predicted Sentiment: negative
-----------------------------------
Text: "Absolutely fantastic! I was captivated from start to finish." - Predicted Sentiment: negative
-----------------------------------
Text: "An utter disappointment. The plot was confusing, and the acting was terrible." - Predicted Sentiment: negative
-----------------------------------
Text: "I can't recommend this film enough. It's a masterpiece." - Predicted Sentiment: positive
-----------------------------------
Text: "I couldn't even finish watching this. It was so boring and poorly made." - Predicted Sentiment: negative
-----------------------------------
Text: "A must-see for anyone who loves a good drama. The performances were outstanding." - Predicted Sentiment: positive
-----------------------------------
Text: "A complete waste of ti

`A list of new text examples is provided for classification. The model predicts the sentiment of each new text, and the results are printed. The sentiment labels are extracted from the FastText output format.`

<hr>