# FastText

## 🚀 What is FastText?

**FastText** is a lightweight and fast library developed by Facebook AI for:

* 🔤 **Text classification** (e.g., sentiment, topic tagging)
* 🧠 **Word embeddings** (like Word2Vec, but with subword info)

It's known for being:

* **Very fast to train**
* **Accurate**, especially for small datasets
* Useful even with limited compute

In this example, you’re doing **supervised text classification**.


In [23]:
# !wget https://github.com/facebookresearch/fastText/archive/0.2.0.zip
# !unzip 0.2.0.zip
# %cd fastText-0.2.0
# !make

### (Alternate)📦 Step 1: Download and Build FastText

```python
!wget https://github.com/facebookresearch/fastText/archive/0.2.0.zip
!unzip 0.2.0.zip
%cd fastText-0.2.0
!make
```

* `wget`: Downloads the FastText source code (v0.2.0)
* `unzip`: Extracts the archive
* `%cd`: Changes directory (Colab magic command)
* `make`: Compiles the C++ source code, producing the binary `./fasttext`

✅ After this step, you have a working `./fasttext` CLI tool.


In [24]:
# 1 is positive, 0 is negative
f = open('train.txt', 'w')
f.write('__label__1 i love you\n')
f.write('__label__1 he loves me\n')
f.write('__label__1 she likes baseball\n')
f.write('__label__0 i hate you\n')
f.write('__label__0 sorry for that\n')
f.write('__label__0 this is awful')
f.close()

f = open('test.txt', 'w')
f.write('sorry hate you')
f.close()

### 📝 Step 2: Create Training Data

```python
f = open('train.txt', 'w')
f.write('__label__1 i love you\n')
f.write('__label__1 he loves me\n')
f.write('__label__1 she likes baseball\n')
f.write('__label__0 i hate you\n')
f.write('__label__0 sorry for that\n')
f.write('__label__0 this is awful')
f.close()
```

* This is a **sentiment classification dataset**.
* Format: `__label__<label> <text>`

FastText expects:

* Each line to be a labeled example.
* Labels to be prefixed by `__label__`.

So here:

* `__label__1` → Positive
* `__label__0` → Negative


### 🧪 Step 3: Create Test File

```python
f = open('test.txt', 'w')
f.write('sorry hate you')
f.close()
```

* One test sentence.
* No labels; we want FastText to predict them.

In [29]:
! ./fasttext supervised -input train.txt -output model -dim 2

'.' is not recognized as an internal or external command,
operable program or batch file.


### 🧠 Step 4: Train the Model

```bash
!./fasttext supervised -input train.txt -output model -dim 2
```

This trains a **supervised text classifier**.

| Option       | Explanation                                         |
| ------------ | --------------------------------------------------- |
| `supervised` | Train a text classifier (vs. `skipgram`, etc.)      |
| `-input`     | File with labeled training data                     |
| `-output`    | Prefix for saved model files (model.bin, model.vec) |
| `-dim 2`     | Embedding dimension (just 2 here for simplicity)    |

✅ After training, FastText saves:

* `model.bin` → Trained binary model
* `model.vec` → Word embeddings (optional)


In [26]:
!cat test.txt
!./fasttext predict model.bin test.txt

'cat' is not recognized as an internal or external command,
operable program or batch file.
'.' is not recognized as an internal or external command,
operable program or batch file.


### 📤 Step 5: Run Inference

```bash
!cat test.txt
!./fasttext predict model.bin test.txt
```

* `cat test.txt` shows the sentence: `sorry hate you`
* `fasttext predict` loads the model and predicts the label

📌 Expected output:

```
__label__0
```

Because:

* “sorry”, “hate”, “you” all occurred in negative examples

## ✅ Summary

| Step              | Purpose                                     |
| ----------------- | ------------------------------------------- |
| 🏗 Build FastText | Compiles the library in your notebook       |
| 📝 Train.txt      | Labeled training data for classification    |
| 🧪 Test.txt       | Input file for prediction                   |
| ⚙️ Train model    | Learns word embeddings + classifier         |
| 🔮 Predict        | Uses learned model to classify new sentence |


In [27]:
def predict_fasttext_label_with_prob(model_path, input_text, tmp_file='tmp_input.txt', k=1):
    import subprocess

    # Save input to file
    with open(tmp_file, 'w') as f:
        f.write(input_text.strip())

    # Run fasttext predict-prob
    result = subprocess.run(
        ['./fasttext', 'predict-prob', model_path, tmp_file, str(k)],
        stdout=subprocess.PIPE,
        stderr=subprocess.PIPE,
        text=True
    )

    if result.returncode != 0:
        print("⚠️ Error:", result.stderr)
        return []

    predictions = []
    for line in result.stdout.strip().split('\n'):
        if not line:
            continue
        parts = line.strip().split()
        # FastText may return multiple predictions per line
        for i in range(0, len(parts), 2):
            try:
                label = parts[i]
                prob = float(parts[i + 1])
                predictions.append((label, prob))
            except (IndexError, ValueError):
                continue  # skip malformed entries

    return predictions


In [28]:
text = "this is amazing"
predictions = predict_fasttext_label_with_prob('model.bin', text, k=2)

for label, confidence in predictions:
    print(f"🔍 {label} with confidence {confidence:.4f}")


FileNotFoundError: [WinError 2] The system cannot find the file specified