# Text Classification API (preview)

## Install packages

To use the Text Classification API, you'll have to install the following packages

- [`Microsoft.ML`](https://www.nuget.org/packages/Microsoft.ML/)
- [`Microsoft.ML.TorchSharp`](https://www.nuget.org/packages/Microsoft.ML.TorchSharp/)
- [`TorchSharp-cpu`](https://www.nuget.org/packages/TorchSharp-cpu/) if you're using a CPU or [`TorchSharp-cuda-windows`](https://www.nuget.org/packages/TorchSharp-cuda-windows/) / [`TorchSharp-cuda-linux`](https://www.nuget.org/packages/TorchSharp-cuda-linux/) if you're using a GPU.

In [1]:
#r "nuget:Microsoft.ML,2.0.0-preview.22310.1"
#r "nuget:Microsoft.ML.TorchSharp,0.20.0-preview.22310.1"
#r "nuget:TorchSharp-cpu,0.96.7"
#r "nuget:Microsoft.Data.Analysis,0.20.0-preview.22310.1"

Loading extensions from `Microsoft.Data.Analysis.Interactive.dll`

## Add using statements

In [1]:
using Microsoft.ML;
using Microsoft.ML.Data;
using Microsoft.Data.Analysis;
using Microsoft.ML.TorchSharp;

## Initialize MLContext

All ML.NET operations start in the MLContext class. Initializing mlContext creates a new ML.NET environment that can be shared across the model creation workflow objects. It's similar, conceptually, to DBContext in Entity Framework.

In [1]:
var mlContext = new MLContext();

## Load your data

Use the `#!value` and `#!share` magic commands to fetch the data from GitHub, store it in the `yelp_reviews` variable and load it into a `DataFrame`.

In [1]:
#!value --name yelp_reviews --from-url "https://raw.githubusercontent.com/luisquintanilla/csharp-notebooks/text-classification/machine-learning/data/yelp_labelled.txt"

In [1]:
#!share yelp_reviews --from value 

In [1]:
var columnNames = new [] {"Text", "Sentiment"};
var df = DataFrame.LoadCsvFromString(yelp_reviews, separator:'\t',header:false, columnNames:columnNames);

Once the data is loaded, use the `Head` method to preview the first three rows.

In [1]:
df.Head(3)

index,Text,Sentiment
0,Wow... Loved this place.,1
1,Crust is not good.,0
2,Not tasty and the texture was just nasty.,0


## Split the data into train and test sets. 

The original dataset is split into two subsets: train and test. The train set is what you'll use to learn the patterns of your data. The test set is used to evaluate the performance of your model using evaluation metrics for the classification task.

In this case, 80% of the data is used for training as defined by the `testFraction` parameter. The remaining 20% is used for evaluation and testing.

In [1]:
var trainTestSplit = mlContext.Data.TrainTestSplit(df, testFraction:0.2);

## Define your training pipeline

The Text Classification API is part of the multiclass classification catalog. To use it, add the `TextClassification` trainer to your pipeline. 

In [1]:
var pipeline =
		mlContext.Transforms.Conversion.MapValueToKey("Label","Sentiment")
			.Append(mlContext.MulticlassClassification.Trainers.TextClassification(sentence1ColumnName: "Text"))
			.Append(mlContext.Transforms.Conversion.MapKeyToValue("PredictedLabel"));

## Train the model

Use the training dataset to train your model using the `Fit` method.

In [1]:
var model = pipeline.Fit(trainTestSplit.TrainSet);

## Use the model to make predictions

Use your model to make predictions on the test dataset by calling the `Transform` method. 

In [1]:
var predictionIDV = model.Transform(trainTestSplit.TestSet);

The result of calling `Transform` is an `IDataView` with your predicted values. To make it easier to view your predictions, convert the `IDataView` to an `IDataFrame` . In this case, the only columns that I'm interested in are the text, sentiment (actual value), and PredictedLabel (predicted value). 

In [1]:
var columnsToSelect = new [] {"Text", "Sentiment", "PredictedLabel"};

var predictions = predictionIDV.ToDataFrame(columnsToSelect);

Use the `Tail` method to preview the last three rows in your prediction `DataFrame`.

In [1]:
predictions.Tail(3)

index,Text,Sentiment,PredictedLabel
0,"Oh this is such a thing of beauty, this restaurant.",1,1
1,"A greasy, unhealthy meal.",0,1
2,"The best place in Vegas for breakfast (just check out a Sat, or Sun.",1,1


## Evaluate the model

There's a variety of metrics you can use to evaluate how well your model performs. In this case, you'll calculate the model's accuracy. 

Start by taking the instances where the actual value matches the predicted value, also known as the true positives (TP) and true negatives (TN).

In [1]:
var tptn = 
	predictions.Filter(
		predictions["Sentiment"].ElementwiseEquals(predictions["PredictedLabel"]));

Then, divide the number of instances where the actual value matches the predicted value by the total number of predictions. 

In [1]:
var accuracy = ((float) tptn.Rows.Count / (float) predictions.Rows.Count);

$"Accuracy: {accuracy:0.####}"

Accuracy: 0.6