Skip to content

Latest commit

 

History

History
121 lines (86 loc) · 6.71 KB

File metadata and controls

121 lines (86 loc) · 6.71 KB
page_type name description urlFragment languages products
sample
Train a sentiment analysis deep learning model with ML.NET Model Builder
Train a deep learning text classification model to analyze and classify sentiment using ML.NET Model Builder
mlnet-sentiment-analysis-model-builder
csharp
dotnet
aspnet-core
mlnet

Sentiment Analysis: Razor Pages sample optimized for scalability and performance when running/scoring an ML.NET model built with Model Builder (Using the new Text Classification API)

ML.NET version Status App Type Data type Scenario ML Task Algorithms
v2.0.0 Up-to-date Razor Pages Single data sample Text classification Text Classification NAS-BERT

Goal

Create a Razor Pages web application that hosts an ML.NET deep learning text classification model trained using Model Builder to analyze the sentiment of comments from a website.

Application

  • SentimentRazor: A .NET Core Razor Pages web application that uses a deep learning text classification model to analyze sentiment from comments made on the website.

The data

Each row in the wikipedia-detox-250-line-data.tsv dataset represents a different review left by a user on Wikipedia. The first column represents the sentiment of the text (0 is non-toxic, 1 is toxic), and the second column represents the comment left by the user. The columns are separated by tabs. The data looks like the following:

Sentiment SentimentText
1 ==RUDE== Dude, you are rude upload that carl picture back, or else.
1 == OK! == IM GOING TO VANDALIZE WILD ONES WIKI THEN!!!
0 I hope this helps.

Text classification in ML.NET

The Text Classification API is powered by TorchSharp. TorchSharp is a .NET library that provides access to libtorch, the library that powers PyTorch. TorchSharp contains the building blocks for training neural networks from scratch in .NET. The TorchSharp components however are low-level and building neural networks from scratch has a steep learning curve. In ML.NET, we’ve abstracted some of that complexity to the scenario level.

In direct collaboration with Microsoft Research, we’ve taken a TorchSharp implementation of NAS-BERT, a variant of BERT obtained with neural architecture search, and added it to ML.NET. Using a pre-trained version of this model, the Text Classification API uses your data to fine-tune the model.

The model

The goal of the application is to predict whether a comment's sentiment belongs to one of two categories (toxic/not-toxic). The Machine Learning Task to use in this scenario is text classification. The model in this application was trained using Model Builder.

Model Builder is an intuitive graphical Visual Studio extension to build, train, and deploy custom machine learning models.

You don't need machine learning expertise to use Model Builder. All you need is some data, and a problem to solve. Model Builder generates the code to add the model to your .NET application.

In this solution, both the SentimentAnalysis.training.cs and SentimentAnalysis.consumption.cs classes are autogenerated by Model Builder.

SentimentAnalysis.zip is also autogenerated by Model Builder and is the serialized representation of your model.

The web application

Users interact with the application through a Razor Pages website. In a text box on the main page of the application, a user enters a comment which triggers a handler on the page's model to use the input to predict the sentiment of the comment using the trained model.

Try a different dataset

If you want to try out the application with a dataset that produces better results such as the UCI Sentiment Labeled Sentences dataset, you can make the following adjustments.

Get the data

  1. Download UCI Sentiment Labeled Sentences dataset ZIP file anywhere on your computer, and unzip it.

  2. Open PowerShell and navigate to the unzipped folder in the previous step.

  3. By default, the file does not have column names. To add column names to the training data, use the following PowerShell commands:

    echo "Comment`tSentiment" | sc yelp_labelled_columns.tsv; cat yelp_labelled.tsv | sc yelp_labelled_columns.tsv

The output generated by the previous commands is a new file called yelp_labelled_columns.tsv containing the original data with the respective column names.

Each row in the yelp_labelled_columns.tsv dataset represents a different restaurant review left by a user on Yelp. The first column represents the comment left by the user, and the second column represents the sentiment of the text (0 is negative, 1 positive). The columns are separated by tabs. The data looks like the following:

Comment Sentiment
Wow... Loved this place. 1
Crust is not good. 0
Not tasty and the texture was just nasty. 0

Train and use the model

  1. Use model builder to train a binary classification model using the new dataset.
  2. Update the OnGetAnalyzeSentiment handler in the Index.cshtml.cs file.
public IActionResult OnGetAnalyzeSentiment([FromQuery] string text)
{
        if (String.IsNullOrEmpty(text)) return Content("Neutral");
        var input = new ModelInput { Comment = text };
        var prediction = _predictionEnginePool.Predict(input);
        var sentiment = Convert.ToBoolean(prediction.Prediction) ? "Positive" : "Negative";
        return Content(sentiment);
}
  1. Update the updateSentiment function in the site.js file
function updateSentiment() {

var userInput = $("#Message").val();

getSentiment(userInput)
        .then((sentiment) => {
        switch (sentiment) {
                case "Positive":
                updateMarker(100.0,sentiment);
                break;
                case "Negative":
                updateMarker(0.0,sentiment);
                break;
                default:
                updateMarker(45.0, "Neutral");
        }
        });
}