Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/custom model api endpoint support #820

Merged
merged 4 commits into from
Oct 17, 2023

Conversation

chakravarthik27
Copy link
Collaborator

@chakravarthik27 chakravarthik27 commented Oct 10, 2023

Description

Enhanced Custom Model Support in Langtest Library

This pull request introduces an essential feature to the Langtest library, enabling enhanced support for customized models. Additionally, an important modification has been made in the Harness class. Specifically, the 'hub' parameter has been updated to accept "custom".

The implementation of this feature extends the flexibility of the Langtest library, empowering users to leverage and integrate personalized models seamlessly. By accommodating custom models, the Langtest library can now cater to a broader range of use cases and accommodate diverse linguistic requirements.

Moreover, the update made in the Harness class represents a significant improvement, simplifying the configuration process and making it more intuitive for users to specify their preferred parameters. This modification enhances the overall usability of the Langtest library, contributing to a more streamlined and efficient experience for developers and researchers.

These enhancements are designed to elevate the functionality and accessibility of the Langtest library, demonstrating our commitment to continuously improving the user experience and expanding the capabilities of the library to meet evolving language processing needs.


Custom model API endpoint support

Type of change

Please delete options that are not relevant.

  • [ x] New feature (non-breaking change which adds functionality)

Usage

The SentimentAnalysis class within this module enables users to perform sentiment analysis on textual data. It consists of essential methods for training, predicting, and evaluating sentiment analysis models. To ensure comprehensive testing with Langtest, please note the following specific requirements for the predict method:

class SentimentAnalysis:
    def __init__(self, embedding_dim, hidden_dim, vocab_size):
        self.embedding_dim = embedding_dim
        self.hidden_dim = hidden_dim
        self.vocab_size = vocab_size
        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

        self.model = LSTMClassifier(embedding_dim, hidden_dim, vocab_size)
        self.word_dict = pickle.load(open("./sentiment_analysis/word_dict.pkl", "rb"))

    def train(self, train_loader, epochs):
        optimizer = optim.Adam(self.model.parameters(), lr=1e-3)
        loss_fn = torch.nn.BCELoss()
        for epoch in range(1, epochs + 1):
            self.model.train()
            total_loss = 0
            for batch in train_loader:
                batch_X, batch_y = batch

                batch_X = batch_X.to(self.device)
                batch_y = batch_y.to(self.device)

                # TODO: Complete this train method to train the model provided.
                optimizer.zero_grad()

                model_output = self.model.forward(batch_X)
                loss = loss_fn(model_output, batch_y)

                loss.backward()

                optimizer.step()

                total_loss += loss.data.item()
            print("Epoch: {}, BCELoss: {}".format(
                epoch, total_loss / len(train_loader)))

    def predict(self, x):
        data_X, data_len = convert_and_pad(self.word_dict, review_to_words(x), pad=500)
        data_pack = np.hstack((data_len, data_X))
        data_pack = data_pack.reshape(1, -1)
        
        data = torch.from_numpy(data_pack)
        data = data.to(self.device)

        # Make sure to put the model into evaluation mode
        self.model.eval()
        with torch.no_grad():
            output = self.model(data)
            return "positive" if round(output.item()) else "negative"

    def evaluate(self, x, y):
        self.model.eval()
        with torch.no_grad():
            output = self.model(x)
            predicted = output.data.round()
            correct = (predicted == y).sum().item()
            return correct / len(y)

Please ensure that the provided code snippet for the predict method is retained and integrated into the testing process with Langtest. This step is crucial for validating the accurate functioning of the sentiment analysis predictions within the Langtest framework.

Furthermore, the SentimentAnalysis class can be leveraged effectively for training sentiment analysis models, making predictions on new data points, and evaluating the performance of the model.

Let's do with Langtest

harness = Harness(task="text-classification",
                  model={'model': model, "hub": "custom"}, data={'data_source': 'path/to/data/imdb.csv'})

Checklist:

  • I've added Google style docstrings to my code.
  • I've used pydantic for typing when/where necessary.
  • I have linted my code
  • I have added tests to cover my changes.

Screenshots (if appropriate):

image

image

image

image

image

Failed test cases in add_contraction test
image


@chakravarthik27 chakravarthik27 added the ⭐ Feature Indicates new feature requests label Oct 10, 2023
@chakravarthik27 chakravarthik27 self-assigned this Oct 10, 2023
@chakravarthik27 chakravarthik27 linked an issue Oct 10, 2023 that may be closed by this pull request
@ArshaanNazir ArshaanNazir merged commit 454651d into release/1.7.0 Oct 17, 2023
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⭐ Feature Indicates new feature requests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Custom model API endpoint support
2 participants