HubNLP Documentation

Introduction

HubNLP is a simple, user-friendly NLP utility library designed to make NLP tasks easier and more accessible. It includes functions for Named Entity Recognition (NER), word feature extraction, attention-based models, and more.

Installation

You can install HubNLP via pip:

pip install HubNLP

Functions and Features

1. `print_developer_info()`

Prints the information about the developer of the library.

Usage:

from HubNLP import print_developer_info

print_developer_info()

Output:

Developer: Self-nasu
Library: HubNLP
Email: nexiotech.2024@gmail.com

2. `print_library_tagline()`

Prints the tagline or description of the library.

Usage:

from HubNLP import print_library_tagline

print_library_tagline()

Output:

HubNLP - A simple NLP utility library

3. `load_ner_data(file_path)`

Loads NER data in CoNLL format (which includes words, part-of-speech tags, and NER tags).

Arguments:

file_path (str): The path to the CoNLL format text file.

Returns:

A list of sentences, where each sentence is represented as a list of tuples (word, pos, ner).

Usage:

from HubNLP import load_ner_data

file_path = "path/to/conll_file.txt"
sentences = load_ner_data(file_path)

print(sentences)

Example Output:

[[("John", "NNP", "B-PER"), ("Smith", "NNP", "I-PER")],
 ["London", "NNP", "B-LOC"]]

4. `extract_word_features(sentence, index)`

Extracts features for a specific word in a sentence for NLP tasks (e.g., NER or POS tagging).

Arguments:

sentence (list): A list of tuples representing a sentence, where each tuple contains (word, POS, [optional] NER tag).
index (int): The index of the word in the sentence for which features are to be extracted.

Returns:

A dictionary containing features for the word.

Usage:

from HubNLP import extract_word_features

sentence = [("John", "NNP"), ("is", "VBZ"), ("running", "VBG")]
features = extract_word_features(sentence, 2)

print(features)

Example Output:

{
    'word': 'running',
    'postag': 'VBG',
    'is_upper': False,
    'is_title': False,
    'is_digit': False,
    '-1:word': 'is',
    '-1:postag': 'VBZ',
    'BOS': False,
    '+1:word': '',
    '+1:postag': '',
    'EOS': True
}

5. `AttentionLayer` Class

Defines a custom attention layer that applies a Bi-directional LSTM followed by an attention mechanism.

Usage:

from HubNLP import AttentionLayer
import tensorflow as tf
from tensorflow.keras.layers import Input, Embedding, Dense, Bidirectional, LSTM

# Example to use AttentionLayer
class CustomModel(tf.keras.Model):
    def __init__(self, vocab_size, embedding_dim, max_seq_len, lstm_units):
        super(CustomModel, self).__init__()
        self.embedding = Embedding(input_dim=vocab_size, output_dim=embedding_dim, input_length=max_seq_len)
        self.attention_layer = AttentionLayer(lstm_units)
        self.dense = Dense(64, activation='relu')
        self.output_layer = Dense(4, activation='softmax')  # For 4-class classification
    
    def call(self, inputs):
        review_input, aspect_input = inputs
        review_embedded = self.embedding(review_input)
        aspect_embedded = self.embedding(aspect_input)
        context_vector = self.attention_layer([review_embedded, aspect_embedded])
        x = self.dense(context_vector)
        return self.output_layer(x)

6. `build_attention_model(vocab_size, embedding_dim, max_seq_len, lstm_units)`

Builds a model architecture with an attention mechanism for tasks like aspect-based sentiment analysis.

Arguments:

vocab_size (int): The size of the vocabulary.
embedding_dim (int): The dimensionality of the embedding layer.
max_seq_len (int): The maximum sequence length for input.
lstm_units (int): The number of units in the Bi-directional LSTM layer.

Returns:

A compiled Keras model.

Usage:

from HubNLP import build_attention_model

vocab_size = 5000  # Size of the vocabulary
embedding_dim = 128  # Dimensionality of embeddings
max_seq_len = 100  # Max sequence length
lstm_units = 64  # Number of LSTM units

model = build_attention_model(vocab_size, embedding_dim, max_seq_len, lstm_units)

# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Print model summary
model.summary()

Additional Notes

The library is designed to work with TensorFlow and Keras for neural network-based models, and the functions can be used independently or together for various NLP tasks.
If you're using the attention-based models, make sure that you have TensorFlow installed in your environment:
```
pip install tensorflow
```

Conclusion

HubNLP provides a range of utilities that make it easy to handle tasks like NER, feature extraction, and building attention-based models for sentiment analysis. With its simple API, you can integrate these functionalities into your own projects with minimal effort.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
HubNLP.egg-info		HubNLP.egg-info
HubNLP		HubNLP
build/lib/HubNLP		build/lib/HubNLP
dist		dist
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

HubNLP Documentation

Introduction

Installation

Functions and Features

1. print_developer_info()

Usage:

Output:

2. print_library_tagline()

Usage:

Output:

3. load_ner_data(file_path)

Arguments:

Returns:

Usage:

Example Output:

4. extract_word_features(sentence, index)

Arguments:

Returns:

Usage:

Example Output:

5. AttentionLayer Class

Usage:

6. build_attention_model(vocab_size, embedding_dim, max_seq_len, lstm_units)

Arguments:

Returns:

Usage:

Additional Notes

Conclusion

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

1. `print_developer_info()`

2. `print_library_tagline()`

3. `load_ner_data(file_path)`

4. `extract_word_features(sentence, index)`

5. `AttentionLayer` Class

6. `build_attention_model(vocab_size, embedding_dim, max_seq_len, lstm_units)`

Packages