# Machine learning approach with TF-IDF
# Feature Extraction methods in Machine Learning NLP
# TF-IDF
* TF-IDF (Term Frequency-Inverse Document Frequency) vectorization as a feature extraction method
* TF-IDF is commonly used to convert text data into numerical vectors, which can then be used as input features for machine learning models.
* This process helps capture the importance of words in a document
# Word Embeddings
* Convert the text data into numerical vectors using word embeddings (e.g., Word2Vec, GloVe, FastText) or contextual embeddings (e.g., BERT, GPT).
* These embeddings capture semantic and contextual information in the text.

# Machine Learning algorithms used in NLP
* LogisticRegression : Used for text classification and sentiment analysis.
* Naive Bayes: Used for text classification and sentiment analysis.* 
Support Vector Machines (SVM): Effective for text classification tasks.* 
Decision Trees and Random Forests: Useful for various NLP tasks, including information extraction* .
Deep Learning Models: Recurrent Neural Networks (RNNs), Convolutional Neural Networks (CNNs), and Transformer-based models (e.g., BERT, GPT) are widely used for a wide range of NLP tasks due to their ability to capture complex patterns in text data.

In [3]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Sample data
X_train = ["I love this product.", "This is terrible.", "It's okay."]
y_train = ["positive", "negative", "neutral"]

X_test = ["This is a great item.", "It's not good at all."]

# Text vectorization (using TF-IDF)
vectorizer = TfidfVectorizer()
X_train_vectorized = vectorizer.fit_transform(X_train)
X_test_vectorized = vectorizer.transform(X_test)

# Train a Logistic Regression model
model = LogisticRegression()
model.fit(X_train_vectorized, y_train)

# Make predictions
y_pred = model.predict(X_test_vectorized)
print(y_pred)
# Print sentiment predictions
for text, sentiment in zip(X_test, y_pred):
    print(f"Text: {text} - Predicted Sentiment: {sentiment}")

['negative' 'neutral']
Text: This is a great item. - Predicted Sentiment: negative
Text: It's not good at all. - Predicted Sentiment: neutral
