# Word Embedding and TF-IDF Vectorization

This notebook focuses on transforming text data into numerical features using TF-IDF vectorization. The key steps performed are:

## 1. Importing Necessary Libraries
- `pandas` for data manipulation.
- `TfidfVectorizer` from `sklearn.feature_extraction.text` for text vectorization.

## 2. Loading Cleaned Data
- The cleaned text data is loaded from `cleaned_tweets.csv`.
- Missing values in the 'cleaned_text' column are filled with empty strings.

## 3. TF-IDF Vectorization
- A `TfidfVectorizer` instance is created to convert the text data into a TF-IDF matrix.
- The `fit_transform` method is applied to the 'cleaned_text' column to obtain the TF-IDF representation of the text data.

## 4. Saving the Results
- The resulting TF-IDF matrix and the vectorizer are saved using `pickle` for future use.
  - The TF-IDF matrix is saved to `tfidf_matrix.pkl`.
  - The `TfidfVectorizer` instance is saved to `vectorizer.pkl`.


In [None]:
# word_embedding.ipynb

# Importing Necessary Libraries
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer


In [2]:
# Loading the cleaned data
df = pd.read_csv('cleaned_tweets.csv')
df['cleaned_text'].fillna('', inplace=True)

In [3]:
# TF-IDF Vectorization
vectorizer = TfidfVectorizer()
tfidf_matrix = vectorizer.fit_transform(df['cleaned_text'])

In [4]:
# Saving the TF-IDF matrix and vectorizer for further use
import pickle
with open('tfidf_matrix.pkl', 'wb') as f:
    pickle.dump(tfidf_matrix, f)
with open('vectorizer.pkl', 'wb') as f:
    pickle.dump(vectorizer, f)


In [5]:
# Displaying the shape of the TF-IDF matrix for analyzing the matrix
print(tfidf_matrix.shape)

(27981, 28645)
