## Introduction to TF-IDF

Term Frequency-Inverse Document Frequency (TF-IDF) is a statistical measure used to evaluate the importance of a word in a document relative to a collection of documents (corpus). It helps in identifying terms that are more relevant to a particular document while reducing the influence of commonly occurring terms across all documents.

1. **Term Frequency (TF)**: Measures how frequently a term appears in a document. It is often normalized by the total number of terms in the document.
2. **Inverse Document Frequency (IDF)**: Measures how important a term is by evaluating how frequently it appears across all documents. Terms that appear in fewer documents have higher IDF scores.
3. **TF-IDF Score**: The product of TF and IDF, which helps in identifying terms that are unique to a document and relevant in the context of the corpus.

In [None]:
## Example: Using Scikit-Learn for TF-IDF
from sklearn.feature_extraction.text import TfidfVectorizer

In [None]:
# Sample documents
documents = [
    "I love programming in Python",
    "Python programming is fun",
    "Machine learning is fascinating"
]

# Create the TfidfVectorizer object


In [None]:
vectorizer = TfidfVectorizer()

# Fit and transform the documents


In [None]:
X = vectorizer.fit_transform(documents)

# Convert to array and get feature names


In [None]:
X_array = X.toarray()
feature_names = vectorizer.get_feature_names_out()

# Display the TF-IDF representation


In [None]:
import pandas as pd
df = pd.DataFrame(X_array, columns=feature_names)
print(df)

In [None]:
documents = [
    "Data science is an interdisciplinary field",
    "It uses scientific methods, processes, algorithms",
    "Data science is used for data analysis"
]



# #TODO : Create the TfidfVectorizer object


# #TODO : Fit and transform the documents


# #TODO : Convert to array and get feature names


# #TODO : Display the TF-IDF representation
