# Feature scaling 

Feature scaling is a text preprocessing technique that ensures that different features in a dataset are on a similar scale to improve the performance 
of machine-learning models. It’s important to note that we don’t apply it directly to text. Instead, we apply it to the result of other text 
representation techniques, such as BoW or TF-IDF. These techniques convert the text into numerical representations, to which we can later apply 
feature scaling.

Reasons for feature scaling : 

 ![image.png](attachment:49e74283-71d6-42b5-9c51-bd093efc180e.png)

In [5]:
"""
Let’s now explore feature scaling using Python. We’ll use the TfidfVectorizer() method, which is a common text representation technique, to transform 
the text data into numerical TF-IDF features. Later, we will perform feature scaling by applying min-max scaling using the MinMaxScaler() method to
these TF-IDF features. This type of scaling will transform the text representation values to a specific range (usually [0, 1]).

"""

'\nLet’s now explore feature scaling using Python. We’ll use the TfidfVectorizer() method, which is a common text representation technique, to transform \nthe text data into numerical TF-IDF features. Later, we will perform feature scaling by applying min-max scaling using the MinMaxScaler() method to\nthese TF-IDF features. This type of scaling will transform the text representation values to a specific range (usually [0, 1]).\n\n'

In [1]:
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.preprocessing import MinMaxScaler

In [2]:
# Read the necessary dataset

df = pd.read_csv("C:/Users/ariji/OneDrive/Desktop/Data/reviews.csv")
df.head()

Unnamed: 0,review_id,text
0,txt145,The software had a steep learning curve at fir...
1,txt327,I'm really impressed with the user interface o...
2,txt209,The latest update to the software fixed severa...
3,txt825,I encountered a few glitches while using the s...
4,txt878,I was skeptical about trying the software init...


In [3]:
tfidf_vectorizer = TfidfVectorizer()
tfidf_features = tfidf_vectorizer.fit_transform(df['text'])
min_max_scaler = MinMaxScaler()
scaled_tfidf_features = min_max_scaler.fit_transform(tfidf_features.toarray())
print("Scaled TF-IDF Features:")
print(scaled_tfidf_features)

Scaled TF-IDF Features:
[[0.         0.         0.         ... 0.         0.94824294 0.        ]
 [0.         0.         0.         ... 0.         0.         1.        ]
 [0.         0.         0.         ... 0.         0.         0.        ]
 ...
 [0.         0.         0.         ... 0.         1.         0.        ]
 [0.         0.         0.         ... 0.         0.         0.        ]
 [0.         0.         0.         ... 0.         0.         0.        ]]


In [4]:
scaled_tfidf_features.shape

(16, 150)