**Explanation**

The provided code shows how to create a machine learning model that categorises Rotten Tomatoes audience scores in an easy-to-understand manner. Importing the required libraries for data handling, model training, and evaluation is the first step in the procedure. For a cleaner output, warnings are muted. After using Pandas to load the dataset, a custom function called binarize_score is built to translate the audience scores into binary labels. Specifically, scores below 50% are regarded as negative (0), and scores that are equal to or more than 50% are regarded as positive (1). These binary values are constructed and added to the audience_score_bin column.

Two features are chosen for modelling: the critics score and the release date. LabelEncoder is used to encode the Release Date into numerical values because it is a categorical feature. The binarized audience score is the desired variable. A 75%–25% ratio is used to divide the dataset into training and testing subsets.

Next, using the training subset, a logistic regression model is trained, and predictions are made using the testing subset. Lastly, the accuracy_score function is used to calculate the accuracy of the model, and the result is reported. An accuracy metric of the logistic regression model's ability to categorise audience scores based on critic scores and release dates is obtained throughout the entire procedure.



In [3]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import LabelEncoder
import warnings

# Suppress warnings
warnings.filterwarnings("ignore")

# Load the CSV file created from the first notebook
df = pd.read_csv("rottentomatoesmovies.csv")

# Binarize audience score (negative if <50%, positive if >=50%)
def binarize_score(score):
    if isinstance(score, int):  # Check if the value is already an integer
        return 1 if score >= 50 else 0
    else:
        return 1 if int(score.strip('%')) >= 50 else 0

df['audience_score_bin'] = df['Audience Score'].apply(binarize_score)

# Select features and target variable
features = df[['Release Date', 'Critics Score']]  # Selecting features excluding 'Title'
label_encoder = LabelEncoder()  # For encoding categorical features
features['Release Date'] = label_encoder.fit_transform(features['Release Date'])

target = df['audience_score_bin']

# Split the data into training (75%) and testing (25%) sets
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.25, random_state=42)

# Train a simple logistic regression model
model = LogisticRegression()
model.fit(X_train, y_train)

# Predict on the testing set
y_pred = model.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)

print("Model accuracy on the testing set:", accuracy)


Model accuracy on the testing set: 0.93974175035868


The model accuracy achieved is 93%