## Model testing to ensure project feasibility

In [1]:
import pandas as pd
df = pd.read_csv("../data/youtube_data.csv")

In [2]:
df.columns

Index(['video_id', 'duration', 'bitrate', 'bitrate(video)', 'height', 'width',
       'frame rate', 'frame rate(est.)', 'codec', 'category', 'url', 'title',
       'description', 'hashtags', 'views', 'likes', 'comments'],
      dtype='object')

**Dropping** Null rows in features that will be used later for anomaly detection

In [3]:
df.dropna(subset=['bitrate', 'views', 'likes', 'duration'], inplace=True);

Using **Isolation Forest** for  quick anomaly detection without getting very complex

In [5]:
from sklearn.ensemble import IsolationForest

features = df[['bitrate', 'views', 'likes', 'duration']]
model = IsolationForest(contamination=0.01, random_state=42)
df['is_anomaly'] = model.fit_predict(features)

**Filtering** the dataframe to get the number of anomalies

In [6]:
anomalies = df[df['is_anomaly'] == -1]
print(f"Found {len(anomalies)} out of {len(df)} videos")

Found 176 out of 17589 videos


176 videos are flagged as anomalies. Therefore, project is feasible.