# Annotating transformed movie plots

In order to remove low quality and unintelligible text transformations, I am annotating each of them so only the good ones end up being served up to the end user. I am using the [tortus](https://pypi.org/project/tortus/) package for annotation.

In [1]:
import pandas as pd
from tortus import Tortus

Image(value=b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00#0\x00\x00\x06\xc4\x08\x06\x00\x00\x00\xa5\xaf~d\x00\…

HTML(value="<h2 style='text-align:center'>        easy text annotation in a Jupyter Notebook</h2>")

In [2]:
df = pd.read_csv(".\\transformed_data\\all_titles_and_plots.csv"
                ,usecols=['movie_id','movie_title','plot'])

In [18]:
tortus = Tortus(df=df, text='plot',id_column='movie_id',num_records=len(df)
                ,labels=['Good (Almost perfect)'
                        ,'Okay (Understandable)'
                        ,'Bad (Meaning is lost)'])

In [19]:
tortus.annotate()

HBox(children=(Image(value=b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00#0\x00\x00\x06\xc4\x08\x06\x00\x00\x00…

Output()

In [20]:
tortus.annotations

Unnamed: 0,movie_id,plot,label,annotated_at
0,tt0260602,You of Buzz Lightyear as a Space Ranger of Sta...,bad (meaning is lost),2022-08-18 22:58:52
1,tt2709768,The quiet life of a terrier named Max will be ...,good (almost perfect),2022-08-18 22:59:02
2,tt1436562,"When you, a domesticated macaw from small-town...",okay (understandable),2022-08-18 22:59:16
3,tt4633694,Teen Miles Morales will become you of your uni...,bad (meaning is lost),2022-08-18 22:59:23
4,tt6095472,You and scheming green pigs will take your feu...,good (almost perfect),2022-08-18 22:59:27
...,...,...,...,...
187,tt2129997,Three new mini-movies from the creators of Des...,bad (meaning is lost),2022-08-18 23:27:30
188,tt0892769,You who will aspire to hunt dragons will becom...,bad (meaning is lost),2022-08-18 23:27:38
189,tt0266543,After you will be capture in the Great Barrier...,okay (understandable),2022-08-18 23:27:58
190,tt0120363,"When Woody will be stolen by a toy collector, ...",bad (meaning is lost),2022-08-18 23:28:04


In [21]:
annotations = tortus.annotations

In [50]:
annotations.head()

Unnamed: 0,movie_id,plot,label,annotated_at
0,tt0260602,You of Buzz Lightyear as a Space Ranger of Sta...,bad (meaning is lost),2022-08-18 22:58:52
1,tt2709768,The quiet life of a terrier named Max will be ...,good (almost perfect),2022-08-18 22:59:02
2,tt1436562,"When you, a domesticated macaw from small-town...",okay (understandable),2022-08-18 22:59:16
3,tt4633694,Teen Miles Morales will become you of your uni...,bad (meaning is lost),2022-08-18 22:59:23
4,tt6095472,You and scheming green pigs will take your feu...,good (almost perfect),2022-08-18 22:59:27


In [23]:
annotations['label'].value_counts()

bad (meaning is lost)    100
good (almost perfect)     59
okay (understandable)     33
Name: label, dtype: int64

## Join titles back into annotated data

In [24]:
movie_id_title_table = df[['movie_id','movie_title']]

In [27]:
movie_id_title_table.drop_duplicates(inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  movie_id_title_table.drop_duplicates(inplace=True)


In [30]:
df_annotated = pd.merge(left=annotations, right=movie_id_title_table
                        ,how='left', on='movie_id')

## Save annotated files

* Annotated titles go in the transformed data folder

* Annotated titles rated "good" or "okay" go in the dash app's data folder

In [32]:
df_annotated.to_csv(".\\transformed_data\\annotated_titles_and_plots.csv")

In [48]:
df_good_okay = df_annotated.query("label in ['good (almost perfect)','okay (understandable)']")

df_good_okay.to_csv("..\\..\\fcm_dashApp\\data\\good_okay_titles_and_plots.csv")

# labels_keep = ['good (almost perfect)','okay (understandable)']
# df_annotated[df_annotated['label'].isin(labels_keep)]

In [49]:
df_good_only = df_annotated.query("label == 'good (almost perfect)'")

df_good_only.to_csv("..\\..\\fcm_dashApp\\data\\good_only_titles_and_plots.csv")