![image info](https://raw.githubusercontent.com/albahnsen/MIAD_ML_and_NLP/main/images/banner_1.png)

# Proyecto 2 - Clasificación de género de películas

El propósito de este proyecto es que puedan poner en práctica, en sus respectivos grupos de trabajo, sus conocimientos sobre técnicas de preprocesamiento, modelos predictivos de NLP, y la disponibilización de modelos. Para su desarrollo tengan en cuenta las instrucciones dadas en la "Guía del proyecto 2: Clasificación de género de películas"

**Entrega**: La entrega del proyecto deberán realizarla durante la semana 8. Sin embargo, es importante que avancen en la semana 7 en el modelado del problema y en parte del informe, tal y como se les indicó en la guía.

Para hacer la entrega, deberán adjuntar el informe autocontenido en PDF a la actividad de entrega del proyecto que encontrarán en la semana 8, y subir el archivo de predicciones a la [competencia de Kaggle](https://www.kaggle.com/t/2c54d005f76747fe83f77fbf8b3ec232).

## Datos para la predicción de género en películas

![image info](https://raw.githubusercontent.com/albahnsen/MIAD_ML_and_NLP/main/images/moviegenre.png)

# Disponibilización del modelo

In [1]:
import pandas as pd
import string
import nltk
import joblib
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.multiclass import OneVsRestClassifier
from sklearn.linear_model import LogisticRegression

In [2]:
data = pd.read_csv('https://github.com/albahnsen/MIAD_ML_and_NLP/raw/main/datasets/dataTraining.zip', encoding='UTF-8', index_col=0) # Asegúrate que contenga columnas 'plot' y 'genres'

In [3]:
# Clases de géneros
data['genres'] = data['genres'].map(lambda x: eval(x))
mlb = MultiLabelBinarizer()
y = mlb.fit_transform(data['genres'])

In [4]:
# Guardamos los géneros para referencia
joblib.dump(mlb, 'model_deployment/generos.pkl')

['model_deployment/generos.pkl']

In [5]:
# Preprocesamiento de texto
stop_words = set(stopwords.words('english'))
lemmatizer = WordNetLemmatizer()

def procesar(texto):
    texto = texto.lower()
    texto = texto.translate(str.maketrans('', '', string.punctuation))
    tokens = word_tokenize(texto)
    tokens = [t for t in tokens if t not in stop_words]
    tokens = [lemmatizer.lemmatize(t, pos='v') for t in tokens]
    return ' '.join(tokens)

data['plot_n'] = data['plot'].apply(procesar)

In [6]:
# Vectorización
vectorizer = CountVectorizer(max_features=1000)
X = vectorizer.fit_transform(data['plot_n'])

In [7]:
# Entrenamiento del modelo multietiqueta
clf = OneVsRestClassifier(LogisticRegression())
clf.fit(X, y)

In [8]:
# Guardar modelo y vectorizador

joblib.dump(clf, 'model_deployment/modelo.pkl')
joblib.dump(vectorizer, 'model_deployment/vectorizer.pkl')

['model_deployment/vectorizer.pkl']

In [9]:
from model_deployment.m09_model_deployment import predict_genres

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\carol\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\carol\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\carol\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


In [10]:
print(predict_genres("a serial killer decides to teach the secrets of his satisfying career to a video store clerk ."))

['Drama']


In [11]:
print(predict_genres("now that tony stark has revealed to the world that he is iron man ,  the entire world is now eager to get their hands on his hot technology  -  whether it ' s the united states government ,  weapons contractors ,  or someone else .  that someone else happens to be ivan vanko  -  the son of now deceased anton vanko ,  howard stark ' s former partner .  stark had vanko banished to russia for conspiring to commit treason against the us ,  and now ivan wants revenge against tony  -  and he ' s willing to get it at any cost .  but after being humiliated in front of the senate armed forces committee ,  rival weapons contractor justin hammer sees ivan as the key to upping his status against stark enterprises after an attack on the monaco  N  .  but an ailing tony has to figure out a way to save himself ,  get vanko ,  and get hammer before the government shows up and takes his beloved suits away .  and can he figure out what a mysterious figure named nick fury wants with him ?"))

['Action', 'Thriller']


In [12]:
print(predict_genres('tinker bell journey far north of never land to patch things up with her friend terence and restore a pixie dust tree .'))

['Adventure', 'Drama', 'Family']


In [13]:
print(predict_genres("a mysterious killer video tape is circulating around .  one look at this tape and you have seven days left to live .  news reporter cindy campbell  ( faris )  witnesses this video tape and tries to work out a way to prevent her death .  but this is not the only mystery to appear .  crop circles have been appearing in the local farm of tom  ( sheen )  and george  ( rex )  .  with help from aunt shaneequa  ( latifah )  ,  cindy suspects that the aliens may be linked with the killer tape and must now work out both mysteries before it ' s the end of the world ."))

['Crime', 'Horror', 'Mystery', 'Thriller']


In [14]:
dataTesting = pd.read_csv('https://github.com/albahnsen/MIAD_ML_and_NLP/raw/main/datasets/dataTesting.zip', encoding='UTF-8', index_col=0)

In [15]:
dataTesting

Unnamed: 0,year,title,plot
1,1999,Message in a Bottle,"who meets by fate , shall be sealed by fate ...."
4,1978,Midnight Express,"the true story of billy hayes , an american c..."
5,1996,Primal Fear,martin vail left the chicago da ' s office to ...
6,1950,Crisis,husband and wife americans dr . eugene and mr...
7,1959,The Tingler,the coroner and scientist dr . warren chapin ...
...,...,...,...
11263,2008,The Fifth Commandment,"in bangkok , an assassin who turns down a job..."
11265,2003,Coffee and Cigarettes,eleven separate vignettes are presented . in ...
11269,1957,Pal Joey,"joey evans is charming , handsome , funny , ..."
11270,2002,Jonah: A VeggieTales Movie,when the singing veggies encounter some car tr...


In [16]:
# app.py
from flask import Flask
from flask_restx import Api, Resource, fields
import joblib
from  model_deployment.m09_model_deployment import predict_genres

app = Flask(__name__)
api = Api(app, version='1.0', title='Clasificación de géneros de películas', description='Predicción del género de una película dado la sinopsis.')

ns = api.namespace('predict', description='Clasificación de género')

parser = api.parser()

parser.add_argument('plot', type=str, required=True, help='Agrega la sinopsis de la película', location='args')

resource_fields = api.model('Resource', {
    'genres': fields.List(fields.String),
})

@ns.route('/')
class GenreApi(Resource):
    @api.doc(parser=parser)
    @api.marshal_with(resource_fields)
    def get(self):
        args = parser.parse_args()
        plot = args['plot']
        genres = predict_genres(plot)
        return {"genres": genres}, 200

if __name__ == '__main__':
    app.run(debug=True, use_reloader=False, host='0.0.0.0', port=5000)

 * Serving Flask app '__main__'
 * Debug mode: on


 * Running on all addresses (0.0.0.0)
 * Running on http://127.0.0.1:5000
 * Running on http://192.168.80.18:5000
Press CTRL+C to quit
127.0.0.1 - - [21/May/2025 21:35:50] "GET /predict/?plot=dr%20.%20%20tess%20coleman%20and%20her%20fifteen%20-%20year%20-%20old%20daughter%20,%20%20anna%20,%20%20are%20not%20getting%20along%20.%20%20they%20don%20'%20t%20see%20eye%20-%20to%20-%20eye%20on%20clothes%20,%20%20hair%20,%20%20music%20,%20%20and%20certainly%20not%20in%20each%20other%20'%20s%20taste%20in%20men%20.%20%20one%20thursday%20evening%20,%20%20their%20disagreements%20reach%20a%20fever%20pitch%20%20-%20%20anna%20is%20incensed%20that%20her%20mother%20doesn%20'%20t%20support%20her%20musical%20aspirations%20and%20tess%20,%20%20a%20widow%20about%20to%20remarry%20,%20%20can%20'%20t%20see%20why%20anna%20won%20'%20t%20give%20her%20fiancÃ©%20a%20break%20.%20%20everything%20soon%20changes%20when%20two%20identical%20chinese%20fortune%20cookies%20cause%20a%20little%20mystic%20mayhem%20.%20%20the%20n