#### IMPORTACIONES

In [1]:
import pandas as pd
import numpy as np
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', 200)

#### Proceso de Creación del CSV de Predicciones de Los Simpson: Generación de Datos con Asistencia de la IA.

Es importante destacar que el proceso de creación y estructuración de estos conjunto de datos (CSV) sobre las predicciones de Los Simpson ha sido un ejercicio práctico asistido por las herramientas de Inteligencia Artificial (IA), siguiendo las metodologías aprendidas en talleres sobre el uso correcto y avanzado de modelos de lenguaje.

El proceso para crear los archivos CSV con las predicciones de Los Simpson se divide en tres fases principales (la Búsqueda de Datos, la Estructuración (Definición de Columnas) y la Normalización (Limpieza y Formato)).

A continuación, se detalla el recorrido desde la idea inicial hasta el producto final:

>*1. Fase de Investigación y Búsqueda de DatosEsta fase comienza con la identificación de la fuente de datos y la recopilación de la información.*

La búsqueda inicial se centra en compilar una lista exhaustiva de las "predicciones" de la serie. Esto implica revisar artículos, bases de datos de fans, y recursos que comparan eventos de la vida real con momentos específicos de los episodios de Los Simpson.

- Identificación y Verificación: Para cada supuesta predicción, se verifican dos componentes críticos:
 - El Evento de la Serie: Se determina el nombre exacto del episodio, la temporada, el número de episodio y la fecha de emisión original.
 - El Evento de la Vida Real (Predicción): Se verifica la fecha en que ocurrió el evento real y la descripción precisa de la coincidencia. 
 - Definición de Campos Clave: Durante esta recopilación, se identifican las piezas de información necesarias para catalogar y comparar cada caso, lo que lleva directamente a la creación de las columnas.

Los datos son buscados, localizados y validados de manera manual. La IA procesó el texto proporcionado. En lugar de introducir cada uno de los 59 registros manualmente, se utilizó la Inteligencia Artificial (IA) como una potente herramienta de procesamiento de texto para la creación de las listas.

>*2. Fase de Estructuración y Definición de Columnas*

Una vez recopilados los datos brutos, se decide cómo deben organizarse para el análisis. Se determinan 12 columnas para documentar y analizar cada predicción de Los Simpson.

1. Campos de Identificación y Localización del Episodio: 

* ID: Identificador único y referenciable de cada predicción (ej., ES-01).
* EPISODIO: El título completo del episodio. Se encerró entre comillas dobles para manejar comas internas y evitar errores de formato CSV.
* NÚMERO  TEMPORADA Y EPISODIO: usados para facilitar la localización del material fuente.
* FECHA DE EMISION y AÑO: base para todos los cálculos de tiempo.

2. Campos de Clasificación y Contexto:

* CATEGORIA: El tema principal de la predicción (ej., Política, Tecnología, Sociedad). Permite el análisis temático.
* UBICACION: El lugar geográfico relevante para el evento de la serie o el evento real (ej., EE. UU., Japón). Permite análisis geográfico.
* DESCRIPCION: Detalle de la escena o evento que ocurre dentro del episodio de Los Simpson.
* PREDICCION: Detalle del evento que ocurrió en la vida real y la fecha de su acontecimiento.

3. Campos de Métrica Clave (Análisis):

* DIFERENCIA (AÑOS): La métrica muestra años transcurridos entre la emisión y el evento real. Se calcula como (Año Real - Año de Emisión).
* NIVEL DE COINCIDENCIA: Una valoración subjetiva (Baja, Media, Alta) que clasifica la fidelidad o exactitud de la predicción. Métrica para clasificar la calidad de la coincidencia.

>*3. Fase de Normalización y Limpieza (Creación del CSV) Una vez definidas las columnas, los datos se insertan y se formatean de manera estandarizada.*

* Estandarización del Formato: 
    - Se utiliza la coma (,) como separador de campos (el estándar CSV). 
    - El formato de fecha se unifica a AAAA-MM-DD (ejemplo: 1990-11-01).

* Generación de Listas Estructuradas: 
La IA procesó la información y la formateó en listas de Python o en diccionarios anidados. Esta salida fue diseñada para ser consumida directamente por la librería Pandas.

* Creación del CSV con Pandas: 
Las listas estructuradas y generadas por IA fueron importadas a un entorno Python. La librería Pandas utilizó estas listas como entrada para crear el DataFrame final, que tras un proceso de EDA fue limpiado y complementado con otro dataset y finalmente fue exportado al formato CSV.

En resumen: El análisis del esquema fue humano, la generación masiva de los datos listos para el código fue asistida por la IA, y la creación final del archivo CSV fue gestionada por la robustez de Pandas.

#### CREACIÓN CSV CON PANDA:

INGLÉS

In [2]:
datos_en = {
    'ID': ['EN-01', 'EN-02', 'EN-03', 'EN-04', 'EN-05', 'EN-06', 'EN-07', 'EN-08', 'EN-09', 'EN-10', 'EN-11', 'EN-12', 'EN-13', 'EN-14', 'EN-15', 'EN-16', 'EN-17', 'EN-18', 'EN-19', 'EN-20', 'EN-21', 'EN-22', 'EN-23', 'EN-24', 'EN-25', 'EN-26', 'EN-27', 'EN-28', 'EN-29', 'EN-30', 'EN-31', 'EN-32', 'EN-33', 'EN-34', 'EN-35', 'EN-36', 'EN-37', 'EN-38', 'EN-39', 'EN-40', 'EN-41', 'EN-42', 'EN-43', 'EN-44', 'EN-45', 'EN-46', 'EN-47', 'EN-48', 'EN-49', 'EN-50', 'EN-51', 'EN-52', 'EN-53', 'EN-54', 'EN-55', 'EN-56', 'EN-57', 'EN-58', 'EN-59'],
    'EPISODIO': ['two cars in every garage...', 'Itchy & Scratchy & Marge', 'bart the murderer', 'brush with greatness', 'oh brother, where art thou?', 'brother, can you spare two dimes?', 'homer at the bat', 'itchy & scratchy: the movie', 'new kid on the block', 'marge vs. the monorail', 'rosebud', '$$pringfield (tiger attack)', 'the old man and the sea of trouble', 'the old man and the sea of trouble', 'the old man and the sea of trouble', 'whacking day', 'last exit to springfield', 'lisa vs. malibu stacy', 'lisa on ice', "lisa's wedding", "lisa's wedding", 'lemon of troy', 'who shot mr. burns?', "a bard's tale / homerpalooza", 'the day the violence died', 'bart after dark', 'the mysterious voyage of homer', 'the city of new york vs. homer...', 'treehouse of horror viii', "lisa's sax", 'lisa the simpson', 'bart carny', 'the wizard of evergreen terrace', 'lard of the dance', 'when you dish upon a star', 'viva ned flanders', "they saved lisa's brain", 'e-i-e-i-(annoyed grunt)', 'bart to the future', 'bart to the future', "poppa's got a brand new badge", 'diatribe of a mad housewife', 'midnight rx', "homer's paternity test", 'the simpsons movie', "please homer, don't hammer 'em", 'treehouse of horror xix', 'elementary school musical', 'the Winter of His Content', 'the replaceable you', 'lisa goes gaga', 'the ned and edna blend', 'politically inept, with homer...', 'you don\'t have to live like a referee', 'friends and family', 'the great phatsby', 'the serfsons', 'west wing story', 'it\'s a blunderful life'],
    'S.': ['S2', 'S2', 'S3', 'S2', 'S2', 'S3', 'S3', 'S4', 'S4', 'S4', 'S5', 'S5', 'S4', 'S4', 'S4', 'S4', 'S4', 'S5', 'S6', 'S6', 'S6', 'S6', 'S6', 'S7', 'S7', 'S8', 'S8', 'S9', 'S9', 'S9', 'S9', 'S9', 'S10', 'S10', 'S10', 'S10', 'S10', 'S11', 'S11', 'S11', 'S13', 'S15', 'S16', 'S17', 'MOVIE', 'S18', 'S20', 'S22', 'S25', 'S23', 'S23', 'S23', 'S23', 'S25', 'S28', 'S28', 'S29', 'SHORT', 'S35'],
    'EP. NUM.': [4, 9, 4, 18, 15, 24, 16, 6, 8, 12, 4, 10, 21, 21, 21, 20, 17, 14, 8, 19, 19, 24, 25, '21 / 19', 18, 7, 9, 1, 4, 3, 17, 12, 2, 1, 5, 10, 22, 5, 17, 17, 22, 14, 6, 10, 'n/a', 3, 4, 1, 14, 4, 22, 21, 10, 16, 2, 12, 1, 'n/a', 7],
    'AIR DATE': ['1990-11-01', '1990-12-20', '1991-10-10', '1991-04-11', '1991-02-21', '1992-08-27', '1992-02-20', '1992-11-03', '1992-11-12', '1993-01-14', '1993-10-21', '1993-12-16', '1993-05-06', '1993-05-06', '1993-05-06', '1993-04-29', '1993-03-11', '1994-02-17', '1994-11-19', '1995-03-19', '1995-03-19', '1995-05-07', '1995-05-21', '1996-05-05', '1996-03-17', '1996-11-24', '1997-01-05', '1997-09-21', '1997-10-26', '1997-10-19', '1998-03-08', '1998-04-26', '1998-09-20', '1998-08-23', '1998-11-08', '1999-01-10', '1999-05-09', '1999-11-07', '2000-04-09', '2000-04-09', '2002-05-21', '2004-04-18', '2005-01-16', '2006-01-08', '2007-07-27', '2008-09-21', '2008-11-02', '2010-09-26', '2014-03-16', '2011-11-06', '2012-05-20', '2012-05-13', '2012-01-08', '2014-03-30', '2016-10-02', '2017-01-15', '2017-04-30', '2018-04-08', '2019-02-10'],
    'YEAR': [1990, 1990, 1991, 1991, 1991, 1992, 1992, 1992, 1992, 1993, 1993, 1993, 1993, 1993, 1993, 1993, 1993, 1994, 1994, 1995, 1995, 1995, 1995, 1996, 1996, 1996, 1997, 1997, 1997, 1997, 1998, 1998, 1998, 1998, 1998, 1999, 1999, 1999, 2000, 2000, 2002, 2004, 2005, 2006, 2007, 2008, 2008, 2010, 2014, 2011, 2012, 2012, 2012, 2014, 2016, 2017, 2017, 2018, 2019],
    'PREDICTION CATEGORY': ['environment', 'society', 'celebrity', 'celebrity', 'automotive', 'technology', 'sports', 'advertising', 'legal', 'history', 'society', 'celebrity', 'politics', 'pandemics', 'nature', 'society', 'cinema', 'politics', 'technology', 'technology', 'technology', 'society', 'environment', 'music', 'politics', 'technology', 'politics', 'history', 'animation', 'pandemics', 'science', 'technology', 'science', 'economy', 'business', 'cinema', 'science', 'society', 'politics', 'video games', 'legal', 'cinema', 'legal', 'technology', 'politics', 'economy', 'politics', 'economy', 'sports', 'robotics', 'music', 'technology', 'economy', 'sports', 'technology', 'celebrity', 'television', 'politics', 'infrastructure'],
    'GEOGRAPHICAL LOCATION': ['japan / south america', 'us / russia', 'global', 'global', 'global', 'global', 'us', 'new zealand', 'us (massachusetts)', 'us (new york)', 'global', 'us (las vegas)', 'us', 'global', 'us', 'us (florida)', 'global', 'us / global', 'global', 'uk / global', 'global', 'us (houston)', 'global', 'uk', 'us (washington d. c.)', 'global', 'us', 'us (new york)', 'global', 'africa', 'global', 'global', 'global', 'us (new york)', 'us', 'global', 'global', 'japan / global', 'us', 'global', 'argentina', 'global', 'canada', 'north atlantic ocean', 'us', 'us', 'us', 'sweden / global', 'south korea', 'japan / global', 'us', 'global', 'greece', 'brazil / switzerland', 'global', 'us', 'global', 'us', 'chile / spain'],
    'DIFFERENCE (YEARS)': [33, 26, 3, 22, 30, 20, 1, 11, 21, 8, 20, 10, 28, 27, 27, 27, 4, 29, 18, 15, 15, 18, 18, 28, 25, 5, 16, 4, 1, 3, 14, 10, 14, 14, 21, 10, 1, 14, 16, 6, 11, 17, 13, 17, 6, 13, 4, 6, 8, 0, 5, 11, 3, 1, 8, 7, 2, 0, 2],
    'COINCIDENCE LEVEL': ['medium', 'medium', 'high', 'high', 'medium', 'medium', 'high', 'medium', 'high', 'low', 'high', 'high', 'high', 'medium', 'high', 'high', 'medium', 'high', 'high', 'medium', 'high', 'high', 'medium', 'high', 'high', 'high', 'high', 'low', 'high', 'medium', 'medium', 'medium', 'high', 'high', 'high', 'high', 'medium', 'medium', 'high', 'high', 'high', 'medium', 'high', 'high', 'medium', 'high', 'high', 'high', 'high', 'medium', 'high', 'high', 'high', 'high', 'high', 'high', 'high', 'high', 'medium'],
    'DESCRIPTION': ['blinky the three-eyed fish appears near the nuclear plant. the japanese pm eats fukushima fish.', "marge calls for the censorship of michelangelo's david. the statue is covered with jeans.", 'it is mentioned that melanie griffith and don johnson are going to separate.', 'ringo starr is seen responding to countless fan letters.', 'homer designs "the homer," a ridiculous and futuristic car priced at $82,000.', "herb invents a device capable of decoding babies' babbling.", 'mr. burns benches don mattingly for his "rebellious sideburns.".', 'a billboard is installed with scratchy being decapitated, and blood splashing outside the sign.', 'homer sues an all-you-can-eat restaurant for deceptive advertising after being kicked out.', 'a picture shows two towers with a destroyed plane (coincidence visual).', 'burns finds his teddy bear, bobo, after a long time in a buried box.', 'gunter and ernst (siegfried and roy parody) are attacked by their trained white tiger.', 'mayor quimby flees springfield during a pandemic.', 'the rest of springfield catches the osaka flu after a shipment arrives.', 'a crowd overturns a truck, releasing a swarm of "killer bees".', 'annual springfield tradition where residents gather to beat snakes to death.', 'a scene from the mcbain movie parodies cheesy dialogue with a terrible ice-related pun.', 'lisa is concerned about the message of the doll. / the anchor briefly mentions that "the president was arrested."', 'the message "beat up martin" is autocorrected to "eat up martha" (apple newton).', 'poster advertises rolling stones\' "steel wheelchair tour 2010." / spire-like building (the shard). / robot librarians.', "lisa uses a screen phone showing marge's face in real-time (a video call).", "homer and bart work together to retrieve springfield's lemon tree after it is stolen.", 'mr. burns blocks the sun with a giant disc to force citizens to use his power.', "cypress hill performs with the london symphony orchestra after hiring them.", 'a furious mob of cartoon characters assaults the capitol.', 'burns\' communicator is a square with a smaller black square and a circle.', 'telephone wires are seen labeled with names of spy agencies like nsa, fbi, and cia.', 'a magazine cover shows "$9" next to the twin towers\' silhouettes.', 'bart fuses his dog and cat.', 'marge reads bart a book titled "**curious george and the ebola virus**."', 'the simpson gene that affects the intelligence of male simpsons is discussed.', 'children play *yard work simulator*. a house has satellite dishes.', 'homer writes a complex mathematical equation on a blackboard.', 'homer devises a plan to steal used cooking grease from restaurants to sell it.', 'a scene shows that 20th century fox studio is now owned by walt disney.', 'homer wakes up in a destroyed hotel room and meets a boxer.', 'homer theorizes about a donut-shaped or "torus" universe.', 'homer invents the mutated tomato "tomacco." horse meat is discovered in schools.', 'lisa mentions the difficult task of cleaning up the economy after president donald trump.', 'a virtual console exists in the future.', 'fat tony plans to sell ferrets with cotton balls glued on as toy poodles.', 'a poster says "matrix christmas" and "coming soon."', 'ned discovers that "reeferino" (marihuana) is legal in canada.', 'homer is trapped in a submersible while looking for a sunken ship.', 'a man catching a signal appears to work for the nsa. springfield is enclosed in a dome. / president schwarzenegger.', 'marge mentions that a jcpenney "used to be here" in a run-down mall.', 'homer attempts to vote for obama on an electronic machine that incorrectly registers his vote.', "milhouse's pick for the nobel prize in economics is bengt r. holmstrom.", 'homer and marge win olympic gold in curling by defeating sweden.', '**a robot seal is created**. homer uses a camera hat that can twist his neck.', 'lady gaga flies over the audience and plays the piano during her concert.', '**an application on homer\'s smartphone contains an "x" as a symbol**.', '**a headline on a news program addresses the financial instability in greece**.', 'homer becomes a referee after a fifa corruption scandal. / brazil vs. germany match. neymar injured.', 'all of springfield uses virtual reality glasses, bumping and stumbling in public.', 'homer and burns crash an all-white themed party hosted by a hip-hop musician similar to diddy.', 'in a *game of thrones* parody, a dragon sets the village on fire.', 'president joe biden and vice president kamala harris are shown as dance partners.', 'there is a total power outage in springfield.'],
    'PREDICTION': ['a real three-eyed fish was found. pm kishida ate fukushima seafood (2023).', 'controversies over david in st. petersburg (2016) and a florida school (2023).', 'melanie griffith and don johnson separated in 1994.', 'paul mccartney replied to a letter sent by two fans 50 years prior (1963-2013).', "the launch of elon musk's cybertruck.", "apps like cry translator and zoundream exist today.", 'the yankees manager benched don mattingly for not cutting his hair (1993).', 'a billboard for *kill bill* showed the character\'s blood splashing outside the sign in new zealand.', 'a man from springfield, ma, sued golden corral for deceptive advertising after being expelled.', 'september 11, 2001, attacks.', 'a 100+ year-old teddy bear was found in a time capsule in 2013.', 'roy horn was attacked by his tiger, montecore, during a live show in 2003.', 'us mayors and senators traveled while urging citizens to stay home (2021).', 'similar to the covid-19 pandemic.', 'report of the asian giant hornet ("murder hornet") in the us (2020).', 'the "florida python challenge" (annual event to control the python population).', 'the movie *batman & robin* (1997) became infamous for mr. freeze\'s terrible puns.', 'the *barbie* movie (2023). / president donald trump was arrested (2023).', 'the error is a parallel to the modern autocorrect function.', 'the rolling stones are still active. / construction of the shard. / prototype librarian robot (2016).', 'apple announced the development of "facetime" in 2010.', "a similar lemon tree theft occurred in houston, texas (2013).", 'george soros\' proposal to use a cloud layer over the arctic (geoengineering).', 'cypress hill and the london symphony orchestra performed together in 2024.', 'us capitol riots (jan 6, 2021).', 'the ipod is invented.', 'nsa spying was confirmed in 2013.', 'september 11, 2001, attacks.', 'the *catdog* series premiered in 1998.', 'ebola outbreaks occurred in africa in 2000 and 2014.', 'the rgs14 gene was nicknamed **homer simpson** in 2012.', 'farmville is created and satellite dishes become popular.', 'homer\'s equation closely predicts the mass of the higgs boson (discovered in 2012).', 'thieves in new york stole grease to recycle it into biodiesel (2012).', 'walt disney announced a merger with 21st century fox in 2019.', 'the movie *the hangover* came out (2009).', 'the "torus" theory (donut-shaped universe) became popular in the early 2000s.', 'mutant vegetables in 2013 after fukushima. horse meat in schools (2013).', 'donald trump was elected president in 2016.', 'the wii is created.', 'an argentine pensioner was discovered selling weasels groomed as toy poodles (2013).', '*the matrix resurrections* (the fourth installment) premiered in december 2021.', 'recreational marijuana was legalized in canada in 2018.', 'the titan submersible tragedy (2023).', 'nsa spying was confirmed in 2013. the series *under the dome* came out. / politician herman cain.', 'the jcpenney department store chain filed for bankruptcy in 2020.', 'problems with electronic voting machines were reported in 2012 and 2016.', 'bengt r. holmstrom won the nobel prize in economics six years later (2016).', 'the united states won the gold medal in curling at the 2018 pyeongchang olympics by defeating sweden.', '**a robot seal (Paro) is created. the GoPro is invented**.', 'lady gaga performed at Super Bowl LI (2017), where she was lifted over the audience and played the piano.', '**Twitter was renamed to X in 2023**.', '**the Greek economic crisis of 2015**.', 'police raid on FIFA headquarters (2015). the Brazil vs. Germany match and Neymar\'s injury (2014 World Cup).', 'Apple launched its Vision Pro glasses (2024), with viral videos of people using them in public.', 'diddy\'s "white party" scandal (Sean Combs).', 'Daenerys Targaryen\'s villainous turn (2019).', 'the scene was created before Biden and Harris were running mates.', 'there are massive power outages in Chile and Spain in 2025.'],
}

In [3]:
df_En = pd.DataFrame(datos_en)


In [4]:
df_En.head()

Unnamed: 0,ID,EPISODIO,S.,EP. NUM.,AIR DATE,YEAR,PREDICTION CATEGORY,GEOGRAPHICAL LOCATION,DIFFERENCE (YEARS),COINCIDENCE LEVEL,DESCRIPTION,PREDICTION
0,EN-01,two cars in every garage...,S2,4,1990-11-01,1990,environment,japan / south america,33,medium,blinky the three-eyed fish appears near the nuclear plant. the japanese pm eats fukushima fish.,a real three-eyed fish was found. pm kishida ate fukushima seafood (2023).
1,EN-02,Itchy & Scratchy & Marge,S2,9,1990-12-20,1990,society,us / russia,26,medium,marge calls for the censorship of michelangelo's david. the statue is covered with jeans.,controversies over david in st. petersburg (2016) and a florida school (2023).
2,EN-03,bart the murderer,S3,4,1991-10-10,1991,celebrity,global,3,high,it is mentioned that melanie griffith and don johnson are going to separate.,melanie griffith and don johnson separated in 1994.
3,EN-04,brush with greatness,S2,18,1991-04-11,1991,celebrity,global,22,high,ringo starr is seen responding to countless fan letters.,paul mccartney replied to a letter sent by two fans 50 years prior (1963-2013).
4,EN-05,"oh brother, where art thou?",S2,15,1991-02-21,1991,automotive,global,30,medium,"homer designs ""the homer,"" a ridiculous and futuristic car priced at $82,000.",the launch of elon musk's cybertruck.


In [5]:
archivo_en = 'thesimpsons_predictions_en.csv'

In [6]:
df_En.to_csv(archivo_en, index=False, encoding='utf-8')

print(f'DataFrame creado y exportado a {archivo_en}')

df_En


DataFrame creado y exportado a thesimpsons_predictions_en.csv


Unnamed: 0,ID,EPISODIO,S.,EP. NUM.,AIR DATE,YEAR,PREDICTION CATEGORY,GEOGRAPHICAL LOCATION,DIFFERENCE (YEARS),COINCIDENCE LEVEL,DESCRIPTION,PREDICTION
0,EN-01,two cars in every garage...,S2,4,1990-11-01,1990,environment,japan / south america,33,medium,blinky the three-eyed fish appears near the nuclear plant. the japanese pm eats fukushima fish.,a real three-eyed fish was found. pm kishida ate fukushima seafood (2023).
1,EN-02,Itchy & Scratchy & Marge,S2,9,1990-12-20,1990,society,us / russia,26,medium,marge calls for the censorship of michelangelo's david. the statue is covered with jeans.,controversies over david in st. petersburg (2016) and a florida school (2023).
2,EN-03,bart the murderer,S3,4,1991-10-10,1991,celebrity,global,3,high,it is mentioned that melanie griffith and don johnson are going to separate.,melanie griffith and don johnson separated in 1994.
3,EN-04,brush with greatness,S2,18,1991-04-11,1991,celebrity,global,22,high,ringo starr is seen responding to countless fan letters.,paul mccartney replied to a letter sent by two fans 50 years prior (1963-2013).
4,EN-05,"oh brother, where art thou?",S2,15,1991-02-21,1991,automotive,global,30,medium,"homer designs ""the homer,"" a ridiculous and futuristic car priced at $82,000.",the launch of elon musk's cybertruck.
5,EN-06,"brother, can you spare two dimes?",S3,24,1992-08-27,1992,technology,global,20,medium,herb invents a device capable of decoding babies' babbling.,apps like cry translator and zoundream exist today.
6,EN-07,homer at the bat,S3,16,1992-02-20,1992,sports,us,1,high,"mr. burns benches don mattingly for his ""rebellious sideburns."".",the yankees manager benched don mattingly for not cutting his hair (1993).
7,EN-08,itchy & scratchy: the movie,S4,6,1992-11-03,1992,advertising,new zealand,11,medium,"a billboard is installed with scratchy being decapitated, and blood splashing outside the sign.",a billboard for *kill bill* showed the character's blood splashing outside the sign in new zealand.
8,EN-09,new kid on the block,S4,8,1992-11-12,1992,legal,us (massachusetts),21,high,homer sues an all-you-can-eat restaurant for deceptive advertising after being kicked out.,"a man from springfield, ma, sued golden corral for deceptive advertising after being expelled."
9,EN-10,marge vs. the monorail,S4,12,1993-01-14,1993,history,us (new york),8,low,a picture shows two towers with a destroyed plane (coincidence visual).,"september 11, 2001, attacks."


#### AMPLIAMOS INFORMACIÓN CON OTROS DATASET

In [7]:
df_epi = pd.read_csv('Files/simpsons_episodes.csv')
df_epi.head(3)

Unnamed: 0,id,image_url,imdb_rating,imdb_votes,number_in_season,number_in_series,original_air_date,original_air_year,production_code,season,title,us_viewers_in_millions,video_url,views
0,10,http://static-media.fxx.com/img/FX_Networks_-_FXX/305/815/Simpsons_01_10.jpg,7.4,1511.0,10,10,1990-03-25,1990,7G10,1,Homer's Night Out,30.3,http://www.simpsonsworld.com/video/275197507879,50816.0
1,12,http://static-media.fxx.com/img/FX_Networks_-_FXX/245/843/Simpsons_01_12.jpg,8.3,1716.0,12,12,1990-04-29,1990,7G12,1,Krusty Gets Busted,30.4,http://www.simpsonsworld.com/video/288019523914,62561.0
2,14,http://static-media.fxx.com/img/FX_Networks_-_FXX/662/811/bart_gets_F.jpg,8.2,1638.0,1,14,1990-10-11,1990,7F03,2,"Bart Gets an ""F""",33.6,http://www.simpsonsworld.com/video/260539459671,59575.0


In [8]:
df_epi.shape

(600, 14)

In [9]:
df_epi.columns

Index(['id', 'image_url', 'imdb_rating', 'imdb_votes', 'number_in_season',
       'number_in_series', 'original_air_date', 'original_air_year',
       'production_code', 'season', 'title', 'us_viewers_in_millions',
       'video_url', 'views'],
      dtype='object')

In [10]:
df_epi.isna().sum()

id                        0
image_url                 4
imdb_rating               3
imdb_votes                3
number_in_season          0
number_in_series          0
original_air_date         0
original_air_year         0
production_code           0
season                    0
title                     0
us_viewers_in_millions    6
video_url                 4
views                     4
dtype: int64

#### RENOMBRAMOS COLUMNAS PARA TRABAJAR LOS DATOS EN LA MISMA LINEA

In [11]:
print("\nColumnas antes de renombrar y ejemplo a seguir:")
print(df_En.columns.tolist())
print(df_epi.columns.tolist())
      
# MAPEO NUEVOS NOMBRES
nuevos_nombres = {
    'EPISODIO': 'TITLE',
    'S.': 'SEASON',
    'EP. NUM.': 'NUMBER_IN_SEASON'
}

# RENOMBRAR
df_En = df_En.rename(columns=nuevos_nombres)
print("\nColumnas después de renombrar:")
print(df_En.columns.tolist())

# COMPROBACIÓN
df_En[['ID', 'TITLE', 'SEASON', 'NUMBER_IN_SEASON']].head(2)


Columnas antes de renombrar y ejemplo a seguir:
['ID', 'EPISODIO', 'S.', 'EP. NUM.', 'AIR DATE', 'YEAR', 'PREDICTION CATEGORY', 'GEOGRAPHICAL LOCATION', 'DIFFERENCE (YEARS)', 'COINCIDENCE LEVEL', 'DESCRIPTION', 'PREDICTION']
['id', 'image_url', 'imdb_rating', 'imdb_votes', 'number_in_season', 'number_in_series', 'original_air_date', 'original_air_year', 'production_code', 'season', 'title', 'us_viewers_in_millions', 'video_url', 'views']

Columnas después de renombrar:
['ID', 'TITLE', 'SEASON', 'NUMBER_IN_SEASON', 'AIR DATE', 'YEAR', 'PREDICTION CATEGORY', 'GEOGRAPHICAL LOCATION', 'DIFFERENCE (YEARS)', 'COINCIDENCE LEVEL', 'DESCRIPTION', 'PREDICTION']


Unnamed: 0,ID,TITLE,SEASON,NUMBER_IN_SEASON
0,EN-01,two cars in every garage...,S2,4
1,EN-02,Itchy & Scratchy & Marge,S2,9


#### NORMALIZAMOS CSV (Files/simpsons_episodes.csv) PARA COMPARAR DATOS Y COMPLETAR NUESTRA INFORMACIÓN:

In [12]:
# MAPEO NUEVOS NOMBRES
renombrar_episodes = {'season':'SEASON',
    'original_air_year': 'YEAR',
    'imdb_rating': 'IMDB_RATING',
    'imdb_votes': 'IMDB_VOTES',
    'production_code': 'PRODUCTION_CODE',
    'us_viewers_in_millions': 'US_VIEWERS_IN_MILLIONS',
    'video_url': 'VIDEO_URL',
    'views': 'VIEWS',
    'number_in_season':'NUMBER_IN_SEASON',
    'title': 'TITLE_EPISODES'
}

# RENOMBRAR
df_epi = df_epi.rename(columns=renombrar_episodes)
print("\nColumnas después de renombrar:")
print(df_En.columns.tolist())

# COMPROBACIÓN
df_epi[['TITLE_EPISODES','YEAR','IMDB_RATING', 'IMDB_VOTES', 'PRODUCTION_CODE', 
    'US_VIEWERS_IN_MILLIONS', 'VIDEO_URL', 'VIEWS', 'NUMBER_IN_SEASON']].head(2)


Columnas después de renombrar:
['ID', 'TITLE', 'SEASON', 'NUMBER_IN_SEASON', 'AIR DATE', 'YEAR', 'PREDICTION CATEGORY', 'GEOGRAPHICAL LOCATION', 'DIFFERENCE (YEARS)', 'COINCIDENCE LEVEL', 'DESCRIPTION', 'PREDICTION']


Unnamed: 0,TITLE_EPISODES,YEAR,IMDB_RATING,IMDB_VOTES,PRODUCTION_CODE,US_VIEWERS_IN_MILLIONS,VIDEO_URL,VIEWS,NUMBER_IN_SEASON
0,Homer's Night Out,1990,7.4,1511.0,7G10,30.3,http://www.simpsonsworld.com/video/275197507879,50816.0,10
1,Krusty Gets Busted,1990,8.3,1716.0,7G12,30.4,http://www.simpsonsworld.com/video/288019523914,62561.0,12


In [13]:
df_En[df_En.duplicated(keep=False)].head()

Unnamed: 0,ID,TITLE,SEASON,NUMBER_IN_SEASON,AIR DATE,YEAR,PREDICTION CATEGORY,GEOGRAPHICAL LOCATION,DIFFERENCE (YEARS),COINCIDENCE LEVEL,DESCRIPTION,PREDICTION


In [14]:
df_epi[df_epi.duplicated(keep=False)].head()

Unnamed: 0,id,image_url,IMDB_RATING,IMDB_VOTES,NUMBER_IN_SEASON,number_in_series,original_air_date,YEAR,PRODUCTION_CODE,SEASON,TITLE_EPISODES,US_VIEWERS_IN_MILLIONS,VIDEO_URL,VIEWS


In [15]:
df_epi.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 600 entries, 0 to 599
Data columns (total 14 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   id                      600 non-null    int64  
 1   image_url               596 non-null    object 
 2   IMDB_RATING             597 non-null    float64
 3   IMDB_VOTES              597 non-null    float64
 4   NUMBER_IN_SEASON        600 non-null    int64  
 5   number_in_series        600 non-null    int64  
 6   original_air_date       600 non-null    object 
 7   YEAR                    600 non-null    int64  
 8   PRODUCTION_CODE         600 non-null    object 
 9   SEASON                  600 non-null    int64  
 10  TITLE_EPISODES          600 non-null    object 
 11  US_VIEWERS_IN_MILLIONS  594 non-null    float64
 12  VIDEO_URL               596 non-null    object 
 13  VIEWS                   596 non-null    float64
dtypes: float64(4), int64(5), object(5)
memory 

In [16]:
df_En.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 59 entries, 0 to 58
Data columns (total 12 columns):
 #   Column                 Non-Null Count  Dtype 
---  ------                 --------------  ----- 
 0   ID                     59 non-null     object
 1   TITLE                  59 non-null     object
 2   SEASON                 59 non-null     object
 3   NUMBER_IN_SEASON       59 non-null     object
 4   AIR DATE               59 non-null     object
 5   YEAR                   59 non-null     int64 
 6   PREDICTION CATEGORY    59 non-null     object
 7   GEOGRAPHICAL LOCATION  59 non-null     object
 8   DIFFERENCE (YEARS)     59 non-null     int64 
 9   COINCIDENCE LEVEL      59 non-null     object
 10  DESCRIPTION            59 non-null     object
 11  PREDICTION             59 non-null     object
dtypes: int64(2), object(10)
memory usage: 5.7+ KB


>*NOTAS NORMALIZACIÓN*
1.  NUMBER_IN_SEASON CONVERTIR A FLOAT64
2.  SEASON CONVERTIR A INT64

In [17]:
df_En['NUMBER_IN_SEASON'].unique()

array([4, 9, 18, 15, 24, 16, 6, 8, 12, 10, 21, 20, 17, 14, 19, 25,
       '21 / 19', 7, 1, 3, 2, 5, 22, 'n/a'], dtype=object)

Puesto que queremos mantener la información de season con movie o short, más especifica, mantenemos como object. y creamos una nueva columna numérica

In [18]:
df_En['SEASON'].unique()

array(['S2', 'S3', 'S4', 'S5', 'S6', 'S7', 'S8', 'S9', 'S10', 'S11',
       'S13', 'S15', 'S16', 'S17', 'MOVIE', 'S18', 'S20', 'S22', 'S25',
       'S23', 'S28', 'S29', 'SHORT', 'S35'], dtype=object)

In [19]:
# LIMPIEZA NUMBER_IN_SEASON ('X / Y'; 'n/a')

df_En['NUMBER_IN_SEASON'] = (
    df_En['NUMBER_IN_SEASON']
    .astype(str)
    .str.split('/', n=1)
    .str[0]
    .str.strip())

df_En['NUMBER_IN_SEASON'] = pd.to_numeric(
    df_En['NUMBER_IN_SEASON'], 
    errors='coerce') # Convierte cualquier otro valor no numérico a NaN


# NORMALIZACIÓN
df_En['NUMBER_IN_SEASON'] = df_En['NUMBER_IN_SEASON'].astype(np.float64)

# COMPROBACIÓN
print("Tipo de dato después de la conversión")
print(df_En['NUMBER_IN_SEASON'].dtypes)

Tipo de dato después de la conversión
float64


In [20]:
# LIMPIEZA DE SEASON 

# MANTENEMOS 'SEASON' OBJECT, CREAMOS 'SEASON_NUMERICA'
df_En['SEASON_NUMERICA'] = (
    df_En['SEASON']
    .astype(str)
    .str.extract(r'(\d+)') # Extrae solo  dígitos (2 de 'S2', 23 de 'T23')
    .astype(float)
).astype(pd.Int64Dtype()) # Convertir a Int64 para la unión

# ASIGNAMOS -1 a 'MOVIE'
df_En.loc[df_En['SEASON'].str.contains('MOVIE', na=False), 'SEASON_NUMERICA'] = -1

# ASIGNAMOS -2 a 'SHORT'
df_En.loc[df_En['SEASON'].str.contains('SHORT', na=False), 'SEASON_NUMERICA'] = -2

# NORMALIZACIÓN (permitiendo enteros y NaN)
df_En['SEASON_NUMERICA'] = df_En['SEASON_NUMERICA'].astype(pd.Int64Dtype())

In [21]:
# COMPROBACIÓN
print("Tipo de dato después de la conversión")
print(df_En['SEASON_NUMERICA'].dtypes)

Tipo de dato después de la conversión
Int64


In [22]:
df_epi.columns

Index(['id', 'image_url', 'IMDB_RATING', 'IMDB_VOTES', 'NUMBER_IN_SEASON',
       'number_in_series', 'original_air_date', 'YEAR', 'PRODUCTION_CODE',
       'SEASON', 'TITLE_EPISODES', 'US_VIEWERS_IN_MILLIONS', 'VIDEO_URL',
       'VIEWS'],
      dtype='object')

In [23]:

# COLUMNAS PARA LA UNION QUE DESEAMOS DEL DATA EPISODIOS
columnas_episodios = [
    'SEASON',         
    'YEAR',           
    'NUMBER_IN_SEASON', 
    'TITLE_EPISODES', 
    'IMDB_RATING', 
    'IMDB_VOTES', 
    'PRODUCTION_CODE', 
    'US_VIEWERS_IN_MILLIONS', 
    'VIDEO_URL', 
    'VIEWS'
]
# COPIA DE SEGURIDAD
df_episodes_seleccion = df_epi[columnas_episodios].copy()
# UNIÓN POR LEFT MERGE (QUEREMOS MANTENER TODA LA INFORMACIÓN DE PREDICCIONES Y COMPLEMENTAR CON EPISODIOS)

join_keys_left = ['YEAR', 'NUMBER_IN_SEASON', 'SEASON_NUMERICA'] # Claves de Predicciones
join_keys_right = ['YEAR', 'NUMBER_IN_SEASON', 'SEASON']        # Claves de Episodios

df_final = pd.merge(
    df_En,
    df_episodes_seleccion,
    left_on=join_keys_left,
    right_on=join_keys_right,
    how='left',
    suffixes=('_PREDICTION', '_EPISODE')
)
print("\nLa unión ha sido realizada correctamente")

# LIMPIEZA FINAL Y RENOMBRAMIENTO
# renombraremos las claves que se duplicaron o conservaron.

df_final = df_final.rename(columns={ 
    # Claves conservadas que quedaron con el nombre del lado izquierdo (Predicciones)
    'YEAR_PREDICTION': 'YEAR',
    'NUMBER_IN_SEASON_PREDICTION': 'NUMBER_IN_SEASON',
    # La columna RAW de la predicción (S2, MOVIE)
    'SEASON_PREDICTION': 'SEASON_RAW_PREDICTION', 
    # La clave numérica del episodio (1, 2, 3, ...)
    'SEASON_EPISODE': 'SEASON_EPISODE_NUMERIC',
    'AIR DATE': 'AIR_DATE_PREDICTION'
})

# 4.2. ELIMINAMOS COLUMNAS INNECESARIAS
df_final = df_final.drop(columns=['SEASON_RAW_PREDICTION_PREDICTION', 'NUMBER_IN_SEASON_RAW'], errors='ignore')

df_final.head()


La unión ha sido realizada correctamente


Unnamed: 0,ID,TITLE,SEASON_RAW_PREDICTION,NUMBER_IN_SEASON,AIR_DATE_PREDICTION,YEAR,PREDICTION CATEGORY,GEOGRAPHICAL LOCATION,DIFFERENCE (YEARS),COINCIDENCE LEVEL,DESCRIPTION,PREDICTION,SEASON_NUMERICA,SEASON_EPISODE_NUMERIC,TITLE_EPISODES,IMDB_RATING,IMDB_VOTES,PRODUCTION_CODE,US_VIEWERS_IN_MILLIONS,VIDEO_URL,VIEWS
0,EN-01,two cars in every garage...,S2,4.0,1990-11-01,1990,environment,japan / south america,33,medium,blinky the three-eyed fish appears near the nuclear plant. the japanese pm eats fukushima fish.,a real three-eyed fish was found. pm kishida ate fukushima seafood (2023).,2,2.0,Two Cars in Every Garage and Three Eyes on Every Fish,8.1,1457.0,7F01,26.1,http://www.simpsonsworld.com/video/260537411822,64959.0
1,EN-02,Itchy & Scratchy & Marge,S2,9.0,1990-12-20,1990,society,us / russia,26,medium,marge calls for the censorship of michelangelo's david. the statue is covered with jeans.,controversies over david in st. petersburg (2016) and a florida school (2023).,2,2.0,Itchy & Scratchy & Marge,8.1,1402.0,7F09,22.2,http://www.simpsonsworld.com/video/260538435885,55413.0
2,EN-03,bart the murderer,S3,4.0,1991-10-10,1991,celebrity,global,3,high,it is mentioned that melanie griffith and don johnson are going to separate.,melanie griffith and don johnson separated in 1994.,3,3.0,Bart the Murderer,8.7,1446.0,8F03,20.8,http://www.simpsonsworld.com/video/288015427922,64342.0
3,EN-04,brush with greatness,S2,18.0,1991-04-11,1991,celebrity,global,22,high,ringo starr is seen responding to countless fan letters.,paul mccartney replied to a letter sent by two fans 50 years prior (1963-2013).,2,2.0,Brush with Greatness,8.0,1257.0,7F18,20.6,http://www.simpsonsworld.com/video/272055875843,58561.0
4,EN-05,"oh brother, where art thou?",S2,15.0,1991-02-21,1991,automotive,global,30,medium,"homer designs ""the homer,"" a ridiculous and futuristic car priced at $82,000.",the launch of elon musk's cybertruck.,2,2.0,"Oh Brother, Where Art Thou?",8.2,1413.0,7F16,26.8,http://www.simpsonsworld.com/video/272046659561,47426.0


In [24]:
df_final.shape

(59, 21)

#### REVISIÓN DE NULOS TRAS LA UNIÓN

Los datos proporcionados sobre los episodios y películas de Los Simpson se han verificado mediante un proceso de comprobación manual y búsqueda de información real, en lugar de utilizar métodos automatizados de imputación (como medias o estimaciones estadísticas).

* Identificación de Nulos/Faltantes: 
Cuando un campo estaba vacío, se identificaba como un dato faltante que debía rellenarse.
* Extracción de Datos Reales: Se extrajo el dato real y verificable (por ejemplo, la cifra exacta de 9.72 millones de espectadores de Nielsen o el código de producción "HABF20").
* Inclusión Rigurosa: Estos datos reales se añadieron al conjunto de información, asegurando la máxima precisión para la investigación.

In [25]:
round(df_final.isna().sum()/df_final.shape[0]*100,2)

ID                         0.00
TITLE                      0.00
SEASON_RAW_PREDICTION      0.00
NUMBER_IN_SEASON           3.39
AIR_DATE_PREDICTION        0.00
YEAR                       0.00
PREDICTION CATEGORY        0.00
GEOGRAPHICAL LOCATION      0.00
DIFFERENCE (YEARS)         0.00
COINCIDENCE LEVEL          0.00
DESCRIPTION                0.00
PREDICTION                 0.00
SEASON_NUMERICA            0.00
SEASON_EPISODE_NUMERIC    10.17
TITLE_EPISODES            10.17
IMDB_RATING               11.86
IMDB_VOTES                11.86
PRODUCTION_CODE           10.17
US_VIEWERS_IN_MILLIONS    13.56
VIDEO_URL                 11.86
VIEWS                     11.86
dtype: float64

In [26]:
# IDENTIFICAMOS NULOS
nulos_por_celda = df_final.isnull()

# FILTRO POR FILA
filas_con_nulos = nulos_por_celda.any(axis=1) # axis=1 busca a lo largo de las columnas
df_con_nulos = df_final[filas_con_nulos]

# COMPROBACIÓN
print('Filas que contienen algún valor nulo (NaN):')
df_con_nulos

Filas que contienen algún valor nulo (NaN):


Unnamed: 0,ID,TITLE,SEASON_RAW_PREDICTION,NUMBER_IN_SEASON,AIR_DATE_PREDICTION,YEAR,PREDICTION CATEGORY,GEOGRAPHICAL LOCATION,DIFFERENCE (YEARS),COINCIDENCE LEVEL,DESCRIPTION,PREDICTION,SEASON_NUMERICA,SEASON_EPISODE_NUMERIC,TITLE_EPISODES,IMDB_RATING,IMDB_VOTES,PRODUCTION_CODE,US_VIEWERS_IN_MILLIONS,VIDEO_URL,VIEWS
25,EN-26,bart after dark,S8,7.0,1996-11-24,1996,technology,global,5,high,burns' communicator is a square with a smaller black square and a circle.,the ipod is invented.,8,8.0,Lisa's Date with Density,7.8,1005.0,4F01,,http://www.simpsonsworld.com/video/306394691862,60912.0
44,EN-45,the simpsons movie,MOVIE,,2007-07-27,2007,politics,us,6,medium,a man catching a signal appears to work for the nsa. springfield is enclosed in a dome. / president schwarzenegger.,nsa spying was confirmed in 2013. the series *under the dome* came out. / politician herman cain.,-1,,,,,,,,
45,EN-46,"please homer, don't hammer 'em",S18,3.0,2008-09-21,2008,economy,us,13,high,"marge mentions that a jcpenney ""used to be here"" in a run-down mall.",the jcpenney department store chain filed for bankruptcy in 2020.,18,,,,,,,,
54,EN-55,friends and family,S28,2.0,2016-10-02,2016,technology,global,8,high,"all of springfield uses virtual reality glasses, bumping and stumbling in public.","Apple launched its Vision Pro glasses (2024), with viral videos of people using them in public.",28,28.0,"Friends and Family""[203]",,,VABF18,,,
55,EN-56,the great phatsby,S28,12.0,2017-01-15,2017,celebrity,us,7,high,homer and burns crash an all-white themed party hosted by a hip-hop musician similar to diddy.,"diddy's ""white party"" scandal (Sean Combs).",28,,,,,,,,
56,EN-57,the serfsons,S29,1.0,2017-04-30,2017,television,global,2,high,"in a *game of thrones* parody, a dragon sets the village on fire.",Daenerys Targaryen's villainous turn (2019).,29,,,,,,,,
57,EN-58,west wing story,SHORT,,2018-04-08,2018,politics,us,0,high,president joe biden and vice president kamala harris are shown as dance partners.,the scene was created before Biden and Harris were running mates.,-2,,,,,,,,
58,EN-59,it's a blunderful life,S35,7.0,2019-02-10,2019,infrastructure,chile / spain,2,medium,there is a total power outage in springfield.,there are massive power outages in Chile and Spain in 2025.,35,,,,,,,,


In [27]:
# CREAMOS FUNCION PARA RELLENAR DATAFRAME
def rellenar_datos(df, index, datos_nuevos):
    for columna, valor in datos_nuevos.items():
        df.loc[index, columna] = valor

    print(f"ID {index} actualizado.")

In [28]:
datos_EN_26 = {'US_VIEWERS_IN_MILLIONS': 7.2}

rellenar_datos(df_final, 25, datos_EN_26)

ID 25 actualizado.


In [29]:
datos_EN_45 = {
    'SEASON_EPISODE_NUMERIC': -1,
    'TITLE_EPISODES': 'The Simpsons movie',
    'PRODUCTION_CODE': 'K38',
    'US_VIEWERS_IN_MILLIONS': np.nan, 
    'IMDB_RATING': 7.3,
    'IMDB_VOTES': 260000.0,
    'VIEWS': np.nan}

rellenar_datos(df_final, 44, datos_EN_45)

ID 44 actualizado.


In [30]:
datos_EN_46 = {
    'SEASON_EPISODE_NUMERIC': 18.0,
    'TITLE_EPISODES': "Please homer, don't hammer'em",
    'PRODUCTION_CODE': 'HABF20',
    'US_VIEWERS_IN_MILLIONS': 9.72, 
    'IMDB_RATING': 7.3,
    'IMDB_VOTES': 1800.0,
    'VIEWS': 9.72}

rellenar_datos(df_final, 45, datos_EN_46)

ID 45 actualizado.


In [31]:
datos_EN_55 = {
    'TITLE_EPISODES': 'Friends and Family',
    'PRODUCTION_CODE': 'WABF04',
    'US_VIEWERS_IN_MILLIONS': 6.00, 
    'IMDB_RATING': 6.7,
    'IMDB_VOTES': 1100.0,
    'VIEWS': 6.00}

rellenar_datos(df_final, 54, datos_EN_55)

ID 54 actualizado.


In [32]:
datos_EN_56 = {
    'SEASON_EPISODE_NUMERIC': 28.0,
    'TITLE_EPISODES': 'The Great Phatsby',
    'PRODUCTION_CODE': 'WABF04',
    'US_VIEWERS_IN_MILLIONS': 3.10, 
    'IMDB_RATING': 6.9,
    'IMDB_VOTES': 1100.0,
    'VIEWS': 3.10}

rellenar_datos(df_final, 55, datos_EN_56)

ID 55 actualizado.


In [33]:
datos_EN_57 = {
    'SEASON_EPISODE_NUMERIC': 29.0,
    'TITLE_EPISODES': 'the serfsons',
    'PRODUCTION_CODE': 'WABF17',
    'US_VIEWERS_IN_MILLIONS': 3.25, 
    'IMDB_RATING': 7.1,
    'IMDB_VOTES': 1700.0,
    'VIEWS': 3.25}

rellenar_datos(df_final, 56, datos_EN_57)

ID 56 actualizado.


In [34]:
datos_EN_59 = {
    'SEASON_EPISODE_NUMERIC': 35.0,
    'TITLE_EPISODES': "It's a Blunderful Life",
    'PRODUCTION_CODE': 'OABF19',
    'US_VIEWERS_IN_MILLIONS': 1.11, 
    'IMDB_RATING': 6.4,
    'IMDB_VOTES': 700.0,
    'VIEWS': 1.11}

rellenar_datos(df_final, 58, datos_EN_59)

ID 58 actualizado.


>*ANALISIS DE NULOS EN "West Wing Story":*

No podemos rellenar los campos (US_VIEWERS_IN_MILLIONS, VIEWS, IMDB_RATING, IMDB_VOTES, PRODUCTION_CODE) porque el contenido que analizamos, el corto "West Wing Story", fue concebido para un medio de distribución diferente al de un episodio de televisión estándar o una película de cine.

*Formato y Distribución:* 
El corto se lanzó como un clip viral en redes sociales (Internet), no como una emisión de televisión tradicional. Esto hace que métricas como US_VIEWERS_IN_MILLIONS y los PRODUCTION_CODE internos de la serie sean inexistentes o inaplicables.

*Métricas Alternativas:* 
En su lugar, se usan métricas web (millones de visualizaciones combinadas en YouTube/Facebook), que son aproximadas y no cifras oficiales.

*Participación Limitada:* 
Las métricas de IMDB_VOTES son muy bajas (cientos de votos), lo que significa que la calificación de IMDb no es una medida fiable de la opinión general del público.






In [35]:
round(df_final.isna().sum()/df_final.shape[0]*100,2)

ID                         0.00
TITLE                      0.00
SEASON_RAW_PREDICTION      0.00
NUMBER_IN_SEASON           3.39
AIR_DATE_PREDICTION        0.00
YEAR                       0.00
PREDICTION CATEGORY        0.00
GEOGRAPHICAL LOCATION      0.00
DIFFERENCE (YEARS)         0.00
COINCIDENCE LEVEL          0.00
DESCRIPTION                0.00
PREDICTION                 0.00
SEASON_NUMERICA            0.00
SEASON_EPISODE_NUMERIC     1.69
TITLE_EPISODES             1.69
IMDB_RATING                1.69
IMDB_VOTES                 1.69
PRODUCTION_CODE            1.69
US_VIEWERS_IN_MILLIONS     3.39
VIDEO_URL                 11.86
VIEWS                      3.39
dtype: float64

In [36]:
# IDENTIFICAMOS NULOS
nulos_por_celda = df_final.isnull()

# FILTRO POR FILA
filas_con_nulos = nulos_por_celda.any(axis=1) # axis=1 busca a lo largo de las columnas
df_con_nulos = df_final[filas_con_nulos]

# COMPROBACIÓN
print('Filas que contienen algún valor nulo (NaN):')
df_con_nulos

Filas que contienen algún valor nulo (NaN):


Unnamed: 0,ID,TITLE,SEASON_RAW_PREDICTION,NUMBER_IN_SEASON,AIR_DATE_PREDICTION,YEAR,PREDICTION CATEGORY,GEOGRAPHICAL LOCATION,DIFFERENCE (YEARS),COINCIDENCE LEVEL,DESCRIPTION,PREDICTION,SEASON_NUMERICA,SEASON_EPISODE_NUMERIC,TITLE_EPISODES,IMDB_RATING,IMDB_VOTES,PRODUCTION_CODE,US_VIEWERS_IN_MILLIONS,VIDEO_URL,VIEWS
44,EN-45,the simpsons movie,MOVIE,,2007-07-27,2007,politics,us,6,medium,a man catching a signal appears to work for the nsa. springfield is enclosed in a dome. / president schwarzenegger.,nsa spying was confirmed in 2013. the series *under the dome* came out. / politician herman cain.,-1,-1.0,The Simpsons movie,7.3,260000.0,K38,,,
45,EN-46,"please homer, don't hammer 'em",S18,3.0,2008-09-21,2008,economy,us,13,high,"marge mentions that a jcpenney ""used to be here"" in a run-down mall.",the jcpenney department store chain filed for bankruptcy in 2020.,18,18.0,"Please homer, don't hammer'em",7.3,1800.0,HABF20,9.72,,9.72
54,EN-55,friends and family,S28,2.0,2016-10-02,2016,technology,global,8,high,"all of springfield uses virtual reality glasses, bumping and stumbling in public.","Apple launched its Vision Pro glasses (2024), with viral videos of people using them in public.",28,28.0,Friends and Family,6.7,1100.0,WABF04,6.0,,6.0
55,EN-56,the great phatsby,S28,12.0,2017-01-15,2017,celebrity,us,7,high,homer and burns crash an all-white themed party hosted by a hip-hop musician similar to diddy.,"diddy's ""white party"" scandal (Sean Combs).",28,28.0,The Great Phatsby,6.9,1100.0,WABF04,3.1,,3.1
56,EN-57,the serfsons,S29,1.0,2017-04-30,2017,television,global,2,high,"in a *game of thrones* parody, a dragon sets the village on fire.",Daenerys Targaryen's villainous turn (2019).,29,29.0,the serfsons,7.1,1700.0,WABF17,3.25,,3.25
57,EN-58,west wing story,SHORT,,2018-04-08,2018,politics,us,0,high,president joe biden and vice president kamala harris are shown as dance partners.,the scene was created before Biden and Harris were running mates.,-2,,,,,,,,
58,EN-59,it's a blunderful life,S35,7.0,2019-02-10,2019,infrastructure,chile / spain,2,medium,there is a total power outage in springfield.,there are massive power outages in Chile and Spain in 2025.,35,35.0,It's a Blunderful Life,6.4,700.0,OABF19,1.11,,1.11


#### ORDEN DE COLUMNAS Y CREACION DE CSV

AHORA VAMOS A DEFINIR UN NUEVO ORDEN DE LAS COLUMNAS MÁS COHERENTE Y PRESCINDIMOS DE TITLE POR TITLE_EPISODES

In [37]:
# NECESITAMOS TRASLADAR LOS DATOS DE 'TITLE' a 'TITLE_EPISODES' PARA COMPLETAR NULOS NO REALES
if 'TITLE' in df_final.columns:
    df_final['TITLE_EPISODES'] = df_final['TITLE_EPISODES'].fillna(df_final['TITLE'])
# NUEVO ORDEN

nuevo_orden = [
    # DATOS OFICIALES DEL EPISODIO
    'ID',                    # ID único de la predicción
    'TITLE_EPISODES',        # Nombre Oficial (el que conservamos)
    'SEASON_EPISODE_NUMERIC',# Clave 3 - Temporada Oficial (del lado de Episodios)
    'NUMBER_IN_SEASON',      # Clave 2 - Número de Episodio
    'SEASON_NUMERICA',       # Clave 3 - Temporada numérica de la Predicción (incluye -1, -2)
    'SEASON_RAW_PREDICTION', # Temporada Bruta (e.g., S2, MOVIE)
    'AIR_DATE_PREDICTION',   # Fecha de emisión de la predicción 
    'YEAR',                  # Año
    'IMDB_RATING',           # Rating
    'IMDB_VOTES',            # Votos
    'US_VIEWERS_IN_MILLIONS',# Audiencia TV
    'VIEWS',                 # Audiencia Digital
    'PRODUCTION_CODE',       # Código de producción
   
    # DATOS DE PREDICCIÓN Y RESULTADOS
    'PREDICTION CATEGORY',   # Categoría (Política, Tecnología, etc.)
    'GEOGRAPHICAL LOCATION', # Ubicación
    'COINCIDENCE LEVEL',     # Nivel de coincidencia
    'DIFFERENCE (YEARS)',    # Años de diferencia
    'DESCRIPTION',           # Escena del episodio
    'PREDICTION',             # Evento real
    'VIDEO_URL'             # URL del vídeo
]

# APLICAMOS
df_final = df_final[nuevo_orden].copy()

print("Las columnas han sido reordenadas con éxito:")
print(list(df_final.columns))

#COMPROBAMOS NULOS

nulos_titulo = df_final['TITLE_EPISODES'].isnull().sum()
print(f"Nulos restantes en TITLE_EPISODES: {nulos_titulo}")

Las columnas han sido reordenadas con éxito:
['ID', 'TITLE_EPISODES', 'SEASON_EPISODE_NUMERIC', 'NUMBER_IN_SEASON', 'SEASON_NUMERICA', 'SEASON_RAW_PREDICTION', 'AIR_DATE_PREDICTION', 'YEAR', 'IMDB_RATING', 'IMDB_VOTES', 'US_VIEWERS_IN_MILLIONS', 'VIEWS', 'PRODUCTION_CODE', 'PREDICTION CATEGORY', 'GEOGRAPHICAL LOCATION', 'COINCIDENCE LEVEL', 'DIFFERENCE (YEARS)', 'DESCRIPTION', 'PREDICTION', 'VIDEO_URL']
Nulos restantes en TITLE_EPISODES: 0


In [42]:
round(df_final.isna().sum()/df_final.shape[0]*100,2)

ID                         0.00
TITLE_EPISODES             0.00
SEASON_EPISODE_NUMERIC     1.69
NUMBER_IN_SEASON           3.39
SEASON_NUMERICA            0.00
SEASON_RAW_PREDICTION      0.00
AIR_DATE_PREDICTION        0.00
YEAR                       0.00
IMDB_RATING                1.69
IMDB_VOTES                 1.69
US_VIEWERS_IN_MILLIONS     3.39
VIEWS                      3.39
PRODUCTION_CODE            1.69
PREDICTION CATEGORY        0.00
GEOGRAPHICAL LOCATION      0.00
COINCIDENCE LEVEL          0.00
DIFFERENCE (YEARS)         0.00
DESCRIPTION                0.00
PREDICTION                 0.00
VIDEO_URL                 11.86
dtype: float64

In [38]:
df_final

Unnamed: 0,ID,TITLE_EPISODES,SEASON_EPISODE_NUMERIC,NUMBER_IN_SEASON,SEASON_NUMERICA,SEASON_RAW_PREDICTION,AIR_DATE_PREDICTION,YEAR,IMDB_RATING,IMDB_VOTES,US_VIEWERS_IN_MILLIONS,VIEWS,PRODUCTION_CODE,PREDICTION CATEGORY,GEOGRAPHICAL LOCATION,COINCIDENCE LEVEL,DIFFERENCE (YEARS),DESCRIPTION,PREDICTION,VIDEO_URL
0,EN-01,Two Cars in Every Garage and Three Eyes on Every Fish,2.0,4.0,2,S2,1990-11-01,1990,8.1,1457.0,26.1,64959.0,7F01,environment,japan / south america,medium,33,blinky the three-eyed fish appears near the nuclear plant. the japanese pm eats fukushima fish.,a real three-eyed fish was found. pm kishida ate fukushima seafood (2023).,http://www.simpsonsworld.com/video/260537411822
1,EN-02,Itchy & Scratchy & Marge,2.0,9.0,2,S2,1990-12-20,1990,8.1,1402.0,22.2,55413.0,7F09,society,us / russia,medium,26,marge calls for the censorship of michelangelo's david. the statue is covered with jeans.,controversies over david in st. petersburg (2016) and a florida school (2023).,http://www.simpsonsworld.com/video/260538435885
2,EN-03,Bart the Murderer,3.0,4.0,3,S3,1991-10-10,1991,8.7,1446.0,20.8,64342.0,8F03,celebrity,global,high,3,it is mentioned that melanie griffith and don johnson are going to separate.,melanie griffith and don johnson separated in 1994.,http://www.simpsonsworld.com/video/288015427922
3,EN-04,Brush with Greatness,2.0,18.0,2,S2,1991-04-11,1991,8.0,1257.0,20.6,58561.0,7F18,celebrity,global,high,22,ringo starr is seen responding to countless fan letters.,paul mccartney replied to a letter sent by two fans 50 years prior (1963-2013).,http://www.simpsonsworld.com/video/272055875843
4,EN-05,"Oh Brother, Where Art Thou?",2.0,15.0,2,S2,1991-02-21,1991,8.2,1413.0,26.8,47426.0,7F16,automotive,global,medium,30,"homer designs ""the homer,"" a ridiculous and futuristic car priced at $82,000.",the launch of elon musk's cybertruck.,http://www.simpsonsworld.com/video/272046659561
5,EN-06,"Brother, Can You Spare Two Dimes?",3.0,24.0,3,S3,1992-08-27,1992,8.2,1227.0,17.2,50936.0,8F23,technology,global,medium,20,herb invents a device capable of decoding babies' babbling.,apps like cry translator and zoundream exist today.,http://www.simpsonsworld.com/video/279098435798
6,EN-07,Bart the Lover,3.0,16.0,3,S3,1992-02-20,1992,8.3,1272.0,20.5,53123.0,8F16,sports,us,high,1,"mr. burns benches don mattingly for his ""rebellious sideburns."".",the yankees manager benched don mattingly for not cutting his hair (1993).,http://www.simpsonsworld.com/video/272228931984
7,EN-08,Itchy & Scratchy: The Movie,4.0,6.0,4,S4,1992-11-03,1992,8.2,1293.0,20.1,55740.0,9F03,advertising,new zealand,medium,11,"a billboard is installed with scratchy being decapitated, and blood splashing outside the sign.",a billboard for *kill bill* showed the character's blood splashing outside the sign in new zealand.,http://www.simpsonsworld.com/video/278803011830
8,EN-09,New Kid on the Block,4.0,8.0,4,S4,1992-11-12,1992,8.2,1240.0,23.1,54557.0,9F06,legal,us (massachusetts),high,21,homer sues an all-you-can-eat restaurant for deceptive advertising after being kicked out.,"a man from springfield, ma, sued golden corral for deceptive advertising after being expelled.",http://www.simpsonsworld.com/video/273542211871
9,EN-10,Marge vs. the Monorail,4.0,12.0,4,S4,1993-01-14,1993,9.0,2028.0,23.0,88171.0,9F10,history,us (new york),low,8,a picture shows two towers with a destroyed plane (coincidence visual).,"september 11, 2001, attacks.",http://www.simpsonsworld.com/video/306386499796


In [39]:
df_final.shape

(59, 20)

EXPORTAMOS CSV

In [41]:
archivo = 'Thesimpsons_Predictions.csv'

df_final.to_csv(archivo, sep=',', index=False, encoding='utf-8')
                                  
print(f"DataFrame exportado con éxito a '{archivo}'")

DataFrame exportado con éxito a 'Thesimpsons_Predictions.csv'
