# Webscraping

In [1]:
import pandas as pd
from bs4 import BeautifulSoup
import requests

In [2]:
url = 'http://rickandmorty.newtfire.org/transcripts.html'
response = requests.get(url)
response.status_code

200

Después de importar las librearias necesarias, se realizó la solicitud a la página web la cual contenía los guiones a usarse y se comprobó su estatus.

# Extraccion del primer personaje y dialogo

In [3]:
scripts = BeautifulSoup(response.content)
speaker_1 = scripts.find('span', class_='speaker')
print(speaker_1.text)

Rick


In [4]:
speech_1 = scripts.find('span', class_='speech')
print(speech_1.text)


stumbles in drunkenly, and turns on the lights. Morty! You gotta
                    come on. Jus'... you gotta come with me.


In [5]:
texto_completo = speaker_1.text + "" + speech_1.text
print(texto_completo)

Rick
stumbles in drunkenly, and turns on the lights. Morty! You gotta
                    come on. Jus'... you gotta come with me.


Utilizando la biblioteca Beautiful Soup, se llevó a cabo una búsqueda para identificar al primer personaje con diálogo (speaker) y recuperar su discurso (speech).

# Dialogos y Personajes

Después de verificar tanto el 'speaker' como el 'speech', se procedió a buscar todos los elementos en la página web.

In [6]:
speaker = scripts.find_all('span', class_='speaker')
speech = scripts.find_all('span', class_='speech')

for speaker, speech in zip(speaker, speech):
    print(speaker.text)
    print(speech.text)

Rick

stumbles in drunkenly, and turns on the lights. Morty! You gotta
                    come on. Jus'... you gotta come with me.
Morty

rubs his eyes. What, Rick? What’s going on?
Rick
 I got a surprise for you, Morty.
Morty
 It's the middle of the night. What are you talking about?
Rick

spills alcohol on Morty's bed Come on, I got a surprise for you. drags
                        Morty by the ankle Come on, hurry up. pulls Morty out of his bed and into
                        the hall.

Morty
 Ow! Ow! You're tugging me too hard!
Rick
 We gotta go, gotta get outta here, come on. Got a surprise for you Morty.
Rick

Rick drives through the night sky. What do you think of this...
                    flying vehicle, Morty? I built it outta stuff I found in the garage.
Morty
 Yeah, Rick... I-it's great. Is this the surprise?
Rick
 Morty. I had to... I had to do it. I had— I had to— I had to make a bomb, Morty.
                    I had to create a bomb.
Morty
 What?! A bomb?!
Rick
 We'r

# Dataframes

Una vez que se extrajeron todos los nombres y se recopilaron en una lista, se creó un DataFrame de Pandas.

**Creación del dataframe**

In [7]:
e1_speakers = scripts.find_all('span', class_='speaker')

speakers_list = []  # Lista para almacenar los textos de los oradores

for speaker in e1_speakers:
    speaker_text = speaker.text
    speakers_list.append(speaker_text)  # Agregar el texto del orador a la lista

# Crear un DataFrame de pandas a partir de la lista
df_characters = pd.DataFrame({'Characters': speakers_list})

In [8]:
df_characters

Unnamed: 0,Characters
0,Rick
1,Morty
2,Rick
3,Morty
4,Rick
...,...
7847,Beth (on Device)
7848,Jerry (on Device)
7849,Rick
7850,Summer


**Exploración del Dataframe**

In [9]:
df_characters.Characters.unique()

array(['Rick', 'Morty', 'Robot Voice', 'Jerry', 'Summer', 'Beth',
       'Mr. Goldenfold', 'All classmates except Morty', 'Class',
       'Jessica', 'Frank', 'Davin', 'Tom', 'Principal Vagina',
       'Announcer', 'Alien', 'Gromflomite', 'Gromfomite', 'Glenn',
       'Other Gromflomite', 'Ruben', 'Leonard', 'Joyce',
       'Beth, Summer, and Morty', 'Jacob', 'Automated voice', 'Poncho',
       'Dr. Bloom', 'Roger', 'Alexander', 'Annie', 'Animatronics',
       'Ethan', 'Animatronic Ruben', 'Reporter', 'Eric', 'Reporter on TV',
       'Alejandro', 'All', 'TV', 'Mrs. Pancakes', 'Commercial Announcer',
       'Mr. Pancakes', 'Mr.Goldenfold', 'Beth ', 'Snuffles',
       'Plane Passengers', 'Plane Passenger', 'Sexualized S&M Monster',
       'Giant Frog Woman', 'Centaur', 'Scary Terry', 'Little Girl',
       'Scary Melissa', 'Monster Teacher',
       "Monster Teacher: Oh, come on, Terry, you can't think of a pun involving pumpkins, bitch? Morty",
       'Dog 1', 'Bill', 'Dr. Dog', 'Accountan

In [10]:
df_characters.Characters.value_counts()

Characters
                                                                                                      1964
Rick                                                                                                  1268
Morty                                                                                                  930
Jerry                                                                                                  648
Beth                                                                                                   452
                                                                                                      ... 
Hammer Morty                                                                                             1
All religious Mortys                                                                                     1
Doofus Rick: Okay, if we add a little more titanium nitrate, and just a tad of chlorified tartrate       1
Rick 1 and 2              

**Limpieza del Dataframe**

In [11]:
df_characters = df_characters.drop_duplicates()

In [12]:
df_characters.value_counts()

Characters          
                        1
Police officer          1
Police Cheif (on TV)    1
Plutonians              1
Plutonian woman         1
                       ..
Froopies                1
Frankenstein            1
Frank                   1
Flippynips              1
regular Legs (on TV)    1
Name: count, Length: 520, dtype: int64

In [13]:
df_characters.loc[:, 'Characters'] = df_characters['Characters'].str.split('\(', n=1).str[0]
df_characters

Unnamed: 0,Characters
0,Rick
1,Morty
26,Robot Voice
39,Jerry
40,Summer
...,...
7828,Business Man
7839,Jerry
7840,News Anchor
7847,Beth


**Exportación del dataframe a un archivo .csv**

In [14]:
df_characters.to_csv('characters.csv', sep=';', index=True, encoding='utf-8', decimal=',')

# Guiones

Se guardan los guiones en un archivo de texto.

In [15]:
file = open('scripts.txt', 'w', encoding='utf-8')  # Cambiar la codificación a 'utf-8'

speakers = scripts.find_all('span', class_='speaker')
speeches = scripts.find_all('span', class_='speech')

for speaker, speech in zip(speakers, speeches):
    soup_script = f"{speaker.text}: {speech.text}\n"
    print(soup_script)
    file.write(soup_script)

file.close()

Rick: 
stumbles in drunkenly, and turns on the lights. Morty! You gotta
                    come on. Jus'... you gotta come with me.

Morty: 
rubs his eyes. What, Rick? What’s going on?

Rick:  I got a surprise for you, Morty.

Morty:  It's the middle of the night. What are you talking about?

Rick: 
spills alcohol on Morty's bed Come on, I got a surprise for you. drags
                        Morty by the ankle Come on, hurry up. pulls Morty out of his bed and into
                        the hall.


Morty:  Ow! Ow! You're tugging me too hard!

Rick:  We gotta go, gotta get outta here, come on. Got a surprise for you Morty.

Rick: 
Rick drives through the night sky. What do you think of this...
                    flying vehicle, Morty? I built it outta stuff I found in the garage.

Morty:  Yeah, Rick... I-it's great. Is this the surprise?

Rick:  Morty. I had to... I had to do it. I had— I had to— I had to make a bomb, Morty.
                    I had to create a bomb.

Morty:  What?

# FIN