# Introducción
Durante el siguiente notebook presentaremos un sistema de recomendación basado en contenido focalizado sobre series y películas. La estructura del notebook será la siguiente:
* Recoger y homogenizar los datos
* Aplicar el algorimo de recomendación
* Entrenar un algortimo de clasificación de sentimientos
* Aplicar el algortimo sobre datos generados por usuarios
* Establecer un sistema de valoración manual
* Aplicar un algoritmo de recomendación basado en filtrado colaborativo (sin implementar)
* Instalar y lanzar el servidor de flask con la aplicación

### Autores
* Eliana Patricia Aray Cappello
* Iria Martinez Alvarez
* Antonio Cebrerio Bernardez
* Brais Fontan Costas


#**Recogida y tratado de los datos**
##Carga de datos
Haremos uso del archivo All_Streaming_Show.csv obtenido de la siguiente [página](https://www.kaggle.com/amritvirsinghx/web-series-ultimate-edition).

Para empezar el trabajo con estos datos, cargamos el csv en una variable pandas



In [None]:
!wget https://github.com/ElBley/ABP/raw/main/All_Streaming_Shows.csv

--2021-01-28 17:18:24--  https://github.com/ElBley/ABP/raw/main/All_Streaming_Shows.csv
Resolving github.com (github.com)... 140.82.113.4
Connecting to github.com (github.com)|140.82.113.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/ElBley/ABP/main/All_Streaming_Shows.csv [following]
--2021-01-28 17:18:25--  https://raw.githubusercontent.com/ElBley/ABP/main/All_Streaming_Shows.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 9095510 (8.7M) [text/plain]
Saving to: ‘All_Streaming_Shows.csv’


2021-01-28 17:18:26 (29.2 MB/s) - ‘All_Streaming_Shows.csv’ saved [9095510/9095510]



In [None]:
import pandas
data = pandas.read_csv('All_Streaming_Shows.csv')

##Uso del campo 'Description'
Como vamos a utilizar la descripción para aplicar el algoritmo de recomendación, las series sin descripción no nos son útiles, por lo que las borramos, y reseteamos el índice para poder manipular las filas más adelante.

In [None]:
data = data[data['Description'] != '-1']
data = data.reset_index()
data

Unnamed: 0,index,Series Title,Year Released,Content Rating,IMDB Rating,R Rating,Genre,Description,No of Seasons,Streaming Platform
0,0,Breaking Bad,2008,18+,9.5,100,"Crime,Drama","When Walter White, a New Mexico chemistry teac...",5Seasons,Netflix
1,1,Game of Thrones,2011,18+,9.3,99,"Action & Adventure,Drama",Seven noble families fight for control of the ...,8Seasons,"HBO MAX,HBO"
2,2,Rick and Morty,2013,18+,9.2,97,"Animation,Comedy",Rick is a mentally-unbalanced but scientifical...,4Seasons,"Free Services,HBO MAX,Hulu"
3,3,Stranger Things,2016,16+,8.8,96,"Drama,Fantasy","When a young boy vanishes, a small town uncove...",3Seasons,Netflix
4,4,The Boys,2019,18+,8.7,95,"Action & Adventure,Comedy",A group of vigilantes known informally as “The...,2Seasons,Prime Video
...,...,...,...,...,...,...,...,...,...,...
11869,12344,"Stop, Breathe & Think Kids: Mindful Games",2017,,,-1,"2017,Hulu",Mindfulness made easy and fun for kids. Discov...,1Season,Hulu
11870,12348,A Fishing Story with Ronnie Green,2017,,,-1,"2017,Prime Video",A Fishing Story with Ronnie Green has one or m...,2Seasons,"Prime Video,fuboTV"
11871,12350,NHL Road to the Outdoor Classics,2016,,,-1,"2016,Prime Video",Road to the NHL Outdoor Classics takes us deep...,1Season,"Prime Video,Epix"
11872,12351,Addy Media,2018,,,-1,"2018,Prime Video",Addy Media has one or more episodes streaming ...,1Season,Prime Video


##Formato de 'Content Rating'
Vemos que para el tipo de dato 'Content Rating' existen varios posibles valores incluido el valor "nan". Ya que esto es un dato nulo, vamos a asumir que todas las series con valor "nan" son para todos los públicos, y vamos a susituirlo por el valor "all".

In [None]:
data['Content Rating'].unique()

array(['18+', '16+', '7+', 'all', nan, '13+'], dtype=object)

In [None]:
data['Content Rating'] = data['Content Rating'].fillna('all')
data

Unnamed: 0,index,Series Title,Year Released,Content Rating,IMDB Rating,R Rating,Genre,Description,No of Seasons,Streaming Platform
0,0,Breaking Bad,2008,18+,9.5,100,"Crime,Drama","When Walter White, a New Mexico chemistry teac...",5Seasons,Netflix
1,1,Game of Thrones,2011,18+,9.3,99,"Action & Adventure,Drama",Seven noble families fight for control of the ...,8Seasons,"HBO MAX,HBO"
2,2,Rick and Morty,2013,18+,9.2,97,"Animation,Comedy",Rick is a mentally-unbalanced but scientifical...,4Seasons,"Free Services,HBO MAX,Hulu"
3,3,Stranger Things,2016,16+,8.8,96,"Drama,Fantasy","When a young boy vanishes, a small town uncove...",3Seasons,Netflix
4,4,The Boys,2019,18+,8.7,95,"Action & Adventure,Comedy",A group of vigilantes known informally as “The...,2Seasons,Prime Video
...,...,...,...,...,...,...,...,...,...,...
11869,12344,"Stop, Breathe & Think Kids: Mindful Games",2017,all,,-1,"2017,Hulu",Mindfulness made easy and fun for kids. Discov...,1Season,Hulu
11870,12348,A Fishing Story with Ronnie Green,2017,all,,-1,"2017,Prime Video",A Fishing Story with Ronnie Green has one or m...,2Seasons,"Prime Video,fuboTV"
11871,12350,NHL Road to the Outdoor Classics,2016,all,,-1,"2016,Prime Video",Road to the NHL Outdoor Classics takes us deep...,1Season,"Prime Video,Epix"
11872,12351,Addy Media,2018,all,,-1,"2018,Prime Video",Addy Media has one or more episodes streaming ...,1Season,Prime Video


##Campo 'No of Seasons' 
El valor de 'No of Seasons' es una cadena de texto, nos interesa más convertirla en un valor entero, por lo que iteramos por todo el dataset, convirtiendo cada valor en un entero en una nueva columna. Para esta operación necesitamos que los índices sean consecutivos.

In [None]:
import re
p = re.compile('-?[0-9]+')
var = 0
data['Number of Seasons'] = 0
for row in data.itertuples():
  data['Number of Seasons'][var] = int(p.match(row[9]).group())
  var +=1
print(var)
data


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


11874


Unnamed: 0,index,Series Title,Year Released,Content Rating,IMDB Rating,R Rating,Genre,Description,No of Seasons,Streaming Platform,Number of Seasons
0,0,Breaking Bad,2008,18+,9.5,100,"Crime,Drama","When Walter White, a New Mexico chemistry teac...",5Seasons,Netflix,5
1,1,Game of Thrones,2011,18+,9.3,99,"Action & Adventure,Drama",Seven noble families fight for control of the ...,8Seasons,"HBO MAX,HBO",8
2,2,Rick and Morty,2013,18+,9.2,97,"Animation,Comedy",Rick is a mentally-unbalanced but scientifical...,4Seasons,"Free Services,HBO MAX,Hulu",4
3,3,Stranger Things,2016,16+,8.8,96,"Drama,Fantasy","When a young boy vanishes, a small town uncove...",3Seasons,Netflix,3
4,4,The Boys,2019,18+,8.7,95,"Action & Adventure,Comedy",A group of vigilantes known informally as “The...,2Seasons,Prime Video,2
...,...,...,...,...,...,...,...,...,...,...,...
11869,12344,"Stop, Breathe & Think Kids: Mindful Games",2017,all,,-1,"2017,Hulu",Mindfulness made easy and fun for kids. Discov...,1Season,Hulu,1
11870,12348,A Fishing Story with Ronnie Green,2017,all,,-1,"2017,Prime Video",A Fishing Story with Ronnie Green has one or m...,2Seasons,"Prime Video,fuboTV",2
11871,12350,NHL Road to the Outdoor Classics,2016,all,,-1,"2016,Prime Video",Road to the NHL Outdoor Classics takes us deep...,1Season,"Prime Video,Epix",1
11872,12351,Addy Media,2018,all,,-1,"2018,Prime Video",Addy Media has one or more episodes streaming ...,1Season,Prime Video,1


##Media entre puntuaciones
Realizaremos una media con los datos de puntuación de imdb y R. Como existen datos NaN para el IMDB tenemos que normalizarlos, en este caso lo ponemos a -1 ya que es el valor que tiene R rating para sus vacíos. Como las columnas de imdb y r rating tienen distinta escala, tenemos que ponerlos en la misma, en este caso sobre 100 puntos. Una vez tenemos las columnas en el mismo formato, podemos generar una nueva columna con la media de ambas. Esta nueva columna se llamará 'Mean Rating'.

In [None]:
data['IMDB Rating'] = data['IMDB Rating'] * 10
data['IMDB Rating'] = data['IMDB Rating'].fillna(-1)
data['Mean Rating'] = data[['IMDB Rating', 'R Rating']].mean(axis=1)
data

Unnamed: 0,index,Series Title,Year Released,Content Rating,IMDB Rating,R Rating,Genre,Description,No of Seasons,Streaming Platform,Number of Seasons,Mean Rating
0,0,Breaking Bad,2008,18+,95.0,100,"Crime,Drama","When Walter White, a New Mexico chemistry teac...",5Seasons,Netflix,5,97.5
1,1,Game of Thrones,2011,18+,93.0,99,"Action & Adventure,Drama",Seven noble families fight for control of the ...,8Seasons,"HBO MAX,HBO",8,96.0
2,2,Rick and Morty,2013,18+,92.0,97,"Animation,Comedy",Rick is a mentally-unbalanced but scientifical...,4Seasons,"Free Services,HBO MAX,Hulu",4,94.5
3,3,Stranger Things,2016,16+,88.0,96,"Drama,Fantasy","When a young boy vanishes, a small town uncove...",3Seasons,Netflix,3,92.0
4,4,The Boys,2019,18+,87.0,95,"Action & Adventure,Comedy",A group of vigilantes known informally as “The...,2Seasons,Prime Video,2,91.0
...,...,...,...,...,...,...,...,...,...,...,...,...
11869,12344,"Stop, Breathe & Think Kids: Mindful Games",2017,all,-1.0,-1,"2017,Hulu",Mindfulness made easy and fun for kids. Discov...,1Season,Hulu,1,-1.0
11870,12348,A Fishing Story with Ronnie Green,2017,all,-1.0,-1,"2017,Prime Video",A Fishing Story with Ronnie Green has one or m...,2Seasons,"Prime Video,fuboTV",2,-1.0
11871,12350,NHL Road to the Outdoor Classics,2016,all,-1.0,-1,"2016,Prime Video",Road to the NHL Outdoor Classics takes us deep...,1Season,"Prime Video,Epix",1,-1.0
11872,12351,Addy Media,2018,all,-1.0,-1,"2018,Prime Video",Addy Media has one or more episodes streaming ...,1Season,Prime Video,1,-1.0


##Campo 'Streaming Platform'
El formato de la columna Streaming Platform tiene muchas opciones, por ello simplemente vamos a dejar como plataformas posibles:

    -Amazon Prime Video
    -Netflix
    -HBO
    -Others, donde se ubicarán el resto


Primero creamos las columnas correspondientes a estos atributos dummies:





In [None]:
data.insert(9,'Others',0)
data.insert(9,'Prime Video',0)
data.insert(9,'HBO',0)
data.insert(9,'Netflix',0)

data

Unnamed: 0,index,Series Title,Year Released,Content Rating,IMDB Rating,R Rating,Genre,Description,No of Seasons,Netflix,HBO,Prime Video,Others,Streaming Platform,Number of Seasons,Mean Rating
0,0,Breaking Bad,2008,18+,95.0,100,"Crime,Drama","When Walter White, a New Mexico chemistry teac...",5Seasons,0,0,0,0,Netflix,5,97.5
1,1,Game of Thrones,2011,18+,93.0,99,"Action & Adventure,Drama",Seven noble families fight for control of the ...,8Seasons,0,0,0,0,"HBO MAX,HBO",8,96.0
2,2,Rick and Morty,2013,18+,92.0,97,"Animation,Comedy",Rick is a mentally-unbalanced but scientifical...,4Seasons,0,0,0,0,"Free Services,HBO MAX,Hulu",4,94.5
3,3,Stranger Things,2016,16+,88.0,96,"Drama,Fantasy","When a young boy vanishes, a small town uncove...",3Seasons,0,0,0,0,Netflix,3,92.0
4,4,The Boys,2019,18+,87.0,95,"Action & Adventure,Comedy",A group of vigilantes known informally as “The...,2Seasons,0,0,0,0,Prime Video,2,91.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
11869,12344,"Stop, Breathe & Think Kids: Mindful Games",2017,all,-1.0,-1,"2017,Hulu",Mindfulness made easy and fun for kids. Discov...,1Season,0,0,0,0,Hulu,1,-1.0
11870,12348,A Fishing Story with Ronnie Green,2017,all,-1.0,-1,"2017,Prime Video",A Fishing Story with Ronnie Green has one or m...,2Seasons,0,0,0,0,"Prime Video,fuboTV",2,-1.0
11871,12350,NHL Road to the Outdoor Classics,2016,all,-1.0,-1,"2016,Prime Video",Road to the NHL Outdoor Classics takes us deep...,1Season,0,0,0,0,"Prime Video,Epix",1,-1.0
11872,12351,Addy Media,2018,all,-1.0,-1,"2018,Prime Video",Addy Media has one or more episodes streaming ...,1Season,0,0,0,0,Prime Video,1,-1.0


Posteriormente transformamos los "-1" que encontremos en "Not Found" para que podamos trabajar con Strings solamente.

In [None]:
data['Streaming Platform'] = data['Streaming Platform'].fillna("Not Found")
data1 = data[data["Streaming Platform"] == "-1"]
data1["Streaming Platform"] = "Not Found" 
data[data["Streaming Platform"] == "-1"] = data1
data

Unnamed: 0,index,Series Title,Year Released,Content Rating,IMDB Rating,R Rating,Genre,Description,No of Seasons,Netflix,HBO,Prime Video,Others,Streaming Platform,Number of Seasons,Mean Rating
0,0,Breaking Bad,2008,18+,95.0,100,"Crime,Drama","When Walter White, a New Mexico chemistry teac...",5Seasons,0,0,0,0,Netflix,5,97.5
1,1,Game of Thrones,2011,18+,93.0,99,"Action & Adventure,Drama",Seven noble families fight for control of the ...,8Seasons,0,0,0,0,"HBO MAX,HBO",8,96.0
2,2,Rick and Morty,2013,18+,92.0,97,"Animation,Comedy",Rick is a mentally-unbalanced but scientifical...,4Seasons,0,0,0,0,"Free Services,HBO MAX,Hulu",4,94.5
3,3,Stranger Things,2016,16+,88.0,96,"Drama,Fantasy","When a young boy vanishes, a small town uncove...",3Seasons,0,0,0,0,Netflix,3,92.0
4,4,The Boys,2019,18+,87.0,95,"Action & Adventure,Comedy",A group of vigilantes known informally as “The...,2Seasons,0,0,0,0,Prime Video,2,91.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
11869,12344,"Stop, Breathe & Think Kids: Mindful Games",2017,all,-1.0,-1,"2017,Hulu",Mindfulness made easy and fun for kids. Discov...,1Season,0,0,0,0,Hulu,1,-1.0
11870,12348,A Fishing Story with Ronnie Green,2017,all,-1.0,-1,"2017,Prime Video",A Fishing Story with Ronnie Green has one or m...,2Seasons,0,0,0,0,"Prime Video,fuboTV",2,-1.0
11871,12350,NHL Road to the Outdoor Classics,2016,all,-1.0,-1,"2016,Prime Video",Road to the NHL Outdoor Classics takes us deep...,1Season,0,0,0,0,"Prime Video,Epix",1,-1.0
11872,12351,Addy Media,2018,all,-1.0,-1,"2018,Prime Video",Addy Media has one or more episodes streaming ...,1Season,0,0,0,0,Prime Video,1,-1.0


Por último recoreremos la columna que contiene la información de las plataformas que ofrecen cada serie y si coincide con alguna de nuestras plataformas elegidas le pondremos un 1 en su variable dummie correspondiente.

In [None]:
aux = data["Streaming Platform"]
for i in range(aux.size):
      if 'Netflix' in aux[i]:
          data['Netflix'][i] = 1
      if 'Prime Video' in aux[i]:
          data['Prime Video'][i] = 1
      if 'HBO' in aux[i]:
          data['HBO'][i] = 1
      if(not('Netflix' in aux[i])
         and not('Prime Video' in aux[i])
         and not('HBO' in aux[i])):
          data['Others'][i] = 1

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  after removing the cwd from sys.path.
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  if sys.path[0] == '':


##Finalización de trabajo con los datos.
Ya tenemos los datos listos para alimentar el algoritmo.

In [None]:
data

Unnamed: 0,index,Series Title,Year Released,Content Rating,IMDB Rating,R Rating,Genre,Description,No of Seasons,Netflix,HBO,Prime Video,Others,Streaming Platform,Number of Seasons,Mean Rating
0,0,Breaking Bad,2008,18+,95.0,100,"Crime,Drama","When Walter White, a New Mexico chemistry teac...",5Seasons,1,0,0,0,Netflix,5,97.5
1,1,Game of Thrones,2011,18+,93.0,99,"Action & Adventure,Drama",Seven noble families fight for control of the ...,8Seasons,0,1,0,0,"HBO MAX,HBO",8,96.0
2,2,Rick and Morty,2013,18+,92.0,97,"Animation,Comedy",Rick is a mentally-unbalanced but scientifical...,4Seasons,0,1,0,0,"Free Services,HBO MAX,Hulu",4,94.5
3,3,Stranger Things,2016,16+,88.0,96,"Drama,Fantasy","When a young boy vanishes, a small town uncove...",3Seasons,1,0,0,0,Netflix,3,92.0
4,4,The Boys,2019,18+,87.0,95,"Action & Adventure,Comedy",A group of vigilantes known informally as “The...,2Seasons,0,0,1,0,Prime Video,2,91.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
11869,12344,"Stop, Breathe & Think Kids: Mindful Games",2017,all,-1.0,-1,"2017,Hulu",Mindfulness made easy and fun for kids. Discov...,1Season,0,0,0,1,Hulu,1,-1.0
11870,12348,A Fishing Story with Ronnie Green,2017,all,-1.0,-1,"2017,Prime Video",A Fishing Story with Ronnie Green has one or m...,2Seasons,0,0,1,0,"Prime Video,fuboTV",2,-1.0
11871,12350,NHL Road to the Outdoor Classics,2016,all,-1.0,-1,"2016,Prime Video",Road to the NHL Outdoor Classics takes us deep...,1Season,0,0,1,0,"Prime Video,Epix",1,-1.0
11872,12351,Addy Media,2018,all,-1.0,-1,"2018,Prime Video",Addy Media has one or more episodes streaming ...,1Season,0,0,1,0,Prime Video,1,-1.0


#**Recomendador basado en contenido**
##Bag of Words
Para adaptarnos a un formato de entrada válido para nuestro algoritmo  utilizamos "bag-of-words". Este tipo de representación se encarga de transformar nuestros datos en un vector de frecuencias(BoW) del mismo tamaño que las palabras representativas que utilizamos. A continuación, necesitamos deshacernos de esas palabras que no nos aportan información para nuestra búsqueda, como serían por ejemplo: "y","de", etc.

In [None]:
from nltk.tokenize import word_tokenize 
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer

import nltk
nltk.download('punkt')
nltk.download('stopwords')

ps = PorterStemmer()

preprocessedText = []

for row in data.itertuples():
    
    
    text = word_tokenize(row[8])
    stops = set(stopwords.words("english"))
    text = [ps.stem(w) for w in text if not w in stops and w.isalnum()]
    text = " ".join(text)
    
    preprocessedText.append(text)

preprocessedData = data
preprocessedData['processed_text'] = preprocessedText

preprocessedData
    

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


Unnamed: 0,index,Series Title,Year Released,Content Rating,IMDB Rating,R Rating,Genre,Description,No of Seasons,Netflix,HBO,Prime Video,Others,Streaming Platform,Number of Seasons,Mean Rating,processed_text
0,0,Breaking Bad,2008,18+,95.0,100,"Crime,Drama","When Walter White, a New Mexico chemistry teac...",5Seasons,1,0,0,0,Netflix,5,97.5,when walter white new mexico chemistri teacher...
1,1,Game of Thrones,2011,18+,93.0,99,"Action & Adventure,Drama",Seven noble families fight for control of the ...,8Seasons,0,1,0,0,"HBO MAX,HBO",8,96.0,seven nobl famili fight control mythic land we...
2,2,Rick and Morty,2013,18+,92.0,97,"Animation,Comedy",Rick is a mentally-unbalanced but scientifical...,4Seasons,0,1,0,0,"Free Services,HBO MAX,Hulu",4,94.5,rick old man recent reconnect famili He spend ...
3,3,Stranger Things,2016,16+,88.0,96,"Drama,Fantasy","When a young boy vanishes, a small town uncove...",3Seasons,1,0,0,0,Netflix,3,92.0,when young boy vanish small town uncov mysteri...
4,4,The Boys,2019,18+,87.0,95,"Action & Adventure,Comedy",A group of vigilantes known informally as “The...,2Seasons,0,0,1,0,Prime Video,2,91.0,A group vigilant known inform the boy set take...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
11869,12344,"Stop, Breathe & Think Kids: Mindful Games",2017,all,-1.0,-1,"2017,Hulu",Mindfulness made easy and fun for kids. Discov...,1Season,0,0,0,1,Hulu,1,-1.0,mind made easi fun kid discov superpow curios ...
11870,12348,A Fishing Story with Ronnie Green,2017,all,-1.0,-1,"2017,Prime Video",A Fishing Story with Ronnie Green has one or m...,2Seasons,0,0,1,0,"Prime Video,fuboTV",2,-1.0,A fish stori ronni green one episod stream sub...
11871,12350,NHL Road to the Outdoor Classics,2016,all,-1.0,-1,"2016,Prime Video",Road to the NHL Outdoor Classics takes us deep...,1Season,0,0,1,0,"Prime Video,Epix",1,-1.0,road nhl outdoor classic take us deep insid fo...
11872,12351,Addy Media,2018,all,-1.0,-1,"2018,Prime Video",Addy Media has one or more episodes streaming ...,1Season,0,0,1,0,Prime Video,1,-1.0,addi media one episod stream subscript prime v...


In [None]:
preprocessedData.iloc[0]['Description']

"When Walter White, a New Mexico chemistry teacher, is diagnosed with Stage III cancer and given a prognosis of only two years left to live. He becomes filled with a sense of fearlessness and an unrelenting desire to secure his family's financial future at any cost as he enters the dangerous world of drugs and crime.Breaking Bad featuring Bryan Cranston and Aaron Paul has one or more episodes streaming with subscription on Netflix, available for purchase on iTunes, available for purchase on Google Play, and 3 others. It's a crime and drama show with 62 episodes over 5 seasons. Breaking Bad is no longer running and has no plans to air new episodes or seasons. It has a very high IMDb audience rating of 9.5 (1,391,409 votes) and was very well received by critics."

In [None]:
preprocessedData.iloc[0]['processed_text']

'when walter white new mexico chemistri teacher diagnos stage iii cancer given prognosi two year left live He becom fill sens fearless unrel desir secur famili financi futur cost enter danger world drug bad featur bryan cranston aaron paul one episod stream subscript netflix avail purchas itun avail purchas googl play 3 other It crime drama show 62 episod 5 season break bad longer run plan air new episod season It high imdb audienc rate vote well receiv critic'

##TF-IDF
Indudablemente nos interesan las palabras que nos aporten información, pero, estas son las que menos se repiten. Por ello, utilizamos TF-IDF (llamada a TfidVectorizer) que nos expresa cómo de relevante es una palabra en un documento.

##Cálculo de distancia entre vectores de frecuencia
Proseguimos calculando una matriz N x N, donde N se corresponde al número de series, en la cual el valor de distance_matrix[i,j] es la distancia de la serie i a la serie j.

Para calcular esta distancia hemos utilizamos la distancia coseno.

In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics import pairwise_distances

bagOfWordsModel = TfidfVectorizer()
bagOfWordsModel.fit(preprocessedData['processed_text'])
textsBoW= bagOfWordsModel.transform(preprocessedData['processed_text'])

distance_matrix= pairwise_distances(textsBoW,textsBoW ,metric='cosine')

In [None]:
textsBoW.shape

(11874, 33058)

In [None]:
#bagOfWordsModel.get_feature_names()

In [None]:
bagOfWordsModel.get_feature_names()[2210]

'anchor'

##Función de buscar las series más similares respecto a una, tomando en cuenta la descripción.

In [None]:
def buscar(serieText):
    searchTitle = serieText 
    indexOfTitle = preprocessedData[preprocessedData['Series Title']==searchTitle].index.values[0]
    distance_scores = list(enumerate(distance_matrix[indexOfTitle]))
    ordered_scores = sorted(distance_scores, key=lambda x: x[1])
    top_scores = ordered_scores[1:11]
    top_indexes = [i[0] for i in top_scores]
    return preprocessedData['Series Title'].iloc[top_indexes]

In [None]:
print(preprocessedData)

       index  ...                                     processed_text
0          0  ...  when walter white new mexico chemistri teacher...
1          1  ...  seven nobl famili fight control mythic land we...
2          2  ...  rick old man recent reconnect famili He spend ...
3          3  ...  when young boy vanish small town uncov mysteri...
4          4  ...  A group vigilant known inform the boy set take...
...      ...  ...                                                ...
11869  12344  ...  mind made easi fun kid discov superpow curios ...
11870  12348  ...  A fish stori ronni green one episod stream sub...
11871  12350  ...  road nhl outdoor classic take us deep insid fo...
11872  12351  ...  addi media one episod stream subscript prime v...
11873  12352  ...  My dream derelict home one episod stream subsc...

[11874 rows x 17 columns]


In [None]:
searchTitle = "The Boys" #Serie base para las recomendaciones
indexOfTitle = preprocessedData[preprocessedData['Series Title']==searchTitle].index.values[0]
indexOfTitle

4

In [None]:
distance_scores = list(enumerate(distance_matrix[indexOfTitle]))
ordered_scores = sorted(distance_scores, key=lambda x: x[1])
top_scores = ordered_scores[1:11]
top_indexes = [i[0] for i in top_scores]

In [None]:
distance_matrix

array([[0.        , 0.93902789, 0.9482054 , ..., 0.98950162, 0.97759465,
        0.96954089],
       [0.93902789, 0.        , 0.94058848, ..., 0.98890607, 0.97746615,
        0.97528826],
       [0.9482054 , 0.94058848, 0.        , ..., 0.98301855, 0.97329228,
        0.95696585],
       ...,
       [0.98950162, 0.98890607, 0.98301855, ..., 0.        , 0.95907629,
        0.95340978],
       [0.97759465, 0.97746615, 0.97329228, ..., 0.95907629, 0.        ,
        0.91723382],
       [0.96954089, 0.97528826, 0.95696585, ..., 0.95340978, 0.91723382,
        0.        ]])

#**Análisis de sentimientos**
##Carga de datos
Importamos los archivos de entrenamiento para el reconocimento de emociones. Para evitar que tarde mucho el entrenamiento, cogemos solo una porción de los datos.

In [None]:
import pandas as pd

!wget "https://github.com/adrseara/abp_notebooks/raw/master/data/semeval-2017-train.csv"
trainingData = pd.read_csv('semeval-2017-train.csv', delimiter='	')
trainingData = trainingData.head(5000) #Eliminar la funcion head() si se quiere usar todo el dataset. Para las pruebas usamos únicamente los 1000 primeros tweets
trainingData

--2021-01-28 17:19:12--  https://github.com/adrseara/abp_notebooks/raw/master/data/semeval-2017-train.csv
Resolving github.com (github.com)... 140.82.114.3
Connecting to github.com (github.com)|140.82.114.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/adrseara/abp_notebooks/master/data/semeval-2017-train.csv [following]
--2021-01-28 17:19:13--  https://raw.githubusercontent.com/adrseara/abp_notebooks/master/data/semeval-2017-train.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 5883437 (5.6M) [text/plain]
Saving to: ‘semeval-2017-train.csv’


2021-01-28 17:19:14 (26.0 MB/s) - ‘semeval-2017-train.csv’ saved [5883437/5883437]



Unnamed: 0,label,text
0,1,One Night like In Vegas I make dat Nigga Famous
1,1,Walking through Chelsea at this time of day is...
2,0,"And on the very first play of the night, Aaron..."
3,0,"Drove the bike today, about 40 miles. Felt lik..."
4,-1,looking at the temp outside....hpw did it get ...
...,...,...
4995,0,China Telecom 1st Half Net Profit Falls 8.3% b...
4996,0,@HWG91 I get back on Sunday. I was in Split fo...
4997,1,We have just won our 1st match in the MYSA int...
4998,1,Finally made it back from French storms to Lut...


##Procesamiento de los datos de entrenamiento
Tokenizamos las palabras e introducimos la bolsa de palabras en una SVM para entrenar

In [None]:
from nltk.tokenize import word_tokenize 
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer

import nltk
nltk.download('punkt')
nltk.download('stopwords')

ps = PorterStemmer()

preprocessedText = []

for row in trainingData.itertuples():
    
    
    text = word_tokenize(row[2]) ## indice de la columna que contiene el texto
    ## Remove stop words
    stops = set(stopwords.words("english"))
    text = [ps.stem(w) for w in text if not w in stops and w.isalnum()]
    text = " ".join(text)
    
    preprocessedText.append(text)

mypreprocessedData = trainingData
mypreprocessedData['processed_text'] = preprocessedText


from sklearn.feature_extraction.text import TfidfVectorizer

bagOfWordsModel = TfidfVectorizer()
bagOfWordsModel.fit(mypreprocessedData['processed_text'])
textsBoW= bagOfWordsModel.transform(mypreprocessedData['processed_text'])
print("Finished")

from sklearn import svm
svc = svm.SVC(kernel='linear' ) #Modelo de clasificación

X_train = textsBoW #Documentos
Y_train = trainingData['label'] #Etiquetas de los documentos 
svc.fit(X_train, Y_train) #Entrenamiento

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
Finished


SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma='scale', kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False)

Descargamos y tokenizamos los tweets de test y los introducimos en el modelo ya entrenado. Comparamos los resultados con las etiquetas reales y medimos el error utilizando distintas métricas

In [None]:
!wget "https://github.com/adrseara/abp_notebooks/raw/master/data/semeval-2017-test.csv"
testData = pd.read_csv('semeval-2017-test.csv', delimiter='	')
testData = testData.head(500)
testData


ps = PorterStemmer()

preprocessedText = []

for row in testData.itertuples():
    
    
    text = word_tokenize(row[2]) ## indice de la columna que contiene el texto
    ## Remove stop words
    stops = set(stopwords.words("english"))
    text = [ps.stem(w) for w in text if not w in stops and w.isalnum()]
    text = " ".join(text)
    
    preprocessedText.append(text)

preprocessedDataTest = testData
preprocessedDataTest['processed_text'] = preprocessedText

textsBoWTest= bagOfWordsModel.transform(preprocessedDataTest['processed_text'])

X_test = textsBoWTest #Documentos

predictions = svc.predict(X_test) #Se almacena en el array predictions las predicciones del clasificador
print(X_test)

from sklearn.metrics import classification_report

Y_test = testData['label'] #Etiquetas reales de los documentos

print (classification_report(Y_test, predictions))

--2021-01-28 17:19:22--  https://github.com/adrseara/abp_notebooks/raw/master/data/semeval-2017-test.csv
Resolving github.com (github.com)... 140.82.114.4
Connecting to github.com (github.com)|140.82.114.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/adrseara/abp_notebooks/master/data/semeval-2017-test.csv [following]
--2021-01-28 17:19:22--  https://raw.githubusercontent.com/adrseara/abp_notebooks/master/data/semeval-2017-test.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1393528 (1.3M) [text/plain]
Saving to: ‘semeval-2017-test.csv’


2021-01-28 17:19:22 (15.2 MB/s) - ‘semeval-2017-test.csv’ saved [1393528/1393528]

  (0, 610)	0.32006369057953055
  (0, 1215)	0.32723665168896054
  (0, 1592

## Prueba del algoritmo
Una vez tenemos un algoritmo entrenado para reconocer los sentimientos de cadenas de texto, hemos de alimentar el algoritmo con las opiniones de los usuarios.<br/>
En nuestro ejemplo utilizamos comentarios recogidos desde Twitter, utilizando la API gratuita que ofrece la red social. Desgraciadamente, esta versión gratuita de la api presenta varias limitaciones a la hora de recoger los tweets, en cuestión de cantidad y selección, pero a modo de ejemplo es suficiente. <br/>
Aqui definimos una función que recoge los tweets asociados a una cadena de texto, y los pasa por el algoritmo previamente definido para distinguir entre los positivos, negativos y neutros.

In [None]:
import requests
import os
import json
import unicodedata
import csv
import re

def auth():
    return "AAAAAAAAAAAAAAAAAAAAAPdZLgEAAAAA3ltuX2UqfmE1wr9KdXIuGWZTuew%3DPBfk4LbYbE1RMPSt54VOMpLUWYvRugNYfhzneD1gstdT5PHd9N"

    
def create_url(nombre):
    query = "lang:en -https " + nombre
    url = "https://api.twitter.com/2/tweets/search/recent?query={}".format(
        query
    )
    return url

def create_headers(bearer_token):
    headers = {"Authorization": "Bearer {}".format(bearer_token)}
    return headers

def connect_to_endpoint(url, headers):
    response = requests.request("GET", url, headers=headers)
    if response.status_code != 200:
        raise Exception(response.status_code, response.text)
    return response.json()

def searchTweets(nombreSerie):
    bearer_token = auth()
    url = create_url(nombreSerie)
    headers = create_headers(bearer_token)
    json_response = connect_to_endpoint(url, headers)
    txt = json.dumps(json_response, indent=4, sort_keys=True, ensure_ascii=False)
    y = json.loads(txt)

    with open('tweets.csv', 'w', encoding='utf-8', newline='') as csvfile:
        filewriter = csv.writer(csvfile)
        filewriter.writerow(['Label','text'])
        regex = re.compile(r'[\n\r\t]')
        for xx in y['data']:
            tweet = xx['text'].strip()
            tweet = regex.sub(" ", tweet)
            filewriter.writerow(['0', tweet])

    tweets = pd.read_csv("tweets.csv")
    ps = PorterStemmer()
    preprocessedText = []

    for row in tweets.itertuples():
    
      text = word_tokenize(row[2]) ## indice de la columna que contiene el texto
      ## Remove stop words
      stops = set(stopwords.words("english"))
      text = [ps.stem(w) for w in text if not w in stops and w.isalnum()]
      text = " ".join(text)
    
      preprocessedText.append(text)

    preprocessedDataTest = tweets
    preprocessedDataTest['processed_text'] = preprocessedText

    textsBoWTest= bagOfWordsModel.transform(preprocessedDataTest['processed_text'])

    X_test = textsBoWTest #Documentos

    tweets["Label"] = svc.predict(X_test) #Se almacena en el array predictions las predicciones del clasificador
    
    return tweets

print(searchTweets("Pokemon"))



   Label  ...                                     processed_text
0      1  ...  never forget time french pokémon account accid...
1     -1  ...  RT furbyfriday pokemon real peopl would tweet ...
2      0  ...  RT look friend Xp fast send gift pleas add 883...
3     -1  ...  RT furbyfriday pokemon real peopl would tweet ...
4      0  ...  xiipanash kthugstontv furbyfriday nah legit po...
5      1  ...  In 90 liabl get get rob jordan barkley walkman...
6     -1  ...  RT furbyfriday pokemon real peopl would tweet ...
7      0  ...            want pokemon snap game dont want pay 60
8      0  ...  think 6 fav game day tweet uh mayb megaman X n...
9      0  ...  RT thrccracha stay fire tell everyon prepar ev...

[10 rows x 3 columns]


# Rating Manual
Para establecer un sistema mediante el cual el usario pueda puntuar manualmente cada ítem establecemos una tabla que relaciona cada usuario con su respectiva puntuación para cada película.
Primero cargamos los datos de usuarios desde un csv, descargado desde un repositorio remoto, donde tenemos unos pequeños datos de ejemplo

In [None]:
!wget https://github.com/ElBley/ABP/raw/main/User_Table.csv
import pandas as pd
import numpy as np
users = pandas.read_csv('User_Table.csv')
users

--2021-01-28 17:19:23--  https://github.com/ElBley/ABP/raw/main/User_Table.csv
Resolving github.com (github.com)... 140.82.114.4
Connecting to github.com (github.com)|140.82.114.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/ElBley/ABP/main/User_Table.csv [following]
--2021-01-28 17:19:23--  https://raw.githubusercontent.com/ElBley/ABP/main/User_Table.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 131448 (128K) [text/plain]
Saving to: ‘User_Table.csv’


2021-01-28 17:19:23 (4.77 MB/s) - ‘User_Table.csv’ saved [131448/131448]



Unnamed: 0,usuario,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,...,12297,12298,12299,12302,12304,12308,12309,12310,12311,12312,12313,12315,12316,12317,12318,12319,12320,12321,12322,12323,12324,12326,12327,12328,12329,12330,12333,12334,12335,12338,12339,12341,12342,12343,12344,12348,12350,12351,12352,passwd
0,B,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,b
1,A,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,a
2,C,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,c
3,D,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,d
4,F,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,f
5,G,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,g


Definimos unas funciones de altas y bajas de usuarios, basándose el los datos del csv. En cada llamada a la función se vuelcan los datos en el fichero. El dataset contiene una fila por cada usuario guardado, y una columna por cada pelicula que puede puntuar, además de las columnas de nombre de usuario y contraseña.

In [None]:
def altausuario(usuario,passwd):
  users = pd.read_csv('User_Table.csv')
  if not usuario in users["usuario"].values:
    users = users.append({"usuario":usuario,"passwd":passwd},ignore_index = True)
    users.to_csv("User_Table.csv",index=False)
  print(users)

def bajausuario(usuario):
  users = pd.read_csv('User_Table.csv')
  if usuario in users["usuario"].values:
    users = users.set_index("usuario")
    users = users.drop(usuario ,axis=0)
    users = users.reset_index()
    users.to_csv("User_Table.csv",index=False)
  print(users)

altausuario("H","h")
bajausuario("H")

  usuario   0   1   2   3   4  ...  12344  12348  12350  12351  12352  passwd
0       B NaN NaN NaN NaN NaN  ...    NaN    NaN    NaN    NaN    NaN       b
1       A NaN NaN NaN NaN NaN  ...    NaN    NaN    NaN    NaN    NaN       a
2       C NaN NaN NaN NaN NaN  ...    NaN    NaN    NaN    NaN    NaN       c
3       D NaN NaN NaN NaN NaN  ...    NaN    NaN    NaN    NaN    NaN       d
4       F NaN NaN NaN NaN NaN  ...    NaN    NaN    NaN    NaN    NaN       f
5       G NaN NaN NaN NaN NaN  ...    NaN    NaN    NaN    NaN    NaN       g
6       H NaN NaN NaN NaN NaN  ...    NaN    NaN    NaN    NaN    NaN       h

[7 rows x 11876 columns]
  usuario   0   1   2   3   4  ...  12344  12348  12350  12351  12352  passwd
0       B NaN NaN NaN NaN NaN  ...    NaN    NaN    NaN    NaN    NaN       b
1       A NaN NaN NaN NaN NaN  ...    NaN    NaN    NaN    NaN    NaN       a
2       C NaN NaN NaN NaN NaN  ...    NaN    NaN    NaN    NaN    NaN       c
3       D NaN NaN NaN NaN NaN  ...    

Por último definimos una función que permitiría a cada usuario dar una puntuación a cada una de las series. Este tipo de estructura en tabla permitiría introducir éstos datos directamente a un algoritmo de recomendación colaborativo, pero como no disponemos de datos reales representativos no vamos a implementar el ejemplo en este notebook

In [None]:
def rateShow(usuario,serie,rating):
  users = pd.read_csv('User_Table.csv')
  if usuario in users.values:
    users.loc[users["usuario"]==usuario,serie] = rating
    users.to_csv("User_Table.csv",index=False)
  print(users)

rateShow("C","1",5)

  usuario   0    1   2   3   4  ...  12344  12348  12350  12351  12352  passwd
0       B NaN  NaN NaN NaN NaN  ...    NaN    NaN    NaN    NaN    NaN       b
1       A NaN  NaN NaN NaN NaN  ...    NaN    NaN    NaN    NaN    NaN       a
2       C NaN  5.0 NaN NaN NaN  ...    NaN    NaN    NaN    NaN    NaN       c
3       D NaN  NaN NaN NaN NaN  ...    NaN    NaN    NaN    NaN    NaN       d
4       F NaN  NaN NaN NaN NaN  ...    NaN    NaN    NaN    NaN    NaN       f
5       G NaN  NaN NaN NaN NaN  ...    NaN    NaN    NaN    NaN    NaN       g

[6 rows x 11876 columns]


#Configuración y lanzamiento de una interfaz gráfica
Decidimos utilizar Flash para mostrar la páginas web, ya que de una forma medianamente sencilla somos capaces de crear una página web que cumple con las funcionalidades que necesitábamos.

In [None]:
!pip install flask
!pip install flask-ngrok

from flask_ngrok import run_with_ngrok
from flask import Flask
from flask import request

Collecting flask-ngrok
  Downloading https://files.pythonhosted.org/packages/af/6c/f54cb686ad1129e27d125d182f90f52b32f284e6c8df58c1bae54fa1adbc/flask_ngrok-0.0.25-py3-none-any.whl
Installing collected packages: flask-ngrok
Successfully installed flask-ngrok-0.0.25


In [None]:
app = Flask(__name__)
run_with_ngrok(app)   #starts ngrok when the app is run
@app.route("/", methods = ["GET","POST"])
def home():
  temp = """
    <h1><strong>Recomendador de series Case Sensitive</strong></h1>
    <div>
        <h2>Selecciona una serie:</h2>
        <form action="/" method="POST">
            <input type="text" name="serie">
            <button type="submit" style=" display:block; width: 100px; height: 20.98px;">Buscar</button>
        </form>
    </div>"""
  series = ""
  positivos = ""
  negativos = ""
  if request.method == 'POST':
    series = "<ul>"
    serie = request.form['serie']
    try:
      x = buscar(serie)
      for row in x:
        series = series + "<li>" + row + "</li>"
      series = series + "</ul>"
    except :
      series = "No se ha encontrado serie"

    positivos = "Opiniones Positivas<ul>"
    try:
      x = searchTweets(serie)
      print(x)
      for index, row in x.iterrows():
        if row.Label == 1 :
          positivos = positivos + "<li>" + row.text + "</li>"
      positivos = positivos + "</ul>"
    except Exception as e:
      print(e)
      positivos = "No se ha encontrado serie"

    negativos = "Opiniones Negativas<ul>"
    try:
      for index, row in x.iterrows():
        if row.Label == -1 :
          negativos = negativos + "<li>" + row.text + "</li>"
      negativos = negativos + "</ul>"
    except:
      negativos = "No se ha encontrado serie"
    
  
  ret = "<html><body>" + temp + series + positivos + negativos +"</body></html>"
  return ret
  
app.run()


 * Serving Flask app "__main__" (lazy loading)
 * Environment: production
[2m   Use a production WSGI server instead.[0m
 * Debug mode: off


 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)


 * Running on http://f9227e0acb6e.ngrok.io
 * Traffic stats available on http://127.0.0.1:4040


127.0.0.1 - - [28/Jan/2021 17:42:19] "[37mGET / HTTP/1.1[0m" 200 -
127.0.0.1 - - [28/Jan/2021 17:42:20] "[33mGET /favicon.ico HTTP/1.1[0m" 404 -
127.0.0.1 - - [28/Jan/2021 17:42:30] "[37mPOST / HTTP/1.1[0m" 200 -


   Label  ...                                     processed_text
0     -1  ...  RT furbyfriday pokemon real peopl would tweet ...
1      0  ...  hoodlumcallum legendari legendari rare mythic ...
2      0  ...                          relationship statu master
3      1  ...  RT sjokz excit power internet sinc twitch play...
4      1  ...  wigglytuffisgay I pokémon fan mechan mysteri d...
5      0  ...  pokemongoapp don forget repair ball lag catch ...
6     -1  ...  king kong vs godzilla everi pokemon vs billion...
7     -1  ...  RT furbyfriday pokemon real peopl would tweet ...
8     -1  ...              I dream last night I got kill pokémon
9      1  ...  yosherinho ye plz mr pokemon also enargi strea...

[10 rows x 3 columns]


127.0.0.1 - - [28/Jan/2021 17:42:49] "[37mPOST / HTTP/1.1[0m" 200 -


   Label  ...                                     processed_text
0      0  ...  netflix like waaaaay better version game thron...
1      0  ...  RT blackstonejason mikkihereego I alway though...
2      1  ...  dnbrgr ajbauer I thought game throne I also ne...
3      1  ...  RT carlboucherkne when matt wolf blame trudeau...
4      0  ...  abdulla54685312 neilhimself lovegwendolin netf...
5      1  ...  neilhimself lovegwendolin netflix Oh god love ...
6      1  ...  secondgentleman merriamwebst unless destroy ch...
7      0  ...  the best comparison think stock market today r...
8     -1  ...  RT hcatz123 neilhimself lovegwendolin netflix ...
9      0  ...                               daiot20i game throne

[10 rows x 3 columns]
