# Carga de librerías y set de datos

In [2]:
# Para aplicar un proceso de resumen con un modelo pre-entrenado, 
## necesitaremos primero cargar nuevamente el set de datos y extraer las cadenas correspondientes
## a cada etiqueta.

import pandas as pd
import pickle

df = pd.read_csv("./data/flagged/Sheet_1.csv")
columns_df = df[['response_text', 'class']]
df = columns_df.copy()
columns_name = ["text", "label"]
df.columns = columns_name

Flagged_List = df['text'][df.label == 'flagged'].to_list()
NotFlagged_List = df['text'][df.label == 'not_flagged'].to_list()

# Carga de modelos

Se hará uso del modelo pre-entrenado de la plataforma de Hugging Face: "google/pegasus-large", 
disponen de todo un framework de trabajo que permite la manipulación de cadena de textos con bastante rapidez 
y precisión.

En este caso, se seleccionó este modelo por los resultados publicación en comparación de otros modelos 
con la misma finalidad. 

Fuente: https://huggingface.co/google/pegasus-large

In [1]:
# Importación de la librería Transformers y métodos para la generación 
## de texto y tokenización.

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
  
tokenizer = AutoTokenizer.from_pretrained("google/pegasus-large")
model = AutoModelForSeq2SeqLM.from_pretrained("google/pegasus-large")

In [3]:
# Defnición de una función para preparar la data, codificar y decodificar 
## los textos en nuevas cadenas de caracteres.

def summarize(text):
    input_ids = tokenizer(text, return_tensors="pt").input_ids
    output_ids = model.generate(input_ids)[0]
    return(tokenizer.decode(output_ids, skip_special_tokens = True))

## Procesamiento con modelo "Pegasus" de versión Large

In [4]:
Flagged_Temp = []

print("Resumir textos de la etiqueta 'flagged' " )

for sentence in Flagged_List:
    len_sentence = len(sentence.split())
    if len_sentence >= 50: 
        print("Texto completo: \n")
        print(sentence + '\n')
        new_sentence = summarize(sentence)
        print("Nuevo texto resumido: \n")
        print(new_sentence + '\n')
        Flagged_Temp.append(new_sentence)
        print('-------------' + '\n')


Resumir textos de la etiqueta 'flagged' 
Texto completo: 

Having gone through depression and anxiety myself, I understand the struggles and have a few personal methods to cope when stuff hits. Having this knowledge has allowed me to help several people on the internet as well as my other friends when they have faced similar issues and talk to me about them. I understand how important listening is and offer my experiences to help them get through what they face.

Nuevo texto resumido: 

I understand how important listening is and offer my experiences to help them get through what they face.

-------------

Texto completo: 

Friend who had big addiction issues, ended up being completely isolated, skipped school, and had a very low self esteem. I convinced him to go see a doc together, and promised him that i would go through everything with him. He went to rehab for a few months and now he's clean. But the complete lack of support from his family, from school, changed him. I think it wa

In [9]:
NotFlagged_Temp = []

print("Resumir textos de la etiqueta 'not_flagged' " )

for sentence in NotFlagged_List:
    len_sentence = len(sentence.split())
    if len_sentence >= 50: 
        print("Texto completo: \n")
        print(sentence + '\n')
        new_sentence = summarize(sentence)
        print("Nuevo texto resumido: \n")
        print(new_sentence + '\n')
        NotFlagged_Temp.append(new_sentence)
        print('-------------' + '\n')


Resumir textos de la etiqueta 'not_flagged' 
Texto completo: 

Only really one friend who doesn't fit into the any of the above categories. Her therapist calls it spiraling." Anyway she pretty much calls me any time she is frustrated by something with  her boyfriend to ask me if it's logical or not. Before they would just fight and he would call her crazy. Now she asks me if it's ok he didn't say "please" when he said  "hand me the remote."

Nuevo texto resumido: 

Now she asks me if it's ok he didn't say "please" when he said "hand me the remote."

-------------

Texto completo: 

Took a week off work, packed up the car and picked up a friend who was on the verge of losing it and went camping/surfing for a week. His parents were a big part of the problem and being away from them and others and physical activity every day for a week. but more just being around helped i feel.

Nuevo texto resumido: 

His parents were a big part of the problem and being away from them and others and phys

In [10]:
sorted(Flagged_Temp)

['But how i helped was by basically talking to her, and giving her the advice she needed to hear, not the one she wanted to such as oh you\'ll be ok this will all blow over" what i said was more along the lines of being so blunt that many may find it rude but for her and i it was essential to making any progress.""y friend dealt with anxiety and this desire for everything in her life to be perfect she describes it as caring what happens to much but either way I simply talked to her and when "',
 "He went to rehab for a few months and now he's clean.",
 'I understand how important listening is and offer my experiences to help them get through what they face.',
 'I was one of the only people supporting her and she felt as though I could help because I had been in her spot.',
 "I've had some friends come to me saying people or acquaintances they've known who have killed themselves try and find comfort with me because my best friend killed himself my junior year of high school so they've c

In [11]:
sorted(NotFlagged_Temp)

['His parents were a big part of the problem and being away from them and others and physical activity every day for a week.',
 'I think a lot of guys also feel that it can be hard to talk to girl friends irl about their innermost feelings sometimes, thats the feeling I get at least.',
 "I'm an open book and excited to see how many people you're going to help.",
 "I've always lent an ear for someone to speak to.",
 'Now she asks me if it\'s ok he didn\'t say "please" when he said "hand me the remote."',
 "Probably more, but that's what comes up off the top of my head."]

## Observaciones finales:

* Se puede observar la generación de nuevas cadenas de textos a partir del resumen con modelos pre-entrenados; los cuales se desempeñan con
exactitud y rapidez bastante aceptables para generar nuevos set de datos. 
* Se eligió resumir los textos de más de 50 tokens para evitar repeticiones innecesarias de textos más cortos.
* A manera de administrar bien la capacidad de procesamiento, el tratamiento de esta data se hará en el notebook "EDA - preprocesamiento" 

In [14]:
pickle.dump(NotFlagged_Temp, open("./data_final/NotFlagged_Summ.p", "wb" ))
pickle.dump(Flagged_Temp, open("./data_final/Flagged_Summ.p", "wb" ))