# Trabajo Final Inteligencia Artificial
Alumnos: Lucas Soria y Alejandro Marotta

Tema: TechTober y TechTember

## Cargar Token 


In [1]:
import os
from dotenv import load_dotenv
import pandas as pd
import requests

load_dotenv()  # Cargar valores del archivo .env en las variables de entorno

bearer_token = os.environ.get("BEARER_TOKEN")

### Para poder visualizar mayor cantidad de resultados

In [2]:
pd.options.display.max_rows = 4000
pd.options.display.max_seq_items = 2000
pd.options.display.max_columns = 2000
pd.set_option('display.max_colwidth', None)

## Definir URL y Parametros de la consulta

In [3]:
url = "https://api.twitter.com/2/tweets/search/recent"
headers = {
    "Authorization": f"Bearer {bearer_token}",
    "User-Agent": "v2FullArchiveSearchPython"
}
words = '#techtember OR #techtober OR tectober OR tectember'
params = {
    'query': f'({words}) lang:en -is:retweet',
    'max_results': 100
}

## Creamos una funcion para obtener los datos y formatear la respuesta


In [4]:
def get_data(url, params):
    results = []
    while True:
        response = requests.get(url, headers=headers, params=params)
        # Generar excepción si la respuesta no es exitosa
        if response.status_code != 200:
            raise Exception(response.status_code, response.text)
        data = response.json()['data']
        meta_data = dict(response.json())['meta']
        results.append(pd.json_normalize(data))
        if 'next_token' not in meta_data:
            break
        else:
            token = meta_data['next_token']
            print(token)
            params = {
                'query': f'{words} lang:en -is:retweet',
                'next_token': token,
            }
    return pd.concat(results)

## Obtener los datos

In [5]:
#df = get_data(url, params)
#print(df)
#df.to_csv('TechTweets2.csv')

# Tokenizacion

In [6]:
df = pd.read_csv("TechTweets2.csv") #para no volver a pedir los datos
df

Unnamed: 0.1,Unnamed: 0,id,text
0,0,1454969841912147977,And that’s a wrap on #Techtober- We had such a great month it’s almost scary 😉 🎃\n\nThanks for following along as we showcased the tech enabling #OFP to stand on the front lines of #climatetech innovation!\n\nWe can’t wait to see what November brings (rumor has it #NEARvember 😁) https://t.co/2lEqVfF45x
1,1,1454928095144726528,Upgraded from the iPhone 8 Plus to the iPhone 13 ….sheesh 🔥🔥🔥 this phone here is crazy smooth #TechOctober or “TecTober 🤣🙏🏼💯
2,2,1454678098256007170,It's the last day of tectober(October) and how better to end it with Samsungs flagship phones.\nAs for the s21 it's one of the king's of the android camera 📸 world\nGet Ur self a crown from @gadgetworld89 https://t.co/TJxnQe5AGv
3,3,1454419778773323785,@Captain2Phones This!!!! 💯🙌🏼👏🏼😅 \n\n#techtober #YouTuber
4,4,1454347902617362435,"Nothing says TechTober like C++ , angithi Thanos....#objectorientedprogramming ...#C++....#techtober https://t.co/10d2WmkjOL"
5,5,1454278040121401359,@MKBHD #techtober\nwhere those reviews for the AirPods
6,6,1454185070093295623,“Wait! This all happened in techtober?…Siiiick!“\n\n#techtober #HALLOWEEN #tech #rewind #WAKEMEUP #weekendvibes https://t.co/eFX6UGkLJ0
7,7,1454131155096117250,@RexChapman @CraigWeekend @MKBHD we just finished #techtober and his reviews of tech make me smile every time.
8,8,1454078437090631682,"Because it #Techtober, we have the #Oneplus8T on offer for the current stock. Call 0716 690 990 or Dm to order while stocks last. #PhonesTabletsKe https://t.co/L0IUx8w9Pa"
9,9,1454039838068445185,#NikonZ9 is a beast of a camera but it’s price tag won’t let it make the sensation #sonya7iv made. Nikon should work on some Z5/6/7 upgrades to catch the eye of the every day user. What do you think?\n#techtober #nikon #SonyAlpha #Alpha7IV #camera #photography #Filmmaking


## Filtrar Columnas

In [7]:
df = df[['text']]
df
df2 = df
df = df2

## Tokenizar


In [8]:
from nltk.tokenize import TweetTokenizer
# Instanciar Tokenizer
tt = TweetTokenizer()
# Aplicar Tokenizer a la columna
tokenized_text = df['text'].apply(tt.tokenize)
df["tokenized_text"] = tokenized_text
df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["tokenized_text"] = tokenized_text


Unnamed: 0,text,tokenized_text
0,And that’s a wrap on #Techtober- We had such a great month it’s almost scary 😉 🎃\n\nThanks for following along as we showcased the tech enabling #OFP to stand on the front lines of #climatetech innovation!\n\nWe can’t wait to see what November brings (rumor has it #NEARvember 😁) https://t.co/2lEqVfF45x,"[And, that, ’, s, a, wrap, on, #Techtober, -, We, had, such, a, great, month, it, ’, s, almost, scary, 😉, 🎃, Thanks, for, following, along, as, we, showcased, the, tech, enabling, #OFP, to, stand, on, the, front, lines, of, #climatetech, innovation, !, We, can, ’, t, wait, to, see, what, November, brings, (, rumor, has, it, #NEARvember, 😁, ), https://t.co/2lEqVfF45x]"
1,Upgraded from the iPhone 8 Plus to the iPhone 13 ….sheesh 🔥🔥🔥 this phone here is crazy smooth #TechOctober or “TecTober 🤣🙏🏼💯,"[Upgraded, from, the, iPhone, 8, Plus, to, the, iPhone, 13, …, ., sheesh, 🔥, 🔥, 🔥, this, phone, here, is, crazy, smooth, #TechOctober, or, “, TecTober, 🤣, 🙏, 🏼, 💯]"
2,It's the last day of tectober(October) and how better to end it with Samsungs flagship phones.\nAs for the s21 it's one of the king's of the android camera 📸 world\nGet Ur self a crown from @gadgetworld89 https://t.co/TJxnQe5AGv,"[It's, the, last, day, of, tectober, (, October, ), and, how, better, to, end, it, with, Samsungs, flagship, phones, ., As, for, the, s21, it's, one, of, the, king's, of, the, android, camera, 📸, world, Get, Ur, self, a, crown, from, @gadgetworld89, https://t.co/TJxnQe5AGv]"
3,@Captain2Phones This!!!! 💯🙌🏼👏🏼😅 \n\n#techtober #YouTuber,"[@Captain2Phones, This, !, !, !, 💯, 🙌, 🏼, 👏, 🏼, 😅, #techtober, #YouTuber]"
4,"Nothing says TechTober like C++ , angithi Thanos....#objectorientedprogramming ...#C++....#techtober https://t.co/10d2WmkjOL","[Nothing, says, TechTober, like, C, +, +, ,, angithi, Thanos, ..., #objectorientedprogramming, ..., #, C, +, +, ..., #techtober, https://t.co/10d2WmkjOL]"
5,@MKBHD #techtober\nwhere those reviews for the AirPods,"[@MKBHD, #techtober, where, those, reviews, for, the, AirPods]"
6,“Wait! This all happened in techtober?…Siiiick!“\n\n#techtober #HALLOWEEN #tech #rewind #WAKEMEUP #weekendvibes https://t.co/eFX6UGkLJ0,"[“, Wait, !, This, all, happened, in, techtober, ?, …, Siiiick, !, “, #techtober, #HALLOWEEN, #tech, #rewind, #WAKEMEUP, #weekendvibes, https://t.co/eFX6UGkLJ0]"
7,@RexChapman @CraigWeekend @MKBHD we just finished #techtober and his reviews of tech make me smile every time.,"[@RexChapman, @CraigWeekend, @MKBHD, we, just, finished, #techtober, and, his, reviews, of, tech, make, me, smile, every, time, .]"
8,"Because it #Techtober, we have the #Oneplus8T on offer for the current stock. Call 0716 690 990 or Dm to order while stocks last. #PhonesTabletsKe https://t.co/L0IUx8w9Pa","[Because, it, #Techtober, ,, we, have, the, #Oneplus8T, on, offer, for, the, current, stock, ., Call, 0716, 690, 990, or, Dm, to, order, while, stocks, last, ., #PhonesTabletsKe, https://t.co/L0IUx8w9Pa]"
9,#NikonZ9 is a beast of a camera but it’s price tag won’t let it make the sensation #sonya7iv made. Nikon should work on some Z5/6/7 upgrades to catch the eye of the every day user. What do you think?\n#techtober #nikon #SonyAlpha #Alpha7IV #camera #photography #Filmmaking,"[#NikonZ9, is, a, beast, of, a, camera, but, it, ’, s, price, tag, won, ’, t, let, it, make, the, sensation, #sonya7iv, made, ., Nikon, should, work, on, some, Z5, /, 6/7, upgrades, to, catch, the, eye, of, the, every, day, user, ., What, do, you, think, ?, #techtober, #nikon, #SonyAlpha, #Alpha7IV, #camera, #photography, #Filmmaking]"


## Obtener Frecuencia de cada termino

### Poner todo en minuscula

In [9]:
tokenized_list = df.explode('tokenized_text')
tokenized_list_text = tokenized_list['tokenized_text']

tokenized_list_text_min = list(map(str.lower,tokenized_list_text))
tokenized_list_text_min
    

['and',
 'that',
 '’',
 's',
 'a',
 'wrap',
 'on',
 '#techtober',
 '-',
 'we',
 'had',
 'such',
 'a',
 'great',
 'month',
 'it',
 '’',
 's',
 'almost',
 'scary',
 '😉',
 '🎃',
 'thanks',
 'for',
 'following',
 'along',
 'as',
 'we',
 'showcased',
 'the',
 'tech',
 'enabling',
 '#ofp',
 'to',
 'stand',
 'on',
 'the',
 'front',
 'lines',
 'of',
 '#climatetech',
 'innovation',
 '!',
 'we',
 'can',
 '’',
 't',
 'wait',
 'to',
 'see',
 'what',
 'november',
 'brings',
 '(',
 'rumor',
 'has',
 'it',
 '#nearvember',
 '😁',
 ')',
 'https://t.co/2leqvff45x',
 'upgraded',
 'from',
 'the',
 'iphone',
 '8',
 'plus',
 'to',
 'the',
 'iphone',
 '13',
 '…',
 '.',
 'sheesh',
 '🔥',
 '🔥',
 '🔥',
 'this',
 'phone',
 'here',
 'is',
 'crazy',
 'smooth',
 '#techoctober',
 'or',
 '“',
 'tectober',
 '🤣',
 '🙏',
 '🏼',
 '💯',
 "it's",
 'the',
 'last',
 'day',
 'of',
 'tectober',
 '(',
 'october',
 ')',
 'and',
 'how',
 'better',
 'to',
 'end',
 'it',
 'with',
 'samsungs',
 'flagship',
 'phones',
 '.',
 'as',
 'for',
 

### Hacer una primera limpieza
La idea es que no sea tan profunda para no elimanar hastag, menciones, ni emoticones.
Se limpian los simbolos raros que estan solos y todos los links con spam


In [10]:
caracteres_a_eliminar = ("http","https",'“','"','–',"[","]","{","}","#","@","!","’",",",";",".",":","+","/","*","'","?","¿","¡","!","|","ª","/","~","¬","%","&","(",")","=","-","...",'…',"_")


for i in caracteres_a_eliminar:    
    if i in tokenized_list_text_min:
        tokenized_list_text_min = list(filter(lambda val: val !=  i, tokenized_list_text_min))
        
for j in range(3):
    for i in tokenized_list_text_min:
        if ("http" in i) or ("t.co" in i) or ("https://t.co/" in i):             
            tokenized_list_text_min.remove(i)
    

tokenized_list_text_min

['and',
 'that',
 's',
 'a',
 'wrap',
 'on',
 '#techtober',
 'we',
 'had',
 'such',
 'a',
 'great',
 'month',
 'it',
 's',
 'almost',
 'scary',
 '😉',
 '🎃',
 'thanks',
 'for',
 'following',
 'along',
 'as',
 'we',
 'showcased',
 'the',
 'tech',
 'enabling',
 '#ofp',
 'to',
 'stand',
 'on',
 'the',
 'front',
 'lines',
 'of',
 '#climatetech',
 'innovation',
 'we',
 'can',
 't',
 'wait',
 'to',
 'see',
 'what',
 'november',
 'brings',
 'rumor',
 'has',
 'it',
 '#nearvember',
 '😁',
 'upgraded',
 'from',
 'the',
 'iphone',
 '8',
 'plus',
 'to',
 'the',
 'iphone',
 '13',
 'sheesh',
 '🔥',
 '🔥',
 '🔥',
 'this',
 'phone',
 'here',
 'is',
 'crazy',
 'smooth',
 '#techoctober',
 'or',
 'tectober',
 '🤣',
 '🙏',
 '🏼',
 '💯',
 "it's",
 'the',
 'last',
 'day',
 'of',
 'tectober',
 'october',
 'and',
 'how',
 'better',
 'to',
 'end',
 'it',
 'with',
 'samsungs',
 'flagship',
 'phones',
 'as',
 'for',
 'the',
 's21',
 "it's",
 'one',
 'of',
 'the',
 "king's",
 'of',
 'the',
 'android',
 'camera',
 '📸',
 'wo

### Obtener frecuencia

In [11]:
from nltk.probability import FreqDist

fdist = FreqDist(tokenized_list_text_min)
fdist

FreqDist({'#techtober': 62, 'the': 40, 'to': 32, 'a': 25, 'and': 21, 'is': 18, 'i': 17, 'of': 16, 'it': 15, 'for': 15, ...})

### Convertir en Data Frame


In [12]:
df_fdist = pd.DataFrame.from_dict(fdist, orient='index')
df_fdist.columns = ['Frequency']
df_fdist.index.name = 'Term'
df_fdist.sort_values(by=['Frequency'], inplace=True)
df_fdist

Unnamed: 0_level_0,Frequency
Term,Unnamed: 1_level_1
#nanowrimo,1
year's,1
😍,1
may,1
open,1
browser,1
👍,1
looking,1
lucky,1
#900,1


## Primera nube de palabras con limpieza poca profunda

### Importar librerias

In [13]:
from wordcloud import WordCloud
import matplotlib.pyplot as plt

In [14]:
# Generar nube de palabras
#wordcloud = WordCloud(max_words=100, background_color="white").generate(df_fdist.to_string())

# Mostrar gráfico
#plt.imshow(wordcloud, interpolation='bilinear')
#plt.axis("off")
#plt.rcParams['figure.figsize'] = [500, 500]
#plt.show()

## Quitar Stop Words

In [15]:
import nltk
from nltk.corpus import stopwords
nltk.download('stopwords')
stop_words = set(stopwords.words('english'))

tokenized_list_text_no_stopwords = [x for x in tokenized_list_text_min if not x.lower() in stop_words]

tokenized_list_text_no_stopwords


[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\amaro\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


['wrap',
 '#techtober',
 'great',
 'month',
 'almost',
 'scary',
 '😉',
 '🎃',
 'thanks',
 'following',
 'along',
 'showcased',
 'tech',
 'enabling',
 '#ofp',
 'stand',
 'front',
 'lines',
 '#climatetech',
 'innovation',
 'wait',
 'see',
 'november',
 'brings',
 'rumor',
 '#nearvember',
 '😁',
 'upgraded',
 'iphone',
 '8',
 'plus',
 'iphone',
 '13',
 'sheesh',
 '🔥',
 '🔥',
 '🔥',
 'phone',
 'crazy',
 'smooth',
 '#techoctober',
 'tectober',
 '🤣',
 '🙏',
 '🏼',
 '💯',
 'last',
 'day',
 'tectober',
 'october',
 'better',
 'end',
 'samsungs',
 'flagship',
 'phones',
 's21',
 'one',
 "king's",
 'android',
 'camera',
 '📸',
 'world',
 'get',
 'ur',
 'self',
 'crown',
 '@gadgetworld89',
 '@captain2phones',
 '💯',
 '🙌',
 '🏼',
 '👏',
 '🏼',
 '😅',
 '#techtober',
 '#youtuber',
 'nothing',
 'says',
 'techtober',
 'like',
 'c',
 'angithi',
 'thanos',
 '#objectorientedprogramming',
 'c',
 '#techtober',
 '@mkbhd',
 '#techtober',
 'reviews',
 'airpods',
 'wait',
 'happened',
 'techtober',
 'siiiick',
 '#techtober

### Nueva frecuencia de palabras

In [16]:
fdist = FreqDist(tokenized_list_text_no_stopwords)
fdist

FreqDist({'#techtober': 62, 'via': 8, 'techtober': 7, '@mkbhd': 7, '#nikon': 7, '#z9': 7, 'tech': 6, 'see': 6, 'today': 6, 'new': 6, ...})

### Nuevo dataframe


In [17]:
df_fdist = pd.DataFrame.from_dict(fdist, orient='index')
df_fdist.columns = ['Frequency']
df_fdist.index.name = 'Term'
df_fdist.sort_values(by=['Frequency'], inplace=True)
df_fdist

Unnamed: 0_level_0,Frequency
Term,Unnamed: 1_level_1
wrap,1
join,1
robots,1
build,1
games,1
play,1
come,1
#techcommunity,1
#techreviews,1
cameras,1


### Nube de palabras sin StopWords

In [18]:
"""wordcloud = WordCloud(max_words=100, background_color="white").generate(df_fdist.to_string())

# Mostrar gráfico
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.rcParams['figure.figsize'] = [500, 500]
plt.show()"""

'wordcloud = WordCloud(max_words=100, background_color="white").generate(df_fdist.to_string())\n\n# Mostrar gráfico\nplt.imshow(wordcloud, interpolation=\'bilinear\')\nplt.axis("off")\nplt.rcParams[\'figure.figsize\'] = [500, 500]\nplt.show()'

## Etiquetado POS

In [19]:
nltk.download('tagsets')
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
text_taged = nltk.pos_tag(tokenized_list_text_no_stopwords)
text_taged

[nltk_data] Downloading package tagsets to
[nltk_data]     C:\Users\amaro\AppData\Roaming\nltk_data...
[nltk_data]   Package tagsets is already up-to-date!
[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\amaro\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     C:\Users\amaro\AppData\Roaming\nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!


[('wrap', 'NN'),
 ('#techtober', 'NNP'),
 ('great', 'JJ'),
 ('month', 'NN'),
 ('almost', 'RB'),
 ('scary', 'JJ'),
 ('😉', 'NNP'),
 ('🎃', 'NNP'),
 ('thanks', 'NNS'),
 ('following', 'VBG'),
 ('along', 'RB'),
 ('showcased', 'VBN'),
 ('tech', 'NN'),
 ('enabling', 'VBG'),
 ('#ofp', 'JJ'),
 ('stand', 'NN'),
 ('front', 'NN'),
 ('lines', 'NNS'),
 ('#climatetech', 'JJ'),
 ('innovation', 'NN'),
 ('wait', 'NN'),
 ('see', 'VBP'),
 ('november', 'JJ'),
 ('brings', 'NNS'),
 ('rumor', 'VBP'),
 ('#nearvember', 'NNP'),
 ('😁', 'NNP'),
 ('upgraded', 'VBD'),
 ('iphone', 'NN'),
 ('8', 'CD'),
 ('plus', 'CC'),
 ('iphone', 'JJ'),
 ('13', 'CD'),
 ('sheesh', 'JJ'),
 ('🔥', 'NNP'),
 ('🔥', 'NNP'),
 ('🔥', 'NNP'),
 ('phone', 'NN'),
 ('crazy', 'NN'),
 ('smooth', 'JJ'),
 ('#techoctober', 'NNP'),
 ('tectober', 'NNP'),
 ('🤣', 'NNP'),
 ('🙏', 'NNP'),
 ('🏼', 'NNP'),
 ('💯', 'NNP'),
 ('last', 'JJ'),
 ('day', 'NN'),
 ('tectober', 'VBD'),
 ('october', 'NNP'),
 ('better', 'RBR'),
 ('end', 'VBP'),
 ('samsungs', 'NNS'),
 ('flagship

## Lematizacion

In [20]:
#N* -> N
#J* -> A
#V* -> V
#R* -> R
taged_OK = []
for i in range(len(text_taged)):
    if text_taged[i][1][0] == "N":
        text_taged[i] = (text_taged[i][0], "N")
        taged_OK.append(text_taged[i])
    elif text_taged[i][1][0] == "J":
        text_taged[i] = (text_taged[i][0], "A")
        taged_OK.append(text_taged[i])
    elif text_taged[i][1][0] == "V":
        text_taged[i] = (text_taged[i][0], "V")
        taged_OK.append(text_taged[i])
    elif text_taged[i][1][0] == "R":
        text_taged[i] = (text_taged[i][0], "R")
        taged_OK.append(text_taged[i])
taged_OK

[('wrap', 'N'),
 ('#techtober', 'N'),
 ('great', 'A'),
 ('month', 'N'),
 ('almost', 'R'),
 ('scary', 'A'),
 ('😉', 'N'),
 ('🎃', 'N'),
 ('thanks', 'N'),
 ('following', 'V'),
 ('along', 'R'),
 ('showcased', 'V'),
 ('tech', 'N'),
 ('enabling', 'V'),
 ('#ofp', 'A'),
 ('stand', 'N'),
 ('front', 'N'),
 ('lines', 'N'),
 ('#climatetech', 'A'),
 ('innovation', 'N'),
 ('wait', 'N'),
 ('see', 'V'),
 ('november', 'A'),
 ('brings', 'N'),
 ('rumor', 'V'),
 ('#nearvember', 'N'),
 ('😁', 'N'),
 ('upgraded', 'V'),
 ('iphone', 'N'),
 ('iphone', 'A'),
 ('sheesh', 'A'),
 ('🔥', 'N'),
 ('🔥', 'N'),
 ('🔥', 'N'),
 ('phone', 'N'),
 ('crazy', 'N'),
 ('smooth', 'A'),
 ('#techoctober', 'N'),
 ('tectober', 'N'),
 ('🤣', 'N'),
 ('🙏', 'N'),
 ('🏼', 'N'),
 ('💯', 'N'),
 ('last', 'A'),
 ('day', 'N'),
 ('tectober', 'V'),
 ('october', 'N'),
 ('better', 'R'),
 ('end', 'V'),
 ('samsungs', 'N'),
 ('flagship', 'N'),
 ('phones', 'N'),
 ('s21', 'V'),
 ("king's", 'N'),
 ('android', 'N'),
 ('camera', 'N'),
 ('📸', 'N'),
 ('world', 'N'

In [21]:
from nltk.stem import WordNetLemmatizer
nltk.download('wordnet')
# Importar Lemmatizer
wordnet_lemmatizer = WordNetLemmatizer()
lemmatized = []
for word, simbol in taged_OK:
    lemmatized.append(wordnet_lemmatizer.lemmatize(word, simbol.lower()))
lemmatized

[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\amaro\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


['wrap',
 '#techtober',
 'great',
 'month',
 'almost',
 'scary',
 '😉',
 '🎃',
 'thanks',
 'follow',
 'along',
 'showcased',
 'tech',
 'enable',
 '#ofp',
 'stand',
 'front',
 'line',
 '#climatetech',
 'innovation',
 'wait',
 'see',
 'november',
 'brings',
 'rumor',
 '#nearvember',
 '😁',
 'upgrade',
 'iphone',
 'iphone',
 'sheesh',
 '🔥',
 '🔥',
 '🔥',
 'phone',
 'crazy',
 'smooth',
 '#techoctober',
 'tectober',
 '🤣',
 '🙏',
 '🏼',
 '💯',
 'last',
 'day',
 'tectober',
 'october',
 'well',
 'end',
 'samsungs',
 'flagship',
 'phone',
 's21',
 "king's",
 'android',
 'camera',
 '📸',
 'world',
 'get',
 'ur',
 'crown',
 '@gadgetworld89',
 '@captain2phones',
 '💯',
 '🙌',
 '🏼',
 '👏',
 '🏼',
 '😅',
 '#techtober',
 '#youtuber',
 'nothing',
 'say',
 'c',
 'angithi',
 '#objectorientedprogramming',
 'c',
 '#techtober',
 '@mkbhd',
 '#techtober',
 'review',
 'airpods',
 'wait',
 'happen',
 'techtober',
 'siiiick',
 '#techtober',
 '#halloween',
 '#tech',
 '#rewind',
 '#wakemeup',
 '#weekendvibes',
 '@rexchapman',

### Nube de palabras con lematización

In [22]:
"""wordcloud = WordCloud(max_words=25, background_color="white").generate(" ".join(lemmatized))

# Mostrar gráfico
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.rcParams['figure.figsize'] = [100, 100]
plt.show()"""

'wordcloud = WordCloud(max_words=25, background_color="white").generate(" ".join(lemmatized))\n\n# Mostrar gráfico\nplt.imshow(wordcloud, interpolation=\'bilinear\')\nplt.axis("off")\nplt.rcParams[\'figure.figsize\'] = [100, 100]\nplt.show()'

## Analisis de Sentimientos

In [23]:
import nltk
nltk.download('vader_lexicon')
#conda install -c conda-forge vadersentiment
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
# Instanciar Analizador
sentiment_analyzer = SentimentIntensityAnalyzer()

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\amaro\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


In [24]:
"""l = []
for i in df["tokenized_text"]:
    l.append(" ".join(i))
df["polaridad"] = l
df"""

'l = []\nfor i in df["tokenized_text"]:\n    l.append(" ".join(i))\ndf["polaridad"] = l\ndf'

In [25]:
df["negative"] = ""
df["neutral"] = ""
df["positive"] = ""
df["result"] = ""
for index, row in df.iterrows():
    #Analizar cada review
    analisis = sentiment_analyzer.polarity_scores(row['text'])
    row["negative"] = analisis["neg"]
    row["neutral"] = analisis["neu"]
    row["positive"] = analisis["pos"]
    # Evaluar que valores se considerarán positivo o negativo
    if analisis['compound'] > 0.6 :
        row["result"] = "Positive"
    elif analisis['compound'] <  0.6:
        row["result"] = "Negative"
    else :
        row["result"] = "Neutral"
df

Unnamed: 0,text,tokenized_text,negative,neutral,positive,result
0,And that’s a wrap on #Techtober- We had such a great month it’s almost scary 😉 🎃\n\nThanks for following along as we showcased the tech enabling #OFP to stand on the front lines of #climatetech innovation!\n\nWe can’t wait to see what November brings (rumor has it #NEARvember 😁) https://t.co/2lEqVfF45x,"[And, that, ’, s, a, wrap, on, #Techtober, -, We, had, such, a, great, month, it, ’, s, almost, scary, 😉, 🎃, Thanks, for, following, along, as, we, showcased, the, tech, enabling, #OFP, to, stand, on, the, front, lines, of, #climatetech, innovation, !, We, can, ’, t, wait, to, see, what, November, brings, (, rumor, has, it, #NEARvember, 😁, ), https://t.co/2lEqVfF45x]",0.044,0.763,0.193,Positive
1,Upgraded from the iPhone 8 Plus to the iPhone 13 ….sheesh 🔥🔥🔥 this phone here is crazy smooth #TechOctober or “TecTober 🤣🙏🏼💯,"[Upgraded, from, the, iPhone, 8, Plus, to, the, iPhone, 13, …, ., sheesh, 🔥, 🔥, 🔥, this, phone, here, is, crazy, smooth, #TechOctober, or, “, TecTober, 🤣, 🙏, 🏼, 💯]",0.224,0.701,0.075,Negative
2,It's the last day of tectober(October) and how better to end it with Samsungs flagship phones.\nAs for the s21 it's one of the king's of the android camera 📸 world\nGet Ur self a crown from @gadgetworld89 https://t.co/TJxnQe5AGv,"[It's, the, last, day, of, tectober, (, October, ), and, how, better, to, end, it, with, Samsungs, flagship, phones, ., As, for, the, s21, it's, one, of, the, king's, of, the, android, camera, 📸, world, Get, Ur, self, a, crown, from, @gadgetworld89, https://t.co/TJxnQe5AGv]",0.0,0.901,0.099,Negative
3,@Captain2Phones This!!!! 💯🙌🏼👏🏼😅 \n\n#techtober #YouTuber,"[@Captain2Phones, This, !, !, !, 💯, 🙌, 🏼, 👏, 🏼, 😅, #techtober, #YouTuber]",0.0,0.838,0.162,Negative
4,"Nothing says TechTober like C++ , angithi Thanos....#objectorientedprogramming ...#C++....#techtober https://t.co/10d2WmkjOL","[Nothing, says, TechTober, like, C, +, +, ,, angithi, Thanos, ..., #objectorientedprogramming, ..., #, C, +, +, ..., #techtober, https://t.co/10d2WmkjOL]",0.19,0.81,0.0,Negative
5,@MKBHD #techtober\nwhere those reviews for the AirPods,"[@MKBHD, #techtober, where, those, reviews, for, the, AirPods]",0.0,1.0,0.0,Negative
6,“Wait! This all happened in techtober?…Siiiick!“\n\n#techtober #HALLOWEEN #tech #rewind #WAKEMEUP #weekendvibes https://t.co/eFX6UGkLJ0,"[“, Wait, !, This, all, happened, in, techtober, ?, …, Siiiick, !, “, #techtober, #HALLOWEEN, #tech, #rewind, #WAKEMEUP, #weekendvibes, https://t.co/eFX6UGkLJ0]",0.0,1.0,0.0,Negative
7,@RexChapman @CraigWeekend @MKBHD we just finished #techtober and his reviews of tech make me smile every time.,"[@RexChapman, @CraigWeekend, @MKBHD, we, just, finished, #techtober, and, his, reviews, of, tech, make, me, smile, every, time, .]",0.0,0.865,0.135,Negative
8,"Because it #Techtober, we have the #Oneplus8T on offer for the current stock. Call 0716 690 990 or Dm to order while stocks last. #PhonesTabletsKe https://t.co/L0IUx8w9Pa","[Because, it, #Techtober, ,, we, have, the, #Oneplus8T, on, offer, for, the, current, stock, ., Call, 0716, 690, 990, or, Dm, to, order, while, stocks, last, ., #PhonesTabletsKe, https://t.co/L0IUx8w9Pa]",0.0,1.0,0.0,Negative
9,#NikonZ9 is a beast of a camera but it’s price tag won’t let it make the sensation #sonya7iv made. Nikon should work on some Z5/6/7 upgrades to catch the eye of the every day user. What do you think?\n#techtober #nikon #SonyAlpha #Alpha7IV #camera #photography #Filmmaking,"[#NikonZ9, is, a, beast, of, a, camera, but, it, ’, s, price, tag, won, ’, t, let, it, make, the, sensation, #sonya7iv, made, ., Nikon, should, work, on, some, Z5, /, 6/7, upgrades, to, catch, the, eye, of, the, every, day, user, ., What, do, you, think, ?, #techtober, #nikon, #SonyAlpha, #Alpha7IV, #camera, #photography, #Filmmaking]",0.0,1.0,0.0,Negative


In [26]:
total = len(df["result"])
positive = 0
negative = 0
neutral = 0

for i in df["result"]:
    if i == "Positive":
        positive += 1
    if i == "Neutral":
        neutral += 1
    if i == "Negative":
        negative += 1


print("Total: ",total)
print("Positivas: ",positive)
print("Neutrales: ",neutral)
print("Negativas: ",negative)

Total:  65
Positivas:  13
Neutrales:  0
Negativas:  52


In [27]:
df["negative"] = ""
df["neutral"] = ""
df["positive"] = ""
df["result"] = ""
for index, row in df.iterrows():
    #Analizar cada review
    analisis = sentiment_analyzer.polarity_scores(row['text'])
    row["negative"] = analisis["neg"]
    row["neutral"] = analisis["neu"]
    row["positive"] = analisis["pos"]
    # Evaluar que valores se considerarán positivo o negativo
    if analisis['compound'] > 0.4 :
        row["result"] = "Positive"
    elif analisis['compound'] <  -0.2:
        row["result"] = "Negative"
    else :
        row["result"] = "Neutral"
df[["text", "result"]]

Unnamed: 0,text,result
0,And that’s a wrap on #Techtober- We had such a great month it’s almost scary 😉 🎃\n\nThanks for following along as we showcased the tech enabling #OFP to stand on the front lines of #climatetech innovation!\n\nWe can’t wait to see what November brings (rumor has it #NEARvember 😁) https://t.co/2lEqVfF45x,Positive
1,Upgraded from the iPhone 8 Plus to the iPhone 13 ….sheesh 🔥🔥🔥 this phone here is crazy smooth #TechOctober or “TecTober 🤣🙏🏼💯,Negative
2,It's the last day of tectober(October) and how better to end it with Samsungs flagship phones.\nAs for the s21 it's one of the king's of the android camera 📸 world\nGet Ur self a crown from @gadgetworld89 https://t.co/TJxnQe5AGv,Positive
3,@Captain2Phones This!!!! 💯🙌🏼👏🏼😅 \n\n#techtober #YouTuber,Positive
4,"Nothing says TechTober like C++ , angithi Thanos....#objectorientedprogramming ...#C++....#techtober https://t.co/10d2WmkjOL",Negative
5,@MKBHD #techtober\nwhere those reviews for the AirPods,Neutral
6,“Wait! This all happened in techtober?…Siiiick!“\n\n#techtober #HALLOWEEN #tech #rewind #WAKEMEUP #weekendvibes https://t.co/eFX6UGkLJ0,Neutral
7,@RexChapman @CraigWeekend @MKBHD we just finished #techtober and his reviews of tech make me smile every time.,Neutral
8,"Because it #Techtober, we have the #Oneplus8T on offer for the current stock. Call 0716 690 990 or Dm to order while stocks last. #PhonesTabletsKe https://t.co/L0IUx8w9Pa",Neutral
9,#NikonZ9 is a beast of a camera but it’s price tag won’t let it make the sensation #sonya7iv made. Nikon should work on some Z5/6/7 upgrades to catch the eye of the every day user. What do you think?\n#techtober #nikon #SonyAlpha #Alpha7IV #camera #photography #Filmmaking,Neutral


In [28]:
total = len(df["result"])
positive = 0
negative = 0
neutral = 0

for i in df["result"]:
    if i == "Positive":
        positive += 1
    if i == "Neutral":
        neutral += 1
    if i == "Negative":
        negative += 1


print("Total: ",total)
print("Positivas: ",positive)
print("Neutrales: ",neutral)
print("Negativas: ",negative)

Total:  65
Positivas:  24
Neutrales:  33
Negativas:  8


Resultados negativos

In [29]:

df.loc[lambda df: df['result'] == "Negative"]['text']


1        Upgraded from the iPhone 8 Plus to the iPhone 13 ….sheesh 🔥🔥🔥 this phone here is crazy smooth #TechOctober or “TecTober 🤣🙏🏼💯
4        Nothing says TechTober like C++ , angithi Thanos....#objectorientedprogramming ...#C++....#techtober https://t.co/10d2WmkjOL
15                        #Nikon #Z9 - No Mechanical Shutter?!? The Future Is Here! https://t.co/fjhFXcsY4p via @tedforbes #techtober
16                                             #Nikon #Z9 has NO shutter! First Lok https://t.co/Vq56D58cP4 via @lokcheung #techtober
21                                          Tectober / Techtober has been crazy busy...the overload was real! https://t.co/fshDNjVwQj
31                                                          #Z9 is Coming - Teaser 4 https://t.co/sfGZiiCRBD via @NikonUSA #techtober
44    A #Techtober tutorial for those folks who look fine but deep down wish they were lost in a corn maze. 🍂 https://t.co/ofkeGDhpoB
53         @MKBHD No wonder… I’ve been wondering why the chann