# Obtener datos de la API

## Obtener una cuenta de Twitter Developer

- Crear una cuenta en <a href="https://twitter.com/">Twitter</a> o ingresar a una ya creada
- Solicitar una <a href="https://developer.twitter.com/en/portal/petition/use-case">cuenta de developer</a>  
<br/>
<img src="img/twitter_api_1.png" style="width:50%;float:left;border:1px solid black">
<img src="img/twitter_api_2.png" style="width:50%;float:left;border:1px solid black">
<img src="img/twitter_api_3.png" style="width:50%;float:left;border:1px solid black">
<img src="img/twitter_api_4.png" style="width:50%;float:left;border:1px solid black">

#### How will you use the Twitter API or Twitter Data?

I will search and filter tweets with specific hashtags in order to perform data mining and sentiment analysis practices. These tasks are part of the Artificial Intelligence course. The extracted data will not be used for any other purpose.

#### Are you planning to analyze Twitter data?

I will perform sentiment analysis of the content of the tweets and their geographical location. The type of content of each tweet will be evaluated (links, images, videos)


<br/><br/>
<img src="img/twitter_api_5.png" style="width:50%;float:left;border:1px solid black">
<img src="img/twitter_api_6.png" style="width:50%;float:left;border:1px solid black">


### Responder mail

Si Twitter envía un mail solicitando más información responder con el siguiente mensaje.

<code>
    I will search and filter tweets with specific hashtags in order to perform data mining and sentiment analysis practices. These tasks are part of the Artificial Intelligence course. The extracted data will not be used for any other purpose.
    I will perform sentiment analysis of the content of the tweets and their geographical location. The type of content of each tweet will be evaluated (links, images, videos)
    I will not be using the Tweeting, Retweeting, or liking content. I will only use the API to obtain tweets content.
    The content of the tweets will not be shown. The content will only be used to carry out data analysis exercises during the course.
</code>

## Crear aplicación

- Crear proyecto
- Crear aplicación dentro del proyecto
- Obtener y guardar claves (copiar todas las claves antes de continuar ya que no pueden ser accedidas más adelante)

<br/>
<img src="img/twitter_api_8.png" style="width:30%;float:left;border:1px solid black">
<img src="img/twitter_api_7.png" style="width:70%;float:left;border:1px solid black">


## Cargar Token en variables de entorno

 - Cargar el valor del token en un archivo .env
 <code>export 'BEARER_TOKEN'='valor del bearer token' </code>
 - Agregar el archivo .env dentro del .gitignore en caso de trabajar en repositorio

## Cargar valor del Token en la aplicación

In [5]:
import os
from dotenv import load_dotenv
# Cargar valores del archivo .env en las variables de entorno
load_dotenv()
# Cargar valor del token a variable
bearer_token = os.environ.get("BEARER_TOKEN")

## Definir consulta a la API

### URL de la consulta

Definir la URL de acuerdo a los datos requeridos de acuerdo a la documentación de la <a href="https://developer.twitter.com/en/docs/twitter-api/api-reference-index">API</a>

In [6]:
url = "https://api.twitter.com/2/tweets/search/recent"

## Definir parámetros adicionales

Definr valores como el rango de fecha, hashtag, contenido y campos requeridos.

In [7]:
params = {
    'query': '#machinelearning -is:retweet',
    'tweet.fields':'created_at',
    'max_results':100
}

## Definir cabecera
La cabecera debe llevar el Token de autenticación para que la consulta sea autorizada


In [8]:
headers = {
    "Authorization": f"Bearer {bearer_token}",
    "User-Agent":"v2FullArchiveSearchPython"
} 

## Realizar consulta

In [9]:
import requests
response = requests.get(url, headers=headers, params=params)
print(response)
# Generar excepción si la respuesta no es exitosa
if response.status_code != 200:
    raise Exception(response.status_code, response.text)
print(response.json())

<Response [200]>
{'data': [{'created_at': '2021-10-20T17:36:15.000Z', 'id': '1450878543186075649', 'text': 'Region features for YOLO architecture #DeepLearning #learning #machinelearning via https://t.co/5KoJKMHpsB https://t.co/66Mde6cYe4'}, {'created_at': '2021-10-20T17:36:00.000Z', 'id': '1450878479692713988', 'text': 'The Train and Test Split in #MachineLearning Process: https://t.co/ioYoMbsjpL #BigData #DataScience #AI'}, {'created_at': '2021-10-20T17:35:52.000Z', 'id': '1450878449032302594', 'text': 'Top 10 Best Laptop In Full Details Review 2021 #BigData #learning #machinelearning via https://t.co/wjnQGYLeUp https://t.co/04titOOR4g'}, {'created_at': '2021-10-20T17:35:12.000Z', 'id': '1450878281067151367', 'text': 'The latest The SAINTS Daily! https://t.co/dyco4ktkP0 #machinelearning #ai'}, {'created_at': '2021-10-20T17:35:04.000Z', 'id': '1450878246447423489', 'text': 'Pass Components as Props in React \n#cybersecurity #devops #100DaysOfCode #ai #codenewbie #machinelearning #DEVC

## Formatear respuesta

Convertir respuesta en un dataframe de Pandas

In [10]:
import pandas as pd
df = pd.json_normalize(response.json()['data'])
df

Unnamed: 0,created_at,id,text
0,2021-10-20T17:36:15.000Z,1450878543186075649,Region features for YOLO architecture #DeepLea...
1,2021-10-20T17:36:00.000Z,1450878479692713988,The Train and Test Split in #MachineLearning P...
2,2021-10-20T17:35:52.000Z,1450878449032302594,Top 10 Best Laptop In Full Details Review 2021...
3,2021-10-20T17:35:12.000Z,1450878281067151367,The latest The SAINTS Daily! https://t.co/dyco...
4,2021-10-20T17:35:04.000Z,1450878246447423489,Pass Components as Props in React \n#cybersecu...
...,...,...,...
95,2021-10-20T17:15:55.000Z,1450873427057852427,https://t.co/DDVSG294Jw #BigData #learning #ma...
96,2021-10-20T17:15:37.000Z,1450873352399335432,Big Data On Amazon Web Services (AWS) Cloud #B...
97,2021-10-20T17:15:19.000Z,1450873276994138118,The latest Enterprise Mobility News! https://t...
98,2021-10-20T17:15:11.000Z,1450873244026875910,#hclswlobp #nocode #lowcode #javascript #githu...


# Ejercicios

 A partir de la documentación del endpoint <a href="https://developer.twitter.com/en/docs/twitter-api/tweets/search/api-reference/get-tweets-search-recent"> Recent </a> y las opciones de <a href="https://developer.twitter.com/en/docs/twitter-api/tweets/search/integrate/build-a-query"> query </a> obtener:
 
 - Una lista de las fechas y creación de los tweets realizados por el usuario @kdnuggets que contenga el hashtag #NLP

In [11]:
params = {
    'query': '#NLP -is:retweet from:kdnuggets',
    'tweet.fields':'created_at',
    'max_results':100
}

In [12]:
response = requests.get(url, headers=headers, params=params)
print(response)
# Generar excepción si la respuesta no es exitosa
if response.status_code != 200:
    raise Exception(response.status_code, response.text)
print(response.json())

<Response [200]>
{'meta': {'result_count': 0}}


In [13]:
df = pd.json_normalize(response.json()['data'])
df

KeyError: 'data'

- Una lista de los textos y nombres de usuario correspondientes a los tweets que contengan los hashtags #NLP y #MachineLearning que no sean retweets

In [14]:
params = {
    'query': '#NLP #MachineLearning -is:retweet',
    'tweet.fields':'created_at',
    'expansions': 'author_id',
    'user.fields':'username',
    'max_results':100
}

In [15]:
response = requests.get(url, headers=headers, params=params)
print(response)
# Generar excepción si la respuesta no es exitosa
if response.status_code != 200:
    raise Exception(response.status_code, response.text)
print(response.json())

<Response [200]>
{'data': [{'id': '1450875908982808576', 'text': 'AM Best TV Presents “Advancing Tech Exposes Insurers to Bias Risk” - Business Wire https://t.co/oZDVE9yawn\n\n#DataScience #MachineLearning #DeepLearning #Insurance #NLP #Robots #AI #IoT #BigData', 'created_at': '2021-10-20T17:25:47.000Z', 'author_id': '918112383628963841'}, {'id': '1450875904251637764', 'text': "Insurance Stocks' Q3 Earnings on Oct 21: WRB, MMC, &amp; FAF - Yahoo Finance https://t.co/djfoEvZ3TO\n\n#DataScience #MachineLearning #DeepLearning #Insurance #NLP #Robots #AI #IoT #BigData", 'created_at': '2021-10-20T17:25:46.000Z', 'author_id': '918112383628963841'}, {'id': '1450875512541388809', 'text': 'Hmu for quality assignment help\nEssays\nBiology\nChemistry\nMath\nLaw\nEcology\nAnatomy\n#MachineLearning  #DataScience #5G #100DaysOfCode\n#Python #Cybersecurity #BigData #AI #IoT #DeepLearning\n#ArtificialIntelligence #NLP #robots #Industry40 #javascript \n https://t.co/mOyoOPHhSs', 'created_at': '2021-10-

In [19]:
df = pd.json_normalize(response.json()['data'])
df

Unnamed: 0,id,text,created_at,author_id
0,1450875908982808576,AM Best TV Presents “Advancing Tech Exposes In...,2021-10-20T17:25:47.000Z,918112383628963841
1,1450875904251637764,"Insurance Stocks' Q3 Earnings on Oct 21: WRB, ...",2021-10-20T17:25:46.000Z,918112383628963841
2,1450875512541388809,Hmu for quality assignment help\nEssays\nBiolo...,2021-10-20T17:24:12.000Z,1341306390862827520
3,1450874956607213575,Take a break and listen to this #poem created ...,2021-10-20T17:22:00.000Z,1320676594780934144
4,1450874632568033282,HMU for quality assignment help\nEssays\nExams...,2021-10-20T17:20:43.000Z,1341306390862827520
...,...,...,...,...
95,1450798455966691343,College teams closely watching how professiona...,2021-10-20T12:18:01.000Z,444522096
96,1450797922673639425,The Myths of AI https://t.co/0OPOcNkWLK via @C...,2021-10-20T12:15:53.000Z,1704823387
97,1450797645124014082,Implications of #ArtificialIntelligence (#AI) ...,2021-10-20T12:14:47.000Z,18229080
98,1450794189269176330,The Myths of AI https://t.co/kSTi71MMJq #Docup...,2021-10-20T12:01:03.000Z,950410848283119616


In [20]:
df_users = pd.json_normalize(response.json()['includes']['users'])
df_users.rename(columns={'id': 'author_id'}, inplace=True)
df_users


Unnamed: 0,author_id,name,username
0,918112383628963841,David Sobo,DS_Analytics
1,1341306390862827520,Excellent Writers,writersxcellent
2,1320676594780934144,Bot Poets Society,BotPoetsSociety
3,1409946409499979777,Sally M. Fedor,FedorSally
4,1083699084760895496,Nikseam,NikseamC
5,950410848283119616,C-Suite,CSuitePro
6,1446513127055511552,Thomas M,Thomas91580600
7,843593628337553408,KUNGFU.AI,kungfuai
8,300750392,Verinite,Verinite
9,1403014419953635332,BEST ASSIGNMENTS AND ESSAY HELP,MAYASSIGNMENT1


In [21]:
pd.merge(df, df_users, on="author_id")

Unnamed: 0,id,text,created_at,author_id,name,username
0,1450875908982808576,AM Best TV Presents “Advancing Tech Exposes In...,2021-10-20T17:25:47.000Z,918112383628963841,David Sobo,DS_Analytics
1,1450875904251637764,"Insurance Stocks' Q3 Earnings on Oct 21: WRB, ...",2021-10-20T17:25:46.000Z,918112383628963841,David Sobo,DS_Analytics
2,1450860725199593472,Origami Risk Enhances Workers' Comp Claims Adm...,2021-10-20T16:25:27.000Z,918112383628963841,David Sobo,DS_Analytics
3,1450845581849550852,AM Best Upgrades Issuer Credit Rating for Farm...,2021-10-20T15:25:16.000Z,918112383628963841,David Sobo,DS_Analytics
4,1450830483298865152,New Analysis from Global Industry Analysts Rev...,2021-10-20T14:25:17.000Z,918112383628963841,David Sobo,DS_Analytics
...,...,...,...,...,...,...
95,1450801466424193025,How To Remove Bad Reviews From Google Local\nh...,2021-10-20T12:29:58.000Z,1422214344754925570,Brock Dyess,BrockDyess
96,1450798455966691343,College teams closely watching how professiona...,2021-10-20T12:18:01.000Z,444522096,Colin Bristow,BristowColin
97,1450797922673639425,The Myths of AI https://t.co/0OPOcNkWLK via @C...,2021-10-20T12:15:53.000Z,1704823387,Chris Meyer,OnTopBln
98,1450797645124014082,Implications of #ArtificialIntelligence (#AI) ...,2021-10-20T12:14:47.000Z,18229080,Chris Rigatuso,crigatuso


- Una lista de los textos y enlaces de los tweets que contengan los hashtags #InteligenciaArtificial o #IA en español

In [22]:
params = {
    'query': '(#InteligenciaArtificial OR #IA) lang:es -is:retweet',
    'tweet.fields':'created_at,entities',
    'expansions': 'author_id',
    'user.fields':'username',
    'max_results':100
}

In [23]:
response = requests.get(url, headers=headers, params=params)
print(response)
# Generar excepción si la respuesta no es exitosa
if response.status_code != 200:
    raise Exception(response.status_code, response.text)
print(response.json())

<Response [200]>
{'data': [{'created_at': '2021-10-20T17:34:58.000Z', 'entities': {'mentions': [{'start': 115, 'end': 129, 'username': 'CNT_infoLibre', 'id': '1320674590335684608'}], 'urls': [{'start': 81, 'end': 104, 'url': 'https://t.co/pDIcINW61E', 'expanded_url': 'https://paper.li/Interconexiona/1463484771?pub_id=b8594899-fd71-40ff-bd32-8430cdd7995e', 'display_url': 'paper.li/Interconexiona…'}], 'hashtags': [{'start': 13, 'end': 28, 'tag': 'Interconexiona'}, {'start': 32, 'end': 39, 'tag': 'Empleo'}, {'start': 40, 'end': 45, 'tag': 'RRHH'}, {'start': 46, 'end': 60, 'tag': 'MarcaPersonal'}, {'start': 61, 'end': 80, 'tag': 'Orientacionlaboral'}, {'start': 130, 'end': 153, 'tag': 'inteligenciaartificial'}, {'start': 154, 'end': 163, 'tag': 'startups'}]}, 'id': '1450878220673429507', 'text': 'Lo último de #Interconexiona en #Empleo #RRHH #MarcaPersonal #Orientacionlaboral https://t.co/pDIcINW61E gracias a @CNT_infoLibre #inteligenciaartificial #startups', 'author_id': '617338053'}, {'c

In [26]:
df = pd.json_normalize(response.json()['data'])
df.drop(columns=['entities.annotations','entities.hashtags','entities.mentions'],inplace=True)
#df["url_aux"] = [df['entities.urls'][i][0]['url'] for i in range(len(df['entities.urls']))]
#print(df['entities.urls'])
df


Unnamed: 0,created_at,id,text,author_id,entities.mentions,entities.urls,entities.hashtags,entities.annotations
0,2021-10-20T17:34:58.000Z,1450878220673429507,Lo último de #Interconexiona en #Empleo #RRHH ...,617338053,"[{'start': 115, 'end': 129, 'username': 'CNT_i...","[{'start': 81, 'end': 104, 'url': 'https://t.c...","[{'start': 13, 'end': 28, 'tag': 'Interconexio...",
1,2021-10-20T17:34:19.000Z,1450878057661833220,5 DAS 21 Empresas de Biotecnologia Europeias\n...,2956693744,,"[{'start': 235, 'end': 258, 'url': 'https://t....","[{'start': 68, 'end': 72, 'tag': 'EMA'}, {'sta...",
2,2021-10-20T17:30:35.000Z,1450877117445648393,Este Jueves!! nueva edición del Ciclo #innovac...,2499492799,"[{'start': 73, 'end': 89, 'username': 'Argenti...","[{'start': 218, 'end': 241, 'url': 'https://t....","[{'start': 38, 'end': 49, 'tag': 'innovación'}...","[{'start': 125, 'end': 150, 'probability': 0.2..."
3,2021-10-20T17:30:32.000Z,1450877106087469058,@iLABMex invita al panel internacional: Labora...,104654597,"[{'start': 0, 'end': 8, 'username': 'iLABMex',...","[{'start': 127, 'end': 150, 'url': 'https://t....","[{'start': 151, 'end': 169, 'tag': 'innovacion...","[{'start': 78, 'end': 91, 'probability': 0.465..."
4,2021-10-20T17:29:30.000Z,1450876842949451787,El futuro en la securización de espacios en ce...,1200581984,"[{'start': 265, 'end': 279, 'username': 'Geria...","[{'start': 236, 'end': 259, 'url': 'https://t....","[{'start': 91, 'end': 114, 'tag': 'Inteligenci...","[{'start': 174, 'end': 182, 'probability': 0.7..."
...,...,...,...,...,...,...,...,...
95,2021-10-20T13:40:48.000Z,1450819290765135876,🆕📰 @Bolsamania: KPMG distribuirá la nueva plat...,613074939,"[{'start': 3, 'end': 14, 'username': 'bolsaman...","[{'start': 90, 'end': 113, 'url': 'https://t.c...","[{'start': 115, 'end': 118, 'tag': 'IA'}, {'st...","[{'start': 18, 'end': 21, 'probability': 0.628..."
96,2021-10-20T13:39:28.000Z,1450818953169752067,🆕📰 @europapress: KPMG distribuirá la nueva pla...,613074939,"[{'start': 3, 'end': 15, 'username': 'europapr...","[{'start': 91, 'end': 114, 'url': 'https://t.c...","[{'start': 116, 'end': 119, 'tag': 'IA'}, {'st...","[{'start': 19, 'end': 22, 'probability': 0.628..."
97,2021-10-20T13:36:57.000Z,1450818323051171845,#Algoritmos en RRSS son más #Peligrosos que oj...,3287967557,,"[{'start': 205, 'end': 228, 'url': 'https://t....","[{'start': 0, 'end': 11, 'tag': 'Algoritmos'},...","[{'start': 63, 'end': 68, 'probability': 0.491..."
98,2021-10-20T13:36:08.000Z,1450818117773447168,La #inteligenciaartificial nos impacta en nues...,1442754255581073409,,,"[{'start': 3, 'end': 26, 'tag': 'inteligenciaa...",


In [18]:
# Obtenemos indices de muestras con alive-at-1 faltante
index_to_drop = df[pd.isna(df['entities.urls'])].index
# Eliminar esos índices de el dataframe
df = df.drop(index_to_drop, axis=0)
df

Unnamed: 0,author_id,id,created_at,text,entities.urls
0,1361813588093018112,1443597309921562628,2021-09-30T15:23:14.000Z,¿Cuándo es necesario desarrollar tecnologías d...,"[{'start': 66, 'end': 89, 'url': 'https://t.co..."
2,1123930563751051271,1443597049568583684,2021-09-30T15:22:12.000Z,GPT-3 ¿Un paso más cerca de la Inteligencia Ar...,"[{'start': 247, 'end': 270, 'url': 'https://t...."
3,130792364,1443596832182095881,2021-09-30T15:21:20.000Z,La #InteligenciaArtificial se alimenta de dato...,"[{'start': 237, 'end': 260, 'url': 'https://t...."
5,170027490,1443596526664753164,2021-09-30T15:20:07.000Z,🤖Demos a conocer los proyectos que construyen ...,"[{'start': 236, 'end': 259, 'url': 'https://t...."
6,958956108890034176,1443596496792952838,2021-09-30T15:20:00.000Z,La #TransformaciónDigital hace especial uso de...,"[{'start': 278, 'end': 301, 'url': 'https://t...."
...,...,...,...,...,...
95,613074939,1443555476642299906,2021-09-30T12:37:00.000Z,🤔 ¿Alguna vez te has preguntado qué es la #IA?...,"[{'start': 79, 'end': 102, 'url': 'https://t.c..."
96,1046893958918541312,1443554349335076868,2021-09-30T12:32:31.000Z,¿Sabíais que es posible usar nuestra #onesaitP...,"[{'start': 213, 'end': 236, 'url': 'https://t...."
97,93828125,1443554010561060868,2021-09-30T12:31:10.000Z,Un informe de @Gartner_inc apunta que un terci...,"[{'start': 158, 'end': 181, 'url': 'https://t...."
98,111706303,1443553726724288515,2021-09-30T12:30:03.000Z,Las empresas que consideran estratégica su #Co...,"[{'start': 178, 'end': 201, 'url': 'https://t...."


In [19]:
values = []
for v in df['entities.urls'][:]:
    values.append(v[0]['url'])
df['entities.urls'] = values
df

Unnamed: 0,author_id,id,created_at,text,entities.urls
0,1361813588093018112,1443597309921562628,2021-09-30T15:23:14.000Z,¿Cuándo es necesario desarrollar tecnologías d...,https://t.co/wzDSLyg1mP
2,1123930563751051271,1443597049568583684,2021-09-30T15:22:12.000Z,GPT-3 ¿Un paso más cerca de la Inteligencia Ar...,https://t.co/sYENh8pRbn
3,130792364,1443596832182095881,2021-09-30T15:21:20.000Z,La #InteligenciaArtificial se alimenta de dato...,https://t.co/t26M9JcDgk
5,170027490,1443596526664753164,2021-09-30T15:20:07.000Z,🤖Demos a conocer los proyectos que construyen ...,https://t.co/h6YpO6P2gV
6,958956108890034176,1443596496792952838,2021-09-30T15:20:00.000Z,La #TransformaciónDigital hace especial uso de...,https://t.co/gKE7gPorjS
...,...,...,...,...,...
95,613074939,1443555476642299906,2021-09-30T12:37:00.000Z,🤔 ¿Alguna vez te has preguntado qué es la #IA?...,https://t.co/HPrH8Pxeab
96,1046893958918541312,1443554349335076868,2021-09-30T12:32:31.000Z,¿Sabíais que es posible usar nuestra #onesaitP...,https://t.co/WrlhYX1SFi
97,93828125,1443554010561060868,2021-09-30T12:31:10.000Z,Un informe de @Gartner_inc apunta que un terci...,https://t.co/IQDINASLRr
98,111706303,1443553726724288515,2021-09-30T12:30:03.000Z,Las empresas que consideran estratégica su #Co...,https://t.co/syjp1tMQqP


## Descargar a CSV

In [1]:
params = {
    'query': '(#InteligenciaArtificial OR #IA) lang:es -is:retweet',
    'tweet.fields':'created_at,entities',
    'expansions': 'author_id',
    'user.fields':'username',
    'max_results':100
}
url = "https://api.twitter.com/2/tweets/search/recent"

In [2]:
response = requests.get(url, headers=headers, params=params)
print(response)
# Generar excepción si la respuesta no es exitosa
if response.status_code != 200:
    raise Exception(response.status_code, response.text)
print(dict(response.json())['meta'])
def get_data(url,params):
    results = []

    while True:
        response = requests.get(url, headers=headers, params=params)
        # Generar excepción si la respuesta no es exitosa
        if response.status_code != 200:
            raise Exception(response.status_code, response.text)
        data = response.json()['data']
        meta_data = dict(response.json())['meta']
        results.append(pd.json_normalize(data))
        if 'next_token' not in meta_data:
            break
        else:
            token = meta_data['next_token']
            print(token)
            params = {
                'query': '(#InteligenciaArtificial OR #IA) lang:es -is:retweet',
                'tweet.fields':'created_at,entities',
                'expansions': 'author_id',
                'user.fields':'username',
                'next_token':token,
                'max_results':100
            }
    return pd.concat(results)
df = get_data(url,params)
df

NameError: name 'requests' is not defined

In [None]:
df.to_csv('tweets_ej')  