# Obtener datos de la API

## Obtener una cuenta de Twitter Developer

- Crear una cuenta en <a href="https://twitter.com/">Twitter</a> o ingresar a una ya creada
- Solicitar una <a href="https://developer.twitter.com/en/portal/petition/use-case">cuenta de developer</a>  
<br/>
<img src="img/twitter_api_1.png" style="width:50%;float:left;border:1px solid black">
<img src="img/twitter_api_2.png" style="width:50%;float:left;border:1px solid black">
<img src="img/twitter_api_3.png" style="width:50%;float:left;border:1px solid black">
<img src="img/twitter_api_4.png" style="width:50%;float:left;border:1px solid black">

#### How will you use the Twitter API or Twitter Data?

I will search and filter tweets with specific hashtags in order to perform data mining and sentiment analysis practices. These tasks are part of the Artificial Intelligence course. The extracted data will not be used for any other purpose.

#### Are you planning to analyze Twitter data?

I will perform sentiment analysis of the content of the tweets and their geographical location. The type of content of each tweet will be evaluated (links, images, videos)


<br/><br/>
<img src="img/twitter_api_5.png" style="width:50%;float:left;border:1px solid black">
<img src="img/twitter_api_6.png" style="width:50%;float:left;border:1px solid black">


### Responder mail

Si Twitter envía un mail solicitando más información responder con el siguiente mensaje.

<code>
    I will search and filter tweets with specific hashtags in order to perform data mining and sentiment analysis practices. These tasks are part of the Artificial Intelligence course. The extracted data will not be used for any other purpose.
    I will perform sentiment analysis of the content of the tweets and their geographical location. The type of content of each tweet will be evaluated (links, images, videos)
    I will not be using the Tweeting, Retweeting, or liking content. I will only use the API to obtain tweets content.
    The content of the tweets will not be shown. The content will only be used to carry out data analysis exercises during the course.
</code>

## Crear aplicación

- Crear proyecto
- Crear aplicación dentro del proyecto
- Obtener y guardar claves (copiar todas las claves antes de continuar ya que no pueden ser accedidas más adelante)

<br/>
<img src="img/twitter_api_8.png" style="width:30%;float:left;border:1px solid black">
<img src="img/twitter_api_7.png" style="width:70%;float:left;border:1px solid black">


## Cargar Token en variables de entorno

 - Cargar el valor del token en un archivo .env
 <code>export 'BEARER_TOKEN'='valor del bearer token' </code>
 - Agregar el archivo .env dentro del .gitignore en caso de trabajar en repositorio

## Cargar valor del Token en la aplicación

In [1]:
import os
from dotenv import load_dotenv
# Cargar valores del archivo .env en las variables de entorno
load_dotenv()
# Cargar valor del token a variable
bearer_token = os.environ.get("BEARER_TOKEN")
print(bearer_token)

None


## Definir consulta a la API

### URL de la consulta

Definir la URL de acuerdo a los datos requeridos de acuerdo a la documentación de la <a href="https://developer.twitter.com/en/docs/twitter-api/api-reference-index">API</a>

In [52]:
url = "https://api.twitter.com/2/tweets/search/recent"

## Definir parámetros adicionales

Definr valores como el rango de fecha, hashtag, contenido y campos requeridos.

In [53]:
params = {
    'query': '#machinelearning -is:retweet',
    'tweet.fields':'created_at',
    'max_results':100
}

## Definir cabecera
La cabecera debe llevar el Token de autenticación para que la consulta sea autorizada


In [54]:
headers = {
    "Authorization": f"Bearer {bearer_token}",
    "User-Agent":"v2FullArchiveSearchPython"
} 

## Realizar consulta

In [55]:
import requests
response = requests.get(url, headers=headers, params=params)
print(response)
# Generar excepción si la respuesta no es exitosa
if response.status_code != 200:
    raise Exception(response.status_code, response.text)
print(response.json())

<Response [401]>


Exception: (401, '{"title":"Unauthorized","detail":"Unauthorized","type":"about:blank","status":401}')

## Formatear respuesta

Convertir respuesta en un dataframe de Pandas

In [None]:
import pandas as pd
df = pd.json_normalize(response.json()['data'])
df

KeyError: 'data'

# Ejercicios

 A partir de la documentación del endpoint <a href="https://developer.twitter.com/en/docs/twitter-api/tweets/search/api-reference/get-tweets-search-recent"> Recent </a> y las opciones de <a href="https://developer.twitter.com/en/docs/twitter-api/tweets/search/integrate/build-a-query"> query </a> obtener:
 
 - Una lista de las fechas y creación de los tweets realizados por el usuario @kdnuggets que contenga el hashtag #NLP

In [None]:
params = {
    'query': '#kimetsunoyaiba -is:retweet from:TweeHunch',
    'tweet.fields':'created_at',
    'max_results':100
}

In [None]:
response = requests.get(url, headers=headers, params=params)
print(response)
# Generar excepción si la respuesta no es exitosa
if response.status_code != 200:
    raise Exception(response.status_code, response.text)
print(response.json())


<Response [401]>


Exception: (401, '{"title":"Unauthorized","detail":"Unauthorized","type":"about:blank","status":401}')

In [None]:
df = pd.json_normalize(response.json()['data'])
df

Unnamed: 0,created_at,id,text
0,2021-09-28T14:14:03.000Z,1442855125337485318,.@BaiduResearch Releases PLATO-XL: World’s Fir...
1,2021-09-27T17:22:15.000Z,1442540100303327234,"Language, Vision and Deep Learning Models - Fr..."
2,2021-09-24T14:51:29.000Z,1441414993346539521,Get a Free Dataset Worth $1350 - test your mod...


- Una lista de los textos y nombres de usuario correspondientes a los tweets que contengan los hashtags #NLP y #MachineLearning que no sean retweets

In [None]:
params = {
    'query': '#kimetsunoyaiba -is:retweet',
    'tweet.fields':'created_at',
    'expansions': 'author_id',
    'user.fields':'username',
    'max_results':100
}

In [None]:
response = requests.get(url, headers=headers, params=params)
print(response)
# Generar excepción si la respuesta no es exitosa
if response.status_code != 200:
    raise Exception(response.status_code, response.text)
print(response.json())

NameError: name 'requests' is not defined

In [None]:
df = pd.json_normalize(response.json()['data'])
df

Unnamed: 0,created_at,text,id,author_id
0,2021-09-30T15:16:55.000Z,HMU for quality assignment help\nEssays\nExams...,1443595721349939201,1246118622306631682
1,2021-09-30T15:15:00.000Z,⚕️#AI May Predict the Next High-Risk #Virus To...,1443595239244042241,1403861754808049666
2,2021-09-30T15:13:45.000Z,Dm for help\nExams\nEssays\nBiology\nMath\nEng...,1443594924901994499,1400694407498809348
3,2021-09-30T15:01:39.000Z,We can help with;\nMath\nEnglish\nHistory\nChe...,1443591881112244230,1403014419953635332
4,2021-09-30T15:01:03.000Z,Lessons Learned: Training and Deploying State ...,1443591728582168582,705539763349164032
...,...,...,...,...
95,2021-09-30T09:25:27.000Z,Benchmark Analytics Launches Risk Solutions Bu...,1443507272802525191,918112383628963841
96,2021-09-30T09:25:26.000Z,Mark Wilson's Abacai partners with Ticker | In...,1443507269916930049,918112383628963841
97,2021-09-30T09:21:57.000Z,بــري وجــعــجــع وجــنـبــلاط بـخـطــر\n#Arti...,1443506390182633477,1355257214529892352
98,2021-09-30T09:21:30.000Z,بــري وجــعــجــع وجــنـبــلاط بـخـطــر\n#Arti...,1443506279562072064,1364806129717563394


In [None]:
df_users = pd.json_normalize(response.json()['includes']['users'])
df_users.rename(columns={'id': 'author_id'}, inplace=True)
df_users


Unnamed: 0,author_id,name,username
0,1246118622306631682,MAYA ÀSSIGNMENT AND ONLINE CLASSES HELP,Mayassignment
1,1403861754808049666,د. خلود صالح المانع | Dr. Khulood Almani,Khulood_Almani
2,1400694407498809348,JM ASSIGNMENTS HELP,ASSIGNMENTSHE17
3,1403014419953635332,BEST ASSIGNMENTS AND ESSAY HELP,MAYASSIGNMENT1
4,705539763349164032,ipfconline,ipfconline1
5,49913640,Vijay,Vijaypal87
6,467513287,"Iain Brown, PhD",IainLJBrown
7,1083699084760895496,Nikseam,NikseamC
8,1400418183052283911,Indika AI,Indika_AI
9,918112383628963841,David Sobo,DS_Analytics


In [None]:
pd.merge(df, df_users, on="author_id")

Unnamed: 0,created_at,text,id,author_id,name,username
0,2021-09-30T15:16:55.000Z,HMU for quality assignment help\nEssays\nExams...,1443595721349939201,1246118622306631682,MAYA ÀSSIGNMENT AND ONLINE CLASSES HELP,Mayassignment
1,2021-09-30T15:15:00.000Z,⚕️#AI May Predict the Next High-Risk #Virus To...,1443595239244042241,1403861754808049666,د. خلود صالح المانع | Dr. Khulood Almani,Khulood_Almani
2,2021-09-30T11:01:06.000Z,Why #organizations are slow to patch even high...,1443531341883117574,1403861754808049666,د. خلود صالح المانع | Dr. Khulood Almani,Khulood_Almani
3,2021-09-30T15:13:45.000Z,Dm for help\nExams\nEssays\nBiology\nMath\nEng...,1443594924901994499,1400694407498809348,JM ASSIGNMENTS HELP,ASSIGNMENTSHE17
4,2021-09-30T15:01:39.000Z,We can help with;\nMath\nEnglish\nHistory\nChe...,1443591881112244230,1403014419953635332,BEST ASSIGNMENTS AND ESSAY HELP,MAYASSIGNMENT1
...,...,...,...,...,...,...
95,2021-09-30T10:10:03.000Z,How AI Completed Beethoven's Unfinished Tenth ...,1443518497057878016,1017816919980769280,Ravi Dugh,ravidugh
96,2021-09-30T09:56:20.000Z,Buy Verified PayPal Accounts-Personal &amp; bu...,1443515046445793281,1370393914692280322,usaxoom,usaxoom
97,2021-09-30T09:42:43.000Z,"HIRING: Research Internship, NLP (Spring 2022)...",1443511619070406659,1014944290236174336,ai-jobs.net,ai_jobsNET
98,2021-09-30T09:35:11.000Z,Ever wondered how to approach an evaluation of...,1443509723773427713,946706181913022466,deepset,deepset_ai


- Una lista de los textos y enlaces de los tweets que contengan los hashtags #InteligenciaArtificial o #IA en español

In [None]:
params = {
    'query': '(#InteligenciaArtificial OR #IA) lang:es -is:retweet',
    'tweet.fields':'created_at,entities',
    'expansions': 'author_id',
    'user.fields':'username',
    'max_results':100
}

In [None]:
response = requests.get(url, headers=headers, params=params)
print(response)
# Generar excepción si la respuesta no es exitosa
if response.status_code != 200:
    raise Exception(response.status_code, response.text)
print(response.json())

<Response [200]>
{'data': [{'author_id': '1361813588093018112', 'entities': {'urls': [{'start': 66, 'end': 89, 'url': 'https://t.co/wzDSLyg1mP', 'expanded_url': 'https://youtu.be/NN3UzxlQO2E', 'display_url': 'youtu.be/NN3UzxlQO2E', 'images': [{'url': 'https://pbs.twimg.com/news_img/1443579455406936072/UDGc42aI?format=jpg&name=orig', 'width': 1280, 'height': 720}, {'url': 'https://pbs.twimg.com/news_img/1443579455406936072/UDGc42aI?format=jpg&name=150x150', 'width': 150, 'height': 150}], 'status': 200, 'title': 'fAIr LAC Jalisco presenta: InventIA, foro de Inteligencia Artificial 2021 - 30 de septiembre de 2021', 'description': 'WEB OFICIAL → https://www.inventiajalisco.com/https://twitter.com/Hub_IA_Tec https://www.facebook.com/HubIATechttps://www.linkedin.com/company/hub-ia-tec/ ht...', 'unwound_url': 'https://www.youtube.com/watch?v=NN3UzxlQO2E&feature=youtu.be'}], 'hashtags': [{'start': 48, 'end': 51, 'tag': 'IA'}, {'start': 92, 'end': 105, 'tag': 'InventIA2021'}]}, 'id': '144359730

In [None]:
df = pd.json_normalize(response.json()['data'])
df.drop(columns=['entities.annotations','entities.hashtags','entities.mentions'],inplace=True)
#df["url_aux"] = [df['entities.urls'][i][0]['url'] for i in range(len(df['entities.urls']))]
#print(df['entities.urls'])



In [None]:
# Obtenemos indices de muestras con alive-at-1 faltante
index_to_drop = df[pd.isna(df['entities.urls'])].index
# Eliminar esos índices de el dataframe
df = df.drop(index_to_drop, axis=0)
df

Unnamed: 0,author_id,id,created_at,text,entities.urls
0,1361813588093018112,1443597309921562628,2021-09-30T15:23:14.000Z,¿Cuándo es necesario desarrollar tecnologías d...,"[{'start': 66, 'end': 89, 'url': 'https://t.co..."
2,1123930563751051271,1443597049568583684,2021-09-30T15:22:12.000Z,GPT-3 ¿Un paso más cerca de la Inteligencia Ar...,"[{'start': 247, 'end': 270, 'url': 'https://t...."
3,130792364,1443596832182095881,2021-09-30T15:21:20.000Z,La #InteligenciaArtificial se alimenta de dato...,"[{'start': 237, 'end': 260, 'url': 'https://t...."
5,170027490,1443596526664753164,2021-09-30T15:20:07.000Z,🤖Demos a conocer los proyectos que construyen ...,"[{'start': 236, 'end': 259, 'url': 'https://t...."
6,958956108890034176,1443596496792952838,2021-09-30T15:20:00.000Z,La #TransformaciónDigital hace especial uso de...,"[{'start': 278, 'end': 301, 'url': 'https://t...."
...,...,...,...,...,...
95,613074939,1443555476642299906,2021-09-30T12:37:00.000Z,🤔 ¿Alguna vez te has preguntado qué es la #IA?...,"[{'start': 79, 'end': 102, 'url': 'https://t.c..."
96,1046893958918541312,1443554349335076868,2021-09-30T12:32:31.000Z,¿Sabíais que es posible usar nuestra #onesaitP...,"[{'start': 213, 'end': 236, 'url': 'https://t...."
97,93828125,1443554010561060868,2021-09-30T12:31:10.000Z,Un informe de @Gartner_inc apunta que un terci...,"[{'start': 158, 'end': 181, 'url': 'https://t...."
98,111706303,1443553726724288515,2021-09-30T12:30:03.000Z,Las empresas que consideran estratégica su #Co...,"[{'start': 178, 'end': 201, 'url': 'https://t...."


In [None]:
values = []
for v in df['entities.urls'][:]:
    values.append(v[0]['url'])
df['entities.urls'] = values
df

Unnamed: 0,author_id,id,created_at,text,entities.urls
0,1361813588093018112,1443597309921562628,2021-09-30T15:23:14.000Z,¿Cuándo es necesario desarrollar tecnologías d...,https://t.co/wzDSLyg1mP
2,1123930563751051271,1443597049568583684,2021-09-30T15:22:12.000Z,GPT-3 ¿Un paso más cerca de la Inteligencia Ar...,https://t.co/sYENh8pRbn
3,130792364,1443596832182095881,2021-09-30T15:21:20.000Z,La #InteligenciaArtificial se alimenta de dato...,https://t.co/t26M9JcDgk
5,170027490,1443596526664753164,2021-09-30T15:20:07.000Z,🤖Demos a conocer los proyectos que construyen ...,https://t.co/h6YpO6P2gV
6,958956108890034176,1443596496792952838,2021-09-30T15:20:00.000Z,La #TransformaciónDigital hace especial uso de...,https://t.co/gKE7gPorjS
...,...,...,...,...,...
95,613074939,1443555476642299906,2021-09-30T12:37:00.000Z,🤔 ¿Alguna vez te has preguntado qué es la #IA?...,https://t.co/HPrH8Pxeab
96,1046893958918541312,1443554349335076868,2021-09-30T12:32:31.000Z,¿Sabíais que es posible usar nuestra #onesaitP...,https://t.co/WrlhYX1SFi
97,93828125,1443554010561060868,2021-09-30T12:31:10.000Z,Un informe de @Gartner_inc apunta que un terci...,https://t.co/IQDINASLRr
98,111706303,1443553726724288515,2021-09-30T12:30:03.000Z,Las empresas que consideran estratégica su #Co...,https://t.co/syjp1tMQqP


## Descargar a CSV

In [None]:
params = {
    'query': '(#InteligenciaArtificial OR #IA) lang:es -is:retweet',
    'tweet.fields':'created_at,entities',
    'expansions': 'author_id',
    'user.fields':'username',
    'max_results':100
}
url = "https://api.twitter.com/2/tweets/search/recent"

In [None]:
response = requests.get(url, headers=headers, params=params)
print(response)
# Generar excepción si la respuesta no es exitosa
if response.status_code != 200:
    raise Exception(response.status_code, response.text)
print(dict(response.json())['meta'])
def get_data(url,params):
    results = []

    while True:
        response = requests.get(url, headers=headers, params=params)
        # Generar excepción si la respuesta no es exitosa
        if response.status_code != 200:
            raise Exception(response.status_code, response.text)
        data = response.json()['data']
        meta_data = dict(response.json())['meta']
        results.append(pd.json_normalize(data))
        if 'next_token' not in meta_data:
            break
        else:
            token = meta_data['next_token']
            print(token)
            params = {
                'query': '(#InteligenciaArtificial OR #IA) lang:es -is:retweet',
                'tweet.fields':'created_at,entities',
                'expansions': 'author_id',
                'user.fields':'username',
                'next_token':token,
                'max_results':100
            }
    return pd.concat(results)
df = get_data(url,params)
df

<Response [200]>
{'newest_id': '1443597309921562628', 'oldest_id': '1443553004691410950', 'result_count': 100, 'next_token': 'b26v89c19zqg8o3fpds7a5e44knh5clsa5aczlrmutwn1'}
b26v89c19zqg8o3fpds7a5e44knh5clsa5aczlrmutwn1
b26v89c19zqg8o3fpds7a5cm4y0gt09icsr51lfuk1qbh
b26v89c19zqg8o3fpds7a159nw2przhytgh8u6359ezjx
b26v89c19zqg8o3fpds79yzxllg2a94nya952dzpe9ri5
b26v89c19zqg8o3fpds79ww445r4jz90whc2on8i0d8jh
b26v89c19zqg8o3fpds79wub7ilcw99j1rkz17xyrx7jx
b26v89c19zqg8o3fpds6v8as9kcpmipfrltdzvz2loz99
b26v89c19zqg8o3fpds6v66cwmp6bpvjjoz23cea0p665
b26v89c19zqg8o3fpds6v42tqxbhlr0lsoxaewqvmmav1
b26v89c19zqg8o3fpds6v1wwkhnel3b3gabri0pz8fpml
b26v89c19zqg8o3fpds6uzr9vsuxsiw80gdr4zewxy6pp
b26v89c19zqg8o3fpds6uxng286qhin7fzof1rjgx4kn1


ConnectionError: HTTPSConnectionPool(host='api.twitter.com', port=443): Max retries exceeded with url: /2/tweets/search/recent?query=%28%23InteligenciaArtificial+OR+%23IA%29+lang%3Aes+-is%3Aretweet&tweet.fields=created_at%2Centities&expansions=author_id&user.fields=username&next_token=b26v89c19zqg8o3fpds6uxng286qhin7fzof1rjgx4kn1&max_results=100 (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f2a9e4b34c0>: Failed to establish a new connection: [Errno 101] Network is unreachable'))

In [None]:
df.to_csv('tweets_ej')  