# Obtener datos de la API

## Obtener una cuenta de Twitter Developer

- Crear una cuenta en <a href="https://twitter.com/">Twitter</a> o ingresar a una ya creada
- Solicitar una <a href="https://developer.twitter.com/en/portal/petition/use-case">cuenta de developer</a>  
<br/>
<img src="img/twitter_api_1.png" style="width:50%;float:left;border:1px solid black">
<img src="img/twitter_api_2.png" style="width:50%;float:left;border:1px solid black">
<img src="img/twitter_api_3.png" style="width:50%;float:left;border:1px solid black">
<img src="img/twitter_api_4.png" style="width:50%;float:left;border:1px solid black">

#### How will you use the Twitter API or Twitter Data?

I will search and filter tweets with specific hashtags in order to perform data mining and sentiment analysis practices. These tasks are part of the Artificial Intelligence course. The extracted data will not be used for any other purpose.

#### Are you planning to analyze Twitter data?

I will perform sentiment analysis of the content of the tweets and their geographical location. The type of content of each tweet will be evaluated (links, images, videos)


<br/><br/>
<img src="img/twitter_api_5.png" style="width:50%;float:left;border:1px solid black">
<img src="img/twitter_api_6.png" style="width:50%;float:left;border:1px solid black">


### Responder mail

Si Twitter envía un mail solicitando más información responder con el siguiente mensaje.

<code>
    I will search and filter tweets with specific hashtags in order to perform data mining and sentiment analysis practices. These tasks are part of the Artificial Intelligence course. The extracted data will not be used for any other purpose.
    I will perform sentiment analysis of the content of the tweets and their geographical location. The type of content of each tweet will be evaluated (links, images, videos)
    I will not be using the Tweeting, Retweeting, or liking content. I will only use the API to obtain tweets content.
    The content of the tweets will not be shown. The content will only be used to carry out data analysis exercises during the course.
</code>

## Crear aplicación

- Crear proyecto
- Crear aplicación dentro del proyecto
- Obtener y guardar claves (copiar todas las claves antes de continuar ya que no pueden ser accedidas más adelante)

<br/>
<img src="img/twitter_api_8.png" style="width:30%;float:left;border:1px solid black">
<img src="img/twitter_api_7.png" style="width:70%;float:left;border:1px solid black">


## Cargar Token en variables de entorno

 - Cargar el valor del token en un archivo .env
 <code>export 'BEARER_TOKEN'='valor del bearer token' </code>
 - Agregar el archivo .env dentro del .gitignore en caso de trabajar en repositorio

## Cargar valor del Token en la aplicación

In [2]:
import os
from dotenv import load_dotenv
# Cargar valores del archivo .env en las variables de entorno
load_dotenv()
# Cargar valor del token a variable
bearer_token = os.environ.get("BEARER_TOKEN")

## Definir consulta a la API

### URL de la consulta

Definir la URL de acuerdo a los datos requeridos de acuerdo a la documentación de la <a href="https://developer.twitter.com/en/docs/twitter-api/api-reference-index">API</a>

In [3]:
url = "https://api.twitter.com/2/tweets/search/recent"

## Definir parámetros adicionales

Definr valores como el rango de fecha, hashtag, contenido y campos requeridos.

In [4]:
params = {
    'query': '#machinelearning -is:retweet',
    'tweet.fields':'created_at',
    'max_results':100
}

## Definir cabecera
La cabecera debe llevar el Token de autenticación para que la consulta sea autorizada


In [5]:
headers = {
    "Authorization": f"Bearer {bearer_token}",
    "User-Agent":"v2FullArchiveSearchPython"
} 

## Realizar consulta

In [6]:
import requests
response = requests.get(url, headers=headers, params=params)
print(response)
# Generar excepción si la respuesta no es exitosa
if response.status_code != 200:
    raise Exception(response.status_code, response.text)
print(response.json())

<Response [200]>
{'data': [{'created_at': '2021-10-04T14:57:55.000Z', 'id': '1445040491490328579', 'text': 'The latest Great ideas from Good Comms! https://t.co/mkSMskdnsq Thanks to @CynoteckTS #ai #machinelearning'}, {'created_at': '2021-10-04T14:57:54.000Z', 'id': '1445040487094788104', 'text': "Machine learning can pinpoint 'genes of importance' that help crops to grow with less fertilizer, according to a new study. It can also predict additional traits in plants and disease outcomes in animals!\n\n#futurepositive #machinelearning #ml #artificialintelligence #ai #plants https://t.co/7vkY81PJqz"}, {'created_at': '2021-10-04T14:57:26.000Z', 'id': '1445040368429449219', 'text': 'You should read Prof. #ArtificialIntelligence #learning #machinelearning via https://t.co/eBW8Lmmpx7 https://t.co/hIovg91MRW'}, {'created_at': '2021-10-04T14:57:14.000Z', 'id': '1445040318223683586', 'text': '#hclswlobp\xa0#nocode\xa0#lowcode\xa0#javascript\xa0#github\xa0#nodejs\xa0#cybersecurity\xa0#devops\xa0

## Formatear respuesta

Convertir respuesta en un dataframe de Pandas

In [7]:
import pandas as pd
df = pd.json_normalize(response.json()['data'])
df

Unnamed: 0,created_at,id,text
0,2021-10-04T14:57:55.000Z,1445040491490328579,The latest Great ideas from Good Comms! https:...
1,2021-10-04T14:57:54.000Z,1445040487094788104,Machine learning can pinpoint 'genes of import...
2,2021-10-04T14:57:26.000Z,1445040368429449219,You should read Prof. #ArtificialIntelligence ...
3,2021-10-04T14:57:14.000Z,1445040318223683586,#hclswlobp #nocode #lowcode #javascript #githu...
4,2021-10-04T14:56:11.000Z,1445040054682984452,Most people come face to face with AI technolo...
...,...,...,...
95,2021-10-04T14:35:01.000Z,1445034728428507147,Top Machine Learning Projects Beginners Must ...
96,2021-10-04T14:34:55.000Z,1445034702243569668,The latest The Frieling-Bailey Daily! https://...
97,2021-10-04T14:34:44.000Z,1445034656542478347,The latest The Education Daily! https://t.co/Z...
98,2021-10-04T14:34:41.000Z,1445034643032510465,The latest The c&amp;c Daily! https://t.co/Cmr...


# Ejercicios

 A partir de la documentación del endpoint <a href="https://developer.twitter.com/en/docs/twitter-api/tweets/search/api-reference/get-tweets-search-recent"> Recent </a> y las opciones de <a href="https://developer.twitter.com/en/docs/twitter-api/tweets/search/integrate/build-a-query"> query </a> obtener:
 
 - Una lista de las fechas y creación de los tweets realizados por el usuario @kdnuggets que contenga el hashtag #NLP

In [8]:
user='@kdnuggets'
hashtag='#NLP'
params = {
    'query': f'{user} {hashtag} -is:retweet',
    'tweet.fields':'created_at',
    'max_results':100
}
response = requests.get(url, headers=headers, params=params)
print(response)
# Generar excepción si la respuesta no es exitosa
if response.status_code != 200:
    raise Exception(response.status_code, response.text)
df = pd.json_normalize(response.json()['data'])
df

<Response [200]>


Unnamed: 0,created_at,id,text
0,2021-09-26T15:37:38.000Z,1442151386163003396,Understanding the day-to-day applications of #...
1,2021-09-26T13:45:04.000Z,1442123056613249033,Understanding the day-to-day applications of #...
2,2021-09-25T11:46:43.000Z,1441730882310586375,Relax! #DataScientists will not go extinct in ...
3,2021-09-24T10:56:47.000Z,1441355928792551425,Are Larger Language Models Less Truthful?\n\n#...


- Una lista de los textos y nombres de usuario correspondientes a los tweets que contengan los hashtags #NLP y #MachineLearning que no sean retweets

In [54]:
hashtag='#NLP #MachineLearning'
params = {
    'query': f'{hashtag} -is:retweet',
    'user.fields': 'username',
    'expansions': 'author_id',
    'max_results': 100
}
response = requests.get(url, headers=headers, params=params)
# Generar excepción si la respuesta no es exitosa
if response.status_code != 200:
    raise Exception(response.status_code, response.text)
df_tweets = pd.json_normalize(response.json()['data'])
authors = pd.json_normalize(response.json()['includes']['users'])
authors

Unnamed: 0,id,name,username
0,1403861754808049666,د. خلود صالح المانع | Dr. Khulood Almani,Khulood_Almani
1,467513287,"Iain Brown, PhD",IainLJBrown
2,1029341716883492864,TechTalk™ Boulevard,TechTalkBlvd
3,1346747196452925442,Sakib Hossain,sakib8783
4,1083699084760895496,Nikseam,NikseamC
5,1264433760,Elitsa Krumova,Eli_Krumova
6,918112383628963841,David Sobo,DS_Analytics
7,1435645100051206154,Neil Esparz,EsparzNeil
8,1269107130776223745,Grepnetics,Grepnetics
9,1442886288927981570,Tony Edwin,TonyEdw68907630


In [66]:
list = []
for index, row in df_tweets.iterrows():
    author = authors.loc[authors['id'] == row['author_id']].username.item()
    text = row['text']
    list.append({'author': author, 'text': text})
print(list)

[{'author': 'Khulood_Almani', 'text': '🚘#ElectricVehicles set to triple in 2021\n\n#EV #SmartCities #Technology #innovation #Python\n#DigitalMarketing #Industry40 #ArtificialIntelligence #MachineLearning #CyberSecurity #DataScience #bot #coding  #NodeJS #NLP #javascript #django #TensorFlow #devops #100DaysOfCode #RHOBH https://t.co/Dngv9qFTZY'}, {'author': 'IainLJBrown', 'text': 'Learn tips on how to break the cycle of bias in AI decision making to build a better place for humanity. #WomenInAnalytics #ArtificialIntelligence #AI #DataScience #100DaysOfCode #Python #MachineLearning #BigData #DeepLearning #NLP #Robots #IoT https://t.co/kU8x5Qz2ye'}, {'author': 'TechTalkBlvd', 'text': 'Should you fear #MachineLearning\n\n#DigitalTransformation #DeepLearning\n#programming #Database #AI #coding #DataScientists #Analytics #BigData #Rstats #AI #Reactjs #Python #DataScience #Tech #IIoT #ML #NLP #javascript #TensorFlow #DEVCommunity #Serverless #100DaysOfCode #Dataviz https://t.co/r02I1SIiPq'}, 

- Una lista de los textos y enlaces de los tweets que contengan los hashtags #InteligenciaArtificial o #IA en español

In [71]:
hashtag='#InteligenciaArtificial OR #IA'
params = {
    'query': f'{hashtag} -is:retweet lang:es',
    'user.fields': 'username',
    'expansions': 'author_id',
    'max_results': 100
}
response = requests.get(url, headers=headers, params=params)
# Generar excepción si la respuesta no es exitosa
if response.status_code != 200:
    raise Exception(response.status_code, response.text)
df_tweets = pd.json_normalize(response.json()['data'])
authors = pd.json_normalize(response.json()['includes']['users'])
authors

Unnamed: 0,id,name,username
0,751352816762023936,Big Data Campus,campusbigdata
1,110380312,Plain Concepts,plainconcepts
2,1100696023528292354,AllianzGI_ESP,AllianzGI_Esp
3,205640203,Telefónica Grandes Empresas,TE_GranEmpresa
4,562466814,Rigoberto J Nodal,nodalrigobertoj
...,...,...,...
82,1218902551,Carlos Quenan,CarlosAQuenan
83,62426117,Julio Cesar,CesarRuizRuiz
84,275179722,Tania Martagón,TaniaMartagon
85,372460779,IFEX ALC,IFEXALC


In [74]:
list = []
for index, row in df_tweets.iterrows():
    author = authors.loc[authors['id'] == row['author_id']].username.item()
    tid = row['id']
    text = row['text']
    list.append({'url': f'https://twitter.com/{author}/status/{tid}', 'text': text})
print(list)

[{'url': 'https://twitter.com/campusbigdata/status/1445056829772476417', 'text': 'RT @InnovaCuenta: Datos Personales e Inteligencia Artificial🤖 \nGuillermo Cernuda, Director de Operaciones en DoGood y tutor del Máster en B…'}, {'url': 'https://twitter.com/plainconcepts/status/1445056815402782721', 'text': 'Gracias a nuestra solución Smart Concepts, @tecnalia ha conseguido la digitalización de sus laboratorios y además la aceleración de los procesos de #innovacion  y comunicación.\n\n#IoT #CloudComputing #IA #Industria40\n\n¿Quieres descubrir más?\nhttps://t.co/PkR0hfGDHa'}, {'url': 'https://twitter.com/AllianzGI_Esp/status/1445056153256402954', 'text': 'Perspectivas de la #inteligenciaartificial para 2021\n\nComo inversores, nos centramos en identificar empresas que potencien la IA para generar resultados que beneficien a todas sus partes interesadas. \n\nhttps://t.co/xWyx2yFOJr https://t.co/mQAYocCafQ'}, {'url': 'https://twitter.com/TE_GranEmpresa/status/1445056115335524358', 'text': 

## Descargar a CSV

In [19]:
df.to_csv('tweets_ej.csv')  