# Obtener datos de la API

## Obtener una cuenta de Twitter Developer

- Crear una cuenta en <a href="https://twitter.com/">Twitter</a> o ingresar a una ya creada
- Solicitar una <a href="https://developer.twitter.com/en/portal/petition/use-case">cuenta de developer</a>  
<br/>
<img src="img/twitter_api_1.png" style="width:50%;float:left;border:1px solid black">
<img src="img/twitter_api_2.png" style="width:50%;float:left;border:1px solid black">
<img src="img/twitter_api_3.png" style="width:50%;float:left;border:1px solid black">
<img src="img/twitter_api_4.png" style="width:50%;float:left;border:1px solid black">

#### How will you use the Twitter API or Twitter Data?

I will search and filter tweets with specific hashtags in order to perform data mining and sentiment analysis practices. These tasks are part of the Artificial Intelligence course. The extracted data will not be used for any other purpose.

#### Are you planning to analyze Twitter data?

I will perform sentiment analysis of the content of the tweets and their geographical location. The type of content of each tweet will be evaluated (links, images, videos)


<br/><br/>
<img src="img/twitter_api_5.png" style="width:50%;float:left;border:1px solid black">
<img src="img/twitter_api_6.png" style="width:50%;float:left;border:1px solid black">


### Responder mail

Si Twitter envía un mail solicitando más información responder con el siguiente mensaje.

<code>
    I will search and filter tweets with specific hashtags in order to perform data mining and sentiment analysis practices. These tasks are part of the Artificial Intelligence course. The extracted data will not be used for any other purpose.
    I will perform sentiment analysis of the content of the tweets and their geographical location. The type of content of each tweet will be evaluated (links, images, videos)
    I will not be using the Tweeting, Retweeting, or liking content. I will only use the API to obtain tweets content.
    The content of the tweets will not be shown. The content will only be used to carry out data analysis exercises during the course.
</code>

## Crear aplicación

- Crear proyecto
- Crear aplicación dentro del proyecto
- Obtener y guardar claves (copiar todas las claves antes de continuar ya que no pueden ser accedidas más adelante)

<br/>
<img src="img/twitter_api_8.png" style="width:30%;float:left;border:1px solid black">
<img src="img/twitter_api_7.png" style="width:70%;float:left;border:1px solid black">


## Cargar Token en variables de entorno

 - Cargar el valor del token en un archivo .env
 <code>export 'BEARER_TOKEN'='valor del bearer token' </code>
 - Agregar el archivo .env dentro del .gitignore en caso de trabajar en repositorio

## Cargar valor del Token en la aplicación

In [116]:
import os
from dotenv import load_dotenv
# Cargar valores del archivo .env en las variables de entorno
load_dotenv()
# Cargar valor del token a variable
bearer_token = os.environ.get("BEARER_TOKEN")


## Definir consulta a la API

### URL de la consulta

Definir la URL de acuerdo a los datos requeridos de acuerdo a la documentación de la <a href="https://developer.twitter.com/en/docs/twitter-api/api-reference-index">API</a>

In [80]:
url = "https://api.twitter.com/2/tweets/search/recent"

## Definir parámetros adicionales

Definr valores como el rango de fecha, hashtag, contenido y campos requeridos.

In [81]:
params = {
    'query': '#machinelearning -is:retweet',
    'tweet.fields':'created_at',
    'max_results':100
}

## Definir cabecera
La cabecera debe llevar el Token de autenticación para que la consulta sea autorizada


In [82]:
headers = {
    "Authorization": f"Bearer {bearer_token}",
    "User-Agent":"v2FullArchiveSearchPython"
} 

## Realizar consulta

In [83]:
import requests
response = requests.get(url, headers=headers, params=params)
print(response)
# Generar excepción si la respuesta no es exitosa
if response.status_code != 200:
    raise Exception(response.status_code, response.text)
print(response.json())

<Response [200]>
{'data': [{'created_at': '2021-10-07T13:22:10.000Z', 'id': '1446103559129403392', 'text': 'Understand the concept of #dataannotation and its advantages! https://t.co/i1ITnRN1iz\n\n#DataLabeling #DataMining #ImageAnnotation #ArtificialIntelligence #AI #MachineLearning #ML #DeepLearning\n\ncc: @alexjc @karpathy @AndrewYNg @andyjankowski @bobgourley @CadeMetz @chrismessina https://t.co/W4ipgg6wvL'}, {'created_at': '2021-10-07T13:22:04.000Z', 'id': '1446103533036597248', 'text': 'How To Identify AI Opportunities https://t.co/VOzfpmIqSa … \n\n#MachineLearning #DataScience #Python #AI #100DaysOfCode #DEVCommunity #IoT #flutter #javascript #Serverless #womenintech #cybersecurity  #CodeNewbie #technology #WomenWhoCode  #DeepLearning #Job #innovation #startups https://t.co/sJMC03OeM7'}, {'created_at': '2021-10-07T13:22:01.000Z', 'id': '1446103522789842945', 'text': '#python #statistics #pythonprogramming #datascience #bigdata #machinelearning #programming  #ArtificialIntelligen

## Formatear respuesta

Convertir respuesta en un dataframe de Pandas

In [8]:
import pandas as pd
df = pd.json_normalize(response.json()['data'])
df

Unnamed: 0,created_at,id,text
0,2021-10-04T20:18:05.000Z,1445121066532175879,#python #datascience #machinelearning #compute...
1,2021-10-04T20:18:04.000Z,1445121060869795844,#timeseriesanalysis #engineering #ai #machinel...
2,2021-10-04T20:17:54.000Z,1445121019497177090,Improve the UX of Your Website \n#ArtificialIn...
3,2021-10-04T20:17:51.000Z,1445121004292952075,#hclswlobp #nocode #lowcode #javascript #githu...
4,2021-10-04T20:17:48.000Z,1445120993673060360,I've just updated my webpage with some great a...
...,...,...,...
95,2021-10-04T19:59:18.000Z,1445116338473553923,https://t.co/toHC09MXNG \n\n#EARTH #SCiENCES \...
96,2021-10-04T19:59:17.000Z,1445116331590881289,Containerize Springboot CRUD App With Docker &...
97,2021-10-04T19:59:02.000Z,1445116271926956032,DeepMind is now able to accurately predict inc...
98,2021-10-04T19:58:41.000Z,1445116182541991956,Mexican B2B Payments Company Higo Raises A $23...


# Ejercicios

 A partir de la documentación del endpoint <a href="https://developer.twitter.com/en/docs/twitter-api/tweets/search/api-reference/get-tweets-search-recent"> Recent </a> y las opciones de <a href="https://developer.twitter.com/en/docs/twitter-api/tweets/search/integrate/build-a-query"> query </a> obtener:
 
 - Una lista de las fechas y creación de los tweets realizados por el usuario @kdnuggets que contenga el hashtag #NLP

In [78]:
user='@kdnuggets'
hashtag='#NLP'
params = {
    'query': f'{user} {hashtag} -is:retweet',
    'tweet.fields': 'created_at',
    'max_results': 100
} 
response = requests.get(url, headers=headers, params=params)
print(response)
# Generar excepción si la respuesta no es exitosa
if response.status_code != 200:
    raise Exception(response.status_code, response.text)
df = pd.json_normalize(response.json()['data'])
df

<Response [200]>


Unnamed: 0,created_at,id,text
0,2021-10-06T13:15:47.000Z,1445739564098678786,All Recent Books Written By GPT-3 - @OpenAI \n...
1,2021-09-30T23:45:04.000Z,1443723602067611656,A Breakdown of Deep Learning Frameworks\nhttps...


- Una lista de los textos y nombres de usuario correspondientes a los tweets que contengan los hashtags #NLP y #MachineLearning que no sean retweets

In [84]:
hashtag='#NPL #MachineLearning'
params = {
    'query': f'{hashtag} -is:retweet',
    'user.fields': 'username',
    'expansions': 'author_id',
    'max_results': 10
} 
response = requests.get(url, headers=headers, params=params)
print(response)
# Generar excepción si la respuesta no es exitosa
if response.status_code != 200:
    raise Exception(response.status_code, response.text)
print(response.json())
df1 = pd.json_normalize(response.json()['data'])
df2 = pd.json_normalize(response.json()['includes']['users'])
# Hago un merge entre ambos df aplicando un 'join'. Elimino las columnas que no necesito
pd.merge(df1, df2, left_on='author_id', right_on='id').drop(['id_x', 'id_y', 'author_id', 'name'], axis=1)

<Response [200]>
{'data': [{'author_id': '1156578830879997954', 'id': '1443975999910301697', 'text': "Giacomo Fava, Lead Artificial Intelligence Engineer di @CherryNpl, parla di #IntelligenzaArtificiale, #MachineLearning e #DeepLearning, di #potenzialità, #sfide e #innovazione.\n\nLeggi l'articolo completo sul #CherryBlog! 🍒🍒🍒\n\n--&gt; https://t.co/8rPpxGK53D\n\n#cherry #npl #fintech"}], 'includes': {'users': [{'id': '1156578830879997954', 'name': 'Cherry NPL', 'username': 'CherryNpl'}]}, 'meta': {'newest_id': '1443975999910301697', 'oldest_id': '1443975999910301697', 'result_count': 1}}


Unnamed: 0,text,username
0,"Giacomo Fava, Lead Artificial Intelligence Eng...",CherryNpl


- Una lista de los textos y enlaces de los tweets que contengan los hashtags #InteligenciaArtificial o #IA en español

In [121]:
twitterUrl='https://twitter.com/'
hashtag='#InteligenciaArtificial #IA'
lang='es'
params = {
    'query': f'{hashtag} {lang} -is:retweet',
    'user.fields': 'username',
    'expansions': 'author_id',
    'max_results': 100
} 
response = requests.get(url, headers=headers, params=params)
print(response)
#print(response.json())

# Generar excepción si la respuesta no es exitosa
if response.status_code != 200:
    raise Exception(response.status_code, response.text)
df1 = pd.json_normalize(response.json()['data'])
df2 = pd.json_normalize(response.json()['includes']['users'])
# Hago un merge entre ambos df aplicando un 'join'.
dfmerge = pd.merge(df1, df2, left_on='author_id', right_on='id')
# Agrego una columna para formar el enlace al tweet.. La url final tiene la forma: https://twitter.com/{username}/status/{tweetId}
dfmerge['url'] = twitterUrl + dfmerge['username'] + "/status/" + dfmerge['id_x']
#Elimino las columnas que no necesito
del dfmerge['author_id']
del dfmerge['id_x']
del dfmerge['id_y']
del dfmerge['name']
del dfmerge['username']
pd.set_option('display.max_colwidth', None)
dfmerge

<Response [401]>


Exception: (401, '{"title":"Unauthorized","detail":"Unauthorized","type":"about:blank","status":401}')

## Descargar a CSV

In [115]:
dfmerge.to_csv('tweets_ej2.csv')  