# Extracción de datos de Twitter por medio de API 

En este cuaderno utilizaremos la API de Twitter para extraer tweets por medio de la bibliotca [Twarc](https://pypi.org/project/twarc/).

**Advertencia: Los pasos en este notebook son repetitivos, ya que se documentaron todas las consultas para la extracción de los datos** 

### Pasos a seguir para poder acceder a la API de Twiter. 
- Tener una cuenta en la red social de Twitter.
- Solicitar una cuenta de desarrollador; al ser aprobada, crear una aplicación para poder extraer los datos de Twitter.
- Extraer las credenciales de permiso. 
[Academic Research Project](https://developer.twitter.com/en/docs/projects/overview#product-track)




In [43]:
import twarc 
import datetime
import itertools
from twarc.client2 import Twarc2
import json
import pandas as pd

Hacemos la autenticación por medio de las credenciales para poder hacer la solicitud de extracción. 

In [47]:
t = Twarc2(bearer_token="AAAAAAAAAAAAAAAAAAAAABmzOgEAAAAAAdXaEdw8lrDMBtPj%2BwY318SBBmk%3DCvdZpj9aQg2ORVIfm0glUMXBNSm1yoDl1Zp1ifSRh3Lf6j6QEm")
#Recomiendan no hacer las credenciales públicas

Definimos la consulta en el periodo comprendido entre el 13 de marzo de 2020 hasta el 10 de mayo de 2021, para ello vamos a utilizar start_time y end_time, respectivamente. En algunas consultas estos parámetros cambian para obtener mejores resultados. 

In [48]:
start_time = datetime.datetime(2020, 3, 13, 0, 0, 0, 0, datetime.timezone.utc)
end_time = datetime.datetime(2021, 5, 10, 0, 0, 0, 0, datetime.timezone.utc)

#Estas variables deben estar en formato UTC

La consulta está determinada por: 

**Palabras claves**

- covid-19
- coronavirus
- infectado covid-19
- muerte covid-19
- bombona de oxígeno
- flujómetro de oxígeno
- saturación de oxígeno
- dexametasona
- GoFundMe venezuela covid-19
- recolectar dinero covid-19
- ayuda covid-19
- tratamiento covid-19
- UCI 
- unidad de cuidados intensivos
- servicio público covid-19 

La búsqueda no hace distinción entre mayúscula y minúscula, sin embargo, sí diferencia las palabras que poseen acentos y las que no.

**Operadores y filtros**

- Tweets que son originales, es decir, que no sean retweets. `-is: retweets`
- Tweets que provengan del país Venezuela. `place_country:VE`
- Tweets que contengan enlaces. `has link`


En principio, vamos a mostrar los pasos detalladamente, luego creamos una función para aplicarlas a las distintas búsquedas. 

Utilizamos `t.search_all(query, star_time, end_time)` para hacer nuestra primera consulta.

[Creación de consultas para los tweets de búsqueda](https://developer.twitter.com/en/docs/twitter-api/tweets/search/integrate/build-a-query#examples).

### Consulta 1
- covid-19
- coronavirus

In [None]:
search_results = t.search_all(query="covid-19 place_country:VE -is:retweet OR coronavirus place_country:VE -is:retweet", start_time=start_time, end_time=end_time)

Obtener todas las páginas de resultados en un archivo formato .JSON. 

In [None]:
for page in search_results:
    with open("datos_twitter.json", "w+") as f:
        f.write(json.dumps(page) + "\n")

In [None]:
A continuación, creamos una función que nos permita reproducir el procedimiento anterior.

In [26]:
def extraccion_datos_twitter(query, start_time, end_time):
    search_results = t.search_all(query, start_time=start_time, end_time=end_time) 
    
    for page in search_results:
        with open("../datos/datos_twitter/data.json", "w+") as f:
            f.write(json.dumps(page) + "\n")          

### Consulta 2
- infectado covid-19

- muerte covid-19

In [None]:
query = "infectado covid-19 place_country:VE -is:retweet OR muerte covid-19 place_country:VE -is:retweet"
extraccion_datos_twitter(query, start_time, end_time)

In [23]:
query = "#infectadocovid-19 place_country:VE -is:retweet OR #muertecovid-19 place_country:VE -is:retweet"
extraccion_datos_twitter(query, start_time, end_time)

### Consulta 3

   - oxígeno
   - dexametasona

In [None]:
query = "oxígeno place_country:VE -is:retweet OR dexametasona place_country:VE -is:retweet"
extraccion_datos_twitter(query, start_time, end_time)

### Consulta 4

- bombona de oxígeno
- dexametasona

In [None]:
query="bombona de oxígeno place_country:VE -is:retweet OR dexametasona place_country:VE -is:retweet"
extraccion_datos_twitter(query, start_time, end_time)

### Consulta 5

- GoFundMe venezuela covid 19
- GoFundMe venezuela covid 19 (has link)

In [None]:
query="GoFundMe venezuela covid-19 -is:retweet OR GoFundMe venezuela covid-19 has:links -is:retweet"
extraccion_datos_twitter(query, start_time, end_time)

In [None]:
query="#GoFundMe venezuela covid-19 -is:retweet OR #GoFundMe venezuela covid-19 has:links -is:retweet"
extraccion_datos_twitter(query, start_time, end_time)

In [32]:
start_time = datetime.datetime(2021, 1, 1, 0, 0, 0, 0, datetime.timezone.utc)
end_time = datetime.datetime(2021, 5, 10, 0, 0, 0, 0, datetime.timezone.utc)
query="GoFundMe venezuela covid-19 -is:retweet OR GoFundMe venezuela covid-19 has:links -is:retweet"
extraccion_datos_twitter(query, start_time, end_time)

In [None]:
query="#GoFundMe venezuela covid-19 -is:retweet OR #GoFundMe venezuela covid-19 has:links -is:retweet"
extraccion_datos_twitter(query, start_time, end_time)

### Búsqueda 6

- recolectar dinero covid-19
- ayuda covid-19

In [None]:
start_time = datetime.datetime(2020, 3, 13, 0, 0, 0, 0, datetime.timezone.utc)
end_time = datetime.datetime(2021, 5, 10, 0, 0, 0, 0, datetime.timezone.utc)
query = "recolectar dinero covid-19 place_country:VE -is:retweet OR ayuda covid-19 place_country:VE has:links -is:retweet"
extraccion_datos_twitter(query, start_time, end_time)

In [33]:
start_time = datetime.datetime(2021, 1, 1, 0, 0, 0, 0, datetime.timezone.utc)
end_time = datetime.datetime(2021, 5, 10, 0, 0, 0, 0, datetime.timezone.utc)
query = "#RecolectarDinero covid-19 place_country:VE -is:retweet OR #ayuda covid-19 place_country:VE has:links -is:retweet"
extraccion_datos_twitter(query, start_time, end_time)

### Consulta 7

- flujómetro de oxígeno
- saturación de oxígeno

In [None]:
start_time = datetime.datetime(2020, 3, 13, 0, 0, 0, 0, datetime.timezone.utc)
end_time = datetime.datetime(2021, 5, 10, 0, 0, 0, 0, datetime.timezone.utc)
query="flujómetro de oxígeno place_country:VE -is:retweet OR Saturación de oxígeno place_country:VE -is:retweet"
extraccion_datos_twitter(query, start_time, end_time)

In [35]:
query="flujometro de oxigeno place_country:VE -is:retweet OR Saturacion de oxigeno place_country:VE -is:retweet"
extraccion_datos_twitter(query, start_time, end_time)

In [46]:
query="#flujmetroOxigeno place_country:VE -is:retweet OR SaturacionOxigeno place_country:VE -is:retweet"
extraccion_datos_twitter(query, start_time, end_time)

### Consulta 8

- servicio público 

In [None]:
query="#ServicioPúblico covid-19 place_country:VE -is:retweet"
extraccion_datos_twitter(query, start_time, end_time)

In [None]:
query="#ServicioPublico covid-19 place_country:VE -is:retweet"
extraccion_datos_twitter(query, start_time, end_time)

In [28]:
start_time = datetime.datetime(2021, 1, 1, 0, 0, 0, 0, datetime.timezone.utc)
end_time = datetime.datetime(2021, 5, 10, 0, 0, 0, 0, datetime.timezone.utc)
query="servicio público covid-19 place_country:VE -is:retweet OR servicio publico covid-19 place_country:VE -is:retweet" 
extraccion_datos_twitter(query, start_time, end_time)

In [30]:
query="ServicioPublico covid-19 place_country:VE -is:retweet OR ServicioPúblico covid-19 place_country:VE -is:retweet" 
extraccion_datos_twitter(query, start_time, end_time)

In [None]:
start_time = datetime.datetime(2021, 1, 1, 0, 0, 0, 0, datetime.timezone.utc)
end_time = datetime.datetime(2021, 1, 31, 0, 0, 0, 0, datetime.timezone.utc)
query="servicio público covid-19 place_country:VE -is:retweet OR servicio publico covid-19 place_country:VE -is:retweet" 
extraccion_datos_twitter(query, start_time, end_time)

In [None]:
query="ServicioPublico covid-19 place_country:VE -is:retweet OR ServicioPúblico covid-19 place_country:VE -is:retweet" 
extraccion_datos_twitter(query, start_time, end_time)

In [None]:
start_time = datetime.datetime(2021, 2, 1, 0, 0, 0, 0, datetime.timezone.utc)
end_time = datetime.datetime(2021, 2, 28, 0, 0, 0, 0, datetime.timezone.utc)
query="servicio público covid-19 place_country:VE -is:retweet OR servicio publico covid-19 place_country:VE -is:retweet" 
extraccion_datos_twitter(query, start_time, end_time)

In [None]:
query="ServicioPublico covid-19 place_country:VE -is:retweet OR ServicioPúblico covid-19 place_country:VE -is:retweet" 
extraccion_datos_twitter(query, start_time, end_time)

In [None]:
start_time = datetime.datetime(2021, 3, 1, 0, 0, 0, 0, datetime.timezone.utc)
end_time = datetime.datetime(2021, 3, 31, 0, 0, 0, 0, datetime.timezone.utc)
query="servicio público covid-19 place_country:VE -is:retweet OR servicio publico covid-19 place_country:VE -is:retweet" 
extraccion_datos_twitter(query, start_time, end_time)

In [None]:
query="ServicioPublico covid-19 place_country:VE -is:retweet OR ServicioPúblico covid-19 place_country:VE -is:retweet" 
extraccion_datos_twitter(query, start_time, end_time)

In [None]:
start_time = datetime.datetime(2021, 4, 1, 0, 0, 0, 0, datetime.timezone.utc)
end_time = datetime.datetime(2021, 4, 30, 0, 0, 0, 0, datetime.timezone.utc)
query="servicio público covid-19 place_country:VE -is:retweet OR servicio publico covid-19 place_country:VE -is:retweet" 
extraccion_datos_twitter(query, start_time, end_time)

In [None]:
query="ServicioPublico covid-19 place_country:VE -is:retweet OR ServicioPúblico covid-19 place_country:VE -is:retweet" 
extraccion_datos_twitter(query, start_time, end_time)

In [None]:
start_time = datetime.datetime(2021, 5, 1, 0, 0, 0, 0, datetime.timezone.utc)
end_time = datetime.datetime(2021, 5, 10, 0, 0, 0, 0, datetime.timezone.utc)
query="servicio público covid-19 place_country:VE -is:retweet OR servicio publico covid-19 place_country:VE -is:retweet" 
extraccion_datos_twitter(query, start_time, end_time)

In [None]:
query="ServicioPublico covid-19 place_country:VE -is:retweet OR ServicioPúblico covid-19 place_country:VE -is:retweet" 
extraccion_datos_twitter(query, start_time, end_time)

### Consulta 9

- UCIS 
- unidad de cuidados intensivos

In [None]:
start_time = datetime.datetime(2020, 3, 13, 0, 0, 0, 0, datetime.timezone.utc)
end_time = datetime.datetime(2021, 5, 10, 0, 0, 0, 0, datetime.timezone.utc)
query="UCIS place_country:VE -is:retweet OR unidad de cuidados intensivos place_country:VE -is:retweet"
extraccion_datos_twitter(query, start_time, end_time)

In [None]:
query="#UCIS place_country:VE -is:retweet"
extraccion_datos_twitter(query, start_time, end_time)

### Consulta 10

- tratamiento covid-19

In [None]:
query="tratamiento covid-19 place_country:VE -is:retweet"
extraccion_datos_twitter(query, start_time, end_time)