# Extracción de datos
## Autor: Juan Carlos Alonso Valenzuela
En este notebook se realiza el proceso completo de extracción de tweets para generar el dataset. Para ello, se utilizará la librería Twython (https://twython.readthedocs.io/en/latest/)

In [1]:
from twython import Twython
import pandas as pd

In [2]:
APP_KEY = '---'
APP_SECRET = '---'

twitter = Twython(APP_KEY, APP_SECRET, oauth_version=2)
ACCESS_TOKEN = twitter.obtain_access_token()

In [3]:
# Los objetos Twython (instanciados con la API KEY y el ACCESS TOKEN)
# permiten realizar llamadas a la API de python
twitter = Twython(APP_KEY, access_token=ACCESS_TOKEN)

- Referencia API Twitter: https://developer.twitter.com/en/docs/twitter-api/v1/tweets/search/api-reference/get-search-tweets

In [4]:
def generate_csv(hashtag, max_counter):
    # Set parameter values
    query_data = {
        'q': hashtag,
        'lang': 'es',
        'result_type': 'recent',
        'count': 100,
        'until': '2022-05-01'
    }


    results = {'username': [], 'tweet_content': [], 'n_favorites': [], 'timestamp': [], 'mentions': [] }
    
    # Extract data from Twitter
    counter = 0
    for tweet in twitter.cursor(twitter.search, **query_data):
        if counter % 100 == 0:
            print("Extracting tweet {} for hashtag: {}".format(counter, hashtag))
        counter = counter + 1
        results['username'].append(tweet['user']['screen_name'])
        results['tweet_content'].append(tweet['text'])
        results['n_favorites'].append(tweet['favorite_count'])
        results['timestamp'].append(tweet['created_at'])

        mentions = []
        for mention in tweet['entities']['user_mentions']:
            mentions.append(mention['screen_name'])
        results['mentions'].append(mentions)

        if counter == max_counter:
            break
    
    # Create DataFrame from tweets data
    df = pd.DataFrame(results)
    
    # Save DataFrame as a file
    filename = hashtag.replace('#','')
    df.to_csv('dataset/{}.csv'.format(filename))

In [5]:
hashtags = [
    # Neutral
    "#RealMadridEspanyol",
    # Madrid
    "#CAMPEON35",
    "#RMFans",
    # Espanyol
    "#RCDE",
    "#planterRCDE"
]

for hashtag in hashtags:
    generate_csv(hashtag, 5)

Extracting tweet 0 for hashtag: #RealMadridEspanyol
Extracting tweet 0 for hashtag: #CAMPEON35
Extracting tweet 0 for hashtag: #RMFans
Extracting tweet 0 for hashtag: #RCDE
Extracting tweet 0 for hashtag: #planterRCDE
