# Problema 1 - An√°lisis de Desinformaci√≥n en Redes Sociales en las Elecciones Presidenciales

## *Autores*: 
- _Leonardo Ponce 202030531-5 (leonardo.ponde@usm.cl)_
- _√Ålvaro Pozo 202030535-8 (alvaro.pozo@usm.cl)_


## Contexto

En el marco de las elecciones presidenciales, las redes sociales han adquirido un rol
central como espacio de difusi√≥n de informaci√≥n, debate y propaganda pol√≠tica. Sin
embargo, tambi√©n se han convertido en terreno f√©rtil para la circulaci√≥n de noticias
falsas (fake news), campa√±as de desinformaci√≥n coordinada y el uso de cuentas
automatizadas (bots) que distorsionan la conversaci√≥n p√∫blica.
Este problema busca que analicen c√≥mo se propaga este tipo de informaci√≥n, qu√©
actores y comunidades la potencian, y c√≥mo es posible identificar patrones ocultos
mediante el uso de redes, an√°lisis temporal y procesamiento de texto.

## Objetivos

En esta primera parte se trabajar√° con datos de redes sociales asociados a las elecciones presidenciales. El foco est√° en estudiar c√≥mo se propagan las noticias falsas en comparaci√≥n con publicaciones leg√≠timas.

Tareas principales:
1. Recolectar y organizar publicaciones relevantes sobre las elecciones (ej., con
palabras clave o enlaces compartidos).
2. Reconstruir cascadas de difusi√≥n (retuits, compartidos, menciones) como
grafos de propagaci√≥n.
3. Calcular m√©tricas de red (grado, betweenness, closeness, etc.) para identificar
actores que amplifican rumores.
4. Visualizar la din√°mica temporal y geogr√°fica de la propagaci√≥n.
5. Distinguir posibles cuentas automatizadas a partir de sus patrones de actividad.


# Desarrollo del problema

In [None]:
import time
import random
import pandas as pd
import json
import os
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException, NoSuchElementException, WebDriverException
import itertools
from fake_useragent import UserAgent
import requests
from datetime import datetime, timedelta
import logging

# Configurar logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

class XScraper:
    def __init__(self, credentials_file="credentials.json"):
        self.credentials = self.load_credentials(credentials_file)
        self.current_account_index = 0
        self.proxy_cycle = itertools.cycle(self.credentials.get('proxies', [])) if self.credentials.get('proxies') else None
        self.ua = UserAgent()
        self.tweets_data = []
        self.session_start_time = datetime.now()
        self.tweets_scraped_this_session = 0
        self.max_tweets_per_account = 800
        
    def load_credentials(self, file_path):
        """Carga credenciales desde archivo JSON"""
        try:
            with open(file_path, 'r', encoding='utf-8') as f:
                return json.load(f)
        except FileNotFoundError:
            logger.error(f"Archivo de credenciales {file_path} no encontrado")
            return self.create_sample_credentials(file_path)
    
    def create_sample_credentials(self, file_path):
        """Crea archivo de credenciales de ejemplo"""
        sample = {
            "accounts": [
                {
                    "username": "tu_usuario1",
                    "password": "tu_contrase√±a1",
                    "phone": ""
                }
            ],
            "proxies": [],
            "delays": {
                "min_tweet_delay": 2,
                "max_tweet_delay": 5,
                "min_scroll_delay": 3,
                "max_scroll_delay": 7,
                "query_delay": 180,
                "account_switch_delay": 300
            }
        }
        
        with open(file_path, 'w', encoding='utf-8') as f:
            json.dump(sample, f, indent=2)
        
        logger.info(f"Archivo de credenciales de ejemplo creado: {file_path}")
        return sample
    
    def get_free_proxies(self):
        """Obtiene proxies gratuitos"""
        try:
            # M√∫ltiples fuentes de proxies gratuitos
            sources = [
                "https://api.proxyscrape.com/v2/?request=get&protocol=http&timeout=10000&country=all",
                "https://raw.githubusercontent.com/TheSpeedX/PROXY-List/master/http.txt"
            ]
            
            all_proxies = []
            for source in sources:
                try:
                    response = requests.get(source, timeout=10)
                    proxies = response.text.strip().split('\n')
                    all_proxies.extend([f"http://{proxy.strip()}" for proxy in proxies if proxy.strip()])
                except:
                    continue
            
            # Validar algunos proxies
            valid_proxies = []
            for proxy in random.sample(all_proxies[:50], min(10, len(all_proxies))):
                if self.test_proxy(proxy):
                    valid_proxies.append(proxy)
                    if len(valid_proxies) >= 5:  # M√°ximo 5 proxies v√°lidos
                        break
            
            return valid_proxies
            
        except Exception as e:
            logger.warning(f"Error obteniendo proxies gratuitos: {e}")
            return []
    
    def test_proxy(self, proxy):
        """Prueba si un proxy funciona"""
        try:
            response = requests.get(
                "http://httpbin.org/ip", 
                proxies={"http": proxy, "https": proxy},
                timeout=5
            )
            return response.status_code == 200
        except:
            return False
    
    def setup_driver(self, use_proxy=True):
        """Configura el driver con medidas anti-detecci√≥n"""
        chrome_options = Options()
        
        # Opciones b√°sicas anti-detecci√≥n
        chrome_options.add_argument('--no-sandbox')
        chrome_options.add_argument('--disable-dev-shm-usage')
        chrome_options.add_argument('--disable-blink-features=AutomationControlled')
        chrome_options.add_argument('--disable-extensions')
        chrome_options.add_argument('--disable-plugins')
        chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"])
        chrome_options.add_experimental_option('useAutomationExtension', False)
        chrome_options.add_argument('--disable-web-security')
        chrome_options.add_argument('--allow-running-insecure-content')
        
        # User agent aleatorio
        user_agent = self.ua.random
        chrome_options.add_argument(f'--user-agent={user_agent}')
        
        # Configuraci√≥n de proxy
        if use_proxy:
            proxy = None
            
            # Usar proxies configurados
            if self.proxy_cycle:
                try:
                    proxy = next(self.proxy_cycle)
                except:
                    pass
            
            # Si no hay proxies configurados, obtener gratuitos
            if not proxy and not self.credentials.get('proxies'):
                free_proxies = self.get_free_proxies()
                if free_proxies:
                    proxy = random.choice(free_proxies)
                    logger.info(f"Usando proxy gratuito: {proxy}")
            
            if proxy:
                chrome_options.add_argument(f'--proxy-server={proxy}')
                
        try:
            driver = webdriver.Chrome(options=chrome_options)
            
            # Scripts anti-detecci√≥n
            driver.execute_script("""
                Object.defineProperty(navigator, 'webdriver', {get: () => undefined});
                window.chrome = { runtime: {} };
                Object.defineProperty(navigator, 'languages', {get: () => ['en-US', 'en']});
                Object.defineProperty(navigator, 'plugins', {get: () => [1, 2, 3, 4, 5]});
            """)
            
            return driver
            
        except WebDriverException as e:
            logger.error(f"Error creando driver: {e}")
            return None
    
    def human_delay(self, min_delay=None, max_delay=None):
        """Simula comportamiento humano con delays"""
        delays = self.credentials.get('delays', {})
        min_d = min_delay or delays.get('min_tweet_delay', 2)
        max_d = max_delay or delays.get('max_tweet_delay', 5)
        
        delay = random.uniform(min_d, max_d)
        time.sleep(delay)
    
    def human_typing(self, element, text):
        """Simula tipeo humano"""
        element.clear()
        for char in text:
            element.send_keys(char)
            time.sleep(random.uniform(0.05, 0.2))
    
    def scroll_like_human(self, driver):
        """Simula scroll humano"""
        # Scroll gradual
        scroll_pause_time = random.uniform(1, 3)
        
        for _ in range(random.randint(2, 5)):
            scroll_amount = random.randint(300, 800)
            driver.execute_script(f"window.scrollBy(0, {scroll_amount});")
            time.sleep(scroll_pause_time)
        
        # Ocasionalmente scroll hacia atr√°s
        if random.random() < 0.2:
            driver.execute_script(f"window.scrollBy(0, -{random.randint(100, 300)});")
            time.sleep(1)
    
    def login_x(self, driver, account_info):
        """Login en X.com con manejo de verificaciones"""
        try:
            # Ir a la p√°gina de login de X
            driver.get("https://x.com/i/flow/login")
            self.human_delay(3, 6)
            
            # Esperar y encontrar campo de usuario
            username_input = WebDriverWait(driver, 20).until(
                EC.presence_of_element_located((By.CSS_SELECTOR, 'input[autocomplete="username"]'))
            )
            
            # Tipeo humano del username
            self.human_typing(username_input, account_info['username'])
            self.human_delay(1, 2)
            
            # Buscar y hacer clic en "Next"
            next_button = WebDriverWait(driver, 10).until(
                EC.element_to_be_clickable((By.XPATH, '//span[contains(text(), "Next")]//ancestor::button | //button[contains(@role, "button") and contains(., "Next")]'))
            )
            next_button.click()
            self.human_delay(2, 4)
            
            # Manejar verificaci√≥n de tel√©fono si aparece
            try:
                # Verificar si aparece el campo de verificaci√≥n
                verification_input = WebDriverWait(driver, 5).until(
                    EC.presence_of_element_located((By.CSS_SELECTOR, 'input[data-testid="ocfEnterTextTextInput"]'))
                )
                
                if account_info.get('phone'):
                    self.human_typing(verification_input, account_info['phone'])
                    next_button = driver.find_element(By.XPATH, '//span[contains(text(), "Next")]//ancestor::button')
                    next_button.click()
                    self.human_delay(2, 4)
                else:
                    logger.warning("Se requiere verificaci√≥n de tel√©fono pero no est√° configurada")
                    return False
                    
            except TimeoutException:
                # No hay verificaci√≥n de tel√©fono, continuar
                pass
            
            # Campo de contrase√±a
            password_input = WebDriverWait(driver, 15).until(
                EC.presence_of_element_located((By.CSS_SELECTOR, 'input[name="password"]'))
            )
            
            self.human_typing(password_input, account_info['password'])
            self.human_delay(1, 2)
            
            # Bot√≥n de login
            login_button = WebDriverWait(driver, 10).until(
                EC.element_to_be_clickable((By.XPATH, '//span[contains(text(), "Log in")]//ancestor::button | //button[contains(., "Log in")]'))
            )
            login_button.click()
            self.human_delay(5, 10)
            
            # Verificar login exitoso - buscar elementos caracter√≠sticos de X logueado
            try:
                WebDriverWait(driver, 20).until(
                    EC.any_of(
                        EC.presence_of_element_located((By.CSS_SELECTOR, '[data-testid="SideNav_NewTweet_Button"]')),
                        EC.presence_of_element_located((By.CSS_SELECTOR, '[aria-label="Home timeline"]')),
                        EC.presence_of_element_located((By.CSS_SELECTOR, '[data-testid="primaryColumn"]'))
                    )
                )
                logger.info(f"Login exitoso para {account_info['username']}")
                return True
                
            except TimeoutException:
                logger.error(f"Login fallido para {account_info['username']}")
                return False
            
        except Exception as e:
            logger.error(f"Error en login: {e}")
            return False
    
    def extract_tweet_data(self, tweet_element):
        """Extrae datos completos de un tweet en X"""
        try:
            tweet_data = {}
            
            # ID del tweet
            try:
                tweet_links = tweet_element.find_elements(By.CSS_SELECTOR, 'a[href*="/status/"]')
                if tweet_links:
                    tweet_url = tweet_links[0].get_attribute('href')
                    tweet_id = tweet_url.split('/status/')[-1].split('?')[0]
                    tweet_data['tweet_id'] = tweet_id
                    tweet_data['url'] = tweet_url
                else:
                    return None
            except:
                return None
            
            # Texto del tweet
            try:
                text_element = tweet_element.find_element(By.CSS_SELECTOR, '[data-testid="tweetText"]')
                tweet_data['text'] = text_element.text
            except:
                tweet_data['text'] = ""
            
            # Usuario
            try:
                user_elements = tweet_element.find_elements(By.CSS_SELECTOR, '[data-testid="User-Name"] a')
                if user_elements:
                    username_href = user_elements[0].get_attribute('href')
                    username = username_href.split('/')[-1] if username_href else ""
                    tweet_data['username'] = username
                else:
                    tweet_data['username'] = ""
            except:
                tweet_data['username'] = ""
            
            # Nombre display
            try:
                display_elements = tweet_element.find_elements(By.CSS_SELECTOR, '[data-testid="User-Name"] span')
                if display_elements:
                    tweet_data['display_name'] = display_elements[0].text
                else:
                    tweet_data['display_name'] = ""
            except:
                tweet_data['display_name'] = ""
            
            # Timestamp
            try:
                time_element = tweet_element.find_element(By.CSS_SELECTOR, 'time')
                tweet_data['timestamp'] = time_element.get_attribute('datetime')
            except:
                tweet_data['timestamp'] = ""
            
            # M√©tricas (m√°s robustas)
            metrics_map = {
                'reply': 'replies',
                'retweet': 'retweets', 
                'like': 'likes',
                'bookmark': 'bookmarks'
            }
            
            for test_id, key in metrics_map.items():
                try:
                    metric_elements = tweet_element.find_elements(By.CSS_SELECTOR, f'[data-testid="{test_id}"]')
                    if metric_elements:
                        # Intentar obtener el n√∫mero del aria-label o texto
                        metric_text = metric_elements[0].get_attribute('aria-label') or metric_elements[0].text
                        # Extraer n√∫meros
                        import re
                        numbers = re.findall(r'[\d,]+', metric_text)
                        count = numbers[0].replace(',', '') if numbers else "0"
                        tweet_data[key] = count
                    else:
                        tweet_data[key] = "0"
                except:
                    tweet_data[key] = "0"
            
            # Metadatos
            tweet_data['scraped_at'] = datetime.now().isoformat()
            tweet_data['scraper_account'] = self.credentials['accounts'][self.current_account_index]['username']
            
            return tweet_data
            
        except Exception as e:
            logger.debug(f"Error extrayendo datos del tweet: {e}")
            return None
    
    def scrape_search_results(self, driver, query, max_tweets=1000):
        """Scraping de resultados de b√∫squeda en X"""
        # URL de b√∫squeda en X
        encoded_query = requests.utils.quote(query)
        search_url = f"https://x.com/search?q={encoded_query}&src=typed_query&f=live"
        
        logger.info(f"Buscando: {query}")
        driver.get(search_url)
        self.human_delay(5, 10)
        
        tweets_found = 0
        no_new_tweets_count = 0
        max_no_new = 5
        seen_tweet_ids = set()
        
        while tweets_found < max_tweets and no_new_tweets_count < max_no_new:
            # Verificar l√≠mites
            if self.tweets_scraped_this_session >= self.max_tweets_per_account:
                logger.info("L√≠mite por cuenta alcanzado")
                break
            
            # Encontrar tweets
            tweet_elements = driver.find_elements(By.CSS_SELECTOR, 'article[data-testid="tweet"]')
            
            new_tweets_batch = 0
            for tweet_element in tweet_elements:
                try:
                    tweet_data = self.extract_tweet_data(tweet_element)
                    
                    if (tweet_data and 
                        tweet_data.get('tweet_id') and 
                        tweet_data['tweet_id'] not in seen_tweet_ids):
                        
                        seen_tweet_ids.add(tweet_data['tweet_id'])
                        self.tweets_data.append(tweet_data)
                        tweets_found += 1
                        new_tweets_batch += 1
                        self.tweets_scraped_this_session += 1
                        
                        if tweets_found % 20 == 0:
                            logger.info(f"Tweets encontrados: {tweets_found}")
                        
                        if tweets_found >= max_tweets:
                            break
                            
                except Exception as e:
                    continue
            
            # Control de flujo
            if new_tweets_batch == 0:
                no_new_tweets_count += 1
                logger.debug(f"Sin tweets nuevos: {no_new_tweets_count}/{max_no_new}")
            else:
                no_new_tweets_count = 0
            
            # Scroll humano
            self.scroll_like_human(driver)
            
            # Delay entre scrolls
            delays = self.credentials.get('delays', {})
            self.human_delay(
                delays.get('min_scroll_delay', 3),
                delays.get('max_scroll_delay', 7)
            )
        
        logger.info(f"Completado '{query}': {tweets_found} tweets")
        return tweets_found

def run_scraping_pipeline(search_queries, max_tweets_per_query=300):
    """Pipeline principal de scraping"""
    scraper = XScraper()
    
    if not scraper.credentials.get('accounts'):
        logger.error("No hay cuentas configuradas")
        return []
    
    logger.info(f"Iniciando con {len(scraper.credentials['accounts'])} cuentas")
    
    for query_idx, query in enumerate(search_queries):
        logger.info(f"Query {query_idx + 1}/{len(search_queries)}: {query}")
        
        max_retries = 3
        for attempt in range(max_retries):
            driver = scraper.setup_driver()
            if not driver:
                continue
                
            try:
                # Login
                current_account = scraper.credentials['accounts'][scraper.current_account_index]
                
                if scraper.login_x(driver, current_account):
                    # Scraping
                    tweets_found = scraper.scrape_search_results(driver, query, max_tweets_per_query)
                    logger.info(f"Tweets recolectados para '{query}': {tweets_found}")
                    break
                else:
                    logger.error(f"Login fall√≥ para {current_account['username']}")
                    
            except Exception as e:
                logger.error(f"Error en intento {attempt + 1}: {e}")
                
            finally:
                if driver:
                    driver.quit()
            
            if attempt < max_retries - 1:
                time.sleep(30)  # Esperar antes de reintentar
        
        # Rotaci√≥n de cuentas
        if (query_idx + 1) % 2 == 0 and len(scraper.credentials['accounts']) > 1:
            scraper.current_account_index = (scraper.current_account_index + 1) % len(scraper.credentials['accounts'])
            scraper.tweets_scraped_this_session = 0
            logger.info(f"Cambiando a cuenta {scraper.current_account_index + 1}")
        
        # Delay entre queries
        if query_idx < len(search_queries) - 1:
            delay = scraper.credentials.get('delays', {}).get('query_delay', 120)
            logger.info(f"Esperando {delay} segundos...")
            time.sleep(delay)
    
    return scraper.tweets_data

def save_results(tweets_data, filename=None):
    """Guarda resultados en m√∫ltiples formatos"""
    if not tweets_data:
        logger.warning("No hay datos para guardar")
        return None
        
    if not filename:
        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
        filename = f"x_scraping_{timestamp}"
    
    df = pd.DataFrame(tweets_data)
    
    # Limpiar datos
    df = df.drop_duplicates(subset=['tweet_id'])
    
    # Guardar en m√∫ltiples formatos
    df.to_csv(f"{filename}.csv", index=False, encoding='utf-8-sig')
    df.to_json(f"{filename}.json", orient='records', indent=2)
    
    # Excel con an√°lisis
    with pd.ExcelWriter(f"{filename}.xlsx", engine='openpyxl') as writer:
        df.to_excel(writer, sheet_name='Todos_los_tweets', index=False)
        
        # Resumen por usuario
        if 'username' in df.columns and not df.empty:
            user_stats = df.groupby('username').agg({
                'text': 'count',
                'likes': lambda x: pd.to_numeric(x, errors='coerce').sum(),
                'retweets': lambda x: pd.to_numeric(x, errors='coerce').sum(),
                'replies': lambda x: pd.to_numeric(x, errors='coerce').sum()
            }).rename(columns={'text': 'tweet_count'})
            user_stats = user_stats.sort_values('tweet_count', ascending=False)
            user_stats.to_excel(writer, sheet_name='Resumen_usuarios')
    
    logger.info(f"Datos guardados: {filename}.*")
    logger.info(f"Total tweets √∫nicos: {len(df)}")
    
    return df

# Uso en Jupyter
if __name__ == "__main__":
    # T√©rminos de b√∫squeda para desinformaci√≥n electoral
    search_queries = [
        "fraude electoral",
        "elecciones manipuladas",
        "voto fraudulento",
        "urnas trucadas", 
        "conteo falso",
        "elecciones robadas"
    ]
    
    # Ejecutar scraping
    logger.info("=== INICIANDO SCRAPING DE X ===")
    tweets_data = run_scraping_pipeline(
        search_queries=search_queries,
        max_tweets_per_query=200
    )
    
    # Guardar resultados
    if tweets_data:
        df_results = save_results(tweets_data)
        
        # Mostrar estad√≠sticas
        print(f"\nüéØ SCRAPING COMPLETADO!")
        print(f"üìä Total tweets: {len(tweets_data)}")
        if df_results is not None and not df_results.empty:
            print(f"üë• Usuarios √∫nicos: {df_results['username'].nunique()}")
            print(f"üìÖ Per√≠odo: {df_results['timestamp'].min()} - {df_results['timestamp'].max()}")
            
            # Muestra de datos
            print(f"\nüìù Muestra de tweets:")
            print(df_results[['username', 'text', 'likes', 'retweets']].head())
    else:
        logger.error("‚ùå No se recolectaron datos")

INFO:__main__:=== INICIANDO SCRAPING DE X ===
INFO:__main__:Iniciando con 2 cuentas
INFO:__main__:Query 1/6: fraude electoral
INFO:__main__:Usando proxy gratuito: http://8.221.141.88:80
ERROR:__main__:Error en login: Message: 
Stacktrace:
	GetHandleVerifier [0x0x7ff6f2c4e995+80021]
	GetHandleVerifier [0x0x7ff6f2c4e9f0+80112]
	(No symbol) [0x0x7ff6f29d060f]
	(No symbol) [0x0x7ff6f2a28854]
	(No symbol) [0x0x7ff6f2a28b1c]
	(No symbol) [0x0x7ff6f2a7c927]
	(No symbol) [0x0x7ff6f2a5126f]
	(No symbol) [0x0x7ff6f2a7968a]
	(No symbol) [0x0x7ff6f2a51003]
	(No symbol) [0x0x7ff6f2a195d1]
	(No symbol) [0x0x7ff6f2a1a3f3]
	GetHandleVerifier [0x0x7ff6f2f0dd4d+2960461]
	GetHandleVerifier [0x0x7ff6f2f0800a+2936586]
	GetHandleVerifier [0x0x7ff6f2f28a47+3070279]
	GetHandleVerifier [0x0x7ff6f2c6847e+185214]
	GetHandleVerifier [0x0x7ff6f2c6fecf+216527]
	GetHandleVerifier [0x0x7ff6f2c57bd4+117460]
	GetHandleVerifier [0x0x7ff6f2c57d8f+117903]
	GetHandleVerifier [0x0x7ff6f2c3dc68+11112]
	BaseThreadInitThunk [0

In [None]:
import json
from typing import Dict

import jmespath
from nested_lookup import nested_lookup
from scrapfly import ScrapflyClient, ScrapeConfig

SCRAPFLY = ScrapflyClient(key="YOUR SCRAPFLY KEY")


def parse_thread(data: Dict) -> Dict:
    """Parse Twitter tweet JSON dataset for the most important fields"""
    result = jmespath.search(
        """{
        text: post.caption.text,
        published_on: post.taken_at,
        id: post.id,
        pk: post.pk,
        code: post.code,
        username: post.user.username,
        user_pic: post.user.profile_pic_url,
        user_verified: post.user.is_verified,
        user_pk: post.user.pk,
        user_id: post.user.id,
        has_audio: post.has_audio,
        reply_count: post.text_post_app_info.direct_reply_count,
        like_count: post.like_count,
        images: post.carousel_media[].image_versions2.candidates[1].url,
        image_count: post.carousel_media_count,
        videos: post.video_versions[].url
    }""",
        data,
    )
    result["videos"] = list(set(result["videos"] or []))
    if result["reply_count"] and type(result["reply_count"]) != int:
        result["reply_count"] = int(result["reply_count"].split(" ")[0])
    result[
        "url"
    ] = f"https://www.threads.net/@{result['username']}/post/{result['code']}"
    return result


async def scrape_thread(url: str) -> dict:
    """Scrape Threads post and replies from a given URL"""
    _xhr_calls = []
    result = await SCRAPFLY.async_scrape(
        ScrapeConfig(
            url,
            asp=True,  # enables scraper blocking bypass if any
            country="US",  # use US IP address as threads is only available in select countries
        )
    )
    hidden_datasets = result.selector.css(
        'script[type="application/json"][data-sjs]::text'
    ).getall()
    # find datasets that contain threads data
    for hidden_dataset in hidden_datasets:
        # skip loading datasets that clearly don't contain threads data
        if '"ScheduledServerJS"' not in hidden_dataset:
            continue
        if "thread_items" not in hidden_dataset:
            continue
        data = json.loads(hidden_dataset)
        # datasets are heavily nested, use nested_lookup to find
        # the thread_items key for thread data
        thread_items = nested_lookup("thread_items", data)
        if not thread_items:
            continue
        # use our jmespath parser to reduce the dataset to the most important fields
        threads = [parse_thread(t) for thread in thread_items for t in thread]
        return {
            "thread": threads[0],
            "replies": threads[1:],
        }
    raise ValueError("could not find thread data in page")


# Example use:
if __name__ == "__main__":
    import asyncio
    print(asyncio.run(scrape_thread("https://www.threads.net/t/C8H5FiCtESk")))

RuntimeError: asyncio.run() cannot be called from a running event loop

: 

## Las celdas deben estar comentadas!
## Deben haber graficos!!

# Preguntas orientadoras:
- ¬øQu√© candidatos aparecen m√°s vinculados a comunidades de desinformaci√≥n?
- ¬øQu√© t√©rminos o narrativas destacan en las comunidades y qu√© sesgos reflejan?
- ¬øQu√© patrones de conexi√≥n entre comunidades ayudan a explicar la propagaci√≥n de narrativas falsas?
- ¬øQu√© diferencias se observan entre comunidades dominadas por humanos y aquellas potenciadas por bots?


# Discusi√≥n

***La discusi√≥n debe estar conectada con las preguntas orientadoras***