# YouTube Analysis for Multidimensional Poverty Classification in Mexico

This implementation creates a text classification system to analyze YouTube comments and categorize content according to multidimensional poverty dimensions. The analysis follows CONEVAL's (Consejo Nacional de Evaluación de la Política de Desarrollo Social) framework with adaptations for real-time social media data.

We examine seven key dimensions of multidimensional poverty:

- **Income**: Employment status, wages, economic instability, unemployment
- **Access to Health Services**: Healthcare availability, medical infrastructure, health insurance
- **Educational Lag**: School dropout rates, educational access, academic delays
- **Access to Social Security**: Labor protection, social benefits, pension systems
- **Housing**: Living conditions, basic utilities (water, electricity), housing quality
- **Access to Food**: Food security, nutrition, food prices, hunger
- **Social Cohesion**: Community integration, discrimination, social exclusion, belonging

## Technical Methodology

### 1. Data Collection

**Search Parameters:**
- **Temporal Scope**: Full year analysis (2022: January 1 - December 31)
- **Geographic Coverage**: All 32 Mexican states
- **Search Terms**: State name + ["noticias", "news", "economía"] (3 queries per state)
- **Volume Limits**: 100 videos per query, 300 comments per video
- **Language Priority**: Spanish content prioritized via `relevanceLanguage="es"`

### 2. Text Preprocessing

**Preprocessing Steps:**
1. **HTML/Markup Removal**: Strip HTML tags and web links
2. **Character Normalization**: Preserve only alphanumeric + Spanish accented characters
3. **Whitespace Normalization**: Remove extra spaces, convert to lowercase
4. **Length Filtering**: Exclude texts shorter than 10 characters

### 3. Embedding and Classification

**Embedding Generation:**
- **Model**: `paraphrase-multilingual-MiniLM-L12-v2` (768-dimensional embeddings)
- **Language Support**: Optimized for Spanish, English, and mixed-language content
- **Dimension Preprocessing**: Convert keyword lists to normalized text phrases for embedding

**Classification Logic:**
1. Generate embeddings for both input text and poverty dimension definitions
2. Calculate cosine similarity between text and all dimension embeddings
3. Assign text to highest-scoring dimension if score ≥ 0.10 threshold
4. Classify as "OTHER" if below threshold (filters non-poverty content)

### 4. Sentiment Analysis 

**Model**: `nlptown/bert-base-multilingual-uncased-sentiment`
- **Input**: Raw text (max 512 tokens with truncation)
- **Output**: 5-star rating scale converted to normalized sentiment
- **Normalization**: 1 star = -1.0, 3 stars = 0.0, 5 stars = +1.0

### 5. Extracted Components

**State-Level Metrics:**
- **Dimension Coverage**: Percentage of comments per poverty dimension
- **Conditional Sentiment**: Average sentiment score per dimension per state
- **General Statistics**: Total videos and comments analyzed

In [None]:
# load necessary libraries
import pandas as pd
import numpy as np
import os
import re
import json
from datetime import datetime
from googleapiclient.discovery import build
from time import sleep
from dotenv import load_dotenv
from sentence_transformers import SentenceTransformer, util
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
from tqdm import tqdm

# load environment variables from .env file
load_dotenv()
YT_API_KEY = os.getenv("YT_API_KEY")

In [None]:
# mapping of Mexican states with their corresponding search terms
STATES_SEARCH_TERMS = {
    "Aguascalientes": ["Aguascalientes noticias", "Aguascalientes news", "Aguascalientes economía"],
    "Baja California": ["Baja California noticias", "Baja California news", "Baja California economía"],
    "Baja California Sur": ["Baja California Sur noticias", "Baja California Sur news", "Baja California Sur economía"],
    "Campeche": ["Campeche noticias", "Campeche news", "Campeche economía"],
    "Chiapas": ["Chiapas noticias", "Chiapas news", "Chiapas economía"],
    "Chihuahua": ["Chihuahua noticias", "Chihuahua news", "Chihuahua economía"],
    "Ciudad de México": ["Ciudad de México noticias", "Ciudad de México news", "Ciudad de México economía"],
    "Coahuila": ["Coahuila noticias", "Coahuila news", "Coahuila economía"],
    "Colima": ["Colima noticias", "Colima news", "Colima economía"],
    "Durango": ["Durango noticias", "Durango news", "Durango economía"],
    "Estado de México": ["Estado de México noticias", "Estado de México news", "Estado de México economía"],
    "Guanajuato": ["Guanajuato noticias", "Guanajuato news", "Guanajuato economía"],
    "Guerrero": ["Guerrero noticias", "Guerrero news", "Guerrero economía"],
    "Hidalgo": ["Hidalgo noticias", "Hidalgo news", "Hidalgo economía"],
    "Jalisco": ["Jalisco noticias", "Jalisco news", "Jalisco economía"],
    "Michoacán": ["Michoacán noticias", "Michoacán news", "Michoacán economía"],
    "Morelos": ["Morelos noticias", "Morelos news", "Morelos economía"],
    "Nayarit": ["Nayarit noticias", "Nayarit news", "Nayarit economía"],
    "Nuevo León": ["Nuevo León noticias", "Nuevo León news", "Nuevo León economía"],
    "Oaxaca": ["Oaxaca noticias", "Oaxaca news", "Oaxaca economía"],
    "Puebla": ["Puebla noticias", "Puebla news", "Puebla economía"],
    "Querétaro": ["Querétaro noticias", "Querétaro news", "Querétaro economía"],
    "Quintana Roo": ["Quintana Roo noticias", "Quintana Roo news", "Quintana Roo economía"],
    "San Luis Potosí": ["San Luis Potosí noticias", "San Luis Potosí news", "San Luis Potosí economía"],
    "Sinaloa": ["Sinaloa noticias", "Sinaloa news", "Sinaloa economía"],
    "Sonora": ["Sonora noticias", "Sonora news", "Sonora economía"],
    "Tabasco": ["Tabasco noticias", "Tabasco news", "Tabasco economía"],
    "Tamaulipas": ["Tamaulipas noticias", "Tamaulipas news", "Tamaulipas economía"],
    "Tlaxcala": ["Tlaxcala noticias", "Tlaxcala news", "Tlaxcala economía"],
    "Veracruz": ["Veracruz noticias", "Veracruz news", "Veracruz economía"],
    "Yucatán": ["Yucatán noticias", "Yucatán news", "Yucatán economía"],
    "Zacatecas": ["Zacatecas noticias", "Zacatecas news", "Zacatecas economía"]}

# Poverty dimension definitions with keywords. Each dimension contains a mix of formal Spanish terms, 
# Mexican slang, and English words to capture the diverse jargon used in YouTube comments
POVERTY_DIMENSIONS = {
    "INCOME": """
    empleo trabajo salario ingresos dinero economía sueldo ahorro impuestos
    chamba lana nómina billete jale job salary income money
    """,
    
    "ACCESS TO HEALTH SERVICES": """
    salud médico hospital medicina tratamiento atención clínica seguro
    sistema de salud servicios médicos doctor cuidado ir al doctor health insurance
    seguro médico doctor particular ir a consulta healthcare medical treatment 
    """,
    
    "EDUCATIONAL LAG": """
    educación escuela universidad maestro estudiante aprendizaje escuela pública
    clases formación conocimiento título bachillerato preparatoria escuela secundaria
    """,
    
    "ACCESS TO SOCIAL SECURITY": """
    seguridad social pensión jubilación contrato derechos laborales
    prestaciones protección IMSS ISSSTE afore finiquito ahorro para retiro
    cotizar retirement benefits social security worker rights informal job
    """,
    
    "HOUSING": """
    vivienda casa habitación hogar alquiler renta depa housing utilities
    servicios agua luz gas electricidad construcción propiedad rent 
    techo colonia vecindario urbanización asentamiento cuartito mortgage
    """,
    
    "ACCESS TO FOOD": """
    alimentación comida nutrición alimentos dieta cocinar recetas
    canasta básica food security nutrition meal groceries
    comida saludable dieta balanceada comida rápida comida chatarra
    """,
    
    "SOCIAL COHESION": """
    comunidad sociedad integración participación convivencia barrio raza community
    respeto diversidad solidaridad inclusión pertenencia 
    vecinos apoyo redes sociales confianza belonging inclusion
    """}

In [None]:
# confidence threshold: comments with similarity scores below this threshold will be classified as 'OTHER'
MIN_DIMENSION_CONFIDENCE = 0.10

# define constants for YouTube API usage
MAX_VIDEOS_PER_SEARCH = 100  
MAX_COMMENTS_PER_VIDEO = 300  
API_SLEEP_TIME = 0.5  

### SimpleTextProcessor
**Purpose**: Handles text preprocessing, dimension classification, and sentiment analysis

**Key Methods:**
- `clean_text()`: Normalizes and cleans input text
- `classify_dimension()`: Assigns text to poverty dimensions using embeddings
- `get_sentiment_score()`: Computes normalized sentiment scores

In [None]:
#  preprocess the text, classify into dimensions, compute conditional sentiment score. 
class SimpleTextProcessor:
    def __init__(self):
        # initialize the multilingual sentence transformer. This model works well with Spanish, English, and mixed-language content.
        self.embedder = SentenceTransformer("paraphrase-multilingual-MiniLM-L12-v2")
        
        # load pre-trained sentiment analysis model. This model outputs a score between 1 and 5, which we normalize to the scale -1 to 1.
        self.tokenizer = AutoTokenizer.from_pretrained("nlptown/bert-base-multilingual-uncased-sentiment")
        self.model = AutoModelForSequenceClassification.from_pretrained("nlptown/bert-base-multilingual-uncased-sentiment")
        
        # prepare dimension names and their corresponding keyword phrases
        self.dimension_names = list(POVERTY_DIMENSIONS.keys())
        self.dimension_texts = []
        
        # clean formatting: remove extra spaces and handle indentation
        for keywords in POVERTY_DIMENSIONS.values():
            word_list = keywords.strip().split()
            phrase = " ".join(word_list)
            self.dimension_texts.append(phrase)
        
        # pre-compute embeddings for all poverty dimensions
        self.dimension_embeddings = self.embedder.encode(self.dimension_texts, convert_to_tensor=True)


    def clean_text(self, text):
        # remove HTML tags 
        text = re.sub(r'<.*?>', ' ', text)
        
        # remove URLs and links
        text = re.sub(r'http\S+', '', text)
        
        # keep only word characters, spaces, and Spanish accented characters
        text = re.sub(r'[^\w\sáéíóúüñÁÉÍÓÚÜÑ]', ' ', text)
        
        # normalize whitespace and convert to lowercase
        return re.sub(r'\s+', ' ', text).strip().lower()


    def classify_dimension(self, text):
        if not text:
            return "OTHER", 0.0
        
        # generate embedding for the input text
        embedding = self.embedder.encode(text, convert_to_tensor=True)
        
        # calculate cosine similarity with all dimension embeddings
        cosine_scores = util.cos_sim(embedding, self.dimension_embeddings)[0]
        
        # find the dimension with highest similarity
        max_idx = torch.argmax(cosine_scores).item()
        max_score = cosine_scores[max_idx].item()
        
        # apply confidence threshold to filter out irrelevant content - assign "OTHER" if score is too low
        if max_score < MIN_DIMENSION_CONFIDENCE:
            return "OTHER", max_score
        
        return self.dimension_names[max_idx], max_score

    def get_sentiment_score(self, text):
        if not text:
            return 0.0
        
        # tokenize input with truncation to handle long texts
        inputs = self.tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
        
        # get model predictions without gradient computation
        with torch.no_grad():
            outputs = self.model(**inputs)
        
        # convert logits to star rating (1-5 stars)
        stars = torch.argmax(outputs.logits, dim=1).item() + 1
        
        # normalize to sentiment score: 1 star = -1, 3 stars = 0, 5 stars = 1
        return (stars - 3) / 2

### SimpleYouTubeAnalyzer  
**Purpose**: Manages YouTube API interactions and orchestrates analysis pipeline

**Key Methods:**
- `search_videos()`: Retrieves videos based on search terms and date range
- `get_video_comments()`: Extracts comments from individual videos
- `analyze_state_by_keywords()`: Processes complete state analysis workflow

In [None]:
# handle YouTube API interactions and content analysis. 
class SimpleYouTubeAnalyzer:
    def __init__(self, api_key):
        self.api_key = api_key
        self.youtube = build("youtube", "v3", developerKey=api_key)
        self.processor = SimpleTextProcessor()

    # search for videos on YouTube based on a query and date range
    def search_videos(self, query, published_after, published_before, max_results=100):
        videos = []
        next_page_token = None
        
        try:
            while len(videos) < max_results:
                # request videos from YouTube API with pagination
                response = self.youtube.search().list(
                    q=query,
                    part="snippet",
                    maxResults=min(50, max_results - len(videos)),  # API limit is 50 per request
                    pageToken=next_page_token,
                    type="video",
                    order="relevance",
                    publishedAfter=published_after,
                    publishedBefore=published_before,
                    relevanceLanguage="es"  # prioritize Spanish content
                ).execute()
                
                # extract video information from API response
                for item in response.get("items", []):
                    if item["id"]["kind"] == "youtube#video":
                        videos.append({
                            "id": item["id"]["videoId"],
                            "title": item["snippet"]["title"],
                            "description": item["snippet"].get("description", ""),
                            "published_at": item["snippet"]["publishedAt"]
                        })
                
                # check if more pages are available
                next_page_token = response.get("nextPageToken")
                if not next_page_token or len(videos) >= max_results:
                    break
                
                # small pause to avoid quota issues
                sleep(API_SLEEP_TIME)
                
        except Exception as e:
            print(f"Error searching for '{query}': {e}")
        
        print(f"Found {len(videos)} videos for query '{query}'")
        return videos

    # retrieve comments for a specific video
    def get_video_comments(self, video_id, max_comments=300):
        comments = []
        next_page_token = None
        
        try:
            while len(comments) < max_comments:
                # request comments with pagination support
                response = self.youtube.commentThreads().list(
                    part="snippet",
                    videoId=video_id,
                    maxResults=min(100, max_comments - len(comments)),  # API limit is 100 per request
                    pageToken=next_page_token
                ).execute()
                
                # extract comment text from API response
                for item in response.get("items", []):
                    comment_text = item["snippet"]["topLevelComment"]["snippet"]["textDisplay"]
                    comments.append(comment_text)
                
                # check for additional pages
                next_page_token = response.get("nextPageToken")
                if not next_page_token or len(comments) >= max_comments:
                    break
                
                # pause to avoid quota issues 
                sleep(API_SLEEP_TIME)
                
        except Exception as e:
            # hanlde videos that have disabled comments
            pass
        
        return comments

    def analyze_state_by_keywords(self, state_name, search_terms, date_range):
        print(f"\nAnalyzing {state_name}...")
        
        # initialize statistics tracking for all categories and 'OTHER'
        all_categories = list(POVERTY_DIMENSIONS.keys()) + ["OTHER"]
        dimension_stats = {cat: {"sentiment_sum": 0.0, "count": 0} for cat in all_categories}
        
        total_videos = 0
        total_comments = 0
        classification_stats = {cat: 0 for cat in all_categories}
        
        # process each search term for the current state
        for search_term in search_terms:
            print(f"  Searching for '{search_term}'...")
            
            # get relevant videos for this search term
            videos = self.search_videos(
                query=search_term,
                published_after=date_range["published_after"],
                published_before=date_range["published_before"],
                max_results=MAX_VIDEOS_PER_SEARCH
            )
            
            if not videos:
                continue
                
            total_videos += len(videos)
            
            # process each video and its comments
            for video in tqdm(videos, desc=f"Processing videos for '{search_term}'"):
                # extract comments from the current video
                comments = self.get_video_comments(video["id"], MAX_COMMENTS_PER_VIDEO)
                total_comments += len(comments)
                
                # combine video metadata with comments
                all_texts = [video["title"] + ". " + video["description"]] + comments
                
                # analyze each piece of text individually
                for text in all_texts:
                    clean = self.processor.clean_text(text)
                    
                    # skip very short texts 
                    if len(clean) < 10:
                        continue
                    
                    # classify text into poverty dimensions or 'OTHER'
                    category, confidence = self.processor.classify_dimension(clean)
                    
                    # update classification statistics for reporting
                    classification_stats[category] += 1
                    
                    # calculate sentiment score for all classified texts
                    sentiment = self.processor.get_sentiment_score(clean)
                    dimension_stats[category]["sentiment_sum"] += sentiment
                    dimension_stats[category]["count"] += 1
        
        # print classification statistics for this state
        total_texts = sum(classification_stats.values())
        print(f"  Classification statistics for {state_name}:")
        for category, count in classification_stats.items():
            percentage = (count / total_texts * 100) if total_texts > 0 else 0
            print(f"    {category}: {count} texts ({percentage:.1f}%)")
        
        print(f"  Analyzed {total_videos} videos and {total_comments} comments for {state_name}")
        return dimension_stats, total_videos, total_comments, classification_stats

## Output Files

Results are saved as CSV files in `yt_data_2022/` directory:
- One file per state: `{state_name}.csv`
- Aggregated statistics across all dimensions

In [None]:
#  main execution function that processes all Mexican states
def analyze_all_states_simple():
    # initialize the YouTube analyzer with API credentials
    analyzer = SimpleYouTubeAnalyzer(YT_API_KEY)
    
    # define the analysis time period (2022 full year in this case)
    date_range = {
        "published_after": "2022-01-01T00:00:00Z",
        "published_before": "2022-12-31T23:59:59Z"
    }
    
    # create output directory for results
    os.makedirs("yt_data_2022", exist_ok=True)
    
    # initialize lists for aggregated results
    all_results = []
    overall_classification_stats = {}
    
    # process each Mexican state individually
    for state, search_terms in STATES_SEARCH_TERMS.items():
        # analyze the current state using its specific search terms
        stats, total_videos, total_comments, classification_stats = analyzer.analyze_state_by_keywords(
            state_name=state,
            search_terms=search_terms,
            date_range=date_range
        )
        
        # accumulate classification statistics across all states
        for category, count in classification_stats.items():
            overall_classification_stats[category] = overall_classification_stats.get(category, 0) + count
        
        # prepare structured data for this state
        df_rows = []
        for category, v in stats.items():
            df_rows.append({
                "state": state,
                "dimension": category.replace("_", " ").title(),
                "avg_sentiment": v["sentiment_sum"] / v["count"] if v["count"] > 0 else 0,
                "mentions_count": v["count"],
                "percentage_of_total": (v["count"] / sum([s["count"] for s in stats.values()]) * 100) if sum([s["count"] for s in stats.values()]) > 0 else 0,
                "videos_analyzed": total_videos,
                "comments_analyzed": total_comments})
        
        # create DataFrame for this state's results
        df = pd.DataFrame(df_rows)
        
        # save state-specific results to CSV file
        output_file = f"yt_data_2022/{state.replace(' ', '_').lower()}.csv"
        df.to_csv(output_file, index=False)
        print(f"Saved results to {output_file}")
        
        # add to aggregated results collection
        all_results.append(df)

if __name__ == "__main__":
    analyze_all_states_simple()




Analyzing Sinaloa...
  Searching for 'Sinaloa noticias'...
Found 100 videos for query 'Sinaloa noticias'


Processing videos for 'Sinaloa noticias': 100%|██████████| 100/100 [03:18<00:00,  1.98s/it]


  Searching for 'Sinaloa news'...
Found 100 videos for query 'Sinaloa news'


Processing videos for 'Sinaloa news': 100%|██████████| 100/100 [05:45<00:00,  3.46s/it]


  Searching for 'Sinaloa economía'...
Found 100 videos for query 'Sinaloa economía'


Processing videos for 'Sinaloa economía': 100%|██████████| 100/100 [01:58<00:00,  1.18s/it]


  Classification statistics for Sinaloa:
    INCOME: 4119 texts (38.9%)
    ACCESS TO HEALTH SERVICES: 668 texts (6.3%)
    EDUCATIONAL LAG: 1798 texts (17.0%)
    ACCESS TO SOCIAL SECURITY: 285 texts (2.7%)
    HOUSING: 562 texts (5.3%)
    ACCESS TO FOOD: 647 texts (6.1%)
    SOCIAL COHESION: 979 texts (9.2%)
    OTHER: 1528 texts (14.4%)
  Analyzed 300 videos and 11108 comments for Sinaloa
Saved results to yt_data_2022/sinaloa.csv

Analyzing Sonora...
  Searching for 'Sonora noticias'...
Found 100 videos for query 'Sonora noticias'


Processing videos for 'Sonora noticias': 100%|██████████| 100/100 [02:32<00:00,  1.53s/it]


  Searching for 'Sonora news'...
Found 100 videos for query 'Sonora news'


Processing videos for 'Sonora news': 100%|██████████| 100/100 [04:41<00:00,  2.81s/it]


  Searching for 'Sonora economía'...
Found 100 videos for query 'Sonora economía'


Processing videos for 'Sonora economía': 100%|██████████| 100/100 [00:25<00:00,  3.98it/s]


  Classification statistics for Sonora:
    INCOME: 2774 texts (34.7%)
    ACCESS TO HEALTH SERVICES: 436 texts (5.5%)
    EDUCATIONAL LAG: 1517 texts (19.0%)
    ACCESS TO SOCIAL SECURITY: 370 texts (4.6%)
    HOUSING: 288 texts (3.6%)
    ACCESS TO FOOD: 456 texts (5.7%)
    SOCIAL COHESION: 847 texts (10.6%)
    OTHER: 1303 texts (16.3%)
  Analyzed 300 videos and 8216 comments for Sonora
Saved results to yt_data_2022/sonora.csv

Analyzing Tabasco...
  Searching for 'Tabasco noticias'...
Found 100 videos for query 'Tabasco noticias'


Processing videos for 'Tabasco noticias': 100%|██████████| 100/100 [01:48<00:00,  1.09s/it]


  Searching for 'Tabasco news'...
Found 100 videos for query 'Tabasco news'


Processing videos for 'Tabasco news': 100%|██████████| 100/100 [04:31<00:00,  2.71s/it]


  Searching for 'Tabasco economía'...
Found 100 videos for query 'Tabasco economía'


Processing videos for 'Tabasco economía': 100%|██████████| 100/100 [02:11<00:00,  1.32s/it]


  Classification statistics for Tabasco:
    INCOME: 3455 texts (41.4%)
    ACCESS TO HEALTH SERVICES: 474 texts (5.7%)
    EDUCATIONAL LAG: 1393 texts (16.7%)
    ACCESS TO SOCIAL SECURITY: 209 texts (2.5%)
    HOUSING: 561 texts (6.7%)
    ACCESS TO FOOD: 570 texts (6.8%)
    SOCIAL COHESION: 598 texts (7.2%)
    OTHER: 1083 texts (13.0%)
  Analyzed 300 videos and 8406 comments for Tabasco
Saved results to yt_data_2022/tabasco.csv

Analyzing Tamaulipas...
  Searching for 'Tamaulipas noticias'...
Found 100 videos for query 'Tamaulipas noticias'


Processing videos for 'Tamaulipas noticias': 100%|██████████| 100/100 [02:17<00:00,  1.37s/it]


  Searching for 'Tamaulipas news'...
Found 100 videos for query 'Tamaulipas news'


Processing videos for 'Tamaulipas news': 100%|██████████| 100/100 [05:57<00:00,  3.57s/it]


  Searching for 'Tamaulipas economía'...
Found 100 videos for query 'Tamaulipas economía'


Processing videos for 'Tamaulipas economía': 100%|██████████| 100/100 [02:52<00:00,  1.72s/it]


  Classification statistics for Tamaulipas:
    INCOME: 4403 texts (39.5%)
    ACCESS TO HEALTH SERVICES: 571 texts (5.1%)
    EDUCATIONAL LAG: 1949 texts (17.5%)
    ACCESS TO SOCIAL SECURITY: 340 texts (3.0%)
    HOUSING: 475 texts (4.3%)
    ACCESS TO FOOD: 838 texts (7.5%)
    SOCIAL COHESION: 1218 texts (10.9%)
    OTHER: 1356 texts (12.2%)
  Analyzed 300 videos and 11411 comments for Tamaulipas
Saved results to yt_data_2022/tamaulipas.csv

Analyzing Tlaxcala...
  Searching for 'Tlaxcala noticias'...
Found 100 videos for query 'Tlaxcala noticias'


Processing videos for 'Tlaxcala noticias': 100%|██████████| 100/100 [01:17<00:00,  1.30it/s]


  Searching for 'Tlaxcala news'...
Found 100 videos for query 'Tlaxcala news'


Processing videos for 'Tlaxcala news': 100%|██████████| 100/100 [02:06<00:00,  1.27s/it]


  Searching for 'Tlaxcala economía'...
Found 100 videos for query 'Tlaxcala economía'


Processing videos for 'Tlaxcala economía': 100%|██████████| 100/100 [01:53<00:00,  1.13s/it]


  Classification statistics for Tlaxcala:
    INCOME: 1715 texts (34.8%)
    ACCESS TO HEALTH SERVICES: 345 texts (7.0%)
    EDUCATIONAL LAG: 969 texts (19.7%)
    ACCESS TO SOCIAL SECURITY: 113 texts (2.3%)
    HOUSING: 148 texts (3.0%)
    ACCESS TO FOOD: 360 texts (7.3%)
    SOCIAL COHESION: 685 texts (13.9%)
    OTHER: 587 texts (11.9%)
  Analyzed 300 videos and 4850 comments for Tlaxcala
Saved results to yt_data_2022/tlaxcala.csv

Analyzing Veracruz...
  Searching for 'Veracruz noticias'...
Found 100 videos for query 'Veracruz noticias'


Processing videos for 'Veracruz noticias': 100%|██████████| 100/100 [01:45<00:00,  1.06s/it]


  Searching for 'Veracruz news'...
Found 100 videos for query 'Veracruz news'


Processing videos for 'Veracruz news': 100%|██████████| 100/100 [03:52<00:00,  2.32s/it]


  Searching for 'Veracruz economía'...
Found 100 videos for query 'Veracruz economía'


Processing videos for 'Veracruz economía': 100%|██████████| 100/100 [05:03<00:00,  3.04s/it]


  Classification statistics for Veracruz:
    INCOME: 3760 texts (37.7%)
    ACCESS TO HEALTH SERVICES: 582 texts (5.8%)
    EDUCATIONAL LAG: 1946 texts (19.5%)
    ACCESS TO SOCIAL SECURITY: 225 texts (2.3%)
    HOUSING: 559 texts (5.6%)
    ACCESS TO FOOD: 703 texts (7.1%)
    SOCIAL COHESION: 929 texts (9.3%)
    OTHER: 1262 texts (12.7%)
  Analyzed 300 videos and 10213 comments for Veracruz
Saved results to yt_data_2022/veracruz.csv

Analyzing Yucatán...
  Searching for 'Yucatán noticias'...
Found 100 videos for query 'Yucatán noticias'


Processing videos for 'Yucatán noticias': 100%|██████████| 100/100 [00:58<00:00,  1.70it/s]


  Searching for 'Yucatán news'...
Found 100 videos for query 'Yucatán news'


Processing videos for 'Yucatán news': 100%|██████████| 100/100 [04:07<00:00,  2.48s/it]


  Searching for 'Yucatán economía'...
Found 100 videos for query 'Yucatán economía'


Processing videos for 'Yucatán economía': 100%|██████████| 100/100 [01:05<00:00,  1.52it/s]


  Classification statistics for Yucatán:
    INCOME: 1578 texts (30.7%)
    ACCESS TO HEALTH SERVICES: 302 texts (5.9%)
    EDUCATIONAL LAG: 972 texts (18.9%)
    ACCESS TO SOCIAL SECURITY: 126 texts (2.5%)
    HOUSING: 737 texts (14.4%)
    ACCESS TO FOOD: 321 texts (6.3%)
    SOCIAL COHESION: 522 texts (10.2%)
    OTHER: 575 texts (11.2%)
  Analyzed 300 videos and 5027 comments for Yucatán
Saved results to yt_data_2022/yucatán.csv

Analyzing Zacatecas...
  Searching for 'Zacatecas noticias'...
Found 100 videos for query 'Zacatecas noticias'


Processing videos for 'Zacatecas noticias': 100%|██████████| 100/100 [03:51<00:00,  2.32s/it]


  Searching for 'Zacatecas news'...
Found 100 videos for query 'Zacatecas news'


Processing videos for 'Zacatecas news': 100%|██████████| 100/100 [11:16<00:00,  6.77s/it]


  Searching for 'Zacatecas economía'...
Found 100 videos for query 'Zacatecas economía'


Processing videos for 'Zacatecas economía': 100%|██████████| 100/100 [04:31<00:00,  2.71s/it]

  Classification statistics for Zacatecas:
    INCOME: 6907 texts (38.0%)
    ACCESS TO HEALTH SERVICES: 1122 texts (6.2%)
    EDUCATIONAL LAG: 2963 texts (16.3%)
    ACCESS TO SOCIAL SECURITY: 537 texts (3.0%)
    HOUSING: 708 texts (3.9%)
    ACCESS TO FOOD: 1211 texts (6.7%)
    SOCIAL COHESION: 2445 texts (13.4%)
    OTHER: 2293 texts (12.6%)
  Analyzed 300 videos and 18448 comments for Zacatecas
Saved results to yt_data_2022/zacatecas.csv



