# 📒 Cuadernito: Reddit (API oficial) + análisis de **apoyo / no apoyo** en comentarios

Este notebook usa la **API oficial de Reddit con OAuth (PRAW)** para descargar **posts** y **comentarios** de un subreddit, y luego estima **apoyo vs. no apoyo** a partir del **sentimiento** de los comentarios (VADER) y pequeñas **heurísticas** de acuerdo/desacuerdo.

**Qué te entrega por post:**  
- **Título / selftext**  
- **Imágenes** (si la URL principal del post es imagen)  
- **Score (upvotes agregados)** y **# comentarios**  
- **Comentarios** (texto, autor, score, fecha)  
- **Métricas de apoyo**: proporción de comentarios positivos/negativos (VADER), conteo de “acuerdo”/“desacuerdo” por palabras clave, y un **SupportIndex** (0–1).

> **Fuentes**: PRAW (wrapper de la API de Reddit) y OAuth docs; VADER para sentimiento; Hugging Face *pipeline* opcional.  
> Usa con responsabilidad y respeta límites/ToS. Consulta cabeceras `X-Ratelimit-*` para no excederte.


## 📦 Instalación rápida

In [1]:
# Ejecutá esta celda una sola vez en tu entorno local
%pip -q install praw pandas tqdm vaderSentiment nltk transformers --upgrade
# (Opcional) descarga de recursos NLTK si quisieras usar otras herramientas
import nltk
try:
    nltk.data.find('tokenizers/punkt')
except LookupError:
    nltk.download('punkt')



[notice] A new release of pip is available: 25.0.1 -> 25.2
[notice] To update, run: python.exe -m pip install --upgrade pip


Note: you may need to restart the kernel to use updated packages.


## ⚙️ Configuración

In [None]:
from dataclasses import dataclass, asdict
from typing import List, Optional, Tuple
from pathlib import Path
import time, re, json
import pandas as pd
from tqdm import tqdm

# === Editá estos valores ===
SUBREDDIT         = "USCIS"    # <-- el subreddit
SORT              = "top"      # "hot" | "new" | "top" | "rising"
TIME_FILTER       = "year"     # si SORT="top", usar: "day"|"week"|"month"|"year"|"all"
MIN_POSTS         = 20         # mínimo a obtener
MAX_POSTS         = 200        # máximo a obtener
MAX_COMMENTS_PER_POST = 80     # tope de comentarios por post
DOWNLOAD_IMAGES   = False      # descarga local de imágenes si la URL lo es
REQUEST_SLEEP_S   = 0.7        # respeta rate limit
OUT_DIR           = Path("./reddit_api_output")

OUT_DIR.mkdir(parents=True, exist_ok=True)
(OUT_DIR / "images").mkdir(parents=True, exist_ok=True)

@dataclass
class PostRow:
    post_id: str
    title: Optional[str]
    author: Optional[str]
    score: Optional[int]
    num_comments: Optional[int]
    created: Optional[str]
    permalink: str
    is_self: bool
    image_urls: List[str]
    selftext: Optional[str]
    subreddit: Optional[str]
    # métricas de apoyo a nivel post (resumen de comentarios)
    comments_total: int = 0
    comments_pos: int = 0
    comments_neg: int = 0
    comments_neu: int = 0
    agree_hits: int = 0
    disagree_hits: int = 0
    support_index: float = 0.0

@dataclass
class CommentRow:
    post_id: str
    comment_id: str
    author: Optional[str]
    created: Optional[str]
    score: Optional[int]
    body: str
    vader_compound: Optional[float]
    sentiment_label: Optional[str]
    agrees: int
    disagrees: int


## 🔐 Autenticación y utilidades

In [3]:
import praw, requests
from datetime import datetime, timezone
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

def make_reddit():
    reddit = praw.Reddit(
        client_id="kOhzoYbAa7yXmebOh5EDRw",
        client_secret="WLrYM1NI3faRN7H48KiITdO3d-YhaQ",
        user_agent="Fickle_Finish_9750"
    )
    return reddit

def is_image_url(u: str) -> bool:
    return bool(re.search(r"\.(jpg|jpeg|png|gif)(?:\?.*)?$", (u or ""), flags=re.I))

def to_iso(ts_utc: float) -> str:
    try:
        return datetime.fromtimestamp(ts_utc, tz=timezone.utc).isoformat()
    except Exception:
        return None

def sleep():
    time.sleep(REQUEST_SLEEP_S)

# Heurística simple de acuerdo/desacuerdo (EN/ES; podés ampliar tu lexicón)
AGREE_TERMS = [
    "i agree", "agree", "supported", "support this", "i support", "valid point",
    "de acuerdo", "apoyo", "tiene razón", "cierto", "totalmente de acuerdo", "estoy de acuerdo",
]
DISAGREE_TERMS = [
    "i disagree", "disagree", "not support", "oppose", "against this", "bad take",
    "no apoyo", "en desacuerdo", "no estoy de acuerdo", "mala idea", "me opongo"
]

def count_hits(text: str, terms: List[str]) -> int:
    t = (text or "").lower()
    return sum(1 for term in terms if term in t)

analyzer = SentimentIntensityAnalyzer()

def label_from_compound(c: float) -> str:
    # Umbrales sugeridos por la doc de VADER (ver referencias)
    if c >= 0.5:
        return "pos"
    elif c <= -0.5:
        return "neg"
    else:
        return "neu"


## 🚀 Descarga de posts y comentarios (API oficial)

In [4]:
reddit = make_reddit()

# Selección de listing según SORT
sub = reddit.subreddit(SUBREDDIT)
if SORT == "hot":
    it = sub.hot(limit=MAX_POSTS)
elif SORT == "new":
    it = sub.new(limit=MAX_POSTS)
elif SORT == "rising":
    it = sub.rising(limit=MAX_POSTS)
elif SORT == "top":
    it = sub.top(time_filter=TIME_FILTER, limit=MAX_POSTS)
else:
    it = sub.hot(limit=MAX_POSTS)

posts_rows, comments_rows = [], []

for p in tqdm(it, total=MAX_POSTS, desc="Posts"):
    sleep()
    # armar registro del post
    img_urls = [p.url] if is_image_url(getattr(p, "url", "")) else []
    pr = PostRow(
        post_id=p.id,
        title=p.title,
        author=f"u/{p.author}" if p.author else None,
        score=int(p.score) if p.score is not None else None,
        num_comments=int(p.num_comments) if p.num_comments is not None else None,
        created=to_iso(getattr(p, "created_utc", None)),
        permalink=f"https://www.reddit.com{p.permalink}",
        is_self=bool(p.is_self),
        image_urls=img_urls,
        selftext=(p.selftext or None),
        subreddit=str(p.subreddit) if p.subreddit else SUBREDDIT,
    )

    # comentarios (hasta el tope)
    try:
        p.comments.replace_more(limit=0)
        com_list = p.comments.list()[:MAX_COMMENTS_PER_POST]
    except Exception as e:
        print("[WARN] Comentarios", e, "en", pr.permalink)
        com_list = []

    pos = neg = neu = 0
    agree_hits = disagree_hits = 0

    for c in com_list:
        text = getattr(c, "body", "") or ""
        comp = analyzer.polarity_scores(text).get("compound")
        lab = label_from_compound(comp) if comp is not None else None
        if lab == "pos": pos += 1
        elif lab == "neg": neg += 1
        else: neu += 1

        ah = count_hits(text, AGREE_TERMS)
        dh = count_hits(text, DISAGREE_TERMS)
        agree_hits += ah
        disagree_hits += dh

        comments_rows.append(CommentRow(
            post_id=pr.post_id,
            comment_id=getattr(c, "id", ""),
            author=f"u/{c.author}" if c.author else None,
            created=to_iso(getattr(c, "created_utc", None)),
            score=int(getattr(c, "score", 0)) if getattr(c, "score", None) is not None else None,
            body=text,
            vader_compound=comp,
            sentiment_label=lab,
            agrees=ah,
            disagrees=dh
        ))

    total = pos + neg + neu
    pr.comments_total = total
    pr.comments_pos = pos
    pr.comments_neg = neg
    pr.comments_neu = neu
    pr.agree_hits = agree_hits
    pr.disagree_hits = disagree_hits

    # índice simple de apoyo: (positivos + acuerdos) / (positivos + negativos + acuerdos + desacuerdos + epsilon)
    denom = (pos + neg + agree_hits + disagree_hits) or 1
    pr.support_index = round((pos + agree_hits) / denom, 3)

    posts_rows.append(pr)

print(f"Posts: {len(posts_rows)} | Comentarios: {len(comments_rows)}")


Posts: 100%|██████████| 200/200 [06:24<00:00,  1.92s/it]

Posts: 200 | Comentarios: 15093





## 💾 Guardado de resultados

In [None]:
posts_df = pd.DataFrame([asdict(p) for p in posts_rows])
comments_df = pd.DataFrame([asdict(c) for c in comments_rows])

posts_csv = OUT_DIR / "vader_posts.csv"
comments_csv = OUT_DIR / "vader_comments.csv"
posts_jsonl = OUT_DIR / "vader_posts.jsonl"
comments_jsonl = OUT_DIR / "vader_comments.jsonl"

posts_df.to_csv(posts_csv, index=False, encoding="utf-8-sig")
comments_df.to_csv(comments_csv, index=False, encoding="utf-8-sig")

with open(posts_jsonl, "w", encoding="utf-8") as f:
    for _, row in posts_df.iterrows():
        f.write(json.dumps(row.to_dict(), ensure_ascii=False) + "\\n")

with open(comments_jsonl, "w", encoding="utf-8") as f:
    for _, row in comments_df.iterrows():
        f.write(json.dumps(row.to_dict(), ensure_ascii=False) + "\\n")

posts_csv, comments_csv, posts_jsonl, comments_jsonl


(WindowsPath('reddit_api_output/posts.csv'),
 WindowsPath('reddit_api_output/comments.csv'),
 WindowsPath('reddit_api_output/posts.jsonl'),
 WindowsPath('reddit_api_output/comments.jsonl'))

## 👀 Vista rápida y ranking de apoyo

In [6]:
# Top 20 por índice de apoyo con al menos 10 comentarios
rank_df = (posts_df[posts_df["comments_total"] >= 10]
           .sort_values(["support_index", "comments_total"], ascending=[False, False])
           .head(20))

display(posts_df.head(50))
display(comments_df.head(100))
display(rank_df)

Unnamed: 0,post_id,title,author,score,num_comments,created,permalink,is_self,image_urls,selftext,subreddit,comments_total,comments_pos,comments_neg,comments_neu,agree_hits,disagree_hits,support_index
0,1lcftjz,Got my mom her green card by enlisting in the ...,u/SuperiorT,4148,537,2025-06-16T00:52:02+00:00,https://www.reddit.com/r/USCIS/comments/1lcftj...,False,[],"Well that's a wrap. On June 12, 2025 my mom fi...",USCIS,80,53,2,25,0,0,0.964
1,1grmeq4,Today I became a US citizen,u/adepojus,3865,261,2024-11-15T02:45:34+00:00,https://www.reddit.com/r/USCIS/comments/1grmeq...,False,[https://i.redd.it/j2iwqcb0bz0e1.jpeg],I came into United States as an F-1 student in...,USCIS,80,52,1,27,0,0,0.981
2,1gkfbph,Today I became a US citizen,u/Asteroids19_9,3683,142,2024-11-05T19:40:39+00:00,https://www.reddit.com/r/USCIS/comments/1gkfbp...,False,[https://i.redd.it/ejtng9ozy4zd1.jpeg],I am a 19 year old student at college. It took...,USCIS,80,50,4,26,0,0,0.926
3,1glflxy,"So, what now? An immigration attorney perspect...",u/Honest-Grape-9352,2900,715,2024-11-07T02:01:16+00:00,https://www.reddit.com/r/USCIS/comments/1glflx...,True,[],"(Before I begin, I kindly ask that I not be DM...",USCIS,80,35,8,37,0,0,0.814
4,1ltlanr,Became a Citizen after 26 years!!,u/Ajax4557,2655,165,2025-07-07T04:44:32+00:00,https://www.reddit.com/r/USCIS/comments/1ltlan...,False,[https://i.redd.it/5d0ljp8jtdbf1.jpeg],,USCIS,80,59,0,21,0,0,1.0
5,1lmkuyj,"After 25 years, I can finally say, and with a ...",u/liamthegooner,2446,237,2025-06-28T11:45:04+00:00,https://www.reddit.com/r/USCIS/comments/1lmkuy...,False,[https://i.redd.it/flrhy369on9f1.jpeg],,USCIS,80,57,2,21,1,0,0.967
6,1kepdw8,Be careful out there,u/Mrkinkade,2428,768,2025-05-04T17:31:34+00:00,https://www.reddit.com/r/USCIS/comments/1kepdw...,False,[https://i.redd.it/eety23l1wsye1.jpeg],,USCIS,80,13,11,56,1,0,0.56
7,1i8arwf,Judge in Seattle blocks Trump order on birthri...,u/lovetree77,2351,144,2025-01-23T18:50:35+00:00,https://www.reddit.com/r/USCIS/comments/1i8arw...,False,[],,USCIS,80,10,12,58,3,0,0.52
8,1h7t1fl,Disappointed in my country,u/Aggravating_Salad604,2199,849,2024-12-06T04:22:52+00:00,https://www.reddit.com/r/USCIS/comments/1h7t1f...,True,[],I'm an American citizen who is filing for my s...,USCIS,80,24,21,35,5,0,0.58
9,1i68n9x,I’m here for you. Many of us are,u/KFelts910,2167,892,2025-01-21T02:48:28+00:00,https://www.reddit.com/r/USCIS/comments/1i68n9...,True,[],Hey all - Immigration attorney here. It’s been...,USCIS,80,20,5,55,1,0,0.808


Unnamed: 0,post_id,comment_id,author,created,score,body,vader_compound,sentiment_label,agrees,disagrees
0,1lcftjz,my05pdb,u/DrummerHistorical493,2025-06-16T01:03:23+00:00,205,What a great son!,0.6588,pos,0,0
1,1lcftjz,my05mw5,u/Thedippyhoe,2025-06-16T01:02:57+00:00,216,Congratulations!!! Big hugs to your momma!\n\n...,0.8507,pos,0,0
2,1lcftjz,my0c8ez,u/WonderfulVariation93,2025-06-16T01:44:40+00:00,192,You are exempt from all future Mother’s Day gi...,0.5562,pos,0,0
3,1lcftjz,my08ip3,u/GeekNoy,2025-06-16T01:21:16+00:00,50,Congrats to your mom. You're an awesome son.,0.8176,pos,0,0
4,1lcftjz,my067yj,u/Greedy_Disaster_3130,2025-06-16T01:06:41+00:00,125,This is a great benefit offered to service mem...,0.9413,pos,0,0
...,...,...,...,...,...,...,...,...,...,...
95,1grmeq4,lxbuo2s,,2024-11-15T21:05:15+00:00,2,[deleted],0.0000,neu,0,0
96,1grmeq4,lxc2915,u/adisonpooh4,2024-11-15T21:43:53+00:00,2,Big congrats to your journey!🇺🇸🙏,0.5707,pos,0,0
97,1grmeq4,lxcac8i,u/Overall-Dot-7681,2024-11-15T22:27:05+00:00,2,So happy for you!,0.6468,pos,0,0
98,1grmeq4,lxcei8p,u/Georgiabulldawgsgurl,2024-11-15T22:50:24+00:00,2,Congratulations!!!!!,0.7243,pos,0,0


Unnamed: 0,post_id,title,author,score,num_comments,created,permalink,is_self,image_urls,selftext,subreddit,comments_total,comments_pos,comments_neg,comments_neu,agree_hits,disagree_hits,support_index
4,1ltlanr,Became a Citizen after 26 years!!,u/Ajax4557,2655,165,2025-07-07T04:44:32+00:00,https://www.reddit.com/r/USCIS/comments/1ltlan...,False,[https://i.redd.it/5d0ljp8jtdbf1.jpeg],,USCIS,80,59,0,21,0,0,1.0
10,1g9ohqs,I’m officially a U.S citizen!!,u/HeavyProcess3968,2047,93,2024-10-22T17:56:41+00:00,https://www.reddit.com/r/USCIS/comments/1g9ohq...,False,[https://i.redd.it/xc85minojcwd1.jpeg],,USCIS,80,53,0,27,0,0,1.0
12,1j83svb,I did it!! I’m an American!!!,u/SafinJade,1940,266,2025-03-10T17:13:27+00:00,https://www.reddit.com/r/USCIS/comments/1j83sv...,False,[https://i.redd.it/sodsv7zmawne1.jpeg],"Cranbury, NJ office. Super pleasant experience...",USCIS,80,70,0,10,1,0,1.0
16,1o0ktt1,"I became a U.S. citizen today in Newark, NJ 🇺🇸",u/TeklaTch,1579,164,2025-10-07T17:12:50+00:00,https://www.reddit.com/r/USCIS/comments/1o0ktt...,False,[],I want to take a moment to thank everyone here...,USCIS,80,66,0,14,0,0,1.0
33,1n3nb9t,I'm finally a US citizen,u/Mountain-Goatz,1133,115,2025-08-30T00:01:24+00:00,https://www.reddit.com/r/USCIS/comments/1n3nb9...,False,[],Saint Louis FO\nWe were the second batch that ...,USCIS,80,58,0,22,0,0,1.0
43,1my4fk9,Today I’m an American citizen!,u/bluedog33,1012,112,2025-08-23T15:29:11+00:00,https://www.reddit.com/r/USCIS/comments/1my4fk...,False,[https://i.redd.it/j2qdezmjdskf1.jpeg],"I had my oath ceremony in Fairfax, Virginia (t...",USCIS,80,57,0,23,0,0,1.0
45,1ig655q,I GOT IT✅,u/Glad-Craft2735,998,138,2025-02-02T20:01:54+00:00,https://www.reddit.com/r/USCIS/comments/1ig655...,False,[],Thank you all. \nLove god😆🥹,USCIS,80,61,0,19,0,0,1.0
48,1l6f2pf,My green card is hereee✅,u/dinda_xilu,956,207,2025-06-08T15:37:21+00:00,https://www.reddit.com/r/USCIS/comments/1l6f2p...,False,[https://i.redd.it/nid05kjk3q5f1.jpeg],"My pd is November 17, 2025. I was here on F1 v...",USCIS,80,56,0,24,0,0,1.0
57,1ms3goj,FINALLY 🇺🇸,u/Acrobatic-Notice-531,868,81,2025-08-16T18:28:13+00:00,https://www.reddit.com/r/USCIS/comments/1ms3go...,False,[https://i.redd.it/qkjlcy9ycfjf1.jpeg],FINALLY AFTER 1 YEAR we got our passport . tha...,USCIS,80,40,0,40,0,0,1.0
63,1klskny,I'm now officially a United States citizen!,u/stelgado,787,84,2025-05-13T17:54:03+00:00,https://www.reddit.com/r/USCIS/comments/1klskn...,False,[https://i.redd.it/ynsf73698l0f1.jpeg],It's finally over! There's no better feeling o...,USCIS,80,50,0,30,0,0,1.0


## 🔁 (Opcional) Hugging Face `pipeline('sentiment-analysis')`

In [7]:
# Si querés probar un modelo de transformers para comparar con VADER:
# - Esto es más pesado y puede requerir GPU/caché.
from transformers import pipeline

try:
    hf_sa = pipeline("sentiment-analysis")
    sample = comments_df["body"].dropna().head(5).tolist()
    if sample:
        print(hf_sa(sample))
except Exception as e:
    print("Transformers pipeline no disponible:", e)


  from .autonotebook import tqdm as notebook_tqdm
No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


[{'label': 'POSITIVE', 'score': 0.9998666048049927}, {'label': 'POSITIVE', 'score': 0.999363124370575}, {'label': 'POSITIVE', 'score': 0.9965884685516357}, {'label': 'POSITIVE', 'score': 0.9998729228973389}, {'label': 'POSITIVE', 'score': 0.999794065952301}]


## 📚 Notas y referencias

- **PRAW (API Reddit + OAuth):** autenticación y uso básico.  
- **Rate limits oficiales:** 100 QPM por client id con OAuth (revisa `X-Ratelimit-*`).  
- **VADER (Hutto & Gilbert, 2014):** reglas + umbrales (`compound ≥ 0.5` positivo; `≤ -0.5` negativo).  
- **Hugging Face Transformers:** `pipeline('sentiment-analysis')` para comparar enfoques.

> Recordatorio: este análisis de “apoyo/no apoyo” es **heurístico** (sentimiento + palabras de acuerdo). Podés extender con modelos de *stance detection* o ajustar el lexicón según el tema/subreddit.
