# 01 - EDA YouTube Toxic Comments

En este notebook realizamos el **Análisis Exploratorio de Datos (EDA)** del dataset de comentarios de YouTube etiquetados por toxicidad.

## Objetivos

- Entender la estructura del dataset (filas, columnas, tipos de datos).
- Analizar la distribución de las diferentes etiquetas de toxicidad (multietiqueta).
- Estudiar la longitud y características básicas de los comentarios.
- Detectar posibles problemas: valores nulos, duplicados, desbalanceo de clases.
- Analizar la relación entre tipos de toxicidad.
- Explorar diferencias entre comentarios tóxicos y no tóxicos.
- Obtener insights que nos ayuden a tomar decisiones de preprocesamiento y modelado.


In [2]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

from collections import Counter
import re

plt.style.use("ggplot")
sns.set(font_scale=1.1)

pd.set_option("display.max_colwidth", 200)
pd.set_option("display.max_rows", 100)


## 1. Carga de datos

En esta sección:

- Cargamos el CSV que contiene los comentarios de YouTube.
- Comprobamos el número de filas y columnas.
- Visualizamos una muestra de las primeras filas para tener una idea rápida del contenido.


In [3]:
DATA_PATH = "../../data/youtoxic_english_1000.csv"

df = pd.read_csv(DATA_PATH)

print("Tamaño del DataFrame (filas, columnas):", df.shape)
df.head()


Tamaño del DataFrame (filas, columnas): (1000, 15)


Unnamed: 0,CommentId,VideoId,Text,IsToxic,IsAbusive,IsThreat,IsProvocative,IsObscene,IsHatespeech,IsRacist,IsNationalist,IsSexist,IsHomophobic,IsReligiousHate,IsRadicalism
0,Ugg2KwwX0V8-aXgCoAEC,04kJtp6pVXI,"If only people would just take a step back and not make this case about them, because it wasn't about anyone except the two people in that situation. To lump yourself into this mess and take matt...",False,False,False,False,False,False,False,False,False,False,False,False
1,Ugg2s5AzSPioEXgCoAEC,04kJtp6pVXI,Law enforcement is not trained to shoot to apprehend. They are trained to shoot to kill. And I thank Wilson for killing that punk bitch.,True,True,False,False,False,False,False,False,False,False,False,False
2,Ugg3dWTOxryFfHgCoAEC,04kJtp6pVXI,\r\nDont you reckon them 'black lives matter' banners being held by white cunts is kinda patronizing and ironically racist. could they have not come up with somethin better.. or is it just what w...,True,True,False,False,True,False,False,False,False,False,False,False
3,Ugg7Gd006w1MPngCoAEC,04kJtp6pVXI,There are a very large number of people who do not like police officers. They are called Criminals and its the reason we have police officers. The fact that Criminals do not like police officers i...,False,False,False,False,False,False,False,False,False,False,False,False
4,Ugg8FfTbbNF8IngCoAEC,04kJtp6pVXI,"The Arab dude is absolutely right, he should have not been shot 6 extra time. Shoot him once if hes attacking you and that would stop his attack. Shoot him twice if he's still attacking you, but s...",False,False,False,False,False,False,False,False,False,False,False,False
