In [1]:
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# An√°lisis Exploratorio de Datos - Veh√≠culos US\n",
    "\n",
    "Este notebook contiene el an√°lisis exploratorio de los datos de veh√≠culos."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Importar librer√≠as\n",
    "import pandas as pd\n",
    "import plotly.express as px\n",
    "import numpy as np\n",
    "\n",
    "# Para mostrar gr√°ficos en el notebook\n",
    "import plotly.io as pio\n",
    "pio.renderers.default = 'notebook'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Cargar datos\n",
    "df = pd.read_csv('../vehicles_us.csv')\n",
    "\n",
    "print(\"üìä INFORMACI√ìN DEL DATASET\")\n",
    "print(\"=\" * 50)\n",
    "print(f\"Filas: {df.shape[0]}, Columnas: {df.shape[1]}\")\n",
    "print(\"\\nPrimeras 5 filas:\")\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Informaci√≥n de columnas y tipos de datos\n",
    "print(\"üìã INFORMACI√ìN DE COLUMNAS\")\n",
    "print(\"=\" * 50)\n",
    "df.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Estad√≠sticas descriptivas\n",
    "print(\"üìà ESTAD√çSTICAS DESCRIPTIVAS\")\n",
    "print(\"=\" * 50)\n",
    "df.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## üìä Visualizaciones"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 1. Histograma de precios\n",
    "fig1 = px.histogram(df, x='price', nbins=50,\n",
    "                   title='Distribuci√≥n de Precios de Veh√≠culos',\n",
    "                   labels={'price': 'Precio ($)', 'count': 'Cantidad'})\n",
    "fig1.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 2. Gr√°fico de dispersi√≥n: Precio vs Kilometraje\n",
    "fig2 = px.scatter(df, x='odometer', y='price',\n",
    "                 title='Relaci√≥n Precio vs Kilometraje',\n",
    "                 labels={'odometer': 'Kilometraje', 'price': 'Precio ($)'},\n",
    "                 opacity=0.5)\n",
    "fig2.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 3. Veh√≠culos por condici√≥n\n",
    "condition_counts = df['condition'].value_counts().reset_index()\n",
    "condition_counts.columns = ['condition', 'count']\n",
    "\n",
    "fig3 = px.bar(condition_counts, x='condition', y='count',\n",
    "             title='Cantidad de Veh√≠culos por Condici√≥n',\n",
    "             color='condition')\n",
    "fig3.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 4. Precio promedio por tipo de veh√≠culo\n",
    "avg_price_type = df.groupby('type')['price'].mean().reset_index()\n",
    "avg_price_type = avg_price_type.sort_values('price', ascending=False)\n",
    "\n",
    "fig4 = px.bar(avg_price_type, x='type', y='price',\n",
    "             title='Precio Promedio por Tipo de Veh√≠culo',\n",
    "             labels={'type': 'Tipo', 'price': 'Precio Promedio ($)'},\n",
    "             color='price')\n",
    "fig4.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 5. Veh√≠culos por a√±o del modelo\n",
    "# Filtrar a√±os v√°lidos\n",
    "df_year = df.dropna(subset=['model_year'])\n",
    "df_year['model_year'] = df_year['model_year'].astype(int)\n",
    "\n",
    "fig5 = px.histogram(df_year, x='model_year',\n",
    "                   title='Distribuci√≥n por A√±o del Modelo',\n",
    "                   labels={'model_year': 'A√±o del Modelo', 'count': 'Cantidad'},\n",
    "                   nbins=30)\n",
    "fig5.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## üîç An√°lisis de valores nulos"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Valores nulos por columna\n",
    "print(\"üîç VALORES NULOS POR COLUMNA\")\n",
    "print(\"=\" * 50)\n",
    "\n",
    "null_counts = df.isnull().sum()\n",
    "null_percentage = (null_counts / len(df)) * 100\n",
    "\n",
    "null_df = pd.DataFrame({\n",
    "    'Columna': null_counts.index,\n",
    "    'Valores_Nulos': null_counts.values,\n",
    "    'Porcentaje': null_percentage.values\n",
    "})\n",
    "\n",
    "null_df = null_df[null_df['Valores_Nulos'] > 0].sort_values('Porcentaje', ascending=False)\n",
    "null_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## üìù Conclusiones del EDA\n",
    "\n",
    "1. **Distribuci√≥n de precios**: La mayor√≠a de veh√≠culos est√°n en rangos bajos-medios de precio.\n",
    "2. **Relaci√≥n precio-kilometraje**: Existe correlaci√≥n negativa (m√°s kilometraje ‚Üí menor precio).\n",
    "3. **Condici√≥n de veh√≠culos**: La mayor√≠a est√°n en condici√≥n 'good' o 'excellent'.\n",
    "4. **Valores nulos**: Algunas columnas como `odometer` y `paint_color` tienen valores faltantes.\n",
    "5. **Tipos de veh√≠culos**: Los SUV y pickup son los m√°s comunes en el dataset."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.9.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}

NameError: name 'null' is not defined