1. Preparación del Entorno
Asegúrate de tener instalado Pandas en tu entorno de trabajo.
Descarga el archivo dataset.csv desde Kaggle. Elige un dataset que te interese y que no incluya visualización de datos. Algunas sugerencias pueden ser datasets relacionados con ventas, compras, productos, etc.
Archivo seleccionado Tesla-dataset.csv

In [40]:
# Usamos la biblioteca google para poder usar archivos en nuestro drive.
from google.colab import drive
# Este comando conecta colab con drive.
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


2. Cargar los Datos
Carga el archivo CSV en un DataFrame de Pandas.
Muestra las primeras 10 filas del DataFrame para confirmar que los datos se han cargado correctamente.

In [43]:
# Importamos librerias.
import pandas as pd

In [45]:
# Importamos una bbdd en formato excel y lo guardamos en una variable.
path = "/content/drive/MyDrive/BBDD/Chocolate Sales.csv"
df = pd.read_csv(path)

In [46]:
# Visualizamos el DataFrame.
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1094 entries, 0 to 1093
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   Sales Person   1094 non-null   object
 1   Country        1094 non-null   object
 2   Product        1094 non-null   object
 3   Date           1094 non-null   object
 4   Amount         1094 non-null   object
 5   Boxes Shipped  1094 non-null   int64 
dtypes: int64(1), object(5)
memory usage: 51.4+ KB


In [47]:
# Visualizamos el DataFrame con 10 líneaas.
df.head(10)

Unnamed: 0,Sales Person,Country,Product,Date,Amount,Boxes Shipped
0,Jehu Rudeforth,UK,Mint Chip Choco,04-Jan-22,"$5,320",180
1,Van Tuxwell,India,85% Dark Bars,01-Aug-22,"$7,896",94
2,Gigi Bohling,India,Peanut Butter Cubes,07-Jul-22,"$4,501",91
3,Jan Morforth,Australia,Peanut Butter Cubes,27-Apr-22,"$12,726",342
4,Jehu Rudeforth,UK,Peanut Butter Cubes,24-Feb-22,"$13,685",184
5,Van Tuxwell,India,Smooth Sliky Salty,06-Jun-22,"$5,376",38
6,Oby Sorrel,UK,99% Dark & Pure,25-Jan-22,"$13,685",176
7,Gunar Cockshoot,Australia,After Nines,24-Mar-22,"$3,080",73
8,Jehu Rudeforth,New Zealand,50% Dark Bites,20-Apr-22,"$3,990",59
9,Brien Boise,Australia,99% Dark & Pure,04-Jul-22,"$2,835",102


3. Exploración Inicial de los Datos
Muestra las últimas 5 filas del DataFrame.
Utiliza el método info() para obtener información general sobre el DataFrame, incluyendo el número de entradas, nombres de las columnas, tipos de datos y memoria utilizada.
Genera estadísticas descriptivas del DataFrame utilizando el método describe().

In [48]:
# Visualizamos el DataFrame.
df.info()

# Visualizamos el DataFrame con 5 líneaas.
df.head(5)

df.describe()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1094 entries, 0 to 1093
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   Sales Person   1094 non-null   object
 1   Country        1094 non-null   object
 2   Product        1094 non-null   object
 3   Date           1094 non-null   object
 4   Amount         1094 non-null   object
 5   Boxes Shipped  1094 non-null   int64 
dtypes: int64(1), object(5)
memory usage: 51.4+ KB


Unnamed: 0,Boxes Shipped
count,1094.0
mean,161.797989
std,121.544145
min,1.0
25%,70.0
50%,135.0
75%,228.75
max,709.0


4. Limpieza de Datos
Identifica y maneja los datos faltantes utilizando técnicas apropiadas (relleno con valores estadísticos, interpolación, eliminación, etc.).
Corrige los tipos de datos si es necesario (por ejemplo, convertir cadenas a fechas).
Elimina duplicados si los hay.

In [49]:
df.columns = df.columns.str.lower().str.strip()

In [50]:
df.info()
# Según lo revisado no se requiere limpieza de datos

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1094 entries, 0 to 1093
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   sales person   1094 non-null   object
 1   country        1094 non-null   object
 2   product        1094 non-null   object
 3   date           1094 non-null   object
 4   amount         1094 non-null   object
 5   boxes shipped  1094 non-null   int64 
dtypes: int64(1), object(5)
memory usage: 51.4+ KB


In [51]:
df.head()

Unnamed: 0,sales person,country,product,date,amount,boxes shipped
0,Jehu Rudeforth,UK,Mint Chip Choco,04-Jan-22,"$5,320",180
1,Van Tuxwell,India,85% Dark Bars,01-Aug-22,"$7,896",94
2,Gigi Bohling,India,Peanut Butter Cubes,07-Jul-22,"$4,501",91
3,Jan Morforth,Australia,Peanut Butter Cubes,27-Apr-22,"$12,726",342
4,Jehu Rudeforth,UK,Peanut Butter Cubes,24-Feb-22,"$13,685",184


In [52]:
# Revisión de duplicados
df.duplicated()

Unnamed: 0,0
0,False
1,False
2,False
3,False
4,False
...,...
1089,False
1090,False
1091,False
1092,False


In [53]:
# Revisar duplicados.
df.duplicated().sum()

np.int64(0)

In [54]:
# Revisar duplicados.
df[df.duplicated()]

Unnamed: 0,sales person,country,product,date,amount,boxes shipped


In [64]:
# Cambiamos amout a .astype() para poder crear nuevas columnas.
# Limpiar la columna 'amount' y convertirla a int, se elimina $ y , del monto.
df['amount'] = df['amount'].replace('[\$,]', '', regex=True).astype(int)

In [71]:
# Verificamos el resultado
print(df[['amount', 'boxes shipped']].dtypes)
print(df)

amount           int64
boxes shipped    int64
dtype: object
          sales person    country              product       date  amount  \
0       Jehu Rudeforth         UK      Mint Chip Choco  04-Jan-22    5320   
1          Van Tuxwell      India        85% Dark Bars  01-Aug-22    7896   
2         Gigi Bohling      India  Peanut Butter Cubes  07-Jul-22    4501   
3         Jan Morforth  Australia  Peanut Butter Cubes  27-Apr-22   12726   
4       Jehu Rudeforth         UK  Peanut Butter Cubes  24-Feb-22   13685   
...                ...        ...                  ...        ...     ...   
1089  Karlen McCaffrey  Australia  Spicy Special Slims  17-May-22    4410   
1090    Jehu Rudeforth        USA           White Choc  07-Jun-22    6559   
1091      Ches Bonnell     Canada  Organic Choco Syrup  26-Jul-22     574   
1092    Dotty Strutley      India              Eclairs  28-Jul-22    2086   
1093  Karlen McCaffrey      India       70% Dark Bites  23-May-22    5075   

      boxes shi

In [60]:
df.columns

Index(['sales person', 'country', 'product', 'date', 'amount', 'boxes shipped',
       'Total_sales'],
      dtype='object')

5. Transformación de Datos
Crea nuevas columnas basadas en operaciones con las columnas existentes (por ejemplo, calcular ingresos a partir de ventas y precios).
Normaliza o estandariza columnas si es necesario.
Clasifica los datos en categorías relevantes.


In [68]:
# Creamos la columna Total_sales para saber cuanto vende cada Verdedor.
df["Total_sales"] = df["amount"]  * df["boxes shipped"]

In [70]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1094 entries, 0 to 1093
Data columns (total 7 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   sales person   1094 non-null   object
 1   country        1094 non-null   object
 2   product        1094 non-null   object
 3   date           1094 non-null   object
 4   amount         1094 non-null   int64 
 5   boxes shipped  1094 non-null   int64 
 6   Total_sales    1094 non-null   int64 
dtypes: int64(3), object(4)
memory usage: 60.0+ KB


In [72]:
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
amount,1094.0,5652.308044,4102.442,7.0,2390.5,4868.5,8027.25,22050.0
boxes shipped,1094.0,161.797989,121.5441,1.0,70.0,135.0,228.75,709.0
Total_sales,1094.0,905153.063071,1041773.0,588.0,186784.5,553980.0,1244832.75,6985888.0


In [69]:
df.head()

Unnamed: 0,sales person,country,product,date,amount,boxes shipped,Total_sales
0,Jehu Rudeforth,UK,Mint Chip Choco,04-Jan-22,5320,180,957600
1,Van Tuxwell,India,85% Dark Bars,01-Aug-22,7896,94,742224
2,Gigi Bohling,India,Peanut Butter Cubes,07-Jul-22,4501,91,409591
3,Jan Morforth,Australia,Peanut Butter Cubes,27-Apr-22,12726,342,4352292
4,Jehu Rudeforth,UK,Peanut Butter Cubes,24-Feb-22,13685,184,2518040


In [73]:
# Vemos los paises donde se venten estos chocolates
df['country'].value_counts()

Unnamed: 0_level_0,count
country,Unnamed: 1_level_1
Australia,205
India,184
USA,179
UK,178
Canada,175
New Zealand,173


In [121]:
# Podemos revisar los chocolates mas vendidos por país.
df.groupby(["country", "product"]).size().sort_index()

Unnamed: 0_level_0,Unnamed: 1_level_0,0
country,product,Unnamed: 2_level_1
Australia,50% Dark Bites,16
Australia,70% Dark Bites,9
Australia,85% Dark Bars,8
Australia,99% Dark & Pure,11
Australia,After Nines,7
...,...,...
USA,Peanut Butter Cubes,7
USA,Raspberry Choco,11
USA,Smooth Sliky Salty,8
USA,Spicy Special Slims,8


In [132]:
# Producto mas vendido por país.
df.groupby(["product", "boxes shipped", "country"]).sum()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,sales person,date,amount,Total_sales,mediana,Categoria
product,boxes shipped,country,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
50% Dark Bites,6,Australia,Madelene Upcott,25-Jul-22,7350,44100,4868.5,Top Seller
50% Dark Bites,10,New Zealand,Van Tuxwell,11-Jul-22,3626,36260,4868.5,Top Seller
50% Dark Bites,12,Australia,Dennison Crosswaite,01-Apr-22,7287,87444,4868.5,Top Seller
50% Dark Bites,12,India,Roddy Speechley,14-Mar-22,8337,100044,4868.5,Top Seller
50% Dark Bites,13,USA,Mallorie Waber,19-Apr-22,1736,22568,4868.5,Top Seller
...,...,...,...,...,...,...,...,...
White Choc,321,New Zealand,Brien Boise,15-Jun-22,5509,1768389,4868.5,Top Seller
White Choc,330,India,Dotty Strutley,15-Aug-22,12327,4067910,4868.5,Top Seller
White Choc,341,USA,Ches Bonnell,11-Feb-22,5271,1797411,4868.5,Top Seller
White Choc,342,India,Curtice Advani,04-Jul-22,7154,2446668,4868.5,Top Seller


6. Análisis de Datos
Realiza agrupaciones de datos utilizando groupby para obtener insights específicos (por ejemplo, ventas por producto, ventas por región, etc.).
Aplica funciones de agregación como sum, mean, count, min, max, std, y var.
Utiliza el método apply para realizar operaciones más complejas y personalizadas.

In [98]:
# Muestra una comparación promedio, desviacion y mediana de precio y cantidades por pais.

dict_1 = {
  "amount": ["sum", "min", "max", "mean", "std", "median", "var"],
  "boxes shipped": ["sum", "min", "max", "mean", "std", "median", "var"]
}
df.groupby("country").agg(dict_1).round(2)


Unnamed: 0_level_0,amount,amount,amount,amount,amount,amount,amount,boxes shipped,boxes shipped,boxes shipped,boxes shipped,boxes shipped,boxes shipped,boxes shipped
Unnamed: 0_level_1,sum,min,max,mean,std,median,var,sum,min,max,mean,std,median,var
country,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2
Australia,1137367,63,19453,5548.13,3800.39,5194.0,14442943.5,32647,3,708,159.25,125.75,119.0,15813.87
Canada,962899,210,16793,5502.28,3888.96,4830.0,15124020.86,31221,1,709,178.41,140.87,151.0,19842.96
India,1045800,28,22050,5683.7,4362.83,4581.5,19034303.74,29470,2,581,160.16,122.25,136.0,14944.86
New Zealand,950418,7,19481,5493.75,3942.99,5061.0,15547196.74,26580,4,518,153.64,117.84,129.0,13886.7
UK,1051792,7,18991,5908.94,4176.18,5274.5,17440448.4,30265,2,554,170.03,108.52,152.0,11776.11
USA,1035349,70,17465,5784.07,4464.22,4802.0,19929228.06,26824,2,508,149.85,110.19,131.0,12141.5


In [130]:
# Calcular total vendido por persona
total_por_persona = df.groupby("sales person")["Total_sales"].sum()

# Función que devuelve la categoría según monto total de ventas
def clasificar_por_ventas(nombre):
    total = total_por_persona[nombre]
    if total > 15000000:
        return "Top Seller"
    elif total > 10000000:
        return "Medio"
    else:
        return "Bajo"
# Se agrega la columna de categoria de vendedor
df["Categoria"] = df["sales person"].apply(clasificar_por_ventas)

# Mejor vendedor, por categoria
df.groupby(["sales person", "Categoria"]).sum().sort_index()


Unnamed: 0_level_0,Unnamed: 1_level_0,country,product,date,amount,boxes shipped,Total_sales,mediana
sales person,Categoria,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Andria Kimpton,Top Seller,CanadaUKIndiaAustraliaAustraliaNew ZealandUSAC...,After Nines50% Dark BitesSpicy Special Slims50...,11-May-2224-Aug-2223-Feb-2202-Jun-2228-Jun-222...,201747,6448,31770956,189871.5
Barr Faughny,Top Seller,USAAustraliaUSAUSAAustraliaUKCanadaNew Zealand...,Orange ChocoFruit & Nut BarsRaspberry ChocoFru...,10-Mar-2230-Jun-2212-Apr-2228-Jan-2204-Jul-222...,258713,6366,37271192,209345.5
Beverie Moffet,Top Seller,AustraliaCanadaUSAAustraliaNew ZealandUKIndiaN...,Organic Choco SyrupMilk BarsRaspberry ChocoEcl...,26-Jan-2216-Feb-2219-Apr-2220-Jun-2224-Mar-223...,278922,9214,51179380,243425.0
Brien Boise,Top Seller,AustraliaAustraliaAustraliaCanadaNew ZealandCa...,99% Dark & PureEclairsFruit & Nut Bars99% Dark...,04-Jul-2227-Jun-2206-Jun-2218-May-2201-Aug-220...,312816,8102,52568747,258030.5
Camilla Castle,Top Seller,AustraliaIndiaUSAUSAAustraliaUKNew ZealandIndi...,85% Dark Bars50% Dark BitesSpicy Special Slims...,19-May-2204-Jul-2211-Feb-2218-Aug-2229-Apr-220...,196616,5374,33360705,155792.0
Ches Bonnell,Top Seller,New ZealandIndiaUKAustraliaAustraliaUKAustrali...,Spicy Special SlimsSpicy Special SlimsSmooth S...,14-Feb-2206-Jul-2211-Jul-2205-Jul-2202-Mar-222...,320901,7522,45304784,233688.0
Curtice Advani,Top Seller,UKAustraliaIndiaCanadaCanadaAustraliaAustralia...,85% Dark BarsMilk BarsFruit & Nut BarsAlmond C...,08-Jun-2214-Feb-2207-Jul-2224-Mar-2216-Jun-221...,216461,7074,30335928,223951.0
Dennison Crosswaite,Top Seller,New ZealandNew ZealandAustraliaAustraliaUKAust...,White ChocManuka Honey Choco70% Dark Bites85% ...,05-Jul-2215-Jun-2207-Feb-2205-Apr-2224-Mar-221...,291669,8767,54975977,238556.5
Dotty Strutley,Top Seller,AustraliaAustraliaUKUSAUKAustraliaUKAustraliaC...,Drinking CocoFruit & Nut BarsAlmond ChocoManuk...,19-Jul-2223-Aug-2228-Jul-2210-Aug-2203-Jan-221...,190624,6853,33557615,175266.0
Gigi Bohling,Top Seller,IndiaUSAUKCanadaNew ZealandNew ZealandUKUSAInd...,Peanut Butter CubesChoco Coated AlmondsAlmond ...,07-Jul-2202-Mar-2210-Jan-2228-Jan-2204-Apr-222...,232666,6303,32982670,228819.5


7. Documentación
Documenta claramente cada paso del análisis, explicando qué se hizo y por qué se hizo.
Asegúrate de que el código sea legible y esté bien comentado