Importamos las librerías Pandas y Numpy

# Chipotle

Nos han contactado para hacer un análisis de la cadena de restaurantes Chipotle y entender mejor posibles puntos de mejora... Por ahora solo nos pasan una muestra de los datos [aquí](https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv). ¿Somos capaces de explorarlos para ver de qué se trata? Qué tipo de información hay, está completa, podemos pedirles que nos la manden de otro modo, que nos informen de algún campo más, etc...


In [14]:
%pip install pandas numpy

import pandas as pd
import numpy as np


Note: you may need to restart the kernel to use updated packages.


### Step 2. Tomaremos el fichero en [esta URL](https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv) y lo leeremos como un dataframe.

Pista: [read_csv](https://numpy.org/doc/stable/user/absolute_beginners.html#importing-and-exporting-a-csv)

In [27]:
url = "https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv"

pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', 200)
pd.set_option('display.max_colwidth', 30)

df = pd.read_csv(url, sep='\t')

print(df.head())


   order_id  quantity                      item_name             choice_description item_price
0         1         1   Chips and Fresh Tomato Salsa                            NaN     $2.39 
1         1         1                           Izze                   [Clementine]     $3.39 
2         1         1               Nantucket Nectar                        [Apple]     $3.39 
3         1         1  Chips and Tomatillo-Green ...                            NaN     $2.39 
4         2         2                   Chicken Bowl  [Tomatillo-Red Chili Salsa...    $16.98 


### Step 3. Veamos los tipos de datos. ¿Podríamos obtener el producto de mayor precio?

In [16]:
print(df.dtypes)

highest_priced_product = df.loc[df['item_price'].idxmax()]

print("\nProduct with the highest price:")
print(highest_priced_product)


order_id               int64
quantity               int64
item_name             object
choice_description    object
item_price            object
dtype: object

Product with the highest price:
order_id                                        250
quantity                                          1
item_name                          Steak Salad Bowl
choice_description    [Fresh Tomato Salsa, Lettuce]
item_price                                   $9.39 
Name: 607, dtype: object


### Step 4. ¿Qué productos cuestan más de $10?

In [28]:
df['item_price'] = df['item_price'].str.replace('$', '')

df['item_price'] = pd.to_numeric(df['item_price'])

expensive_products = df[df['item_price'] > 10]

print("Products that cost more than $10:")
print(expensive_products[['item_name', 'item_price']])
print(f'Number of productos that cost more than $10: {len(expensive_products)}')


Products that cost more than $10:
                         item_name  item_price
4                     Chicken Bowl       16.98
5                     Chicken Bowl       10.98
7                    Steak Burrito       11.75
13                    Chicken Bowl       11.25
23                 Chicken Burrito       10.98
39                   Barbacoa Bowl       11.75
42                    Chicken Bowl       11.25
43                   Steak Burrito       11.75
45                 Chicken Burrito       10.98
52                 Chicken Burrito       10.98
57                  Veggie Burrito       11.25
58                   Barbacoa Bowl       11.75
62                     Veggie Bowl       11.25
68                 Chicken Burrito       10.98
79              Chicken Soft Tacos       11.25
90                      Steak Bowl       11.75
91                      Steak Bowl       11.75
93                Carnitas Burrito       11.75
97                   Carnitas Bowl       11.75
123                   Chic

### Step 4.1: ¿Y cuántos pedidos se han hecho con un producto de más de 10$? ¿Es lo mismo?

In [31]:
%timeit orders_with_expensive_products = df[df['item_price'] > 10]['order_id'].nunique()

print("Number of orders made with a product that costs more than $10:", orders_with_expensive_products)


192 µs ± 11.6 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
Number of orders made with a product that costs more than $10: 863


### Step 4.2: ¿Y cuántos pedidos se han hecho de más de 10$? ¿Es lo mismo?

In [32]:
%timeit total_per_order = df.groupby('order_id')['item_price'].sum()

%timeit num_orders_total_over_10 = (total_per_order > 10).sum()

print("Number of orders with a total of more than $10:", num_orders_total_over_10)


234 µs ± 21.9 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
37.2 µs ± 1.22 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
Number of orders with a total of more than $10: 1834


### Step 4.3: ¿Y en cuántos pedidos se ha pagado más de 10$ por un mismo producto? ¿Es lo mismo?

In [20]:
total_per_product = df.groupby('item_name')['item_price'].sum()

expensive_products = total_per_product[total_per_product > 10]

num_products_over_10 = len(expensive_products)

print("Number of products that have been paid more than $10 for the same product:", num_products_over_10)


Number of products that have been paid more than $10 for the same product: 47


### Step 5. ¿Qué precio tiene cada producto en distintos pedidos? ¿Hay productos con varios precios?

In [21]:
prices_per_product = df.groupby('item_name')['item_price'].unique()

print("Prices per product:")
print(prices_per_product)


Prices per product:
item_name
6 Pack Soft Drink                                                                                               [6.49, 12.98]
Barbacoa Bowl                                                                         [11.75, 9.25, 8.99, 11.48, 8.69, 11.49]
Barbacoa Burrito                                                                      [8.99, 9.25, 11.75, 11.08, 8.69, 11.48]
Barbacoa Crispy Tacos                                                                        [11.75, 9.25, 11.48, 8.99, 18.5]
Barbacoa Salad Bowl                                                                                             [11.89, 9.39]
Barbacoa Soft Tacos                                                                                [9.25, 8.99, 11.75, 11.48]
Bottled Water                                                         [1.09, 1.5, 3.0, 3.27, 2.18, 6.0, 7.5, 4.5, 10.5, 15.0]
Bowl                                                                                    

### Step 6. Ordena el dataframe en base al nombre de producto (item name)

In [22]:
df_sorted = df.sort_values(by='item_name')

print("Sorted DataFrame based on item name:")
print(df_sorted)


Sorted DataFrame based on item name:
      order_id  quantity                              item_name                                                                                                                                                                                         choice_description  item_price
3389      1360         2                      6 Pack Soft Drink                                                                                                                                                                                                [Diet Coke]       12.98
341        148         1                      6 Pack Soft Drink                                                                                                                                                                                                [Diet Coke]        6.49
1849       749         1                      6 Pack Soft Drink                                                               

### Step 7. ¿Cuantas veces se ha pedido los productos más caros?

In [23]:
most_expensive_products = df[df['item_price'] == df['item_price'].max()]

num_orders_most_expensive = most_expensive_products['quantity'].sum()

print("Number of times the most expensive products have been ordered:", num_orders_most_expensive)


Number of times the most expensive products have been ordered: 15


### Step 8. Veamos para el caso de Veggie Salad Bowl. Extrae esa información.

In [24]:
veggie_salad_bowl_info = df[df['item_name'] == 'Veggie Salad Bowl']

print("Information for Veggie Salad Bowl:")
print(veggie_salad_bowl_info)


Information for Veggie Salad Bowl:
      order_id  quantity          item_name                                                                                    choice_description  item_price
186         83         1  Veggie Salad Bowl  [Fresh Tomato Salsa, [Fajita Vegetables, Rice, Black Beans, Cheese, Sour Cream, Guacamole, Lettuce]]       11.25
295        128         1  Veggie Salad Bowl  [Fresh Tomato Salsa, [Fajita Vegetables, Lettuce, Guacamole, Sour Cream, Cheese, Black Beans, Rice]]       11.25
455        195         1  Veggie Salad Bowl              [Fresh Tomato Salsa, [Fajita Vegetables, Rice, Black Beans, Cheese, Guacamole, Lettuce]]       11.25
496        207         1  Veggie Salad Bowl  [Fresh Tomato Salsa, [Rice, Lettuce, Guacamole, Fajita Vegetables, Cheese, Sour Cream, Black Beans]]       11.25
960        394         1  Veggie Salad Bowl                                                    [Fresh Tomato Salsa, [Fajita Vegetables, Lettuce]]        8.75
1316       536   