# Ex2 - Getting and Knowing your Data

This time we are going to pull data directly from the internet.
Special thanks to: https://github.com/justmarkham for sharing the dataset and materials.

### Step 1. Import the necessary libraries

In [38]:
import pandas as pd
import numpy as np

print("Libraries imported.")


Libraries imported.


# Answer for the above step will be shown here

### Step 2. Import the dataset from this [address](https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv). 

In [39]:
# URL for the Chipotle dataset (TSV format)
url = "https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv"

# Load the dataset
chipo = pd.read_csv(url, sep='\t')

# Clean the dataset: drop missing values and duplicates
chipo.dropna(inplace=True)
chipo.drop_duplicates(inplace=True)

print("Data loaded, cleaned, and ready for processing.")


Data loaded, cleaned, and ready for processing.


# Answer for the above step will be shown here

### Step 3. Assign it to a variable called chipo.

In [40]:
# The dataset is already assigned to the variable 'chipo'
print("Dataset is assigned to variable 'chipo'.")


Dataset is assigned to variable 'chipo'.


# Answer for the above step will be shown here

### Step 4. See the first 10 entries

In [41]:
# Display the first 10 rows of the dataset
chipo.head(10)


Unnamed: 0,order_id,quantity,item_name,choice_description,item_price
1,1,1,Izze,[Clementine],$3.39
2,1,1,Nantucket Nectar,[Apple],$3.39
4,2,2,Chicken Bowl,"[Tomatillo-Red Chili Salsa (Hot), [Black Beans...",$16.98
5,3,1,Chicken Bowl,"[Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou...",$10.98
7,4,1,Steak Burrito,"[Tomatillo Red Chili Salsa, [Fajita Vegetables...",$11.75
8,4,1,Steak Soft Tacos,"[Tomatillo Green Chili Salsa, [Pinto Beans, Ch...",$9.25
9,5,1,Steak Burrito,"[Fresh Tomato Salsa, [Rice, Black Beans, Pinto...",$9.25
11,6,1,Chicken Crispy Tacos,"[Roasted Chili Corn Salsa, [Fajita Vegetables,...",$8.75
12,6,1,Chicken Soft Tacos,"[Roasted Chili Corn Salsa, [Rice, Black Beans,...",$8.75
13,7,1,Chicken Bowl,"[Fresh Tomato Salsa, [Fajita Vegetables, Rice,...",$11.25


# Answer for the above step will be shown here

### Step 5. What is the number of observations in the dataset?

In [42]:

# Number of rows (observations)
num_observations = chipo.shape[0]
print("Number of observations:", num_observations)



Number of observations: 3335


# Answer for the above step will be shown here

### Step 6. What is the number of columns in the dataset?

In [43]:
# Number of columns
num_columns = chipo.shape[1]
print("Number of columns:", num_columns)


Number of columns: 5


# Answer for the above step will be shown here

### Step 7. Print the name of all the columns.

In [44]:
# Print all column names
print("Column names:")
print(chipo.columns.tolist())


Column names:
['order_id', 'quantity', 'item_name', 'choice_description', 'item_price']


# Answer for the above step will be shown here

### Step 8. How is the dataset indexed?

In [45]:
# Display the index details of the DataFrame
print("Dataset index:")
print(chipo.index)


Dataset index:
Index([   1,    2,    4,    5,    7,    8,    9,   11,   12,   13,
       ...
       4609, 4610, 4611, 4612, 4615, 4617, 4618, 4619, 4620, 4621],
      dtype='int64', length=3335)


# Answer for the above step will be shown here

### Step 9. Which was the most-ordered item? 

In [46]:
# Find the most ordered item by counting occurrences in 'item_name'
most_ordered_item = chipo['item_name'].value_counts().idxmax()
print("Most-ordered item:", most_ordered_item)


Most-ordered item: Chicken Bowl


# Answer for the above step will be shown here

### Step 10. For the most-ordered item, how many items were ordered?

In [47]:
# Get the count (frequency) for the most-ordered item
most_ordered_count = chipo['item_name'].value_counts().max()
print("Number of orders for", most_ordered_item, ":", most_ordered_count)


Number of orders for Chicken Bowl : 717


# Answer for the above step will be shown here

### Step 11. What was the most ordered item in the choice_description column?

In [48]:
# Find the most common value in the 'choice_description' column.
# (Often many entries are 'NaN' or empty; adjust if needed.)
most_common_choice = chipo['choice_description'].value_counts().idxmax()
print("Most common choice description:", most_common_choice)


Most common choice description: [Diet Coke]


# Answer for the above step will be shown here

### Step 12. How many items were orderd in total?

In [49]:
# Sum the 'quantity' column to get the total number of items ordered
total_items_ordered = chipo['quantity'].sum()
print("Total items ordered:", total_items_ordered)


Total items ordered: 3549


# Answer for the above step will be shown here

### Step 13. Turn the item price into a float

In [50]:
# Convert the 'item_price' column from a string (with '$') to a float
chipo['item_price'] = chipo['item_price'].apply(lambda x: float(x.replace('$', '')))
print("Item price converted to float.")


Item price converted to float.


# Answer for the above step will be shown here

#### Step 13.a. Check the item price type

In [51]:
# Check the current data type of the 'item_price' column
print("Before conversion, item_price type:", chipo['item_price'].dtype)


Before conversion, item_price type: float64


# Answer for the above step will be shown here

#### Step 13.b. Create a lambda function and change the type of item price

In [52]:
# Convert the 'item_price' from string to float.
# If the price contains a dollar sign, remove it first.
chipo['item_price'] = chipo['item_price'].apply(lambda x: float(x.replace('$','')) if isinstance(x, str) else float(x))
print("Conversion of item_price completed.")


Conversion of item_price completed.


# Answer for the above step will be shown here

#### Step 13.c. Check the item price type

In [53]:
# Verify that the conversion was successful
print("After conversion, item_price type:", chipo['item_price'].dtype)


After conversion, item_price type: float64


# Answer for the above step will be shown here

### Step 14. How much was the revenue for the period in the dataset?

In [54]:
# Calculate revenue for each order line and then sum for the total revenue.
# Revenue per line = quantity * item_price
total_revenue = (chipo['quantity'] * chipo['item_price']).sum()
print("Total revenue for the period: $", total_revenue)


Total revenue for the period: $ 33326.56


# Answer for the above step will be shown here

### Step 15. How many orders were made in the period?

In [55]:
# Count the unique order IDs
num_orders = chipo['order_id'].nunique()
print("Total number of orders:", num_orders)


Total number of orders: 1833


# Answer for the above step will be shown here

### Step 16. What is the average revenue amount per order?

In [56]:
# Calculate the average revenue per order
avg_revenue_per_order = total_revenue / num_orders
print("Average revenue per order: $", round(avg_revenue_per_order, 2))




Average revenue per order: $ 18.18


# Answer for the above step will be shown here

### Step 17. How many different items are sold?

In [57]:
# Count the number of unique items sold
num_unique_items = chipo['item_name'].nunique()
print("Different items sold:", num_unique_items)


Different items sold: 38


# Answer for the above step will be shown here