# üåØ Chipotle Data Analysis: A Beginner's Guide to Pandas

![Chipotle Banner](https://upload.wikimedia.org/wikipedia/en/thumb/3/3b/Chipotle_Mexican_Grill_logo.svg/200px-Chipotle_Mexican_Grill_logo.svg.png)

## üëã Introduction
Welcome to this **interactive tutorial** on analyzing real-world data using Pandas! 

In this notebook, we will explore a dataset of **Chipotle orders**. As a Data Analyst for Chipotle, your task is to dig into the data, answer key business questions, and derive insights about customer preferences and sales.

## üéØ Objectives
- **Load and Inspect Data**: Learn how to read TSV files and glimpse the raw data.
- **Data Cleaning**: Handle data types (converting currency strings to floats).
- **Exploratory Analysis**: Calculate revenue, find popular items, and analyze averages.
- **Visualization**: Use charts to showcase your findings.

---
### üõ†Ô∏è Libraries & Setup
First, let's import the necessary libraries. We'll use `pandas` for data manipulation and `matplotlib`/`seaborn` for visualization.


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set visualization style
sns.set(style="whitegrid")
%matplotlib inline


## üìÇ Step 1: Loading the Data
The dataset is hosted on GitHub. Since it's a **TSV** (Tab Separated Values) file, we need to specify the separator `sep='\t'`.

> **Pro Tip**: By default, `read_csv` expects comma-separated values. Always check your file's delimiter!


In [None]:
url = 'https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv'
chipo = pd.read_csv(url, sep='\t')


**Excellent!** The data is now loaded into a DataFrame called `chipo`.


## üßê Step 2: First Look at the Data
Let's verify the data loaded correctly by peeking at the first 10 rows.
*Question: What does a single row represent?*


In [None]:
chipo.head(10)


**Observation**: Each row represents an item in an order. Notice that `order_id` is repeated, meaning one order can contain multiple items.


## üìè Step 3: Dataset Dimensions
How much data do we have? Let's check the number of rows (observations) and columns.


In [None]:
chipo.info()


**Insight**: 
- `4622` entries (rows).
- `5` columns.
- Notice `choice_description` has some null values (NaN), likely meaning no specific choice was made (e.g., for Chips).


## üìã Step 4: Columns & Indexing
Let's list out our column names to understand our features.


In [None]:
chipo.columns


And check how the DataFrame is indexed:


In [None]:
chipo.index


## üèÜ Step 5: What is the Most Popular Item?
Let's determine which item appears most frequently in orders.

*Logic*: We count the occurrences of each `item_name` and find the top one.


In [None]:
# Get the top 5 most ordered items
item_counts = chipo['item_name'].value_counts()
most_ordered = item_counts.head(5)
print("Top 5 Items:")
print(most_ordered)

print(f"\nThe most ordered item is: {item_counts.idxmax()}")


### üìä Visualization: Top 5 Popular Items
A bar chart makes this ranking instantly clear.


In [None]:
plt.figure(figsize=(10, 6))
sns.barplot(x=most_ordered.values, y=most_ordered.index, palette='viridis')
plt.title('Top 5 Most Ordered Items', fontsize=15)
plt.xlabel('Number of Orders', fontsize=12)
plt.ylabel('Item Name', fontsize=12)
plt.show()


## ü•§ Step 6: Most Popular Choice
Some items like Canned Soda have choices (e.g., Diet Coke). What's the most popular choice overall?


In [None]:
chipo['choice_description'].value_counts().head(1)


## üì¶ Step 7: Total Items Ordered
How many items were ordered in total (sum of all quantities)?


In [None]:
total_items = chipo['quantity'].sum()
print(f"Total items ordered: {total_items}")


## üßπ Step 8: Data Cleaning (Item Price)
We want to calculate revenue, but look at the `item_price` column type:


In [None]:
print(chipo.item_price.dtype)


It's an `object` (string) because of the `$` sign. We need to convert it to a `float`.

**Strategy**:
1. Remove the `$` character.
2. Convert the type to `float`.


In [None]:
# Create a function to strip '$' and convert to float
def dollarizer(x):
    return float(x[1:-1])

# Apply it
chipo['item_price'] = chipo['item_price'].apply(dollarizer)

# Verify
print(chipo.item_price.dtype)
chipo.head()


### üí∏ Visualization: Price Distribution
Now that it's numeric, let's see how expensive the items are.


In [None]:
plt.figure(figsize=(10, 6))
sns.histplot(chipo['item_price'], bins=20, kde=True, color='green')
plt.title('Distribution of Item Prices', fontsize=15)
plt.xlabel('Price ($)', fontsize=12)
plt.ylabel('Frequency', fontsize=12)
plt.show()


## üí∞ Step 9: Total Revenue
Revenue = Quantity * Price.
Let's calculate the total revenue for the period.


In [None]:
revenue = (chipo['quantity'] * chipo['item_price']).sum()
print(f"Total Revenue: ${revenue:,.2f}")


## üßæ Step 10: Total Orders
How many unique orders were placed?


In [None]:
orders = chipo.order_id.value_counts().count()
print(f"Number of unique orders: {orders}")


## üíµ Step 11: Average Revenue per Order
On average, how much does a customer spend per order?


In [None]:
avg_revenue = revenue / orders
print(f"Average Revenue per Order: ${avg_revenue:,.2f}")


## üî¢ Step 12: Variety of Items
How many *unique* items are sold at Chipotle?


In [None]:
unique_items = chipo.item_name.value_counts().count()
print(f"Number of unique items sold: {unique_items}")


---
## üèÅ Conclusion
We've successfully explored the Chipotle dataset! We learned how to:
1. Load TSV data.
2. clean currency strings.
3. specific grouping and aggregating.

**Well done!** üöÄ

### üë®‚Äçüíª About the Author
**Author**: Tassawar Abbas
**Contact**: [abbas829@gmail.com](mailto:abbas829@gmail.com)
**Role**: Data Science Enthusiast
