# 🍽️ Restaurant Tips Data Analysis Tutorial

## 🎯 What is Data Analysis?

<div style="background-color: #e6f3ff; padding: 15px; border-radius: 10px; border-left: 5px solid #0066cc;">
<h3>💡 Think of data analysis like being a detective!</h3>
<p>You have clues (data) and you want to solve mysteries (find patterns and insights). Just like how detectives look for evidence to solve cases, data analysts look at numbers to answer questions!</p>
</div>

### 🔍 Today's Mystery: Restaurant Tips!
We're going to analyze **real restaurant data** to answer questions like:
- 💰 Do people tip more when they spend more money?
- 🚬 Do smokers tip differently than non-smokers?
- 📅 Which day of the week gets the best tips?
- 👥 Does group size affect tipping?

### 📊 Our Tools:
- **<span style="color: #ff6b6b;">NumPy</span>** = Our calculator for working with numbers
- **<span style="color: #4ecdc4;">Matplotlib</span>** = Our artist for drawing charts and graphs
- **<span style="color: #45b7d1;">Tips Dataset</span>** = Real data from a restaurant!

In [None]:
# Import our tools
import numpy as np
import matplotlib.pyplot as plt

# Print welcome message
print("🎉 Welcome to Restaurant Tips Data Analysis!")
print("📚 Let's learn how to be data detectives!")

# 📂 Step 1: Loading Our Data

## 🤔 What is a Dataset?

<div style="background-color: #fff2e6; padding: 15px; border-radius: 10px; border-left: 5px solid #ff9500;">
<h3>📋 Think of a dataset like a digital spreadsheet!</h3>
<p>Each <strong>row</strong> is one restaurant visit, and each <strong>column</strong> tells us something about that visit (like how much they spent, what day it was, etc.)</p>
</div>

### 📊 Our Tips Dataset Contains:
| Column | Description | Example |
|--------|-------------|----------|
| **total_bill** | 💵 Total amount spent | $25.50 |
| **tip** | 💰 Tip amount given | $4.00 |
| **Gender** | 👤 Customer gender | Male/Female |
| **smoker** | 🚬 Smoking section? | Yes/No |
| **day** | 📅 Day of the week | Saturday |
| **time** | 🕐 Meal time | Lunch/Dinner |
| **size** | 👥 Party size | 4 people |

### 🔧 How We Load Data:
We'll create our dataset manually since it's a famous dataset that data scientists use for learning!

In [None]:
# Create our restaurant tips dataset
# This is real data from a restaurant!

# Numerical data (money amounts and party sizes)
total_bill = np.array([16.99, 10.34, 21.01, 23.68, 24.59, 25.29, 8.77, 26.88, 15.04, 14.78,
                      10.27, 35.26, 15.42, 18.43, 14.83, 21.58, 10.33, 16.29, 16.97, 20.65])

tip = np.array([1.01, 1.66, 3.50, 3.31, 3.61, 4.71, 2.00, 3.12, 1.96, 3.23,
               1.71, 5.00, 1.57, 3.00, 3.02, 3.92, 1.67, 3.71, 3.50, 3.35])

party_size = np.array([2, 3, 3, 2, 4, 4, 2, 4, 2, 2, 2, 3, 2, 2, 2, 2, 3, 3, 2, 3])

# Categorical data (text descriptions)
gender = ['Female', 'Male', 'Male', 'Male', 'Female', 'Male', 'Male', 'Male', 'Male', 'Female',
         'Male', 'Female', 'Female', 'Male', 'Female', 'Female', 'Female', 'Female', 'Female', 'Male']

smoker = ['No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No',
         'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'Yes']

day = ['Sun', 'Sun', 'Sun', 'Sun', 'Sun', 'Sun', 'Sun', 'Sun', 'Sun', 'Sun',
      'Sun', 'Sun', 'Sun', 'Sun', 'Sun', 'Sun', 'Sun', 'Sun', 'Sun', 'Sat']

time = ['Dinner', 'Dinner', 'Dinner', 'Dinner', 'Dinner', 'Dinner', 'Dinner', 'Dinner', 'Dinner', 'Dinner',
       'Dinner', 'Dinner', 'Dinner', 'Dinner', 'Dinner', 'Dinner', 'Dinner', 'Dinner', 'Dinner', 'Dinner']

print("✅ Data loaded successfully!")
print(f"📊 We have {len(total_bill)} restaurant visits to analyze")
print(f"💡 Each visit has 7 pieces of information")

# 🔍 Step 2: Exploring Our Data

## 🧐 What Does "Data Exploration" Mean?

<div style="background-color: #f0f8ff; padding: 15px; border-radius: 10px; border-left: 5px solid #1e90ff;">
<h3>🔦 Data exploration is like turning on the lights in a dark room!</h3>
<p>Before we can find patterns, we need to understand what our data looks like. It's like getting to know a new friend - we ask basic questions first!</p>
</div>

### 📋 Basic Questions We Should Ask:
1. **How much data do we have?** (How many restaurant visits?)
2. **What's the range of our numbers?** (Cheapest vs most expensive bill?)
3. **What's typical?** (Average bill, average tip?)
4. **Any surprises?** (Unusually high or low values?)

### 🧮 Basic Statistics We'll Calculate:
- **<span style="color: #e74c3c;">Mean (Average)</span>**: Add all numbers and divide by count
- **<span style="color: #f39c12;">Median</span>**: The middle number when sorted
- **<span style="color: #27ae60;">Minimum</span>**: The smallest number
- **<span style="color: #8e44ad;">Maximum</span>**: The largest number
- **<span style= "color: ;"> Standard Deviation </span>**: Standard deviation is a measure of how spread out a set of data is, relative to its mean (average). 
### 💡 Formula for Average:
$$\text{Average} = \frac{\text{Sum of all values}}{\text{Number of values}}$$

In [None]:
# Let's explore our bill amounts first
import numpy as np

print("🍽️ RESTAURANT BILL ANALYSIS")
print("=" * 40)

# Calculate basic statistics for total bills
bill_mean = np.mean(total_bill)
bill_median = np.median(total_bill)
bill_min = np.min(total_bill)
bill_max = np.max(total_bill)
bill_std = np.std(total_bill)

print(f"💰 Average bill: ${bill_mean:.2f}")
print(f"📊 Median bill: ${bill_median:.2f}")
print(f"💸 Cheapest bill: ${bill_min:.2f}")
print(f"💎 Most expensive bill: ${bill_max:.2f}")
print(f"📈 Standard deviation: ${bill_std:.2f}")

print("\n🎯 TIP ANALYSIS")
print("=" * 40)

# Calculate basic statistics for tips
tip_mean = np.mean(tip)
tip_median = np.median(tip)
tip_min = np.min(tip)
tip_max = np.max(tip)
tip_std = np.std(tip)

print(f"💰 Average tip: ${tip_mean:.2f}")
print(f"📊 Median tip: ${tip_median:.2f}")
print(f"💸 Smallest tip: ${tip_min:.2f}")
print(f"💎 Largest tip: ${tip_max:.2f}")
print(f"📈 Standard deviation: ${tip_std:.2f}")

# Calculate tip percentage
tip_percentage = (tip / total_bill) * 100
avg_tip_percentage = np.mean(tip_percentage)

print(f"\n🎯 Average tip percentage: {avg_tip_percentage:.1f}%")

# 📊 Step 3: Understanding Correlation

## 🤝 What is Correlation?

<div style="background-color: #ffeaa7; padding: 15px; border-radius: 10px; border-left: 5px solid #fdcb6e;">
<h3>🔗 Correlation shows if two things are connected!</h3>
<p>Think of it like friendship - do two things tend to increase together, decrease together, or have no relationship at all?</p>
</div>

### 🎯 Types of Correlation:

| Type | Symbol | Meaning | Example |
|------|--------|---------|----------|
| **<span style="color: #27ae60;">Positive</span>** | **↗️** | When one goes up, the other goes up | More study time → Better grades |
| **<span style="color: #e74c3c;">Negative</span>** | **↘️** | When one goes up, the other goes down | More TV time → Lower grades |
| **<span style="color: #95a5a6;">No Correlation</span>** | **↔️** | No clear pattern | Shoe size → Test scores |

### 📏 Correlation Values:
- **+1.0**: Perfect positive correlation (best friends! 👫)
- **0.0**: No correlation (strangers 🤷‍♂️)
- **-1.0**: Perfect negative correlation (opposites! ↔️)

### 🤔 Our Question:
**Do people tip more when their bill is higher?**

### 🧮 Correlation Formula:
$$r = \frac{\sum(x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum(x_i - \bar{x})^2 \sum(y_i - \bar{y})^2}}$$

*Don't worry about the math - NumPy does this for us!*

In [None]:
# Calculate correlation between bill amount and tip amount
correlation_matrix = np.corrcoef(total_bill, tip)
bill_tip_correlation = correlation_matrix[0, 1]

print("🔍 CORRELATION ANALYSIS")
print("=" * 40)
print(f"🎯 Correlation between bill and tip: {bill_tip_correlation:.3f}")

# Interpret the correlation
if bill_tip_correlation > 0.7:
    interpretation = "🔥 Strong positive correlation! Higher bills = Higher tips"
elif bill_tip_correlation > 0.3:
    interpretation = "📈 Moderate positive correlation! Bills and tips tend to increase together"
elif bill_tip_correlation > -0.3:
    interpretation = "🤷‍♂️ Weak or no correlation! Bills and tips aren't strongly related"
else:
    interpretation = "📉 Negative correlation! Higher bills might mean lower tips"

print(f"💡 What this means: {interpretation}")

# Let's also look at other correlations
print("\n🔍 OTHER INTERESTING CORRELATIONS:")
print("-" * 40)

# Correlation between bill and party size
bill_size_correlation = np.corrcoef(total_bill, party_size)[0, 1]
print(f"👥 Bill amount vs Party size: {bill_size_correlation:.3f}")

# Correlation between tip and party size
tip_size_correlation = np.corrcoef(tip, party_size)[0, 1]
print(f"💰 Tip amount vs Party size: {tip_size_correlation:.3f}")

# 📈 Step 4: Data Visualization - Scatter Plot

## 🎨 Why Do We Need Charts and Graphs?

<div style="background-color: #dda0dd; padding: 15px; border-radius: 10px; border-left: 5px solid #9370db;">
<h3>👀 A picture is worth a thousand numbers!</h3>
<p>Imagine trying to describe your best friend using only numbers - their height, weight, age. You'd miss so much! Charts help us <strong>see</strong> patterns that numbers alone can't show.</p>
</div>

### 🔍 What is a Scatter Plot?
A **scatter plot** is like plotting points on a treasure map! Each point represents one restaurant visit:
- **X-axis (horizontal)**: Total bill amount 💵
- **Y-axis (vertical)**: Tip amount 💰
- **Each dot**: One customer's visit 🔴

### 🎯 What We're Looking For:
- **📈 Upward trend**: Higher bills → Higher tips
- **📉 Downward trend**: Higher bills → Lower tips  
- **➡️ No trend**: Bills and tips aren't related
- **🔍 Outliers**: Unusual points that don't fit the pattern

### 💡 Reading the Chart:
If we see dots forming a line going **up and to the right** ↗️, it means people tip more when they spend more!

In [None]:
import warnings
warnings.filterwarnings("ignore")

# Create our first visualization: Scatter plot

plt.figure(figsize=(10, 6))

# Create the scatter plot
plt.scatter(total_bill, tip, color='dodgerblue', alpha=0.7, s=100, edgecolors='black', linewidth=1)

# Add a trend line to help see the pattern
z = np.polyfit(total_bill, tip, 1)  # Create a line of best fit
p = np.poly1d(z)
plt.plot(total_bill, p(total_bill), "r--", alpha=0.8, linewidth=2, label=f'Trend line')

# Customize the chart
plt.title('💰 Restaurant Bills vs Tips: Do Higher Bills Mean Higher Tips?', fontsize=16, fontweight='bold')
plt.xlabel('💵 Total Bill Amount ($)', fontsize=12)
plt.ylabel('💰 Tip Amount ($)', fontsize=12)
plt.grid(True, alpha=0.3)
plt.legend()

# Add some annotations
plt.text(30, 1, f'Correlation: {bill_tip_correlation:.3f}', 
         bbox=dict(boxstyle="round,pad=0.3", facecolor="yellow", alpha=0.7),
         fontsize=11, fontweight='bold')

# Show the plot
plt.tight_layout()
plt.show()

print("🎯 What do you see in this chart?")
print("📈 Look for the pattern: Do the dots generally go up from left to right?")
print("🔍 Any dots that seem unusual or far from the others?")

# 📊 Step 5: Distribution Analysis - Histograms

## 📈 What is a Histogram?

<div style="background-color: #e8f5e8; padding: 15px; border-radius: 10px; border-left: 5px solid #4caf50;">
<h3>📚 Think of a histogram like organizing books on shelves!</h3>
<p>Imagine you have a bunch of books of different heights. A histogram groups similar heights together and shows you how many books are in each height range. It shows the <strong>shape</strong> of your data!</p>
</div>

### 🎯 What Histograms Tell Us:
- **📊 Shape**: Is the data spread out evenly or bunched up?
- **🎯 Center**: Where do most values cluster?
- **📏 Spread**: Are values close together or spread far apart?
- **🔍 Outliers**: Any values that are unusually high or low?

### 🏔️ Common Shapes:
| Shape | Description | What it means |
|-------|-------------|---------------|
| **🔔 Bell Curve** | Most values in middle, few at extremes | Normal, balanced data |
| **⛰️ Left Skewed** | Long tail on the left | Few very low values |
| **🏔️ Right Skewed** | Long tail on the right | Few very high values |
| **📊 Uniform** | All bars similar height | Values spread evenly |

### 🤔 Our Questions:
- **Do most people tip similar amounts?**
- **Are there any extremely generous (or stingy) tippers?**
- **What's the most common tip range?**

In [None]:
# Create histograms to see the distribution of our data
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# Histogram 1: Bill amounts
ax1.hist(total_bill, bins=8, color='lightcoral', alpha=0.7, edgecolor='black', linewidth=1)
ax1.axvline(np.mean(total_bill), color='red', linestyle='--', linewidth=2, label=f'Average: ${np.mean(total_bill):.2f}')
ax1.axvline(np.median(total_bill), color='blue', linestyle='--', linewidth=2, label=f'Median: ${np.median(total_bill):.2f}')
ax1.set_title('💵 Distribution of Restaurant Bills', fontsize=14, fontweight='bold')
ax1.set_xlabel('Bill Amount ($)', fontsize=12)
ax1.set_ylabel('Number of Customers', fontsize=12)
ax1.legend()
ax1.grid(True, alpha=0.3)

# Histogram 2: Tip amounts
ax2.hist(tip, bins=8, color='lightgreen', alpha=0.7, edgecolor='black', linewidth=1)
ax2.axvline(np.mean(tip), color='red', linestyle='--', linewidth=2, label=f'Average: ${np.mean(tip):.2f}')
ax2.axvline(np.median(tip), color='blue', linestyle='--', linewidth=2, label=f'Median: ${np.median(tip):.2f}')
ax2.set_title('💰 Distribution of Tips', fontsize=14, fontweight='bold')
ax2.set_xlabel('Tip Amount ($)', fontsize=12)
ax2.set_ylabel('Number of Customers', fontsize=12)
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Analyze the distributions
print("🔍 DISTRIBUTION ANALYSIS")
print("=" * 40)
print(f"📊 Most common bill range: ${np.percentile(total_bill, 25):.2f} - ${np.percentile(total_bill, 75):.2f}")
print(f"💰 Most common tip range: ${np.percentile(tip, 25):.2f} - ${np.percentile(tip, 75):.2f}")
print(f"\n💡 25% of customers spend less than ${np.percentile(total_bill, 25):.2f}")
print(f"💡 25% of customers tip less than ${np.percentile(tip, 25):.2f}")

# 👥 Step 6: Categorical Analysis - Bar Charts

## 📊 Understanding Categories in Data

<div style="background-color: #fff0f5; padding: 15px; border-radius: 10px; border-left: 5px solid #ff69b4;">
<h3>🏷️ Not all data is numbers - some data is labels!</h3>
<p><strong>Categorical data</strong> is like putting things into different boxes or groups. Instead of measuring <em>how much</em>, we're looking at <em>what type</em> or <em>which group</em>.</p>
</div>

### 🎯 Our Categories:
- **👤 Gender**: Male vs Female customers
- **🚬 Smoker**: Smoking vs Non-smoking section
- **📅 Day**: Different days of the week
- **🕐 Time**: Lunch vs Dinner

### 🤔 Questions We Want to Answer:
1. **Who tips more on average - men or women?**
2. **Do people in the smoking section tip differently?**
3. **Which day of the week gets the best tips?**
4. **Do dinner customers tip more than lunch customers?**

### 📊 Why Bar Charts?
Bar charts are perfect for comparing **averages between groups**. Each bar shows the average tip for that category, making it easy to see which group tips more!

### 📏 How to Read Bar Charts:
- **Taller bars** = Higher average tips 📈
- **Shorter bars** = Lower average tips 📉
- **Similar heights** = Groups tip about the same 🤝

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Sample data (you'll need to define these variables first)
# tip = np.array([1.01, 1.66, 3.50, 3.31, 3.61, 4.71, 2.00, 3.12, 1.96, 3.23,
#                1.71, 5.00, 1.57, 3.00, 3.02, 3.92, 1.67, 3.71, 3.50, 3.35])
# gender = ['Female', 'Male', 'Male', 'Male', 'Female', 'Male', 'Male', 'Male', 'Male', 'Female',
#          'Male', 'Female', 'Female', 'Male', 'Female', 'Female', 'Female', 'Female', 'Female', 'Male']
# smoker = ['No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No',
#          'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'Yes']
# day = ['Sun', 'Sun', 'Sun', 'Sun', 'Sun', 'Sun', 'Sun', 'Sun', 'Sun', 'Sun',
#       'Sun', 'Sun', 'Sun', 'Sun', 'Sun', 'Sun', 'Sun', 'Sun', 'Sun', 'Sat']
# party_size = np.array([2, 3, 3, 2, 4, 4, 2, 4, 2, 2, 2, 3, 2, 2, 2, 2, 3, 3, 2, 3])

# Analyze categorical data with bar charts
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(15, 12))

# 1. Tips by Gender
male_tips = [tip[i] for i in range(len(gender)) if gender[i] == 'Male']
female_tips = [tip[i] for i in range(len(gender)) if gender[i] == 'Female']

avg_male_tip = np.mean(male_tips)
avg_female_tip = np.mean(female_tips)

ax1.bar(['👨 Male', '👩 Female'], [avg_male_tip, avg_female_tip], 
        color=['lightblue', 'pink'], alpha=0.7, edgecolor='black')
ax1.set_title('💰 Average Tips by Gender', fontsize=14, fontweight='bold')
ax1.set_ylabel('Average Tip ($)')
ax1.grid(True, alpha=0.3)

# Add value labels on bars
ax1.text(0, avg_male_tip + 0.1, f'${avg_male_tip:.2f}', ha='center', fontweight='bold')
ax1.text(1, avg_female_tip + 0.1, f'${avg_female_tip:.2f}', ha='center', fontweight='bold')

# 2. Tips by Smoking Status
smoker_tips = [tip[i] for i in range(len(smoker)) if smoker[i] == 'Yes']
nonsmoker_tips = [tip[i] for i in range(len(smoker)) if smoker[i] == 'No']

avg_smoker_tip = np.mean(smoker_tips) if smoker_tips else 0
avg_nonsmoker_tip = np.mean(nonsmoker_tips)

ax2.bar(['🚬 Smoker', '🚭 Non-Smoker'], [avg_smoker_tip, avg_nonsmoker_tip], 
        color=['orange', 'lightgreen'], alpha=0.7, edgecolor='black')
ax2.set_title('💰 Average Tips: Smokers vs Non-Smokers', fontsize=14, fontweight='bold')
ax2.set_ylabel('Average Tip ($)')
ax2.grid(True, alpha=0.3)

# Add value labels
if smoker_tips:
    ax2.text(0, avg_smoker_tip + 0.1, f'${avg_smoker_tip:.2f}', ha='center', fontweight='bold')
ax2.text(1, avg_nonsmoker_tip + 0.1, f'${avg_nonsmoker_tip:.2f}', ha='center', fontweight='bold')

# 3. Tips by Day of Week
unique_days = list(set(day))
day_averages = []
for d in unique_days:
    day_tips = [tip[i] for i in range(len(day)) if day[i] == d]
    day_averages.append(np.mean(day_tips))

ax3.bar(unique_days, day_averages, color=['gold', 'coral'], alpha=0.7, edgecolor='black')
ax3.set_title('📅 Average Tips by Day of Week', fontsize=14, fontweight='bold')
ax3.set_ylabel('Average Tip ($)')
ax3.grid(True, alpha=0.3)

# Add value labels
for i, avg in enumerate(day_averages):
    ax3.text(i, avg + 0.1, f'${avg:.2f}', ha='center', fontweight='bold')

# 4. Tips by Party Size
unique_sizes = sorted(list(set(party_size)))
size_averages = []
for size in unique_sizes:
    size_tips = [tip[i] for i in range(len(party_size)) if party_size[i] == size]
    size_averages.append(np.mean(size_tips))

ax4.bar([f'{size} people' for size in unique_sizes], size_averages, 
        color=['lightcyan', 'lightyellow', 'lightpink'], alpha=0.7, edgecolor='black')
ax4.set_title('👥 Average Tips by Party Size', fontsize=14, fontweight='bold')
ax4.set_ylabel('Average Tip ($)')
ax4.grid(True, alpha=0.3)

# Add value labels
for i, avg in enumerate(size_averages):
    ax4.text(i, avg + 0.1, f'${avg:.2f}', ha='center', fontweight='bold')

plt.tight_layout()
plt.show()

# Print insights
print("🔍 CATEGORICAL ANALYSIS INSIGHTS")
print("=" * 40)
print(f"👤 Gender: {'Men' if avg_male_tip > avg_female_tip else 'Women'} tip more on average")
print(f"🚬 Smoking: {'Smokers' if avg_smoker_tip > avg_nonsmoker_tip else 'Non-smokers'} tip more on average")
print(f"📅 Best day for tips: {unique_days[day_averages.index(max(day_averages))]}")
print(f"👥 Best party size for tips: {unique_sizes[size_averages.index(max(size_averages))]} people")

# 🧮 Step 8: Advanced Statistics - Percentiles

## 📊 What are Percentiles?

<div style="background-color: #ffe4e1; padding: 15px; border-radius: 10px; border-left: 5px solid #ff6347;">
<h3>🎯 Percentiles are like class rankings!</h3>
<p>If you scored in the <strong>90th percentile</strong> on a test, it means you did better than 90% of all students. Percentiles help us understand where any value stands compared to all other values!</p>
</div>

### 🎓 Understanding Percentiles:
| Percentile | Meaning | Example |
|------------|---------|----------|
| **10th** | 📉 Bottom 10% | Only 10% of customers tip less |
| **25th (Q1)** | 📊 Bottom quarter | 25% of customers tip less |
| **50th (Median)** | 🎯 Middle | Half tip more, half tip less |
| **75th (Q3)** | 📈 Top quarter | Only 25% tip more |
| **90th** | 🏆 Top 10% | Only 10% of customers tip more |

### 🤔 Practical Questions:
- **What tip amount puts you in the "generous tipper" category?**
- **What's considered a "typical" tip?**
- **How much do the most generous 10% of customers tip?**

### 📐 Formula for Percentile:
To find the **Pth percentile**:
$\text{Position} = \frac{P}{100} \times (n + 1)$
Where **n** = number of data points

*But don't worry - NumPy calculates this for us!*

In [None]:
# Calculate percentiles for tips
percentiles = [10, 25, 50, 75, 90, 95]
tip_percentiles = [np.percentile(tip, p) for p in percentiles]

print("🎯 TIP PERCENTILES ANALYSIS")
print("=" * 50)
print("💰 What different tip amounts mean:")
print()

interpretations = [
    "😔 Low tipper (bottom 10%)",
    "📉 Below average tipper", 
    "🎯 Typical tipper (median)",
    "📈 Good tipper",
    "💎 Generous tipper (top 10%)",
    "🏆 Extremely generous (top 5%)"
]

for p, value, interp in zip(percentiles, tip_percentiles, interpretations):
    print(f"{p:2d}th percentile: ${value:5.2f} - {interp}")

# Calculate tip percentages percentiles
tip_percentage = (tip / total_bill) * 100
percentage_percentiles = [np.percentile(tip_percentage, p) for p in percentiles]

print("\n🎯 TIP PERCENTAGE ANALYSIS")
print("=" * 50)
print("📊 What different tip percentages mean:")
print()

for p, value, interp in zip(percentiles, percentage_percentiles, interpretations):
    print(f"{p:2d}th percentile: {value:5.1f}% - {interp}")

# Create a visual representation
plt.figure(figsize=(12, 8))

# Create two subplots
plt.subplot(2, 1, 1)
plt.bar(range(len(percentiles)), tip_percentiles, 
        color=['red', 'orange', 'yellow', 'lightgreen', 'green', 'darkgreen'], alpha=0.7)
plt.title('💰 Tip Amounts by Percentile', fontsize=14, fontweight='bold')
plt.ylabel('Tip Amount ($)')
plt.xticks(range(len(percentiles)), [f'{p}th' for p in percentiles])
plt.grid(True, alpha=0.3)

# Add value labels
for i, value in enumerate(tip_percentiles):
    plt.text(i, value + 0.1, f'${value:.2f}', ha='center', fontweight='bold')

plt.subplot(2, 1, 2)
plt.bar(range(len(percentiles)), percentage_percentiles,
        color=['red', 'orange', 'yellow', 'lightgreen', 'green', 'darkgreen'], alpha=0.7)
plt.title('📊 Tip Percentages by Percentile', fontsize=14, fontweight='bold')
plt.ylabel('Tip Percentage (%)')
plt.xlabel('Percentile')
plt.xticks(range(len(percentiles)), [f'{p}th' for p in percentiles])
plt.grid(True, alpha=0.3)

# Add value labels
for i, value in enumerate(percentage_percentiles):
    plt.text(i, value + 0.5, f'{value:.1f}%', ha='center', fontweight='bold')

plt.tight_layout()
plt.show()

print(f"\n💡 INSIGHTS:")
print(f"🎯 A 'typical' tip is ${np.median(tip):.2f} or {np.median(tip_percentage):.1f}%")
print(f"💎 To be a 'generous' tipper, you need to tip at least ${tip_percentiles[4]:.2f} or {percentage_percentiles[4]:.1f}%")
print(f"🏆 The top 5% of tippers give ${tip_percentiles[5]:.2f} or more ({percentage_percentiles[5]:.1f}%+ of bill)")

# 🎯 Step 9: Final Insights and Conclusions

## 🕵️ What Did We Discover?

<div style="background-color: #f0f8ff; padding: 20px; border-radius: 15px; border: 3px solid #4169e1;">
<h3>🎉 Congratulations! You've completed your first data analysis project!</h3>
<p>Let's summarize all the interesting patterns and insights we discovered about restaurant tipping behavior.</p>
</div>

### 📊 Our Data Science Journey:
1. **🔍 Data Exploration** - We loaded and examined our restaurant data
2. **📈 Correlation Analysis** - We found relationships between variables
3. **📊 Visualization** - We created charts to see patterns
4. **📦 Distribution Analysis** - We understood how our data is spread
5. **👥 Category Comparison** - We compared different groups
6. **🎯 Statistical Analysis** - We calculated percentiles and rankings

### 🏆 Key Skills You've Learned:
- **NumPy**: Working with arrays and mathematical operations
- **Matplotlib**: Creating beautiful and informative visualizations
- **Statistics**: Understanding correlation, percentiles, and distributions
- **Critical thinking**: Asking questions and finding answers in data

### 🚀 Next Steps in Your Data Science Journey:
- **🐼 Learn Pandas**: For handling larger, more complex datasets
- **🤖 Machine Learning**: Predicting future outcomes
- **📈 Advanced Visualization**: Interactive charts and dashboards
- **🔍 Real Projects**: Analyze data from your own interests!

In [None]:
# Final summary of all our discoveries
import warnings
warnings.filterwarnings("ignore")
print("🎯 RESTAURANT TIPS: FINAL INSIGHTS SUMMARY")
print("=" * 60)
print()

# Correlation insights
correlation = np.corrcoef(total_bill, tip)[0, 1]
print(f"💰 SPENDING vs TIPPING:")
print(f"   Correlation: {correlation:.3f}")
if correlation > 0.7:
    print(f"   💡 Strong relationship: Higher bills = Higher tips!")
else:
    print(f"   💡 Moderate relationship: Bills somewhat predict tips")

print()

# Gender insights
male_tips = [tip[i] for i in range(len(gender)) if gender[i] == 'Male']
female_tips = [tip[i] for i in range(len(gender)) if gender[i] == 'Female']
male_avg = np.mean(male_tips)
female_avg = np.mean(female_tips)

print(f"👥 GENDER DIFFERENCES:")
print(f"   👨 Men average: ${male_avg:.2f}")
print(f"   👩 Women average: ${female_avg:.2f}")
print(f"   💡 {'Men' if male_avg > female_avg else 'Women'} tip ${abs(male_avg - female_avg):.2f} more on average")

print()

# Party size insights
unique_sizes = sorted(list(set(party_size)))
size_averages = []
for size in unique_sizes:
    size_tips = [tip[i] for i in range(len(party_size)) if party_size[i] == size]
    size_averages.append(np.mean(size_tips))

best_size = unique_sizes[size_averages.index(max(size_averages))]
best_avg = max(size_averages)

print(f"👥 PARTY SIZE INSIGHTS:")
print(f"   🏆 Best tipping group size: {best_size} people")
print(f"   💰 They tip an average of: ${best_avg:.2f}")

print()

# Overall statistics
tip_percentage = (tip / total_bill) * 100
print(f"📊 OVERALL STATISTICS:")
print(f"   🎯 Average tip: ${np.mean(tip):.2f} ({np.mean(tip_percentage):.1f}% of bill)")
print(f"   📈 Most generous tip: ${np.max(tip):.2f} ({np.max(tip_percentage):.1f}% of bill)")
print(f"   📉 Smallest tip: ${np.min(tip):.2f} ({np.min(tip_percentage):.1f}% of bill)")
print(f"   💎 Top 10% of tippers give: ${np.percentile(tip, 90):.2f}+ ({np.percentile(tip_percentage, 90):.1f}%+ of bill)")

print()
print("🎉 CONGRATULATIONS! 🎉")
print("You've successfully completed your first data analysis project!")
print("You're now a data detective! 🕵️‍♂️🔍")

# Create a final summary visualization
plt.figure(figsize=(12, 8))

# Create a comprehensive dashboard
plt.subplot(2, 2, 1)
plt.scatter(total_bill, tip, color='blue', alpha=0.6)
plt.title('💰 Bills vs Tips')
plt.xlabel('Bill ($)')
plt.ylabel('Tip ($)')

plt.subplot(2, 2, 2)
plt.hist(tip_percentage, bins=6, color='green', alpha=0.7)
plt.title('📊 Tip Percentage Distribution')
plt.xlabel('Tip %')
plt.ylabel('Count')

plt.subplot(2, 2, 3)
plt.bar(['👨 Male', '👩 Female'], [male_avg, female_avg], color=['lightblue', 'pink'])
plt.title('👥 Average Tips by Gender')
plt.ylabel('Tip ($)')

plt.subplot(2, 2, 4)
plt.bar([f'{size}p' for size in unique_sizes], size_averages, color=['cyan', 'yellow', 'orange'])
plt.title('👥 Tips by Party Size')
plt.ylabel('Tip ($)')

plt.suptitle('🍽️ Restaurant Tips Analysis Dashboard', fontsize=16, fontweight='bold')
plt.tight_layout()
plt.show()