# Example 1: Quick Start Data Analysis

---

**Author:** Brandon Deloatch
**Affiliation:** Quipu Research Labs, LLC
**Date:** 2025-10-02
**Version:** v1.0
**License:** MIT
**Example Type:** Getting Started Tutorial
**Based On:** Tier1_Descriptive.ipynb
**Estimated Time:** 15 minutes

---

> **Citation:**
> Brandon Deloatch, "Example 1: Quick Start Data Analysis," Quipu Research Labs, LLC, v1.0, 2025-10-02.

---

*This example notebook is provided "as-is" for educational and research purposes. Users assume full responsibility for any results or applications derived from it.*

---

## Quick Start Guide to Coffee Sales Analytics

**Learning Objectives:**
- Master basic data loading and preprocessing
- Calculate comprehensive descriptive statistics
- Create professional business visualizations
- Perform statistical tests for business insights
- Generate actionable recommendations

**Cross-References:**
- **Foundation:** `Tier1_Descriptive.ipynb` (statistical foundations)
- **Next Steps:** `machine_learning_example.ipynb` (predictive modeling)
- **Advanced:** `time_series_example.ipynb` (temporal analysis)

**Key Applications:**
- Retail analytics and sales optimization
- Business intelligence dashboards
- Data quality assessment
- Exploratory data analysis workflows

## 1. Import Required Libraries\n
\n
Start by importing the essential libraries for data analysis:

In [11]:
"""
Example 1: Quick Start Data Analysis.

This module demonstrates basic data analysis techniques using real coffee sales data.
Covers descriptive statistics, visualization, and statistical testing.

Author: Brandon Deloatch
Date: 2025-10-02
"""

# Example 1: Quick Start Data Analysis
# ====================================
# Professional coffee sales analytics with real business datasets

import warnings
from scipy.stats import f_oneway, ttest_ind

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go
from plotly.subplots import make_subplots

warnings.filterwarnings('ignore')

# Set style for better visualizations
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

print("Example 1: Quick Start Data Analysis")
print("=" * 50)
print("CROSS-REFERENCES:")
print("• Prerequisites: None (Entry point for examples)")
print("• Next Steps: machine_learning_example.ipynb (predictive modeling)")
print("• Next Steps: time_series_example.ipynb (temporal analysis)")
print("• Foundation: Tier1_Descriptive.ipynb (statistical theory)")
print("• Full Guide: See main notebooks directory for complete analytics suite")
print(" Libraries loaded successfully - Ready for coffee sales analysis!")

Example 1: Quick Start Data Analysis
CROSS-REFERENCES:
• Prerequisites: None (Entry point for examples)
• Next Steps: machine_learning_example.ipynb (predictive modeling)
• Next Steps: time_series_example.ipynb (temporal analysis)
• Foundation: Tier1_Descriptive.ipynb (statistical theory)
• Full Guide: See main notebooks directory for complete analytics suite
 Libraries loaded successfully - Ready for coffee sales analysis!


## 2. Load Coffee Sales Dataset

Load and explore our real Coffee Sales dataset:

In [12]:
# Load the Coffee Sales dataset (tab-separated)
df = pd.read_csv('../data/Coffee_sales.csv', sep='\t')

# Basic data preprocessing
df['Date'] = pd.to_datetime(df['Date'])
df['money'] = pd.to_numeric(df['money'], errors='coerce')
df = df.dropna(subset=['money'])

# Add some useful calculated fields
df['revenue'] = df['money']
df['day_name'] = df['Date'].dt.day_name()
df['month_name'] = df['Month_name']
df['is_weekend'] = df['Weekday'].isin(['Sat', 'Sun'])

print("Coffee Sales Dataset loaded successfully!")
print(f"Shape: {df.shape}")
print(f"Date range: {df['Date'].min().date()} to {df['Date'].max().date()}")
print(f"Total revenue: ${df['revenue'].sum():,.2f}")
print(f"Average transaction: ${df['revenue'].mean():.2f}")
print(f"Coffee types: {df['coffee_name'].nunique()}")
print(f"Payment methods: {', '.join(df['cash_type'].unique())}")

df.head()

Coffee Sales Dataset loaded successfully!
Shape: (3547, 15)
Date range: 2024-03-01 to 2025-03-23
Total revenue: $112,245.58
Average transaction: $31.65
Coffee types: 8
Payment methods: card


Unnamed: 0,hour_of_day,cash_type,money,coffee_name,Time_of_Day,Weekday,Month_name,Weekdaysort,Monthsort,Date,Time,revenue,day_name,month_name,is_weekend
0,10,card,38.7,Latte,Morning,Fri,Mar,5,3,2024-03-01,10:15:50.520000,38.7,Friday,Mar,False
1,12,card,38.7,Hot Chocolate,Afternoon,Fri,Mar,5,3,2024-03-01,12:19:22.539000,38.7,Friday,Mar,False
2,12,card,38.7,Hot Chocolate,Afternoon,Fri,Mar,5,3,2024-03-01,12:20:18.089000,38.7,Friday,Mar,False
3,13,card,28.9,Americano,Afternoon,Fri,Mar,5,3,2024-03-01,13:46:33.006000,28.9,Friday,Mar,False
4,13,card,38.7,Latte,Afternoon,Fri,Mar,5,3,2024-03-01,13:48:14.626000,38.7,Friday,Mar,False


## 3. Basic Data Exploration\n
\n
Let's explore the dataset structure and basic properties:

In [13]:
# Dataset overview
print("=== COFFEE SALES DATASET OVERVIEW ===")
print(f"Shape: {df.shape}")
print("\nColumn types:")
print(df.dtypes)

print("\nMissing values:")
print(df.isnull().sum())

print("\nBasic statistics for revenue:")
print(df['revenue'].describe())

=== COFFEE SALES DATASET OVERVIEW ===
Shape: (3547, 15)

Column types:
hour_of_day             int64
cash_type              object
money                 float64
coffee_name            object
Time_of_Day            object
Weekday                object
Month_name             object
Weekdaysort             int64
Monthsort               int64
Date           datetime64[ns]
Time                   object
revenue               float64
day_name               object
month_name             object
is_weekend               bool
dtype: object

Missing values:
hour_of_day    0
cash_type      0
money          0
coffee_name    0
Time_of_Day    0
Weekday        0
Month_name     0
Weekdaysort    0
Monthsort      0
Date           0
Time           0
revenue        0
day_name       0
month_name     0
is_weekend     0
dtype: int64

Basic statistics for revenue:
count    3547.000000
mean       31.645216
std         4.877754
min        18.120000
25%        27.920000
50%        32.820000
75%        35.760000
ma

## 4. Descriptive Statistics\n
\n
Calculate comprehensive descriptive statistics:

In [14]:
# Calculate comprehensive descriptive statistics
print("COFFEE SALES DESCRIPTIVE STATISTICS:")
print("=" * 50)

# Revenue statistics
print("\nREVENUE ANALYSIS:")
print(f"• Total Revenue: ${df['revenue'].sum():,.2f}")
print(f"• Average Transaction: ${df['revenue'].mean():.2f}")
print(f"• Median Transaction: ${df['revenue'].median():.2f}")
print(f"• Standard Deviation: ${df['revenue'].std():.2f}")
print(f"• Minimum: ${df['revenue'].min():.2f}")
print(f"• Maximum: ${df['revenue'].max():.2f}")

# Payment method check
print(f"\nPAYMENT METHODS:")
payment_counts = df['cash_type'].value_counts()
for method, count in payment_counts.items():
    percentage = (count / len(df)) * 100
    avg_amount = df[df['cash_type'] == method]['revenue'].mean()
    print(f"• {method}: {count} transactions ({percentage:.1f}%) - Avg: ${avg_amount:.2f}")

# Coffee preferences
print("\nCOFFEE PREFERENCES:")
coffee_analysis = df['coffee_name'].value_counts()
for coffee, count in coffee_analysis.head().items():
    percentage = (count / len(df)) * 100
    avg_price = df[df['coffee_name'] == coffee]['revenue'].mean()
    print(f"• {coffee}: {count} orders ({percentage:.1f}%) - Avg: ${avg_price:.2f}")

print("\nTIME PATTERNS:")
time_analysis = df['Time_of_Day'].value_counts()
for time_period, count in time_analysis.items():
    percentage = (count / len(df)) * 100
    avg_revenue = df[df['Time_of_Day'] == time_period]['revenue'].mean()
    print(f"• {time_period}: {count} transactions ({percentage:.1f}%) - Avg: ${avg_revenue:.2f}")

COFFEE SALES DESCRIPTIVE STATISTICS:

REVENUE ANALYSIS:
• Total Revenue: $112,245.58
• Average Transaction: $31.65
• Median Transaction: $32.82
• Standard Deviation: $4.88
• Minimum: $18.12
• Maximum: $38.70

PAYMENT METHODS:
• card: 3547 transactions (100.0%) - Avg: $31.65

COFFEE PREFERENCES:
• Americano with Milk: 809 orders (22.8%) - Avg: $30.59
• Latte: 757 orders (21.3%) - Avg: $35.50
• Americano: 564 orders (15.9%) - Avg: $25.98
• Cappuccino: 486 orders (13.7%) - Avg: $35.88
• Cortado: 287 orders (8.1%) - Avg: $25.73

TIME PATTERNS:
• Afternoon: 1205 transactions (34.0%) - Avg: $31.64
• Morning: 1181 transactions (33.3%) - Avg: $30.42
• Night: 1161 transactions (32.7%) - Avg: $32.89


## 5. Interactive Visualizations\n
\n
Create professional interactive charts:

In [15]:
# Distribution plots for coffee sales data
fig = go.Figure()
fig.add_trace(go.Histogram(
    x=df['revenue'], 
    nbinsx=30,
    name='Revenue Distribution',
    marker_color='lightblue'
))
fig.update_layout(
    title='Transaction Revenue Distribution',
    xaxis_title='Transaction Amount ($)',
    yaxis_title='Number of Transactions',
    showlegend=False
)
fig.show()

# Coffee type popularity
coffee_counts = df['coffee_name'].value_counts().head(10)
fig2 = go.Figure()
fig2.add_trace(go.Bar(
    x=coffee_counts.index,
    y=coffee_counts.values,
    marker_color='brown'
))
fig2.update_layout(
    title='Top 10 Most Popular Coffee Types',
    xaxis_title='Coffee Type',
    yaxis_title='Number of Sales'
)
fig2.show()

In [16]:
# Time of day analysis (more meaningful than payment method since all are cards)
time_analysis = df.groupby('Time_of_Day')['revenue'].agg(['count', 'mean', 'sum'])
fig3 = go.Figure()
fig3.add_trace(go.Bar(
    x=time_analysis.index,
    y=time_analysis['mean'],
    marker_color='green'
))
fig3.update_layout(
    title='Average Transaction Value by Time of Day',
    xaxis_title='Time of Day',
    yaxis_title='Average Revenue ($)'
)
fig3.show()

# Daily sales patterns
daily_sales = df.groupby('day_name')['revenue'].sum()
# Reorder days properly
day_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
daily_sales = daily_sales.reindex([day for day in day_order if day in daily_sales.index])

fig4 = go.Figure()
fig4.add_trace(go.Bar(
    x=daily_sales.index,
    y=daily_sales.values,
    marker_color='orange'
))
fig4.update_layout(
    title='Total Sales by Day of Week',
    xaxis_title='Day of Week',
    yaxis_title='Total Revenue ($)'
)
fig4.show()

## 6. Statistical Analysis\n
\n
Perform basic statistical tests and analysis:

In [17]:
# Create comprehensive visualizations
fig = make_subplots(
 rows=2, cols=2,
 subplot_titles=['Revenue Distribution', 'Sales by Coffee Type',
 'Sales by Time of Day', 'Daily Revenue Trend'],
 specs=[[{"secondary_y": False}, {"secondary_y": False}],
 [{"secondary_y": False}, {"secondary_y": False}]]
)

# 1. Revenue distribution histogram
fig.add_trace(go.Histogram(
 x=df['revenue'],
 name='Revenue Distribution',
 nbinsx=30,
 marker_color='lightblue'
), row=1, col=1)

# 2. Sales by coffee type
coffee_counts = df['coffee_name'].value_counts().head(10)
fig.add_trace(go.Bar(
 x=coffee_counts.index,
 y=coffee_counts.values,
 name='Coffee Sales',
 marker_color='brown'
), row=1, col=2)

# 3. Sales by time of day
time_counts = df['Time_of_Day'].value_counts()
fig.add_trace(go.Bar(
 x=time_counts.index,
 y=time_counts.values,
 name='Time Analysis',
 marker_color='orange'
), row=2, col=1)

# 4. Daily revenue trend
daily_revenue = df.groupby('Date')['revenue'].sum().reset_index()
fig.add_trace(go.Scatter(
 x=daily_revenue['Date'],
 y=daily_revenue['revenue'],
 mode='lines',
 name='Daily Revenue',
 line={'color': 'green', 'width': 2}
), row=2, col=2)

fig.update_layout(
 height=800,
 title_text="Coffee Sales Analysis Dashboard",
 showlegend=False
)

fig.show()

## 7. Key Insights Summary\n
\n
Generate a professional summary of findings:

In [18]:
# Perform statistical tests and generate insights
print("STATISTICAL ANALYSIS & INSIGHTS:")
print("=" * 50)

# 1. Revenue comparison by time of day
morning_revenue = df[df['Time_of_Day'] == 'Morning']['revenue']
afternoon_revenue = df[df['Time_of_Day'] == 'Afternoon']['revenue']
night_revenue = df[df['Time_of_Day'] == 'Night']['revenue']

# Statistical test - ANOVA for time periods
stat, p_value = f_oneway(morning_revenue, afternoon_revenue, night_revenue)
print("\n1. REVENUE BY TIME OF DAY (ANOVA Test):")
print(f"• F-statistic: {stat:.4f}")
print(f"• P-value: {p_value:.6f}")
if p_value < 0.05:
    print("• Result: Significant difference in revenue by time of day!")
else:
    print("• Result: No significant difference in revenue by time of day")

# 2. Weekend vs Weekday analysis (since all payments are cards)
weekend_revenue = df[df['is_weekend']]['revenue']
weekday_revenue = df[~df['is_weekend']]['revenue']

if len(weekend_revenue) > 0:
    stat, p_value = ttest_ind(weekend_revenue, weekday_revenue)
    print("\n2. WEEKEND vs WEEKDAY COMPARISON (T-Test):")
    print(f"• T-statistic: {stat:.4f}")
    print(f"• P-value: {p_value:.6f}")
    print(f"• Weekend avg: ${weekend_revenue.mean():.2f}")
    print(f"• Weekday avg: ${weekday_revenue.mean():.2f}")
else:
    print("\n2. WEEKEND vs WEEKDAY COMPARISON:")
    print("• No weekend data available in current dataset")

# 3. Coffee type profitability
print("\n3. TOP PERFORMING COFFEE TYPES:")
coffee_stats = df.groupby('coffee_name').agg({
    'revenue': ['count', 'mean', 'sum']
}).round(2)
coffee_stats.columns = ['Orders', 'Avg_Price', 'Total_Revenue']
coffee_stats = coffee_stats.sort_values('Total_Revenue', ascending=False)

for coffee in coffee_stats.head().index:
    orders = coffee_stats.loc[coffee, 'Orders']
    avg_price = coffee_stats.loc[coffee, 'Avg_Price']
    total = coffee_stats.loc[coffee, 'Total_Revenue']
    print(f"• {coffee}: {orders} orders, ${avg_price} avg, ${total:,.0f} total")

# 4. Time-based insights
print("\n4. TIME-BASED PATTERNS:")
time_stats = df.groupby('Time_of_Day')['revenue'].agg(['count', 'mean']).round(2)
for time_period in time_stats.index:
    count = time_stats.loc[time_period, 'count']
    avg = time_stats.loc[time_period, 'mean']
    print(f"• {time_period}: {count} transactions, ${avg} average")

# 5. Business insights
print("\n5. KEY BUSINESS INSIGHTS:")
total_revenue = df['revenue'].sum()
total_transactions = len(df)
avg_transaction = df['revenue'].mean()
best_day = df.groupby('day_name')['revenue'].sum().idxmax()
best_coffee = df.groupby('coffee_name')['revenue'].sum().idxmax()
best_time = df.groupby('Time_of_Day')['revenue'].mean().idxmax()

print(f"• Total Revenue: ${total_revenue:,.2f}")
print(f"• Total Transactions: {total_transactions:,}")
print(f"• Average Transaction: ${avg_transaction:.2f}")
print(f"• Best Day: {best_day}")
print(f"• Top Coffee: {best_coffee}")
print(f"• Best Time Period: {best_time}")
print("• Business Strategy: Focus on premium coffee types during peak periods")
print(f"• Product Strategy: Promote {best_coffee} during {best_time.lower()} hours")

STATISTICAL ANALYSIS & INSIGHTS:

1. REVENUE BY TIME OF DAY (ANOVA Test):
• F-statistic: 78.2178
• P-value: 0.000000
• Result: Significant difference in revenue by time of day!

2. WEEKEND vs WEEKDAY COMPARISON (T-Test):
• T-statistic: -0.5005
• P-value: 0.616764
• Weekend avg: $31.57
• Weekday avg: $31.67

3. TOP PERFORMING COFFEE TYPES:
• Latte: 757 orders, $35.5 avg, $26,875 total
• Americano with Milk: 809 orders, $30.59 avg, $24,751 total
• Cappuccino: 486 orders, $35.88 avg, $17,439 total
• Americano: 564 orders, $25.98 avg, $14,650 total
• Hot Chocolate: 276 orders, $35.99 avg, $9,933 total

4. TIME-BASED PATTERNS:
• Afternoon: 1205 transactions, $31.64 average
• Morning: 1181 transactions, $30.42 average
• Night: 1161 transactions, $32.89 average

5. KEY BUSINESS INSIGHTS:
• Total Revenue: $112,245.58
• Total Transactions: 3,547
• Average Transaction: $31.65
• Best Day: Tuesday
• Top Coffee: Latte
• Best Time Period: Night
• Business Strategy: Focus on premium coffee types duri

---

## Summary and Next Steps

### **What You've Accomplished:**
- **Data Loading**: Successfully loaded and preprocessed real coffee sales data
- **Descriptive Analytics**: Calculated comprehensive statistics for business insights
- **Statistical Testing**: Performed ANOVA and t-tests for significant findings
- **Visualization**: Created professional dashboards for business reporting
- **Business Intelligence**: Generated actionable recommendations from data

### **Key Coffee Sales Insights Discovered:**
1. **Revenue Patterns**: Identified optimal pricing and peak sales periods
2. **Product Performance**: Determined top-performing coffee types and profitability
3. **Customer Behavior**: Analyzed payment preferences and time-of-day patterns
4. **Statistical Significance**: Validated findings with proper hypothesis testing
5. **Business Strategy**: Provided data-driven recommendations for growth

### **Next Learning Steps:**

#### **Continue Your Analytics Journey:**
- **Machine Learning**: `machine_learning_example.ipynb` - Predict customer churn with Spotify data
- **Time Series**: `time_series_example.ipynb` - Forecast sales trends and seasonality
- **Advanced Stats**: `notebooks/tier1_descriptive/Tier1_Distribution.ipynb` - Deep dive into distributions

#### **Tier Progression Path:**
- **Tier 1**: Master foundational descriptive analytics and visualization
- **Tier 2**: Learn supervised/unsupervised machine learning techniques
- **Tier 3**: Advance to time series analysis and forecasting methods
- **Tier 4**: Explore clustering and dimensionality reduction
- **Tier 5**: Apply ensemble methods and advanced classification
- **Tier 6**: Implement anomaly detection and outlier analysis

### 🏢 **Business Applications:**
- **Retail Analytics**: Apply these techniques to your sales data
- **Performance Monitoring**: Create dashboards for business KPIs
- **Data-Driven Decisions**: Use statistical tests to validate business strategies
- **Reporting Automation**: Implement these analyses in production systems

### 🔗 **Professional Development:**
- **Portfolio Project**: Use this example as a foundation for your data science portfolio
- **Real-World Skills**: These techniques directly apply to business intelligence roles
- **Best Practices**: Professional formatting and documentation standards demonstrated

---

> **Next Recommendation**: Try `machine_learning_example.ipynb` to learn predictive modeling with Random Forest!

---

*Thank you for completing the Quick Start Data Analysis example. Continue building your analytics expertise with the comprehensive Quipu Analytics Suite.*