# Example 1: Quick Start Data Analysis

---

**Author:** Brandon Deloatch
**Affiliation:** Quipu Research Labs, LLC
**Date:** 2025-10-02
**Version:** v1.0
**License:** MIT
**Example Type:** Getting Started Tutorial
**Based On:** Tier1_Descriptive.ipynb
**Estimated Time:** 15 minutes

---

> **Citation:**
> Brandon Deloatch, "Example 1: Quick Start Data Analysis," Quipu Research Labs, LLC, v1.0, 2025-10-02.

---

*This example notebook is provided "as-is" for educational and research purposes. Users assume full responsibility for any results or applications derived from it.*

---

## Quick Start Guide to Coffee Sales Analytics

**Learning Objectives:**
- Master basic data loading and preprocessing
- Calculate comprehensive descriptive statistics
- Create professional business visualizations
- Perform statistical tests for business insights
- Generate actionable recommendations

**Cross-References:**
- **Foundation:** `Tier1_Descriptive.ipynb` (statistical foundations)
- **Next Steps:** `machine_learning_example.ipynb` (predictive modeling)
- **Advanced:** `time_series_example.ipynb` (temporal analysis)

**Key Applications:**
- Retail analytics and sales optimization
- Business intelligence dashboards
- Data quality assessment
- Exploratory data analysis workflows

## 1. Import Required Libraries\n
\n
Start by importing the essential libraries for data analysis:

In [None]:
"""
Example 1: Quick Start Data Analysis.

This module demonstrates basic data analysis techniques using real coffee sales data.
Covers descriptive statistics, visualization, and statistical testing.

Author: Brandon Deloatch
Date: 2025-10-02
"""

# Example 1: Quick Start Data Analysis
# ====================================
# Professional coffee sales analytics with real business datasets

import warnings
from scipy.stats import f_oneway, ttest_ind

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go
from plotly.subplots import make_subplots

warnings.filterwarnings('ignore')

# Set style for better visualizations
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

print("Example 1: Quick Start Data Analysis")
print("=" * 50)
print("CROSS-REFERENCES:")
print("• Prerequisites: None (Entry point for examples)")
print("• Next Steps: machine_learning_example.ipynb (predictive modeling)")
print("• Next Steps: time_series_example.ipynb (temporal analysis)")
print("• Foundation: Tier1_Descriptive.ipynb (statistical theory)")
print("• Full Guide: See main notebooks directory for complete analytics suite")
print(" Libraries loaded successfully - Ready for coffee sales analysis!")

## 2. Load Coffee Sales Dataset

Load and explore our real Coffee Sales dataset:

In [None]:
# Load the Coffee Sales dataset
df = pd.read_csv('../data/Coffee_sales.csv')

# Basic data preprocessing
df['Date'] = pd.to_datetime(df['Date'])
df['money'] = pd.to_numeric(df['money'], errors='coerce')
df = df.dropna(subset=['money'])

# Add some useful calculated fields
df['revenue'] = df['money']
df['day_name'] = df['Date'].dt.day_name()
df['month_name'] = df['Month_name']
df['is_weekend'] = df['Weekday'].isin(['Sat', 'Sun'])

print("Coffee Sales Dataset loaded successfully!")
print(f"Shape: {df.shape}")
print(f"Date range: {df['Date'].min().date()} to {df['Date'].max().date()}")
print(f"Total revenue: ${df['revenue'].sum():,.2f}")
print(f"Average transaction: ${df['revenue'].mean():.2f}")
print(f"Coffee types: {df['coffee_name'].nunique()}")
print(f"Payment methods: {', '.join(df['cash_type'].unique())}")

df.head()

## 3. Basic Data Exploration\n
\n
Let's explore the dataset structure and basic properties:

In [None]:
# Dataset overview
print("=== COFFEE SALES DATASET OVERVIEW ===")
print(f"Shape: {df.shape}")
print("\nColumn types:")
print(df.dtypes)

print("\nMissing values:")
print(df.isnull().sum())

print("\nBasic statistics for revenue:")
print(df['revenue'].describe())

## 4. Descriptive Statistics\n
\n
Calculate comprehensive descriptive statistics:

In [None]:
# Calculate comprehensive descriptive statistics
print("COFFEE SALES DESCRIPTIVE STATISTICS:")
print("=" * 50)

# Revenue statistics
print("\nREVENUE ANALYSIS:")
print(f" Total Revenue: ${df['revenue'].sum():,.2f}")
print(f" Average Transaction: ${df['revenue'].mean():.2f}")
print(f" Median Transaction: ${df['revenue'].median():.2f}")
print(f" Standard Deviation: ${df['revenue'].std():.2f}")
print(f" Minimum: ${df['revenue'].min():.2f}")
print(f" Maximum: ${df['revenue'].max():.2f}")

# Time-based analysis
print("\nCOFFEE PREFERENCES:")
coffee_analysis = df['coffee_name'].value_counts()
for coffee, count in coffee_analysis.head().items():
 percentage = (count / len(df)) * 100
 avg_price = df[df['coffee_name'] == coffee]['revenue'].mean()
 print(f" {coffee}: {count} orders ({percentage:.1f}%) - Avg: ${avg_price:.2f}")

print("\nTIME PATTERNS:")
time_analysis = df['Time_of_Day'].value_counts()
for time_period, count in time_analysis.items():
 percentage = (count / len(df)) * 100
 avg_revenue = df[df['Time_of_Day'] == time_period]['revenue'].mean()
 print(f" {time_period}: {count} transactions ({percentage:.1f}%) - Avg: ${avg_revenue:.2f}")

## 5. Interactive Visualizations\n
\n
Create professional interactive charts:

In [None]:
# Distribution plots\n
fig = px.histogram(df, x='income', nbins=30, \n
 title='Customer Income Distribution',\n
 labels={'income': 'Annual Income ($)', 'count': 'Number of Customers'})\n
fig.update_layout(showlegend=False)\n
fig.show()\n
\n
# Scatter plot with color coding\n
fig2 = px.scatter(df, x='age', y='spending_score', color='segment',\n
 size='income', hover_data=['satisfaction'],\n
 title='Customer Age vs Spending Score by Segment')\n
fig2.show()

In [None]:
# Box plots for segment comparison\n
fig3 = px.box(df, x='segment', y='income', color='segment',\n
 title='Income Distribution by Customer Segment')\n
fig3.show()\n
\n
# Correlation heatmap\n
correlation_matrix = df[numerical_cols].corr()\n
fig4 = px.imshow(correlation_matrix, \n
 title='Correlation Matrix of Numerical Variables',\n
 color_continuous_scale='RdBu_r')\n
fig4.show()

## 6. Statistical Analysis\n
\n
Perform basic statistical tests and analysis:

In [None]:
# Create comprehensive visualizations
fig = make_subplots(
 rows=2, cols=2,
 subplot_titles=['Revenue Distribution', 'Sales by Coffee Type',
 'Sales by Time of Day', 'Daily Revenue Trend'],
 specs=[[{"secondary_y": False}, {"secondary_y": False}],
 [{"secondary_y": False}, {"secondary_y": False}]]
)

# 1. Revenue distribution histogram
fig.add_trace(go.Histogram(
 x=df['revenue'],
 name='Revenue Distribution',
 nbinsx=30,
 marker_color='lightblue'
), row=1, col=1)

# 2. Sales by coffee type
coffee_counts = df['coffee_name'].value_counts().head(10)
fig.add_trace(go.Bar(
 x=coffee_counts.index,
 y=coffee_counts.values,
 name='Coffee Sales',
 marker_color='brown'
), row=1, col=2)

# 3. Sales by time of day
time_counts = df['Time_of_Day'].value_counts()
fig.add_trace(go.Bar(
 x=time_counts.index,
 y=time_counts.values,
 name='Time Analysis',
 marker_color='orange'
), row=2, col=1)

# 4. Daily revenue trend
daily_revenue = df.groupby('Date')['revenue'].sum().reset_index()
fig.add_trace(go.Scatter(
 x=daily_revenue['Date'],
 y=daily_revenue['revenue'],
 mode='lines',
 name='Daily Revenue',
 line={'color': 'green', 'width': 2}
), row=2, col=2)

fig.update_layout(
 height=800,
 title_text="Coffee Sales Analysis Dashboard",
 showlegend=False
)

fig.show()

## 7. Key Insights Summary\n
\n
Generate a professional summary of findings:

In [None]:
# Perform statistical tests and generate insights
print("STATISTICAL ANALYSIS & INSIGHTS:")
print("=" * 50)

# 1. Revenue comparison by time of day
morning_revenue = df[df['Time_of_Day'] == 'Morning']['revenue']
afternoon_revenue = df[df['Time_of_Day'] == 'Afternoon']['revenue']
night_revenue = df[df['Time_of_Day'] == 'Night']['revenue']

# Statistical test - ANOVA for time periods
stat, p_value = f_oneway(morning_revenue, afternoon_revenue, night_revenue)
print("\n1. REVENUE BY TIME OF DAY (ANOVA Test):")
print(f" F-statistic: {stat:.4f}")
print(f" P-value: {p_value:.6f}")
if p_value < 0.05:
 print(" Result: Significant difference in revenue by time of day!")
else:
 print(" Result: No significant difference in revenue by time of day")

# 2. Payment method analysis
card_revenue = df[df['cash_type'] == 'card']['revenue']
cash_revenue = df[df['cash_type'] == 'cash']['revenue']

stat, p_value = ttest_ind(card_revenue, cash_revenue)
print("\n2. PAYMENT METHOD COMPARISON (T-Test):")
print(f" T-statistic: {stat:.4f}")
print(f" P-value: {p_value:.6f}")
print(f" Card avg: ${card_revenue.mean():.2f}")
print(f" Cash avg: ${cash_revenue.mean():.2f}")

# 3. Coffee type profitability
print("\n3. TOP PERFORMING COFFEE TYPES:")
coffee_stats = df.groupby('coffee_name').agg({
 'revenue': ['count', 'mean', 'sum']
}).round(2)
coffee_stats.columns = ['Orders', 'Avg_Price', 'Total_Revenue']
coffee_stats = coffee_stats.sort_values('Total_Revenue', ascending=False)

for coffee in coffee_stats.head().index:
 orders = coffee_stats.loc[coffee, 'Orders']
 avg_price = coffee_stats.loc[coffee, 'Avg_Price']
 total = coffee_stats.loc[coffee, 'Total_Revenue']
 print(f" {coffee}: {orders} orders, ${avg_price} avg, ${total:,.0f} total")

# 4. Weekend vs Weekday analysis
weekend_avg = df[df['is_weekend']]['revenue'].mean()
weekday_avg = df[~df['is_weekend']]['revenue'].mean()
print("\n4. WEEKEND vs WEEKDAY:")
print(f" Weekend average: ${weekend_avg:.2f}")
print(f" Weekday average: ${weekday_avg:.2f}")
print(f" Difference: ${weekend_avg - weekday_avg:.2f}")

# 5. Business insights
print("\n5. KEY BUSINESS INSIGHTS:")
total_revenue = df['revenue'].sum()
total_transactions = len(df)
avg_transaction = df['revenue'].mean()
best_day = df.groupby('day_name')['revenue'].sum().idxmax()
best_coffee = df.groupby('coffee_name')['revenue'].sum().idxmax()

print(f" 💰 Total Revenue: ${total_revenue:,.2f}")
print(f" Total Transactions: {total_transactions:,}")
print(f" 💳 Average Transaction: ${avg_transaction:.2f}")
print(f" 📅 Best Day: {best_day}")
print(f" ☕ Top Coffee: {best_coffee}")
print(" Revenue Growth Opportunity: Focus on evening hours")
print(f" Product Strategy: Promote {best_coffee} during peak hours")

---

## Summary and Next Steps

### **What You've Accomplished:**
- **Data Loading**: Successfully loaded and preprocessed real coffee sales data
- **Descriptive Analytics**: Calculated comprehensive statistics for business insights
- **Statistical Testing**: Performed ANOVA and t-tests for significant findings
- **Visualization**: Created professional dashboards for business reporting
- **Business Intelligence**: Generated actionable recommendations from data

### **Key Coffee Sales Insights Discovered:**
1. **Revenue Patterns**: Identified optimal pricing and peak sales periods
2. **Product Performance**: Determined top-performing coffee types and profitability
3. **Customer Behavior**: Analyzed payment preferences and time-of-day patterns
4. **Statistical Significance**: Validated findings with proper hypothesis testing
5. **Business Strategy**: Provided data-driven recommendations for growth

### **Next Learning Steps:**

#### **Continue Your Analytics Journey:**
- **Machine Learning**: `machine_learning_example.ipynb` - Predict customer churn with Spotify data
- **Time Series**: `time_series_example.ipynb` - Forecast sales trends and seasonality
- **Advanced Stats**: `notebooks/tier1_descriptive/Tier1_Distribution.ipynb` - Deep dive into distributions

#### **Tier Progression Path:**
- **Tier 1**: Master foundational descriptive analytics and visualization
- **Tier 2**: Learn supervised/unsupervised machine learning techniques
- **Tier 3**: Advance to time series analysis and forecasting methods
- **Tier 4**: Explore clustering and dimensionality reduction
- **Tier 5**: Apply ensemble methods and advanced classification
- **Tier 6**: Implement anomaly detection and outlier analysis

### 🏢 **Business Applications:**
- **Retail Analytics**: Apply these techniques to your sales data
- **Performance Monitoring**: Create dashboards for business KPIs
- **Data-Driven Decisions**: Use statistical tests to validate business strategies
- **Reporting Automation**: Implement these analyses in production systems

### 🔗 **Professional Development:**
- **Portfolio Project**: Use this example as a foundation for your data science portfolio
- **Real-World Skills**: These techniques directly apply to business intelligence roles
- **Best Practices**: Professional formatting and documentation standards demonstrated

---

> **Next Recommendation**: Try `machine_learning_example.ipynb` to learn predictive modeling with Random Forest!

---

*Thank you for completing the Quick Start Data Analysis example. Continue building your analytics expertise with the comprehensive Quipu Analytics Suite.*