# Introduction to Data Analytics - Tutorial

Welcome to your first hands-on data analytics tutorial! In this notebook, you'll learn the fundamental workflow of data analytics through practical examples.

## Learning Objectives
- Understand the data analytics workflow
- Load and explore a dataset
- Perform basic data analysis
- Create simple visualizations
- Draw insights from data

## 1. Setting Up Your Environment

First, let's import the essential libraries for data analytics.

In [None]:
# Core data manipulation
import pandas as pd
import numpy as np

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Display settings
pd.set_option('display.max_columns', None)
plt.style.use('seaborn-v0_8-whitegrid')

print("Libraries loaded successfully!")

## 2. The Data Analytics Workflow

Every data analytics project follows a similar workflow:

1. **Define the Question** - What do we want to learn?
2. **Collect Data** - Where is our data coming from?
3. **Clean Data** - Is our data ready for analysis?
4. **Analyze Data** - What patterns can we find?
5. **Visualize Results** - How can we communicate findings?
6. **Draw Conclusions** - What actions should we take?

## 3. Loading Your First Dataset

Let's work with a sample sales dataset to practice the analytics workflow.

In [None]:
# Create a sample sales dataset
np.random.seed(42)

# Generate sample data
n_records = 100
data = {
    'date': pd.date_range('2024-01-01', periods=n_records, freq='D'),
    'product': np.random.choice(['Laptop', 'Phone', 'Tablet', 'Watch'], n_records),
    'category': np.random.choice(['Electronics', 'Accessories'], n_records),
    'quantity': np.random.randint(1, 10, n_records),
    'unit_price': np.random.choice([299, 499, 799, 999, 1299], n_records),
    'region': np.random.choice(['North', 'South', 'East', 'West'], n_records)
}

# Create DataFrame
df = pd.DataFrame(data)
df['total_sales'] = df['quantity'] * df['unit_price']

print(f"Dataset created with {len(df)} records")
df.head()

## 4. Exploring the Data

Before diving into analysis, we need to understand our data.

In [None]:
# Basic information about the dataset
print("=" * 50)
print("DATASET OVERVIEW")
print("=" * 50)
print(f"\nShape: {df.shape[0]} rows x {df.shape[1]} columns")
print(f"\nColumn Types:")
print(df.dtypes)
print(f"\nMissing Values:")
print(df.isnull().sum())

In [None]:
# Statistical summary of numeric columns
print("\nStatistical Summary:")
df.describe()

In [None]:
# Unique values in categorical columns
print("\nCategorical Column Summary:")
for col in ['product', 'category', 'region']:
    print(f"\n{col}: {df[col].nunique()} unique values")
    print(df[col].value_counts())

## 5. Asking Questions About the Data

Good data analytics starts with good questions. Let's answer some business questions:

1. What is our total revenue?
2. Which product generates the most sales?
3. Which region performs best?
4. How do sales trend over time?

In [None]:
# Question 1: Total Revenue
total_revenue = df['total_sales'].sum()
print(f"Total Revenue: ${total_revenue:,.2f}")

In [None]:
# Question 2: Sales by Product
sales_by_product = df.groupby('product')['total_sales'].agg(['sum', 'mean', 'count'])
sales_by_product.columns = ['Total Sales', 'Avg Sale', 'Transactions']
sales_by_product = sales_by_product.sort_values('Total Sales', ascending=False)
print("Sales by Product:")
sales_by_product

In [None]:
# Question 3: Sales by Region
sales_by_region = df.groupby('region')['total_sales'].sum().sort_values(ascending=False)
print("Sales by Region:")
print(sales_by_region)

## 6. Visualizing Your Findings

Visualizations help communicate insights effectively.

In [None]:
# Create a dashboard-style visualization
fig, axes = plt.subplots(2, 2, figsize=(12, 10))

# 1. Sales by Product (Bar Chart)
sales_by_product['Total Sales'].plot(kind='bar', ax=axes[0, 0], color='steelblue')
axes[0, 0].set_title('Total Sales by Product', fontweight='bold')
axes[0, 0].set_ylabel('Sales ($)')
axes[0, 0].tick_params(axis='x', rotation=45)

# 2. Sales by Region (Pie Chart)
sales_by_region.plot(kind='pie', ax=axes[0, 1], autopct='%1.1f%%', startangle=90)
axes[0, 1].set_title('Sales Distribution by Region', fontweight='bold')
axes[0, 1].set_ylabel('')

# 3. Daily Sales Trend (Line Chart)
daily_sales = df.groupby('date')['total_sales'].sum()
daily_sales.plot(ax=axes[1, 0], color='green', linewidth=1)
daily_sales.rolling(7).mean().plot(ax=axes[1, 0], color='red', linewidth=2, label='7-day MA')
axes[1, 0].set_title('Daily Sales Trend', fontweight='bold')
axes[1, 0].set_ylabel('Sales ($)')
axes[1, 0].legend()

# 4. Quantity Distribution (Histogram)
df['quantity'].hist(ax=axes[1, 1], bins=10, color='coral', edgecolor='white')
axes[1, 1].set_title('Order Quantity Distribution', fontweight='bold')
axes[1, 1].set_xlabel('Quantity')
axes[1, 1].set_ylabel('Frequency')

plt.tight_layout()
plt.show()

## 7. Drawing Insights

Based on our analysis, let's summarize the key insights:

In [None]:
# Generate insights summary
top_product = sales_by_product['Total Sales'].idxmax()
top_region = sales_by_region.idxmax()
avg_order = df['total_sales'].mean()

print("=" * 50)
print("KEY INSIGHTS")
print("=" * 50)
print(f"\n1. Total Revenue: ${total_revenue:,.2f}")
print(f"2. Top Performing Product: {top_product}")
print(f"3. Best Region: {top_region}")
print(f"4. Average Order Value: ${avg_order:,.2f}")
print(f"5. Total Transactions: {len(df)}")

## 8. Practice Exercise

Now it's your turn! Complete the following exercises:

### Exercise 1: Find the average quantity sold per region

In [None]:
# Your code here
# Hint: Use groupby() and mean()



### Exercise 2: Create a bar chart showing sales by category

In [None]:
# Your code here
# Hint: Group by category, then use plot(kind='bar')



### Exercise 3: Find the day with the highest sales

In [None]:
# Your code here
# Hint: Group by date and find the maximum



---

<details>
<summary>Click to see solutions</summary>

```python
# Exercise 1 Solution
avg_qty_by_region = df.groupby('region')['quantity'].mean()
print(avg_qty_by_region)

# Exercise 2 Solution
df.groupby('category')['total_sales'].sum().plot(kind='bar')
plt.title('Sales by Category')
plt.ylabel('Sales ($)')
plt.show()

# Exercise 3 Solution
best_day = df.groupby('date')['total_sales'].sum().idxmax()
best_sales = df.groupby('date')['total_sales'].sum().max()
print(f"Best day: {best_day} with sales of ${best_sales:,.2f}")
```
</details>

## Summary

In this tutorial, you learned:

- The data analytics workflow (Question → Collect → Clean → Analyze → Visualize → Conclude)
- How to load and explore datasets with pandas
- Basic aggregation and grouping operations
- Creating visualizations with matplotlib
- Drawing actionable insights from data

### Next Steps
- Practice with different datasets
- Learn more advanced pandas operations
- Explore additional visualization libraries like seaborn and plotly