# Introduction to Tabular Data: The Tips Dataset

In this notebook, we will explore a sample data set called **Tips** from Seaborn. This data set contains information from a restaurant, including details about bills, tips, and customer data.

## About the Libraries: pandas and seaborn

In Python, we use libraries to add extra functionality beyond the built‐in features. Two important libraries for data analysis are:

- **pandas (imported as `pd`)**: A powerful library for data manipulation and analysis. It provides a DataFrame data structure (similar to a spreadsheet) that makes it easy to work with tabular data.

- **seaborn (imported as `sns`)**: A library for creating statistical graphics and visualisations. It also includes built‐in data sets (like the Tips data set) which are ideal for learning and practice.

We use the aliases `pd` and `sns` to write code more concisely. For example, instead of writing `pandas.read_csv()`, we use `pd.read_csv()`.

## Loading Data

We will start by importing the necessary libraries and loading the Tips data set.

In [None]:
import pandas as pd
import seaborn as sns

In [None]:
# Load the Tips data set
df = sns.load_dataset('tips')

## Inspecting the Data

Before analysing the data, it is important to inspect it. We can use methods like `head()` and `tail()` to view the first and last few rows, and use attributes like `shape` and `info()` to understand the structure of the DataFrame.

In [None]:
# Print the first five rows
print(df.head())

# Print the last five rows
print(df.tail())

In [None]:
# Print the shape and summary info of the DataFrame
print("Rows and columns:", df.shape)
print("Info")
print(df.info())

## Accessing Data

You can access specific parts of the DataFrame using square bracket notation or the `iloc` method. For example:

In [None]:
# Access a single column (e.g. total_bill)
print(df['total_bill'])

# Access multiple columns
print(df[['total_bill', 'tip']])

# Access rows where the day is 'Fri'
print(df[df['day'] == 'Fri'])

# Access the first row
print(df.iloc[0])

# Access the fifth row and third column
print(df.iloc[4, 2])

## Basic Operations

Pandas makes it easy to perform basic operations on your data. For example, you can compute the mean of a column or find the maximum value in another column. Other useful functions include `min`, `sum`, and `std` (standard deviation).

In [None]:
# Calculate the mean of 'total_bill'
print(df['total_bill'].mean())

# Find the maximum tip
print(df['tip'].max())

### Exercise 1

Using the Tips data set, print the following information:

* Total revenue generated (sum of `total_bill`).
* Total number of customers (sum of `size`).
* Average spend per customer (total revenue divided by total number of customers).

In [None]:
## YOUR CODE GOES HERE

### Exercise 2

Compare the average `tip` at Lunch and Dinner. Which meal time has the higher average tip? Use the cell below to show your work.

In [None]:
## YOUR CODE GOES HERE

## Adding and Modifying Columns

Sometimes you might want to derive new insights by creating additional features. For example, you can create a new column that represents the tip as a percentage of the total bill. This new column, `tip_pct`, allows you to compare tips relative to the size of the bill.

In [None]:
# Create a new column 'tip_pct' representing the tip percentage
df['tip_pct'] = (df['tip'] / df['total_bill']) * 100

# Display the first few rows to verify the new column
print(df.head())

## Grouping Data

Grouping is a powerful feature in Pandas that allows you to split your data into subsets based on a key, and then apply an aggregation function to each group. In this notebook, we group the data by the `day` column to observe how values differ across the days of the week.

In [None]:
# Group the data by 'day' and calculate the average total_bill, tip, and tip_pct
grouped_day = df.groupby('day', observed=False)[['total_bill', 'tip', 'tip_pct']].mean()
print(grouped_day)

## Exercise: Spend per Head and Smoker

Now it's your turn to apply what you've learnt:

1. **Create a new feature:** Calculate the spend per head by dividing `total_bill` by `size`, and add it as a new column called `spend_per_head`.

2. **Group the data:** Using the existing `smoker` column, group the data set and calculate the average spend per head for smokers and non‐smokers.

Use the cell below to show your work.

In [None]:
## YOUR CODE GOES HERE