# Introduction to Tabular Data: The Tips Dataset

In this notebook, we will explore a sample data set called **Tips** from Seaborn. This data set contains information from a restaurant, including details about bills, tips, and customer data.

## About the Libraries: pandas and seaborn

In Python, we use libraries to add extra functionality beyond the built‐in features. Two important libraries for data analysis are:

- **pandas (imported as `pd`)**: A powerful library for data manipulation and analysis. It provides a DataFrame data structure (similar to a spreadsheet) that makes it easy to work with tabular data.

- **seaborn (imported as `sns`)**: A library for creating statistical graphics and visualisations. It also includes built‐in data sets (like the Tips data set) which are ideal for learning and practice.

We use the aliases `pd` and `sns` to write code more concisely. For example, instead of writing `pandas.DataFrame()`, we use `pd.DataFrame()`.

## About the Data: The Tips Dataset
The Tips dataset is a sample dataset included with seaborn that contains restaurant dining information, including:

- **Total bill**: The cost of the meal, in US dollars
- **Tip amount**: How much the customer tipped, in US dollars
- **Time**: Whether the meal was for lunch or dinner
- **Day**: Which day of the week the meal occurred
- **Size**: The number of people in the dining party
- **Sex**: The sex of the bill payer
- **Smoker**: Whether the party sat in the smoking or non-smoking section

This dataset comes pre-loaded with seaborn.

## Loading Data

We will start by importing the necessary libraries and loading the Tips data set.

In [None]:
# Importing libraries

In [None]:
# Loading data

## Inspecting the Data

Before analysing the data, it is important to inspect it. We can use methods like `head()` and `tail()` to view the first and last few rows, and use attributes like `shape` and `info()` and `describe()` to understand the structure and contents of the DataFrame.

In [None]:
# Looking at the data

In [None]:
# Getting information

In [None]:
# Quick statistics 

## Understanding our data

Based on the output of `shape`, `info()` and `describe()`, the Tips dataset contains 244 rows (dining experiences) with 7 columns of different types:

**Numerical Columns:**
- `total_bill` and `tip` are stored as `float64`, which is appropriate as these monetary values can have decimal places
- `size` is stored as `int64`, which makes sense for whole numbers representing the count of diners
- We can see the value ranges and averages, along with other information about the spread of our numerical data

**Categorical Columns:**
- Four columns are stored as `category` type: `sex`, `smoker`, `day`, and `time`
- Using the category type is memory-efficient for columns with a limited set of possible values

All columns have 244 non-null values, indicating there are no missing values in the dataset, which is helpful for our analysis as we won't need to handle missing data.

## Accessing Data

You can access specific parts of the DataFrame using square bracket notation or the `iloc` method. For example:

In [None]:
# Data access

## Basic Operations

Pandas makes it easy to perform basic operations on your data. For example, you can compute the mean of a column or find the maximum value in another column. Other useful functions include `min`, `sum`, and `std` (standard deviation).

In [None]:
# Basic operations

### Exercise 1: Average spend

Compute and print the following information:

* Total revenue generated (sum of `total_bill`).
* Total number of customers (sum of `size`).
* Average revenue per customer.

In [None]:
## YOUR CODE GOES HERE

### Exercise 2: Time Tips

Compare the average `tip` at Lunch and Dinner. Which meal time has the higher average tip? Use the cell below to show your work.

In [None]:
## YOUR CODE GOES HERE

## Adding and Modifying Columns

Sometimes you might want to derive new insights by creating additional features. For example, you can create a new column that represents the tip as a percentage of the total bill. This new column, `tip_pct`, allows you to compare tips relative to the size of the bill for each observation in the data.

In [None]:
# Adding columns

## Grouping Data

Grouping is a powerful feature in Pandas that allows you to split your data into subsets based on a key, and then apply an aggregation function to each group. In this notebook, we group the data by the `day` column to see how values differ across the days of the week.

This is most often used for category data, and to indicate that we are only interested in categories that actuall appear in our data, we pass `observed=True` as an additional argument.

In [None]:
# Grouping

### Exercise 3: Smoking spend per head

Calculate the average spend per head on each row of the dataset. Then compare the average spend per head for smokers and non-smokers.

Use the cell below to show your work.

In [None]:
## YOUR CODE GOES HERE