<a href="https://colab.research.google.com/github/HansikaGunasekara/-Analysis-of-Retail-Sales/blob/main/Session_15_Demo_Notebook_Retail.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# üêº **Session 15: Introduction to Pandas** üêº
## Demo Notebook - Retail Sales Dataset


## 1. Importing Pandas and Loading Data

First, we need to import the pandas library. By convention, we import it as `pd`.

In [None]:
# Import pandas library
import pandas as pd

Now let's load the retail sales dataset from a CSV file:

In [None]:
# Load the CSV file
# NOTE: Make sure 'retail_sales_dataset.csv' is in the same directory -- we need to load into colab first
df = pd.read_csv('retail_sales_dataset.csv')

## 2. Basic Data Exploration

### 2.1 Viewing the First Few Rows

The `head()` method displays the first 5 rows of the DataFrame:

In [None]:
# Display first 5 rows
df.head()

### 2.2 Viewing the Last Few Rows

The `tail()` method displays the final 5 rows:

In [None]:
# Display last 5 rows
df.tail()

### 2.3 Getting DataFrame Dimensions

The `shape` attribute returns (rows, columns).

See Attribute vs Methods slide for more info on why no brackets!

In [None]:
# get the number of rows and columns
df.shape

In [None]:
# get just the rows (index 0)
df.shape[0]

In [None]:
# get just the cols (index 1)
df.shape[1]

In [None]:
# or if we want to format in a tidier way
print(f"DataFrame shape: {df.shape}")
print(f"Number of rows: {df.shape[0]}")
print(f"Number of columns: {df.shape[1]}")

### 2.4 Listing Column Names

The `columns` attribute displays all column names:

In [None]:
# Display all column names
df.columns

In [None]:
# Display all column names, formatted in a cleaner way
print("Column names:")
print(df.columns.to_list())

### 2.5 DataFrame Info

The `info()` method provides a summary of the DataFrame including:
- Number of entries
- Column names
- Data types
- Non-null counts
- Memory usage

In [None]:
# Display DataFrame information
df.info()

### 2.6 Summary Statistics

The `describe()` method generates descriptive statistics for numerical columns:

In [None]:
# Generate summary statistics
df.describe()

## 3. Selecting Single Columns

You can access a single column using square bracket notation:

In [None]:
# Select the 'Total Amount' column
df['Total Amount']

In [None]:
# Common practice for readibility & efficiency (creating variable)
# Select the 'Total Amount' column
total_amount = df['Total Amount']

# Display first few values
print(total_amount.head())

## 4. Descriptive Statistics on Single Columns

### 4.1 Calculate Mean (Average)

In [None]:
# Calculate average spend per transaction
avg_total = df['Total Amount'].mean()
print(f"Average transaction amount: ¬£{avg_total:.2f}") #round to 2dp as it's currency

### 4.2 Find Unique Values

In [None]:
# Get all unique product categories
unique_categories = df['Product Category'].unique()
print(f"Product categories: {unique_categories}")

### 4.3 Count Values

The `value_counts()` method counts the frequency of each unique value:

In [None]:
# Count transactions per category
category_counts = df['Product Category'].value_counts()
print(category_counts)

## 5. Exploring Patterns in the Data

### 5.1 Spending Patterns

In [None]:
# Calculate various statistics for Total Amount
df['Total Amount'].describe()

In [None]:
print(f"Average spend: ¬£{df['Total Amount'].mean():.2f}")
print(f"Median spend: ¬£{df['Total Amount'].median():.2f}")
print(f"Minimum spend: ¬£{df['Total Amount'].min():.2f}")
print(f"Maximum spend: ¬£{df['Total Amount'].max():.2f}")

### 5.2 Customer Demographics

In [None]:
# Analyze age distribution
df['Age'].describe()


In [None]:
# Gender distribution
df['Gender'].value_counts()

## 6. Introducing GroupBy

The `groupby()` method allows us to split data into groups and apply functions to each group.

### 6.1 Gender Preferences by Product

In [None]:
# Count gender distribution for each product category
df.groupby('Product Category')['Gender'].value_counts()

In [None]:
# gemini example -- transform to percentages

### 6.2 Revenue by Product Category

In [None]:
# Calculate total revenue per product category
df.groupby('Product Category')['Total Amount'].sum().sort_values(ascending=False)

In [None]:
df.groupby('Product Category')['Total Amount'].mean().sort_values(ascending=False)

## 7. Practice Challenge

Now try these exercises on your own:

1. Find the average customer age
2. Count purchases by gender

In [None]:
# YOUR CODE HERE
# Challenge 1: Find the average customer age


In [None]:
# YOUR CODE HERE
# Challenge 2: Count purchases by gender


## Key Takeaways

| Task | Introductory Python | Pandas | Key Benefit |
|------|---------------------|--------|-------------|
| Data Structure | Lists, dicts, nested structures | DataFrame | Labeled, columnar structure |
| Data Inspection | Print entire variable | `head()`, `describe()` | Quick summaries |
| Selecting Column | Dictionary key or list index | `df['ColumnName']` | Easy access by label |
| Aggregation | For loops | `sum()`, `mean()`, `max()` | Built-in, optimized functions |
| Grouping Data | Nested loops, manual tracking | `groupby()` | Splits, applies, combines in one step |

---

**Remember:** Pandas makes data analysis much easier and more efficient than using basic Python structures!