# Importing Data from Excel

$\textbf{by Ahmed Pirzada, University of Bristol}$

$\textbf{aj.pirzada@bristol.ac.uk}$

$\textbf{27th October 2025}$

## Learning Objectives

- Import `pandas` for tabular data work.
- Read GDP data from Excel into a DataFrame.
- Create new variables (GDP per capita, labour productivity, participation rate).
- Analyse with correlations and grouped summaries by region.
- Index by 'country' and select specific entries.
- Filter rows with conditions using `query`.
- Visualise relationships with a scatter plot.

## 1. IMPORT Library

You will import `pandas` to load and transform data.

In [None]:
# Import Pandas for tabular data work.
import pandas as pd

: 

## 2. IMPORT DATA from Excel

You will read an Excel sheet into a DataFrame and preview it with `head()`.

In [None]:
# Only run this cell if running in Google Colab (upload the excel file: gdpdata.xlsx)
from google.colab import files
uploaded = files.upload()

In [None]:
# Read GDP data from Excel into a DataFrame.
df_gdp = pd.read_excel('gdpdata.xlsx', sheet_name='Sheet1')

In [None]:
# Preview the first few rows.
df_gdp.head()

## 3. CREATE New Variables

You will create derived columns:
- gdppc = real GDP / population.
- lab_prod = real GDP / employment.
- participation_rate = employment/population (%).

$\textbf{To-do:}$ Create new variables

In [None]:
# Create GDP per capita variable.


In [None]:
# Create labour productivity variable.


In [None]:
# Create participation rate (%) variable.


---

## 4. Analyse

You will compute:
- Correlations among new variables.
- Grouped summaries by region using `groupby` and `agg`.
- Set 'country' as the index and select a specific country with loc.

$\textbf{To-do:}$ Analyse you dataset

In [None]:
# Compute correlation matrix among derived variables.


In [None]:
# Use groupby to calculate average values across region.


In [None]:
# Use groupby and aggregation to calculate multiple stats for participation rate


In [None]:
# Change index to country.


In [None]:
# Use .loc[] to display data for your chosen country.


---

## 5. QUERY Data

You will filter rows based on value ranges using `query()`.

$\textbf{To-do:}$ Use query select subset of data

In [None]:
# Filter rows using value conditions with query().
df_gdp.query('gdppc > 5000 and gdppc < 10000')

---

## 6. Visualisation

You will make a scatter plot to examine relationships and add a title and axis labels.

In [None]:
import matplotlib.pyplot as plt

$\textbf{To-do:}$ Plot scatter plot between two variables

In [None]:
# Scatter plot to examine relationship between variables.


---

# Student Notes: Code Explanations

This recap explains the intent of each section and how to interpret results.

1) Import library
- `pandas` provides DataFrame (table) functionality.

2) Import data from Excel
- Read the sheet into a DataFrame and preview the first rows with `head()`.
- Check column names and types; they determine later calculations.

3) Create new variables
- `gdppc` = real GDP / population (GDP per capita).
- `lab_prod` = real GDP / employment (labour productivity).
- `participation_rate` = employment / population Ã— 100.

4) Analyse
- `corr()` reveals linear relationships among variables.
- groupby('region').agg(...) summarises by region (median/mean/min/max).
- set_index('country') and loc('Pakistan') enable label-based selection.

5) Query data
- `query()` filters rows using conditions, e.g., value ranges for gdppc.

6) Visualisation
- Scatter plots show relationships between two variables; add titles/labels to explain axes.

Tips
- Re-run cells in order if variables are undefined.
- Use `df.info()` and `df.describe()` for quick structure and summary checks.
- If plots overlap or look crowded, adjust figure size or create subplots.