# Importing Panel Data from Excel

$\textbf{by Ahmed Pirzada, University of Bristol}$

$\textbf{aj.pirzada@bristol.ac.uk}$

$\textbf{26th November 2025}$

## Learning Objectives

- Import `pandas` for tabular data work.
- Import `plotly` for advanced visualisation.
- Read GDP data from Excel into a DataFrame.
- Create new variables (GDP per capita, labour productivity, participation rate).
- Analyse with correlations and grouped summaries by region.
- Filter rows with conditions using `query`.
- Visualise relationships with scatter and line plots using Plotly library

## 1. IMPORT Libraries

Import `pandas` to load and transform data.

In [None]:
# Import Pandas for tabular data work.
import pandas as pd

Import `matplotlib` library for basic visualisation

In [None]:
# Import matplotlib.pyplot
import matplotlib.pyplot as plt

For $advanced$ visualisation, also install and import `plotly` library

In [None]:
# Install Plotly for advanced visualizations.
# Once you have installed it, you can comment out or remove this line.
pip install plotly

In [None]:
# Import Plotly Express to the environment.
import plotly.express as px

## 2. IMPORT DATA from Excel and Preview

- Read an Excel sheet into a DataFrame and preview it with `head()`.

In [None]:
# Read GDP data from Excel into a DataFrame.
df_panel = pd.read_excel('gdppaneldata.xlsx', sheet_name='Sheet1')

In [None]:
# Use head() to preview the first few rows.


- Use `query()` method to select and `head()` method to preview a section of data

In [None]:
# Filter data by country using query method (fill in the ...)
df_panel.query("...")

In [None]:
# Use head() to preview the first few rows of your filtered data.


In [None]:
# Filter data by year using query method (fill in the ...) and view first few rows.
df_panel.query("...").head()

- Use `groupby()` and `median()` methods to calculate the median for each variable by regions

In [None]:
df_panel

- Use `query()` and `groupby()` method to calculate the `median()` for each variable by regions for the year 2023

In [None]:
df_panel

## 3. CREATE New Variables

- gdppc = real GDP / population

In [None]:
# Create GDP per capita variable.
df_panel['gdppc'] = df_panel['rgdpo'] / df_panel['pop']

- Use `head()` to preview the new dataframe

In [None]:
df_panel.head()

- lab_prod = real GDP / employment

In [None]:
# Create labour productivity variable.


- employment_rate = employment/population (%)

In [None]:
# Create participation rate (%) variable.


## 4. Analyse

Compute:
- Correlations among new variables using `query`.
- Grouped summaries by region using `groupby` and `agg`.
- Set 'country' as the index and select a specific country with `loc`.

- Use `describe()` method to get summary statistics for the numeric variables in the dataset

In [None]:
# Compute summary statistics for all numeric variables


- Use `query()` method to select data for an year and `corr()` method to compute correlation for selected variables

In [None]:
# Compute correlation matrix for selected variables for a cross-section of data (e.g., year 2023)


- Use `query()` method to select data for a country and `corr()` method to compute correlation for selected variables

In [None]:
# Compute correlation matrix for selected variables for a country (e.g., United Kingdom)


In [None]:
# Compute (pooled) correlation matrix between selected variables for African countries from 2000, using all country-year observations.


- Use `groupby()` and `agg()` methods to summarise data for any particular variable.

In [None]:
# Calculate summary stats for any particular variable


## 5. Visualisation

Select a particular year and make a scatter plot between `gdppc` and `lab_prod`; add a title and axis labels.

In [None]:
# Scatter plot to examine relationship between variables.


## 6. Advanced Visualistion 

$\textbf{Using Plotly:}$ See here https://plotly.com/python/plotly-express/

$\textbf{A. Colored Scatter}$

- Let's start with a simple one

In [None]:
# Scatter chart: fig = px.scatter(Data, x=..., y=...)



In [None]:
# Show the figure
fig.show()

- Let's add more features e.g. color by region and circle size by population

In [None]:
# Scatter chart: fig = px.scatter(Data, x=..., y=..., color=..., size=...)


fig.show()

- Let's make it nicer: title, axis labels, and cleaner background

In [None]:
# Let's do it together



fig.show()

$\textbf{B. Line}$

- Again, lets start with the simple line chart for China and United Kingdom

In [None]:
# Line chart: fig = px.line(Data, x=..., y=..., color=...)



fig.show()

- Now make it nicer: title, axis labels, and cleaner background

# Student Notes: Code Explanations

This recap explains the intent of each section and how to interpret results.

1) Import library
- `pandas` provides DataFrame (table) functionality.

2) Import data from Excel
- Read the sheet into a DataFrame and preview the first rows with `head()`.
- Check column names and types; they determine later calculations.

3) Create new variables
- `gdppc` = real GDP / population (GDP per capita).
- `lab_prod` = real GDP / employment (labour productivity).
- `employment_rate` = employment / population Ã— 100.

4) Analyse
- `corr()` reveals linear relationships among variables.
- groupby('region').agg(...) summarises by region (median/mean/min/max).

5) Query data
- `query()` filters rows using conditions, e.g., value ranges for gdppc.

6) Visualisation
- Scatter plots show relationships between two variables; add titles/labels to explain axes.

Tips
- Re-run cells in order if variables are undefined.
- Use `df.info()` and `df.describe()` for quick structure and summary checks.
- If plots overlap or look crowded, adjust figure size or create subplots.