## ECO225: Data Tools for Economists
## Practical 2 - NumPy & Pandas Basics

**Course:** ECO225 - Data Tools for Economists  
**Instructor:** Professor Nicholas Zammit  
**Tutorial Leader:** Saurabh Nair  
**Date:** January 20, 2026

---

### Learning Objectives

- Create and manipulate NumPy arrays
- Perform vectorized operations
- Create and work with pandas DataFrames
- Index and filter data
- Calculate summary statistics

---

In [13]:
import numpy as np
import pandas as pd

print("Libraries imported!")

Libraries imported!


## Part 1: NumPy Basics

In [11]:
# Creating arrays
gdp_list = [21000, 1600, 1100, 2800]
gdp_array = np.array(gdp_list)

print("Array:", gdp_array)
print("Type:", type(gdp_array))


Array: [21000  1600  1100  2800]
Type: <class 'numpy.ndarray'>


In [None]:
# Array operations (vectorized)
gdp_trillions = gdp_array / 1000
gdp_growth = gdp_array * 1.03  # 3% growth

print("GDP in trillions(GDP/1000):", gdp_trillions)
print("After 3% growth:", gdp_growth)

In [None]:
# Statistical operations
print("Mean GDP:", np.mean(gdp_array))
print("Median GDP:", np.median(gdp_array))
print("Std Dev:", np.std(gdp_array))
print("Max GDP:", np.max(gdp_array))
print("Min GDP:", np.min(gdp_array))

In [15]:
# Creating arrays
zeros = np.zeros(5)
ones = np.ones(5)
range_arr = np.arange(0, 10, 2) #specifies jump
linspace = np.linspace(0, 100, 5) #specifies number of elements

print("Zeros:", zeros)
print("Ones:", ones)
print("Range:", range_arr)
print("Linspace:", linspace)

Zeros: [0. 0. 0. 0. 0.]
Ones: [1. 1. 1. 1. 1.]
Range: [0 2 4 6 8]
Linspace: [  0.  25.  50.  75. 100.]


In [16]:
# 2D arrays (matrices)
data_matrix = np.array([
    [21000, 331],  # USA: GDP, Population
    [1600, 38],     # Canada
    [1100, 128]     # Mexico
])

print("Matrix shape:", data_matrix.shape)
print("\nMatrix:")
print(data_matrix)

Matrix shape: (3, 2)

Matrix:
[[21000   331]
 [ 1600    38]
 [ 1100   128]]


In [7]:
# Indexing
print("First row:", data_matrix[0])
print("GDP column:", data_matrix[:, 0])
print("USA GDP:", data_matrix[0, 0])


First row: [21000   331]
GDP column: [21000  1600  1100]
USA GDP: 21000


## Part 2: Pandas Basics

In [8]:
# Creating a DataFrame
df = pd.DataFrame({
    'Country': ['USA', 'Canada', 'Mexico', 'UK'],
    'GDP': [21000, 1600, 1100, 2800],
    'Population': [331, 38, 128, 67],
    'Region': ['North America', 'North America', 'North America', 'Europe']
})

print(df)

  Country    GDP  Population         Region
0     USA  21000         331  North America
1  Canada   1600          38  North America
2  Mexico   1100         128  North America
3      UK   2800          67         Europe


In [9]:
# Basic info
print("Shape:", df.shape)
print("Columns:", df.columns.tolist())
print("Data types:")
print(df.dtypes)

Shape: (4, 4)
Columns: ['Country', 'GDP', 'Population', 'Region']
Data types:
Country       object
GDP            int64
Population     int64
Region        object
dtype: object


In [None]:
# Selecting columns
print("GDP column:")
print(df['GDP'])

print("\nMultiple columns:")
print(df[['Country', 'GDP']])

In [None]:
# Adding new column
df['GDP_per_capita'] = (df['GDP'] * 1000) / df['Population']

print(df)

In [None]:
# Filtering
print("Countries with GDP > 2000:")
high_gdp = df[df['GDP'] > 2000]
print(high_gdp)

In [None]:
# Sorting
print("Sorted by GDP (descending):")
sorted_df = df.sort_values('GDP', ascending=False)
print(sorted_df)

In [None]:
# Summary statistics
print("Summary statistics:")
print(df.describe())

In [None]:
# GroupBy
print("Average GDP by region:")
region_stats = df.groupby('Region')['GDP'].mean()
print(region_stats)

## Part 3: .loc and .iloc

In [None]:
# .loc - label-based
print("First row using .loc:")
print(df.loc[0])

print("Specific cell:")
print(df.loc[0, 'GDP'])

In [None]:
# .iloc - position-based
print("First row using .iloc:")
print(df.iloc[0])

print("First 2 rows, first 2 columns:")
print(df.iloc[0:2, 0:2])

In [10]:
df1 = pd.Series(list("abcdef"), index=[49, 48, 47, 0, 1, 2])
print(df1)
print("Using loc:", df1.loc[0])
print("Using iloc:", df1.iloc[0])


49    a
48    b
47    c
0     d
1     e
2     f
dtype: object
Using loc: d
Using iloc: a


---
## Exercises

### Exercise 1: NumPy Arrays (20 points)

Create an array of inflation rates and calculate statistics.

In [None]:
# TODO: Create array with most recent inflation rates for the 5 biggest countries: 
inflation = 

# TODO: Calculate mean, median, and standard deviation

# TODO: Find rates above 3%


### Exercise 2: Array Operations (20 points)

Calculate real GDP given nominal GDP and inflation.

In [None]:
nominal_gdp = np.array([1000, 1500, 2000, 2500])
inflation_rates = np.array([2.0, 2.5, 3.0, 1.5])

# TODO: Calculate real GDP
# Real GDP = Nominal GDP / (1 + inflation/100)
real_gdp = 

print(real_gdp)

### Exercise 3: Create DataFrame (20 points)

Create a DataFrame with economic data for 5 countries.

In [None]:
# TODO: Create DataFrame with columns: Country, GDP, Unemployment, Inflation
my_df = 

print(my_df)

### Exercise 4: DataFrame Operations (20 points)

Filter and calculate statistics.

In [None]:
# TODO: Filter countries with unemployment > 5%

# TODO: Calculate average GDP

# TODO: Sort by inflation (ascending)


### Exercise 5: Misery Index (20 points)

Calculate Misery Index = Unemployment + Inflation

In [None]:
# TODO: Add Misery Index column

# TODO: Find country with highest misery index

# TODO: Calculate average misery index


## Summary
In this lab, you learned:
- Creating and manipulating NumPy arrays
- Vectorized operations for efficient calculations
- Creating pandas DataFrames from dictionaries
- Selecting, filtering, and sorting data
- Calculating summary statistics
- Using .loc and .iloc for indexing
- GroupBy operations for aggregation

**Key NumPy functions:**
- `np.array()` - create arrays
- `np.mean()`, `np.median()`, `np.std()` - statistics
- `np.zeros()`, `np.ones()`, `np.arange()` - array creation
- Vectorized operations (faster than loops)

**Key pandas functions:**
- `pd.DataFrame()` - create DataFrames
- `.head()`, `.tail()` - view data
- `.describe()` - summary statistics
- `.groupby()` - group data for aggregation
- `.sort_values()` - sort data
- `.loc[]` - label-based indexing
- `.iloc[]` - position-based indexing

**Next week:** Data Cleaning & Wrangling!

---
### Submission Instructions
1. Complete all TODO sections
2. Ensure all code cells run without errors
3. Calculate Misery Index correctly
4. Save your notebook
5. Push to GitHub repository

**Grading:** This lab is worth 1% of your final grade and will be graded on:
- Completion of all tasks (70%)
- Code correctness (20%)
- Reasonable participation and effort (10%)
