## ECO225: Data Tools for Economists
## Practical 2 - NumPy & Pandas Basics

**Course:** ECO225 - Data Tools for Economists  
**Instructor:** Professor Nicholas Zammit  
**Tutorial Leader:** Saurabh Nair  
**Date:** January 20, 2026

---

### Learning Objectives

- Create and manipulate NumPy arrays
- Perform vectorized operations
- Create and work with pandas DataFrames
- Index and filter data
- Calculate summary statistics

---

In [4]:
import numpy as np
import pandas as pd

#two most esseitnal libraires, building blocks for stats and data science


print("Libraries imported!")

Libraries imported!


## Part 1: NumPy Basics

In [5]:
# Creating arrays
gdp_list = [21000, 1600, 1100, 2800]
gdp_array = np.array(gdp_list)
#array is a type of list for numpy, where we created a list and turned it into an array
#no commas in arrays
#main thing for array is find specfic stats, mean, median, etc.

print("Array:", gdp_array)
print("Type:", type(gdp_array))

Array: [21000  1600  1100  2800]
Type: <class 'numpy.ndarray'>


In [6]:
# Array operations (vectorized)
gdp_trillions = gdp_array / 1000
gdp_growth = gdp_array * 1.03  # 3% growth


print("GDP in trillions(GDP/1000):", gdp_trillions)
print("After 3% growth:", gdp_growth)

GDP in trillions(GDP/1000): [21.   1.6  1.1  2.8]
After 3% growth: [21630.  1648.  1133.  2884.]


In [7]:
# Statistical operations
print("Mean GDP:", np.mean(gdp_array))
print("Median GDP:", np.median(gdp_array))
print("Std Dev:", np.std(gdp_array))
print("Max GDP:", np.max(gdp_array))
print("Min GDP:", np.min(gdp_array))

Mean GDP: 6625.0
Median GDP: 2200.0
Std Dev: 8322.371957561138
Max GDP: 21000
Min GDP: 1100


In [8]:
# Creating arrays
zeros = np.zeros(5)
ones = np.ones(5)
range_arr = np.arange(0, 10, 2) #specifies jump
linspace = np.linspace(0, 100, 5) #specifies number of elements

print("Zeros:", zeros)
print("Ones:", ones)
print("Range:", range_arr)
print("Linspace:", linspace)

Zeros: [0. 0. 0. 0. 0.]
Ones: [1. 1. 1. 1. 1.]
Range: [0 2 4 6 8]
Linspace: [  0.  25.  50.  75. 100.]


In [9]:
# 2D arrays (matrices)
data_matrix = np.array([
    [21000, 331],  # USA: GDP, Population
    [1600, 38],     # Canada
    [1100, 128]     # Mexico
])

print("Matrix shape:", data_matrix.shape)
print("\nMatrix:")
#\n gives us a new line
print(data_matrix)

Matrix shape: (3, 2)

Matrix:
[[21000   331]
 [ 1600    38]
 [ 1100   128]]


In [10]:
# Indexing
print("First row:", data_matrix[0])
print("GDP column:", data_matrix[:, 0])
print("U.S. Row:", data_matrix[0 ,])
print("USA GDP:", data_matrix[0, 0])

First row: [21000   331]
GDP column: [21000  1600  1100]
U.S. Row: [21000   331]
USA GDP: 21000


## Part 2: Pandas Basics

In [11]:
# Creating a DataFrame
df = pd.DataFrame({
    'Country': ['USA', 'Canada', 'Mexico', 'UK'],
    'GDP': [21000, 1600, 1100, 2800],
    'Population': [331, 38, 128, 67],
    'Region': ['North America', 'North America', 'North America', 'Europe']
})

print(df)

  Country    GDP  Population         Region
0     USA  21000         331  North America
1  Canada   1600          38  North America
2  Mexico   1100         128  North America
3      UK   2800          67         Europe


In [12]:
# Basic info
print("Shape:", df.shape)
print("Columns:", df.columns.tolist())
print("Data types:")
print(df.dtypes)

Shape: (4, 4)
Columns: ['Country', 'GDP', 'Population', 'Region']
Data types:
Country       object
GDP            int64
Population     int64
Region        object
dtype: object


In [13]:
# Selecting columns
print("GDP column:")
print(df['GDP'])

print("\nMultiple columns:")
print(df[['Country', 'GDP', 'Population']])
#we need two square brackets for multiple columns

GDP column:
0    21000
1     1600
2     1100
3     2800
Name: GDP, dtype: int64

Multiple columns:
  Country    GDP  Population
0     USA  21000         331
1  Canada   1600          38
2  Mexico   1100         128
3      UK   2800          67


In [14]:
# Adding new column
df['GDP_per_capita'] = (df['GDP'] * 1000) / df['Population']

print(df)

  Country    GDP  Population         Region  GDP_per_capita
0     USA  21000         331  North America    63444.108761
1  Canada   1600          38  North America    42105.263158
2  Mexico   1100         128  North America     8593.750000
3      UK   2800          67         Europe    41791.044776


In [15]:
# Filtering
print("Countries with GDP > 2000:")
high_gdp = df[df['GDP'] > 2000]
print(high_gdp)

Countries with GDP > 2000:
  Country    GDP  Population         Region  GDP_per_capita
0     USA  21000         331  North America    63444.108761
3      UK   2800          67         Europe    41791.044776


In [16]:
# Sorting
print("Sorted by GDP (descending):")
sorted_df = df.sort_values('GDP', ascending=False)
print(sorted_df)

Sorted by GDP (descending):
  Country    GDP  Population         Region  GDP_per_capita
0     USA  21000         331  North America    63444.108761
3      UK   2800          67         Europe    41791.044776
1  Canada   1600          38  North America    42105.263158
2  Mexico   1100         128  North America     8593.750000


In [17]:
# Summary statistics
print("Summary statistics:")
print(df.describe())

Summary statistics:
               GDP  Population  GDP_per_capita
count      4.00000    4.000000        4.000000
mean    6625.00000  141.000000    38983.541674
std     9609.84738  132.103495    22653.079253
min     1100.00000   38.000000     8593.750000
25%     1475.00000   59.750000    33491.721082
50%     2200.00000   97.500000    41948.153967
75%     7350.00000  178.750000    47439.974559
max    21000.00000  331.000000    63444.108761


In [18]:
# GroupBy
print("Average GDP by region:")
region_stats = df.groupby('Region')['GDP'].mean()
print(region_stats)



Average GDP by region:
Region
Europe           2800.0
North America    7900.0
Name: GDP, dtype: float64


## Part 3: .loc and .iloc

In [19]:
# .loc - label-based

print(df)

print("\nFirst row using .loc:")
print(df.loc[0])

#loc and iloc are indexing stuff


print("\nSpecific cell:")
print(df.loc[0, 'GDP'])

  Country    GDP  Population         Region  GDP_per_capita
0     USA  21000         331  North America    63444.108761
1  Canada   1600          38  North America    42105.263158
2  Mexico   1100         128  North America     8593.750000
3      UK   2800          67         Europe    41791.044776

First row using .loc:
Country                     USA
GDP                       21000
Population                  331
Region            North America
GDP_per_capita     63444.108761
Name: 0, dtype: object

Specific cell:
21000


In [20]:
# .iloc - position-based
print("First row using .iloc:")
print(df.iloc[0])

print("\nFirst 2 rows, first 2 columns:")
print(df.iloc[0:2, 0:2])

First row using .iloc:
Country                     USA
GDP                       21000
Population                  331
Region            North America
GDP_per_capita     63444.108761
Name: 0, dtype: object

First 2 rows, first 2 columns:
  Country    GDP
0     USA  21000
1  Canada   1600


In [24]:
df1 = pd.Series(list("abcdef"), index=[49, 48, 47, 0, 1, 2])
print(df1)

print("\nUsing loc:", df1.loc[0])
print("Using iloc:", df1.iloc[0])


#

49    a
48    b
47    c
0     d
1     e
2     f
dtype: object

Using loc: d
Using iloc: a


---
## Exercises

### Exercise 1: NumPy Arrays (20 points)

Create an array of inflation rates and calculate statistics.

In [40]:
# TODO: Create array with most recent inflation rates for the 5 biggest countries:

FiveBiggestCountries = ["India", "China", "United States", "Idnoessia", "Pakistan"]
inflation_list = [1.33, 0.80, 2.70, 2.90, 1.50]
inflation_array = np.array(inflation_list)

#circle brackets are for printing and functions, square are for list, arrays, or data type
#squiggly for dictionaries



# TODO: Calculate mean, median, and standard deviation
print("Mean GDP:", np.mean(inflation_array))
print("Median GDP:", np.median(inflation_array))
print("Std Dev:", np.std(inflation_array))
# TODO: Find rates above 3%

#for loop approach
for i in inflation_array:
    if i > 3.00:
        print(f"{i}%")
        
        
#non-for loop approach
print(inflation_array[inflation_array>3])
#you can only do this with arrays, not with lists 



Mean GDP: 1.846
Median GDP: 1.5
Std Dev: 0.8149012210077979
[]


### Exercise 2: Array Operations (20 points)

Calculate real GDP given nominal GDP and inflation.

In [38]:
nominal_gdp = np.array([1000, 1500, 2000, 2500])
inflation_rates = np.array([2.0, 2.5, 3.0, 1.5])

# TODO: Calculate real GDP


# Real GDP = Nominal GDP / (1 + inflation/100)
real_gdp = nominal_gdp/(1+inflation_rates)

print(real_gdp)

[ 333.33333333  428.57142857  500.         1000.        ]


### Exercise 3: Create DataFrame (20 points)

Create a DataFrame with economic data for 5 countries.

In [27]:
# TODO: Create DataFrame with columns: Country, GDP, Unemployment, Inflation
my_df = pd.DataFrame({
    'Country': ["India", "China", "United States", "Indonesia", "Pakistan"],
    'GDP': [3910, 18740, 28750, 1400, 371],
    'Unemployment': [4.20, 4.60, 4.1, 3.3, 5.5],
    'Inflation': [1.33, 0.80, 2.70, 2.90, 1.50]
})

#per the world bank


print(my_df)

         Country    GDP  Unemployment  Inflation
0          India   3910           4.2       1.33
1          China  18740           4.6       0.80
2  United States  28750           4.1       2.70
3      Indonesia   1400           3.3       2.90
4       Pakistan    371           5.5       1.50


### Exercise 4: DataFrame Operations (20 points)

Filter and calculate statistics.

In [51]:
# TODO: Filter countries with unemployment > 5%

# Filtering
print("Countries with unemployment > 5%:")
high_unemployment = my_df[my_df['Unemployment'] > 5]
print(high_unemployment)

# TODO: Calculate average GDP
print("Calculate Average:")
print(my_df["GDP"].mean())

# TODO: Sort by inflation (ascending)

print("Sorted by inflation (descending):")
my_sorted_df = my_df.sort_values('Inflation', ascending=True)
print(my_sorted_df)

Countries with unemployment > 5%:
    Country  GDP  Unemployment  Inflation
4  Pakistan  371           5.5        1.5
Calculate Average:
10634.2
Sorted by inflation (descending):
         Country    GDP  Unemployment  Inflation
1          China  18740           4.6       0.80
0          India   3910           4.2       1.33
4       Pakistan    371           5.5       1.50
2  United States  28750           4.1       2.70
3      Indonesia   1400           3.3       2.90


### Exercise 5: Misery Index (20 points)

Calculate Misery Index = Unemployment + Inflation

In [56]:
# TODO: Add Misery Index column

# Adding new column
my_df['Misery_Index'] = my_df['Unemployment']+my_df['Inflation']

print(my_df)

# TODO: Find country with highest misery index


print("\nHighest misery index")
my_df1 = my_df.sort_values ("Misery_Index", ascending=False)
print("Using iloc:",my_df1.iloc[0]['Country'])


#ORRR
highestMI_countryindex = np.argmax(my_df['Misery_Index'].values)
print("Using other method:", my_df.iloc[highestMI_countryindex]['Country'])

# TODO: Calculate average misery index
print("\nAverage Misery Index:", my_df['Misery_Index'].mean())

         Country    GDP  Unemployment  Inflation  Misery_Index
0          India   3910           4.2       1.33          5.53
1          China  18740           4.6       0.80          5.40
2  United States  28750           4.1       2.70          6.80
3      Indonesia   1400           3.3       2.90          6.20
4       Pakistan    371           5.5       1.50          7.00

Highest misery index
Using iloc: Pakistan
Using other method: Pakistan

Average Misery Index: 6.186


## Summary
In this lab, you learned:
- Creating and manipulating NumPy arrays
- Vectorized operations for efficient calculations
- Creating pandas DataFrames from dictionaries
- Selecting, filtering, and sorting data
- Calculating summary statistics
- Using .loc and .iloc for indexing
- GroupBy operations for aggregation

**Key NumPy functions:**
- `np.array()` - create arrays
- `np.mean()`, `np.median()`, `np.std()` - statistics
- `np.zeros()`, `np.ones()`, `np.arange()` - array creation
- Vectorized operations (faster than loops)

**Key pandas functions:**
- `pd.DataFrame()` - create DataFrames
- `.head()`, `.tail()` - view data
- `.describe()` - summary statistics
- `.groupby()` - group data for aggregation
- `.sort_values()` - sort data
- `.loc[]` - label-based indexing
- `.iloc[]` - position-based indexing

**Next week:** Data Cleaning & Wrangling!

---
### Submission Instructions
1. Complete all TODO sections
2. Ensure all code cells run without errors
3. Calculate Misery Index correctly
4. Save your notebook
5. Push to GitHub repository

**Grading:** This lab is worth 1% of your final grade and will be graded on:
- Completion of all tasks (70%)
- Code correctness (20%)
- Reasonable participation and effort (10%)
