<blockquote style="
    padding: 10px 15px;
    border: 2px solid #360084;
    border-radius: 8px;
    margin: 20px 5px 15px 0;
    background: #fafafa;
    box-shadow: 2px 2px 10px rgba(0, 0, 0, 0.1);
">

  <!-- Header Title -->
  <p style="
      padding: 12px;
      font-size: 22pt;
      font-weight: bold;
      color: #fff;
      background: linear-gradient(to right, #360084, #7a1fa2);
      border-radius: 6px 6px 0 0;
      text-align: center;
      margin: -10px -15px 15px;
  ">Pandas Fundamentals for Econometricians</p>

  <!-- Course Details Section -->
  <div style="
      background-color: #f7f7f7;
      padding: 15px;
      border-radius: 6px;
  ">
    <div class="row">
      <div class="col-md-6">
        <!-- <strong>📚 Course:</strong> <span style="color:#360084;"></span><br/> -->
        <strong>📖 Chapter:</strong> <span style="color:#360084;">Data Frames</span> <br/>
        <strong>🎯 Lesson:</strong> <span style="color:#360084;">Introduction to Data Frames</span><br/>
        <strong>👨‍🏫 Author:</strong> <span style="color:#360084;">Dr. Saad Laouadi</span>
      </div>
    </div>
  </div>

  <!-- Objectives Section -->
  <div style="
    background-color: #f8fafc;
    padding: 20px;
    border-radius: 8px;
    border-left: 4px solid #0284c7;
    box-shadow: 0 2px 4px rgba(0, 0, 0, 0.1);
    margin: 20px 0;
">
    <strong style="color: #0284c7; font-size: 18px;">🎯 Learning Objectives</strong>
    <ul style="padding-left: 20px; font-size: 16px; line-height: 1.6; margin-top: 12px;">
        <li>Understand what Pandas is and why it's essential for economic data analysis</li>
        <li>Create and manipulate DataFrames with economic data</li>
        <li>Use basic DataFrame methods (info(), describe()) to explore economic datasets</li>
        <li>Access DataFrame attributes (index, columns, dtypes) to understand data structure</li>
        <li>Apply basic data exploration techniques to real-world economic indicators</li>
    </ul>
</div>

  <!-- Footer -->
  <p style="
      text-align: center;
      font-size: 14px;
      font-style: italic;
      color: #777;
      margin-top: 15px;
  ">© 2025 Dr. Saad Laouadi. All Rights Reserved.</p>

</blockquote>

## Understanding Pandas Library

<div style="
    background: linear-gradient(to right, #f0f9ff, #e0f2fe);
    padding: 10px;
    border-radius: 10px;
    border-left: 5px solid #0369a1;
    margin: 20px;
    box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);
    font-family: Arial, sans-serif;
    line-height: 1.6;
    color: #1e293b;
">
    <h3 style="color: #0369a1; margin-bottom: 15px; font-size: 1.5em;">What is Pandas?</h3>
    <p>Pandas is a powerful Python library for data manipulation and analysis. It provides fast, flexible, and expressive data structures designed to make working with tabular and time series data as intuitive as working with a spreadsheet.</p>
</div>

### Simple Analogy

If Excel is like a calculator, Pandas is like a supercomputer for data analysis. It takes the familiar spreadsheet concept and supercharges it with programming capabilities.

### What This Means in Practice

Think of Pandas as an andvanced Excel application! It helps you:

1. **Work with Familiar Table Formats**
   - Organize data in rows and columns, just like Excel
   - Each row represents an observation (e.g., a country, a time period)
   - Each column represents a variable (e.g., GDP, inflation rate)

2. **Handle Economic Data Efficiently**
   - Perfect for time series data (quarterly GDP, monthly inflation)
   - Great for cross-sectional data (comparing different countries)
   - Excellent for panel data (multiple countries over time)

3. **Data Organization Examples**
   - Time Series: Track inflation rates over months
   - Cross-Section: Compare GDP across countries
   - Panel Data: Analyze unemployment rates across countries over years

### Key features that make pandas essential for economic analysis:
  - Efficient handling of large datasets
  - Built-in time series functionality
  - Powerful data aggregation and transformation tools
  - Easy handling of missing data
  - Intuitive merging and joining of datasets

## Why Pandas for Economic Data Analysis?

For economists and econometricians, pandas offers several crucial advantages:

1. **Familiar Format**: Works like Excel but with more power
2. **Data Import**: Easily read data from various sources (CSV, Excel, databases)
3. **Data Cleaning**: Powerful tools for handling missing values and outliers
4. **Analysis Ready**: Seamlessly connects with statistical tools like statsmodels
5. **Large Datasets**: Efficiently handles large economic datasets
6. **Time Series Handling**: Built-in support for dates, periods, and time-based operations
7. **Panel Data**: Excellent tools for working with cross-sectional and longitudinal data
8. **Data Integration**: Easy to combine data from different sources (e.g., World Bank, IMF, FRED)
9. **Statistical Tools**: Built-in methods for descriptive statistics and basic analysis

## Setting Up Your Environment

First, let's make sure we have pandas installed and import the necessary libraries:

In [1]:
# Import essential libraries
import pandas as pd
import numpy as np

# Check pandas version
print(f"Pandas version: {pd.__version__}")

Pandas version: 2.2.2


## Reading Your First Economic Dataset

Let's start by creating a simple economic dataset and then learn how to read external data.

In [2]:
# Create a simple economic dataset
data = {
    'Country': ['USA', 'UK', 'Japan', 'Germany', 'France'],
    'GDP_Growth_2022': [2.1, 1.8, 1.6, 1.9, 2.5],
    'Inflation_2022': [8.0, 9.1, 2.5, 8.7, 5.9],
    'Unemployment_2022': [3.7, 3.8, 2.6, 3.0, 7.1]
}

# Create a DataFrame
df = pd.DataFrame(data)

# Display the first few rows
print("\nOur Economic Dataset:")
print(df)


Our Economic Dataset:
   Country  GDP_Growth_2022  Inflation_2022  Unemployment_2022
0      USA              2.1             8.0                3.7
1       UK              1.8             9.1                3.8
2    Japan              1.6             2.5                2.6
3  Germany              1.9             8.7                3.0
4   France              2.5             5.9                7.1


## Basic DataFrame Overview

Let's explore the basic operations and information we can get from our DataFrame:

1. `info()` method:
    - Provides a concise summary of your DataFrame
    - Shows number of rows, columns, data types, and memory usage
    - Indicates missing values (null counts)

In [3]:
# Basic information about the DataFrame
print("\nDataFrame Info:")
print(df.info())


DataFrame Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 4 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Country            5 non-null      object 
 1   GDP_Growth_2022    5 non-null      float64
 2   Inflation_2022     5 non-null      float64
 3   Unemployment_2022  5 non-null      float64
dtypes: float64(3), object(1)
memory usage: 288.0+ bytes
None


2. `describe()` method:
    - Generates descriptive statistics of numerical columns
    - Shows count, mean, std, min, 25%, 50%, 75%, max
    - Perfect for quick statistical overview of your data

In [4]:
# Summary statistics
print("\nSummary Statistics:")
print(df.describe())


Summary Statistics:
       GDP_Growth_2022  Inflation_2022  Unemployment_2022
count         5.000000        5.000000           5.000000
mean          1.980000        6.840000           4.040000
std           0.342053        2.721764           1.781292
min           1.600000        2.500000           2.600000
25%           1.800000        5.900000           3.000000
50%           1.900000        8.000000           3.700000
75%           2.100000        8.700000           3.800000
max           2.500000        9.100000           7.100000


In [5]:
# Column names
print("\nColumns in our dataset:")
print(df.columns.to_list(), sep="")


Columns in our dataset:
['Country', 'GDP_Growth_2022', 'Inflation_2022', 'Unemployment_2022']


### DataFrame Attributes:

1. `index`: Row labels of your DataFrame

In [6]:
# Index
print(f"{'The dataframe index:':<30} {df.index}")
print(f"{'The dataframe index values:':<30} {df.index.values}")

The dataframe index:           RangeIndex(start=0, stop=5, step=1)
The dataframe index values:    [0 1 2 3 4]


2. `columns`: Column names of your DataFrame

In [7]:
print("'The dataframe column names:\n")
print(df.columns.tolist())

'The dataframe column names:

['Country', 'GDP_Growth_2022', 'Inflation_2022', 'Unemployment_2022']


3. `dtypes`: Data types of each column

In [8]:
# Data types of each column
print("\nData types:")
print(df.dtypes)


Data types:
Country               object
GDP_Growth_2022      float64
Inflation_2022       float64
Unemployment_2022    float64
dtype: object


---

## Practice Exercise: Exploring Latin American Economic Indicators

Now it's your turn! Let's practice the DataFrame operations we learned using a different economic dataset focusing on Latin American economies.

In [9]:
# Latin American economic dataset is provided for you
latam_data = {
    'Country': ['Brazil', 'Mexico', 'Argentina', 'Chile', 'Colombia', 'Peru'],
    'GDP_Billions_2022': [1608.1, 1322.4, 487.2, 301.5, 343.1, 242.6],
    'Inflation_2022': [5.8, 7.9, 94.8, 11.6, 10.2, 8.5],
    'FDI_Billions': [61.4, 35.9, 15.1, 17.1, 16.8, 8.4],
    'Debt_to_GDP': [75.8, 57.1, 83.7, 36.3, 64.5, 35.8]
}

In [10]:
# 1. Create a DataFrame from latam_data



In [11]:
# 2. Display the first few rows using .head()



In [12]:
# 3. Get basic information about the DataFrame using .info()



In [13]:
# 4. Generate descriptive statistics using .describe()



In [14]:
# 5. Check the column names and data types



In [15]:
# 6. Calculate the mean FDI for all countries



In [16]:
# 7. Find the country with the highest and lowest inflation rate



### Questions to Consider:

1. How does the economic situation vary across these Latin American countries?
2. What patterns do you notice in the relationship between GDP and Debt-to-GDP ratio?
3. Is there any relationship between FDI and GDP?
4. Why might Argentina's inflation rate be significantly different from its neighbors?

### Bonus Challenge:

Try creating a new column that calculates the GDP per billion of FDI (GDP_Billions_2022 / FDI_Billions) to see which country gets the most GDP "bang for their buck" from foreign investment!

## Summary

1. Pandas provides powerful tools for handling economic data
2. DataFrames are the main data structure in pandas
3. Basic operations include:
   - Reading data
   - Accessing data information
   - Accessing columns and rows
   - Get basic descriptive statistics

## Next Steps

In the next notebook, we'll dive deeper into importing different types of economic data and handling various file formats commonly used in economic research.

## Additional Resources

- [Pandas Official Documentation](https://pandas.pydata.org/docs/)
- [Pandas Cheat Sheet](https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf)
- [World Bank Data with Pandas Tutorial](https://pandas-datareader.readthedocs.io/en/latest/readers/world-bank.html)

In [17]:
### Solution Template (try it yourself first!) ###

# 1. Create DataFrame
df_latam = pd.DataFrame(latam_data)

# 2. Display first rows
print("\nLatin American Economic Dataset:")
print(df_latam.head())

# 3. Basic information
print("\nDataset Information:")
print(df_latam.info())

# 4. Descriptive statistics
print("\nSummary Statistics:")
print(df_latam.describe())

# 5. Column names and types
print("\nColumns:")
print(df_latam.columns)
print("\nData Types:")
print(df_latam.dtypes)

# 6. Mean FDI
mean_fdi = df_latam['FDI_Billions'].mean()
print(f"\nAverage FDI: ${mean_fdi:.2f} billion")

# 7. Inflation analysis
max_inflation = df_latam.loc[df_latam['Inflation_2022'].idxmax()]
min_inflation = df_latam.loc[df_latam['Inflation_2022'].idxmin()]
print(f"\nHighest Inflation: {max_inflation['Country']} ({max_inflation['Inflation_2022']}%)")
print(f"Lowest Inflation: {min_inflation['Country']} ({min_inflation['Inflation_2022']}%)")


Latin American Economic Dataset:
     Country  GDP_Billions_2022  Inflation_2022  FDI_Billions  Debt_to_GDP
0     Brazil             1608.1             5.8          61.4         75.8
1     Mexico             1322.4             7.9          35.9         57.1
2  Argentina              487.2            94.8          15.1         83.7
3      Chile              301.5            11.6          17.1         36.3
4   Colombia              343.1            10.2          16.8         64.5

Dataset Information:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 5 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Country            6 non-null      object 
 1   GDP_Billions_2022  6 non-null      float64
 2   Inflation_2022     6 non-null      float64
 3   FDI_Billions       6 non-null      float64
 4   Debt_to_GDP        6 non-null      float64
dtypes: float64(4), object(1)
memory usage: 368.0+ bytes
N