In [1]:
# ========================================================================
#                   Advanced Econometrics with Python
# ========================================================================
#    Module: Data tools
#    Topic: Understanding pandas-datareader
#    
#    Description:
#    
#    
#    
#    Contents:
#    1. 
#    2. 
#
#    Author: Dr. Saad Laouadi
#    Version: 1.0
#    
# ========================================================================
#  ®Copyright Dr. Saad Laouadi, 2025. All rights reserved.
# ========================================================================

In [2]:
# ============================================= #
#           Setting Up Our Environment
# ============================================= #

import datetime

import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

import pandas_datareader as pdr
import pandas_datareader.data as web
import pandas_datareader.wb as wb



# Set pandas display options
pd.set_option('display.max_columns', None)
pd.set_option('display.width', 120)
pd.set_option('display.precision', 3)

# Set the default figure size for plots
plt.rcParams['figure.figsize'] = [12, 6]

plt.style.use('seaborn-v0_8')

%reload_ext watermark

print("*"*52)
%watermark -a "Dr. Saad Laouadi"
%watermark -ud

print("-"*52)
print("The loaded packages".center(52))
print("-"*52)

%watermark -iv

print("*"*52)

****************************************************
Author: Dr. Saad Laouadi

Last updated: 2025-03-02

----------------------------------------------------
                The loaded packages                 
----------------------------------------------------
pandas_datareader: 0.10.0
matplotlib       : 3.10.0
seaborn          : 0.13.2
numpy            : 1.26.4
pandas           : 1.5.3

****************************************************


## Understanding the DataReader Function

The core function in pandas-datareader is `DataReader()`, which handles retrieving data from various sources. Let's explore how this function works and its parameters in detail.

### Importing the Data Module

The most common way to import the data module is using the alias `web`:

```python
import pandas_datareader.data as web
```

This convention is widely used in the financial data analysis community and in the pandas-datareader documentation. The web alias provides a concise way to access the data module's functions.

### DataReader Function Signature

The complete function signature for DataReader is:

```python
DataReader(name, data_source=None, start=None, end=None, retry_count=3, pause=0.1, session=None, api_key=None)
```

## `DataReader` function Parameters in Details

1. **name** (string or list of strings):
    - Represents the dataset to fetch
    - **For stock data**: ticker symbols (e.g., `'AAPL'`, `'MSFT'`)
    - **For economic data**: series IDs (e.g., `'GDP'`, `'UNRATE'`)
    - Can be a single string or a list of strings for multiple series
    - **Example**: `'GDP'` for US GDP from FRED or `'AAPL'` for Apple stock prices

2. **data_source** (string):
    - The data source to fetch from
    - Common values: `'fred'`, `'yahoo'`, `'wb'`, `'stooq'`, etc.
    - Each source expects different formats for the `name` parameter
    - **Example**: `'fred'` for Federal Reserve Economic Data

3. **start (datetime or string)**:
    - Starting date for the data
    - Can be a `datetime` object or a string in `'YYYY-MM-DD'` format
    - If `None`, fetches earliest available data (source dependent)
    - **Example**: `datetime.datetime(2010, 1, 1)` or `'2010-01-01'`

4. **end (datetime or string)**:
    - Ending date for the data
    - Can be a `datetime` object or a string in `'YYYY-MM-DD'` format
    - If `None`, fetches most recent available data
    - **Example**: `datetime.datetime(2023, 12, 31)` or `'2023-12-31'`

5. **retry_count (int, default 3)**:
    - Number of times to retry the request if it fails
    - Useful for handling temporary API availability issues
    - Higher values increase resilience to connection problems

6. **pause (float, default 0.1)**:
    - Time in seconds to pause between retried requests
    - Helps avoid hitting rate limits when making multiple requests
    - May need to be increased for some data sources with strict rate limiting

7. **session (Session, default None)**:
    - Requests session object to use for HTTP requests
    - Can be used to customize headers, proxies, etc.
    - Useful for more complex authentication scenarios

8. **api_key (string, default None)**:
    - API key for data sources that require authentication
    - Used by sources like IEX, Tiingo, Alpha Vantage
    - Not required for FRED, Yahoo Finance, or World Bank

## Return Value
The `DataReader` function returns data in the following formats:

1. **For single series requests**:
    - Returns a `pandas.DataFrame` with `DatetimeIndex`
    - Columns depend on the data source and series requested

2. **For multiple series requests (when `name` is a list)**:
    - Returns a `DataFrame` with `DatetimeIndex`
    - Each requested series becomes a column in the `DataFrame`

## Source-Specific Behaviors

Different data sources have unique characteristics when used with DataReader:

### FRED

```python
# Returns a DataFrame with DatetimeIndex and a single column named after the series ID
gdp = web.DataReader('GDP', 'fred', start, end)
# gdp has a column named 'GDP'
```

In [3]:
# Returns a DataFrame with DatetimeIndex and a single column named after the series ID
gdp = web.DataReader('GDP', 'fred', start="2018-1-1", end = "2023-12-31")
gdp.head()

Unnamed: 0_level_0,GDP
DATE,Unnamed: 1_level_1
2018-01-01,20328.553
2018-04-01,20580.912
2018-07-01,20798.73
2018-10-01,20917.867
2019-01-01,21111.6


### World Bank (note: slightly different usage pattern)

```python
# Not typically used directly with DataReader, but through the wb.download function
from pandas_datareader import wb
wb_data = wb.download(indicator='NY.GDP.PCAP.CD', country=['US', 'JP'], start=2010, end=2020)
```

### Examples with Different Parameter Configurations

#### Basic Usage

In [4]:
import pandas_datareader.data as web
import datetime

start = datetime.datetime(2020, 1, 1)
end = datetime.datetime(2022, 12, 31)

# Get US GDP data from FRED
gdp = web.DataReader('GDP', 'fred', start, end)
print(gdp.head())

                  GDP
DATE                 
2020-01-01  21727.657
2020-04-01  19935.444
2020-07-01  21684.551
2020-10-01  22068.767
2021-01-01  22656.793


### Multiple Series

In [5]:
# Get multiple economic indicators in one call
indicators = web.DataReader(['GDP', 'UNRATE', 'CPIAUCSL'], 'fred', start, end)
print(indicators.head())

                  GDP  UNRATE  CPIAUCSL
DATE                                   
2020-01-01  21727.657     3.6   259.127
2020-02-01        NaN     3.5   259.250
2020-03-01        NaN     4.4   258.076
2020-04-01  19935.444    14.8   256.032
2020-05-01        NaN    13.2   255.802


### With Custom Session

### With Error Handling

In [6]:
try:
    data = web.DataReader('AAPL', 'yahoo', start, end, retry_count=5, pause=0.5)
except Exception as e:
    print(f"Error fetching data: {e}")
    # Fallback logic or error handling

Error fetching data: 'NoneType' object has no attribute 'group'


### Common Errors and Troubleshooting

1. RemoteDataError: Raised when the requested data isn't available
    - Check that the series ID or ticker symbol is correct
    - Verify the date range is valid for the requested data
    - Ensure the data source is operating normally


2. HTTP 403 Forbidden: Often indicates rate limiting or access restrictions
    - Increase the pause parameter
    - Use a custom session with appropriate headers
    - Check if the data source requires authentication


3. Connection issues:
    - Increase retry_count and pause parameters
    - Verify internet connectivity
    - Consider implementing exponential backoff for retries

### Alternatives to DataReader
For more control or specialized access, pandas-datareader provides source-specific functions:    

In [7]:
# Instead of DataReader('GDP', 'fred', start, end)
from pandas_datareader.fred import FredReader
fred_reader = FredReader('GDP', start=start, end=end)
data = fred_reader.read()

data.head()

Unnamed: 0_level_0,GDP
DATE,Unnamed: 1_level_1
2020-01-01,21727.657
2020-04-01,19935.444
2020-07-01,21684.551
2020-10-01,22068.767
2021-01-01,22656.793
