<blockquote style="
    padding: 10px 15px;
    border: 2px solid #360084;
    border-radius: 8px;
    margin: 20px 5px 15px 0;
    background: #fafafa;
    box-shadow: 2px 2px 10px rgba(0, 0, 0, 0.1);
">
  <p style="
      padding: 12px;
      font-size: 22pt;
      font-weight: bold;
      color: #fff;
      background: linear-gradient(to right, #360084, #7a1fa2);
      border-radius: 6px 6px 0 0;
      text-align: center;
      margin: -10px -15px 15px;
  ">Importing Economic Data with Pandas</p>
  <div style="
      background-color: #f7f7f7;
      padding: 15px;
      border-radius: 6px;
  ">
    <div class="row">
      <div class="col-md-6">
        <strong>📚 Course:</strong> <span style="color:#360084;">Advanced Econometrics with Python</span><br/>
        <strong>📖 Chapter:</strong> <span style="color:#360084;">Data Management and Manipulation</span> <br/>
        <strong>🎯 Lesson:</strong> <span style="color:#360084;">Importing Economic Data</span><br/>
        <strong>👨‍🏫 Author:</strong> <span style="color:#360084;">Dr. Saad Laouadi</span>
      </div>
    </div>
  </div>
  <div style="
    background-color: #f8fafc;
    padding: 20px;
    border-radius: 8px;
    border-left: 4px solid #0284c7;
    box-shadow: 0 2px 4px rgba(0, 0, 0, 0.1);
    margin: 20px 0;
">
    <strong style="color: #0284c7; font-size: 18px;">🎯 Learning Objectives</strong>
      <h4>Import economic data from various file formats</h4>
    <ul style="padding-left: 20px; font-size: 16px; line-height: 1.6; margin-top: 12px;">
        <li>CSV Files</li>
        <li>Excel files</li>
        <li>JSON files</li>
        <li>Statistical Software: Stata, SPSS, SAS</li>
    </ul>
</div>
  <p style="
      text-align: center;
      font-size: 14px;
      font-style: italic;
      color: #777;
      margin-top: 15px;
  ">© 2025 Dr. Saad Laouadi. All Rights Reserved.</p>
</blockquote>

In [1]:
# ========================================= #
# Setting Up Our Environment
# ========================================= #
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set display options
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.precision', 2)

## 2. Reading Different File Formats

### 2.1 Reading CSV Files
CSV (Comma Separated Values) is one of the most common formats for economic data.

```python
# Basic CSV reading
df_basic = pd.read_csv('example.csv')

# CSV with specific options
df_advanced = pd.read_csv('example.csv',
    sep=',',              # Delimiter to use
    decimal='.',          # Decimal separator
    thousands=',',        # Thousands separator
    encoding='utf-8',     # File encoding
    na_values=['NA', 'missing'],  # Values to treat as NA
    parse_dates=['Date']  # Columns to parse as dates
)
```

In [2]:
# Practice: Read the temperature dataset
data_path = "../datasets/temperatures.csv"

temp = pd.read_csv(data_path)

# Check data information
temp.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16500 entries, 0 to 16499
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   date        16500 non-null  object 
 1   city        16500 non-null  object 
 2   country     16500 non-null  object 
 3   avg_temp_c  16407 non-null  float64
dtypes: float64(1), object(3)
memory usage: 515.8+ KB


### 2.2 Reading Excel Files
Excel files are commonly used by economic institutions.

```python
# Basic Excel reading
df_excel = pd.read_excel('example.xlsx')

# Excel with specific options
df_excel_advanced = pd.read_excel('example.xlsx',
    sheet_name='Data',    # Specific sheet to read
    header=0,            # Row number(s) to use as header
    skiprows=2,          # Number of rows to skip
    usecols='A:E',       # Columns to use
    na_values=['NA', ''], # Values to treat as NA
    parse_dates=['Date']  # Columns to parse as dates
)

# Reading multiple sheets
all_sheets = pd.read_excel('example.xlsx', sheet_name=None)
```

In [3]:
# Practice: Read excel file
excel_file_path = "../datasets/collegecost.xlsx"

data = pd.read_excel(excel_file_path)

In [4]:
# Check data info
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1571 entries, 0 to 1570
Data columns (total 45 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   instnm     1571 non-null   object 
 1   unitid     1571 non-null   int64  
 2   private    1571 non-null   int64  
 3   year       1571 non-null   int64  
 4   tc         1571 non-null   float64
 5   cf         1571 non-null   float64
 6   tt         1571 non-null   float64
 7   ptf        1571 non-null   float64
 8   ga         1571 non-null   float64
 9   ftef       1571 non-null   float64
 10  ftestu     1571 non-null   float64
 11  ftgrad     1571 non-null   float64
 12  ptstu      1571 non-null   float64
 13  ttnap      1571 non-null   float64
 14  ttnap2     1571 non-null   float64
 15  staffsal   1571 non-null   float64
 16  benstaff   1571 non-null   float64
 17  ftenap     1571 non-null   float64
 18  ftenap2    1571 non-null   float64
 19  ftenpro    1571 non-null   float64
 20  ptnap   

### 2.3 Reading JSON Data
JSON format is common when working with APIs.


```python
# Reading JSON files
df_json = pd.read_json('example.json')

# Reading JSON with specific options
df_json_advanced = pd.read_json('example.json',
    orient='records',     # JSON string format
    lines=True,          # Read file as JSON Lines format
    convert_dates=['date'] # Columns to parse as dates
)
```

In [5]:
# Practice: Read JSON file
json_file_path = "../datasets/chemical_dataset.json"

# read the data
data = pd.read_json(json_file_path)

In [6]:
# Check the data info
data.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1200 entries, 0 to 1199
Data columns (total 13 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   year         1200 non-null   int64  
 1   firm         1200 non-null   int64  
 2   lsales       1200 non-null   float64
 3   lcapital     1200 non-null   float64
 4   llabor       1200 non-null   float64
 5   sk_labor     1200 non-null   float64
 6   lmaterials   1200 non-null   float64
 7   foreign      1200 non-null   float64
 8   export       1200 non-null   int64  
 9   intangibles  1200 non-null   int64  
 10  ownership    1200 non-null   int64  
 11  latitude     1200 non-null   float64
 12  longitude    1200 non-null   float64
dtypes: float64(8), int64(5)
memory usage: 131.2 KB


### 2.4 Reading Statistical Software Files (STATA, SPSS, SAS)
Statistical software files are commonly used in academic research and government institutions.

```python
# Reading STATA files (.dta)
df_stata = pd.read_stata('econometrics_data.dta',
    convert_dates=True,          # Convert date variables
    preserve_dtypes=True,        # Keep original data types
    convert_categoricals=True    # Convert categorical variables
)

# Reading SPSS files (.sav)
df_spss = pd.read_spss('survey_data.sav',
    usecols=['income', 'education', 'employment'],  # Select specific columns
    convert_categoricals=True    # Convert categorical variables to pandas categories
)

# Reading SAS files (.sas7bdat)
df_sas = pd.read_sas('financial_data.sas7bdat',
    encoding='latin1',          # Specify encoding
    format='sas7bdat',         # SAS file format
    chunksize=10000           # Read file in chunks for large datasets
)
```


**Note:** Reading SAS and SPSS files requires additional dependencies:
  - For SAS: `pip install sas7bdat`
  - For SPSS: `pip install pyreadstat`

In [7]:
# Practice: Read stata files
stata_file_path = "../datasets/wooldridge-statafiles/APPLE.DTA"
apple_data = pd.read_stata(stata_file_path)

In [8]:
# Check the data info
apple_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 660 entries, 0 to 659
Data columns (total 17 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   id        660 non-null    int16  
 1   educ      660 non-null    int8   
 2   date      660 non-null    object 
 3   state     660 non-null    object 
 4   regprc    660 non-null    float32
 5   ecoprc    660 non-null    float32
 6   inseason  660 non-null    int8   
 7   hhsize    660 non-null    int8   
 8   male      660 non-null    int8   
 9   faminc    660 non-null    int16  
 10  age       660 non-null    int8   
 11  reglbs    660 non-null    float32
 12  ecolbs    660 non-null    float32
 13  numlt5    660 non-null    int8   
 14  num5_17   660 non-null    int8   
 15  num18_64  660 non-null    int8   
 16  numgt64   660 non-null    int8   
dtypes: float32(4), int16(2), int8(9), object(2)
memory usage: 29.1+ KB


### Next Steps

In the next we will read data from common economic databases.