# 5. Importing and Exporting Data

In this notebook, we will explore how to import data from various sources (CSV, Excel, JSON, SQL, APIs, and binary formats) into Pandas DataFrames and export data to different formats. We will also discuss customizing import/export options and handling compression.

## Topics Covered:
- Reading data from various formats: CSV, Excel, JSON, Parquet, SQL, HTML, APIs, Binary formats (Feather, ORC, HDF5)
- Customizing import/export options
- Handling compression (gzip, bz2, etc.)
- Practical: Import/export with various formats


## Reading Data from CSV

Comma-Separated Values (CSV) files are one of the most common formats for tabular data.
We will use the COVID-19 dataset from Indonesia as an example.

### Steps:
1. Use the `read_csv` function from Pandas.
2. Specify the file path.
3. Use parameters like `delimiter`, `header`, and `index_col` as needed.

In [None]:
import pandas as pd

# Reading data from a CSV file
csv_file = '../DataSets/Data_COVID19_Indonesia.csv'
try:
    covid_data = pd.read_csv(csv_file)
    print(covid_data.head())
except FileNotFoundError:
    print(f'Error: The file {csv_file} does not exist. Please check the file path.')

## Reading Data from Excel

Excel files are often used for sharing structured data. Pandas provides the `read_excel` function to work with Excel files.

We will use the IMF Investment and Capital Stock dataset as an example.

### Steps:
1. Use the `read_excel` function from Pandas.
2. Specify the sheet name using the `sheet_name` parameter.
3. Check for missing values and column types.

In [None]:
# Reading data from an Excel file
excel_file = '../DataSets/IMFInvestmentandCapitalStockDataset2021.xlsx'
try:
    imf_data = pd.read_excel(excel_file, sheet_name='Datasets')
    print(imf_data.head())
except FileNotFoundError:
    print(f'Error: The file {excel_file} does not exist. Please check the file path.')

## Reading Data from JSON

JSON (JavaScript Object Notation) is widely used for APIs and web data. Pandas can easily handle JSON files using the `read_json` function.

We will use a mock signup dataset in JSON format as an example.

### Steps:
1. Use the `read_json` function from Pandas.
2. Specify the file path.
3. Use parameters like `orient` for nested JSON files.

In [None]:
# Reading data from a JSON file
json_file = '../DataSets/users.JSON'
try:
    json_data = pd.read_json(json_file)
    print(json_data.head())
except ValueError:
    print(f'Error: Unable to parse JSON file {json_file}. Check the file content.')

## Reading Data from SQL

Structured Query Language (SQL) databases are widely used for storing large datasets. Pandas provides the `read_sql_query` function to query databases.

We will use a mock signup dataset stored in an SQL database as an example.

### Steps:
1. Use the `sqlite3` library to connect to the database.
2. Write a SQL query to fetch data.
3. Use the `read_sql_query` function to load the data into a Pandas DataFrame.

In [None]:
# Reading data from an SQL database
import sqlite3

# Connect to the database
sql_file = '../DataSets/mock_signup.db'
try:
    connection = sqlite3.connect(sql_file)
    sql_query = 'SELECT * FROM mock_signup'
    sql_data = pd.read_sql_query(sql_query, connection)
    print(sql_data.head())
except sqlite3.DatabaseError:
    print(f'Error: Unable to connect to the database {sql_file}. Check the file and query.')

## Reading Data from APIs

APIs (Application Programming Interfaces) provide dynamic data access over the web. We use the `requests` library to fetch data and convert it into a DataFrame.

### Steps:
1. Use the `requests.get` function to fetch JSON data from an API.
2. Convert the JSON response into a Pandas DataFrame using `pd.json_normalize()`.



In [None]:
import requests

# Fetch data from a sample API
api_url = 'https://jsonplaceholder.typicode.com/posts'
response = requests.get(api_url)

if response.status_code == 200:
    api_data = response.json()
    df_api = pd.json_normalize(api_data)
    print('Data fetched from API:')
    print(df_api.head())
else:
    print(f'Failed to fetch data. HTTP Status Code: {response.status_code}')

## Reading and Writing Binary Formats

Binary file formats such as Feather, ORC, and HDF5 are optimized for fast reading and writing. Pandas supports these formats for efficient data storage and retrieval.

### Steps:
1. Use functions like `to_feather`, `to_parquet`, and `to_hdf` to write binary files.
2. Use corresponding `read_` functions like `read_feather`, `read_parquet`, and `read_hdf` to load them.



In [None]:
# Writing and reading a Feather file
feather_file = '../tmp/data.feather'
df_api.to_feather(feather_file)
print(f'Data written to Feather format at {feather_file}')

# Reading a Feather file
df_feather = pd.read_feather(feather_file)
print('Data read from Feather format:')
print(df_feather.head())

## Exporting Data to CSV, Excel, and Other Formats

Pandas makes it easy to export data to different formats using methods like `to_csv` and `to_excel`.

### Steps:
1. Use the appropriate method based on the desired format.
2. Specify the file path and optional parameters like `index`.

In [None]:
# Exporting data to a CSV file
output_csv_file = '../tmp/exported_covid_data.csv'
covid_data.to_csv(output_csv_file, index=False)
print(f'Data exported to {output_csv_file}')

# Exporting data to an Excel file
output_excel_file = '../tmp/exported_imf_data.xlsx'
imf_data.to_excel(output_excel_file, index=False)
print(f'Data exported to {output_excel_file}')

## Handling File Paths and Common Errors

When working with files, common errors include:

- **FileNotFoundError**: Ensure the file path is correct. Use absolute paths if needed.
- **UnsupportedFormatError**: Verify the file format is supported by Pandas.
- **PermissionError**: Check if you have write permissions for the directory when exporting files.
- **ValueError**: Ensure the data format matches the expected input for functions.

## Handling Compression

Compression reduces file size for storage and transmission. Pandas supports compression formats like gzip, bz2, and zip for both import and export.

### Example:
- Reading a compressed file using `read_csv` with the `compression` parameter.
- Exporting a DataFrame to a compressed file using `to_csv` with `compression`.



In [None]:
# Reading compressed data
compressed_input = '../DataSets/compressed_data.csv.gz'
try:
    compressed_data = pd.read_csv(compressed_input, compression='gzip')
    print('Data read from compressed file:')
    print(compressed_data.head())
except FileNotFoundError:
    print(f'Error: The file {compressed_input} does not exist.')

## Customizing Import and Export Options

You can customize import/export options to handle specific requirements, such as setting delimiters, encoding, column selection, or compression.

### Example:
- Setting a custom delimiter while reading CSV files using the `sep` parameter.
- Exporting data with specific encoding (e.g., UTF-8).



In [None]:
# Customizing import options
csv_file = '../DataSets/custom_delimiter_data.csv'
try:
    custom_data = pd.read_csv(csv_file, sep=';', encoding='utf-8')
    print('Data read with custom delimiter:')
    print(custom_data.head())
except FileNotFoundError:
    print(f'Error: The file {csv_file} does not exist.')

# Exporting data with compression
compressed_file = '../tmp/compressed_data.csv.gz'
df_api.to_csv(compressed_file, index=False, compression='gzip')
print(f'Data exported with gzip compression to {compressed_file}')

## Practical: Reading and Saving Datasets

Let’s combine all the knowledge to read a dataset, filter data, and export it in various formats, including compressed formats.

### Steps:
1. Read a CSV file using `read_csv`.
2. Filter rows based on specific conditions.
3. Export the filtered data to multiple formats (CSV, Feather, gzip).



In [None]:
# Practical example
# Filter COVID data for provinces with more than 10,000 total cases
filtered_data = covid_data[covid_data['Total Cases'] > 10000]

# Export filtered data to CSV
filtered_csv = '../tmp/filtered_covid_data.csv'
filtered_data.to_csv(filtered_csv, index=False)
print(f'Filtered data exported to CSV at {filtered_csv}')

# Export filtered data to Feather format
filtered_feather = '../tmp/filtered_covid_data.feather'
filtered_data.to_feather(filtered_feather)
print(f'Filtered data exported to Feather format at {filtered_feather}')

# Export filtered data to compressed CSV
filtered_compressed = '../tmp/filtered_covid_data.csv.gz'
filtered_data.to_csv(filtered_compressed, index=False, compression='gzip')
print(f'Filtered data exported with compression to {filtered_compressed}')