# Data Import/Export: CSV, Excel, JSON, SQL, and APIs in Pandas

### What Is Data Import/Export?

Real-world data doesn't always come as preloaded DataFrames. Most often, data is stored in external formats like CSVs, Excel files, JSON, SQL databases, or served through APIs. Being able to **read from** and **write to** these formats is crucial for any data pipeline.

Pandas offers powerful built-in tools for importing/exporting all these file types with just one or two lines of code. Mastering this allows us to bring in raw datasets, clean and analyze them, and save our results efficiently.

## Reading Data with Pandas

1. **CSV Files** (Most common format)

In [None]:
import pandas as pd

df = pd.read_csv("data/train.csv")  # Load Titanic data

Other options:

In [None]:
pd.read_csv("data.csv", delimiter=";", encoding="utf-8", nrows=100)

2. **Excel Files**

In [None]:
df_excel = pd.read_excel("data/titanic.xlsx", sheet_name="Sheet1")

3. **JSON Files**

In [None]:
df_json = pd.read_json("data/titanic.json")

For nested JSONs: `pd.json_normalize()` helps flatten them.

4. **SQL Databases**

In [None]:
import sqlite3

conn = sqlite3.connect("data/titanic.db")
df_sql = pd.read_sql_query("SELECT * FROM passengers", conn)

5. **APIs (Web Data)**

Use `requests` library to fetch data from web APIs:

In [None]:
import requests

response = requests.get("https://api.example.com/titanic")
data = response.json()
df_api = pd.DataFrame(data)

## Exporting Data with Pandas

1. **To CSV**

In [None]:
df.to_csv("data/titanic_clean.csv", index=False)

2. **To Excel**

In [None]:
df.to_excel("data/titanic_clean.xlsx", index=False)

3. **To JSON**

In [None]:
df.to_json("data/titanic_clean.json", orient="records")

4. **To SQL**

In [None]:
df.to_sql("clean_passengers", conn, if_exists="replace", index=False)

## AI/ML Use Case: Data Sources for Modeling

In machine learning workflows:

- We may **ingest data from APIs**, public datasets, or databases.
- Store **cleaned features into `.csv`/`.parquet`** for modeling.
- Use `.to_sql()` for **tracking pipeline outputs**.
- Export predictions or dashboards as `.xlsx` or `.json`.

Good data I/O (input/output) handling makes our projects reproducible, shareable, and production-ready.

## Exercises

**Q1.** Load the Titanic CSV file into a DataFrame.

In [None]:
df = pd.read_csv("data/train.csv")
print(df.head())

**Q2.** Export the top 100 rows to Excel.

In [None]:
df.head(100).to_excel("data/top100.xlsx", index=False)

**Q3.** Save a JSON file with selected columns.

In [None]:
df[['PassengerId', 'Name', 'Survived']].to_json("data/passengers.json", orient="records")

**Q4.** Load mock SQL table and query passengers who survived.

In [None]:
conn = sqlite3.connect("data/titanic.db")
df_survived = pd.read_sql_query("SELECT * FROM passengers WHERE Survived=1", conn)

**Q5.** Load data from a public API and convert it into DataFrame.

In [None]:
response = requests.get("https://api.example.com/titanic")
df_api = pd.DataFrame(response.json())

## Summary

Pandas makes importing and exporting data simple across all common formats — from local files like CSV and Excel to complex SQL databases and online APIs. This versatility enables seamless data ingestion, transformation, and sharing. Whether we're loading a dataset for analysis or saving ML model outputs, Pandas gives us precise control over file handling.

Understanding `read_` and `to_` methods ensures our data workflows are **automated**, **scalable**, and **ready for production**. For machine learning and analytics, clean I/O is the bridge between raw data and meaningful insights. Always ensure our files are well-formatted and named clearly when saving — this keeps our pipeline reproducible and professional.