
# STEP 11: Save Clean Data (Complete Pandas Guide)

This notebook covers **ALL practical and real-world ways** to SAVE cleaned data
after validation.

Focus: **CSV, Excel, JSON, Parquet, databases, versioning, and best practices**.


In [None]:

import pandas as pd
import numpy as np


## 1. Sample Cleaned Dataset

In [None]:

df = pd.DataFrame({
    "order_id": [1001, 1002, 1003, 1004],
    "customer": ["Alice", "Bob", "Charlie", "David"],
    "amount": [2500.0, 1800.0, 2200.0, 3000.0],
    "city": ["Mumbai", "Delhi", "Mumbai", "Pune"],
    "order_date": pd.to_datetime(["2024-01-01", "2024-01-02", "2024-01-03", "2024-01-04"])
})
df


## 2. Save Data as CSV

In [None]:

# Basic CSV
df.to_csv("clean_data.csv", index=False)

# With custom separator
df.to_csv("clean_data_pipe.csv", sep='|', index=False)


## 3. Save Data as Excel

In [None]:

# Single sheet
df.to_excel("clean_data.xlsx", index=False)

# Multiple sheets
with pd.ExcelWriter("clean_data_multi.xlsx") as writer:
    df.to_excel(writer, sheet_name="orders", index=False)


## 4. Save Data as JSON

In [None]:

df.to_json("clean_data.json", orient='records', lines=True)


## 5. Save Data as Parquet (Big Data)

In [None]:

# Requires pyarrow or fastparquet
# df.to_parquet("clean_data.parquet", index=False)


## 6. Save Data as Pickle (Fast, Python-only)

In [None]:

df.to_pickle("clean_data.pkl")


## 7. Save Data to SQL Database

In [None]:

# from sqlalchemy import create_engine
# engine = create_engine("sqlite:///clean_data.db")
# df.to_sql("orders", engine, if_exists='replace', index=False)


## 8. Versioning Clean Data

In [None]:

# Example naming strategy
version = "v1_0"
file_name = f"clean_data_{version}.csv"
df.to_csv(file_name, index=False)


## 9. Compression while Saving

In [None]:

df.to_csv("clean_data.csv.gz", compression='gzip', index=False)



## ✅ Best Practices & Interview Notes
- Always save AFTER final validation
- Prefer CSV or Parquet for analytics
- Use versioning for reproducibility
- Never overwrite raw data
- Compression helps with large datasets



## ✔ Summary
- `to_*()` methods export data
- Choose format based on use-case
- Versioning & compression are enterprise practices
- Saving is the final gate of data cleaning
