# Pickle

In [1]:
import pandas as pd

1. Writing a DataFrame to a Pickle File

In [2]:
df = pd.DataFrame({
    "Name" : ["Hassam", "Ahmad"],
    "Age" : [24, 23]
})

df.to_pickle("Data_pickle.pkl")
df.info

<bound method DataFrame.info of      Name  Age
0  Hassam   24
1   Ahmad   23>

2. Reading a Pickle File

In [11]:
df2 = pd.read_pickle("Data_pickle.pkl")
df2

Unnamed: 0,Name,Age
0,Hassam,24
1,Ahmad,23


COMPRESSION SUPPORT

In [None]:
df.to_pickle("Data_pickle.pkl.gz", compression="gzip")
df.to_pickle("Data_pickle.pkl.bz2", compression="bz2")
df.to_pickle("Data_pickle.pkl.xz", compression="xz")

And reading works automatically:

In [None]:
pd.read_pickle("data.pkl.gz")

Sometimes you want to store multiple objects in one file:

In [12]:
import pickle

with open("Data_pickle.pkl", "wb") as f:
    pickle.dump(df, f)
    pickle.dump([1,2,3], f)
    pickle.dump({"a": 10}, f)

In [14]:
with open("Data_pickle.pkl", "rb") as f:
    df1 = pickle.load(f)
    lst = pickle.load(f)
    dct = pickle.load(f)

STORE ANY PYTHON OBJECT IN A DATAFRAME AND PICKLE IT

In [15]:
df = pd.DataFrame({
    "objects": [ [1,2,3], {"a": 1}, (4,5,6) ]
})

df.to_pickle("weird_pickle.pkl")



'''
Pickle can handle ANY Python object.
CSV/JSON cannot.
'''

'\nPickle can handle ANY Python object.\nCSV/JSON cannot.\n'

#### Why Pickle is Faster Than CSV, JSON, Excel

Pickle writes the internal memory representation of the DataFrame directly.

CSV must: Convert every value to text, Handle separators and quoting

JSON must: Create nested structures, Escape and encode data

Excel must: Build XML structures, Embed styles, Handle cells individually

#### Pickle → binary dump → instant restore.

Security Threat

If you load an untrusted pickle file: pd.read_pickle("virus.pkl")

That file can execute arbitrary code on your system.
Pickle is unsafe by design.

# Flat Files

1. read_table()

read_table() reads any text file that uses a delimiter between fields.

It is very similar to read_csv() but defaults to tab-separated values (\t).

In [None]:
df = pd.read_table("data.txt")
print(df)

# If the file is tab-delimited, this works out of the box.

Custom Delimiter

df = pd.read_table("data.txt", sep="|")

Specify Column Names

df = pd.read_table("data.txt", sep=",", names=["A", "B", "C"])

Skip Rows

Useful when a file has comments or metadata.

In [None]:
df = pd.read_table("data.txt", skiprows=3)

Handle Missing Values

In [None]:
df = pd.read_table("data.txt", na_values=["NA", "missing", "---"])

High-Performance Chunk Reading

For huge files:

In [None]:
df_iter = pd.read_table("bigfile.txt", sep="\t", chunksize=100000)

for chunk in df_iter:
    process(chunk)

Mixed Encoding, Large File, Complex Parsing

In [None]:
df = pd.read_table(
    "logs_2025.txt",
    sep="|",
    engine="python",
    encoding_errors="ignore",
    dtype={
        "id": "int32",
        "ts": "string",
        "value": "float32"
    },
    parse_dates=["ts"],
    on_bad_lines="skip",
    chunksize=200_000
)

2. read_csv()

The most widely used I/O function in pandas.

In [None]:
df = pd.read_csv("file.csv")

Custom Delimiter

In [None]:
df = pd.read_csv("file.csv", sep="|")

Speed Tip: Give dtypes

Massive optimization:

In [None]:
df = pd.read_csv("file.csv", dtype={"id": "int32", "age": "int8"})

Handling Dates

In [None]:
df = pd.read_csv("data.csv", parse_dates=["date"])

Skipping Bad Lines

In [None]:
df = pd.read_csv("file.csv", on_bad_lines="skip")

Reading in Chunks

In [None]:
reader = pd.read_csv("huge_file.csv", chunksize=500000)
for chunk in reader:
    process(chunk)

schema + converters + performance

In [None]:
df = pd.read_csv(
    "transactions.csv",
    sep=",",
    dtype={
        "user_id": "int32",
        "country": "category",
        "amount": "float32"
    },
    parse_dates=["timestamp"],
    converters={
        "tags": lambda x: x.split("|")
    },
    skip_blank_lines=True,
    low_memory=False,
    na_values=["", "null", "NA", "missing"],
)

3. read_fwf() (Fixed-width formatted files)

Name     Age Score
John     23  89
Alex     45  77

In [None]:
df = pd.read_fwf("file.txt")

Provide column widths

In [None]:
df = pd.read_fwf("file.txt", widths=[10, 5, 5])

Auto-detect + custom colspecs

In [None]:
colspecs = [(0, 10), (10, 15), (15, 20)]

df = pd.read_fwf(
    "report.txt",
    colspecs=colspecs,
    header=None,
    names=["Name", "Age", "Score"]
)