# This notebook compares the speed of loading CSV data as dataframe with pandas vs feather format.

Feather objects are fast, lightweight, and easy to use for storing dataframes. The feather dataset is available [here](https://www.kaggle.com/yusufmuhammedraji/tpsdec2021featherdataset).

Refernces

* [Stop Using CSVs for Storage — This File Format Is 150 Times Faster](https://towardsdatascience.com/stop-using-csvs-for-storage-this-file-format-is-150-times-faster-158bd322074e)
* [Feather Files: Faster Than the Speed of Light](https://medium.com/@steven.p.dye/feather-files-faster-than-the-speed-of-light-d4666ce24387)

In [None]:
import feather
import pandas as pd
from pathlib import Path


In [None]:
data_dir = Path('../input/tabular-playground-series-dec-2021')

# Load the train and test data using pandas

In [None]:
%%time

train_df = pd.read_csv(data_dir / "train.csv")
test_df = pd.read_csv(data_dir / "test.csv")

# Load the train and test data using pandas, then convert to feather

In [None]:
%%time

# convert the dataframe to feather
train_df = pd.read_csv(data_dir / "train.csv").to_feather("train_pd.feather")
test_df = pd.read_csv(data_dir / "test.csv").to_feather("test_pd.feather")

In [None]:
%%time

# then load it into a dataframe using pandas read_feather method
train_df = pd.read_feather("train_pd.feather")
test_df = pd.read_feather("test_pd.feather")

# Load the train and test dataset using feather


In [None]:
%%time

train_df_feather = feather.read_dataframe('train_pd.feather')
test_df_feather = feather.read_dataframe('test_pd.feather')

# Save dataset to feather format

In [None]:
%%time

feather.write_dataframe(train_df_feather, "train.feather")
feather.write_dataframe(test_df_feather, "test.feather")

# Conclusion

The following summarises the time (approximate) it took to load the data with pandas, pandas read_feather method and feather.
* pandas  ~16.1 seconds
* pandas read_feather ~1.6 seconds
* feather ~1.3 seconds

So, just use feather, it's FAST!