# Compressario basics

This notebook demonstates the basic usage of compressario on synthetic data.

In [1]:
%load_ext autoreload
%autoreload 2

import random
import datetime

import numpy as np
import pandas as pd
from visions import StandardSet

from compressio import Compress, storage_size, savings

Generate a DataFrame with various types:

In [2]:
n = 1000000
df = pd.DataFrame({
    'integer': [random.choice([0, 1, 2]) for _ in range(n)], 
    'integer_missing': pd.Series([random.choice([0, 1, 2, np.nan]) for _ in range(n)], dtype="Int32"),
    'float': [3.0 for i in range(n)],
    'complex': pd.Series([complex(0, 2) for i in range(n)], dtype='complex128'),
    'object': ['strings' for i in range(n)],
    'datetime': pd.Series([datetime.datetime(2020, 10, 10) for i in range(n)])
})

Initialize the `Compress` object:

In [3]:
compress = Compress()

We start off with around 53 MB of data:

In [4]:
original_size = storage_size(df).to('megabyte')
print(f'Original DataFrame size: {original_size}')

Original DataFrame size: 53.000128 megabyte


This line of code automatically compresses the DataFrame:

In [5]:
df_compressed = compress.it(df)

Let's see what has changed:

In [6]:
savings(df, df_compressed)

In [7]:
df_compressed.memory_usage()

Index                  128
integer            1000000
integer_missing    2000000
float              2000000
complex            8000000
object             1000088
datetime           1000088
dtype: int64

In [8]:
df.memory_usage()

Index                   128
integer             8000000
integer_missing     5000000
float               8000000
complex            16000000
object              8000000
datetime            8000000
dtype: int64