## Libraries used in this section

In [None]:
import pandas as pd

---

## Typing

Large dataframes often have large amounts of ram memory to be stored. Aside from alternatives like the Dask library, and others similar, the only way to optimaze the need for ram without changing any value is to change the dtype of columns. Some key points:

- Pandas has default types: int64, float64 and object, and apply thoses to .csv files
- If you want to change the dtype and have it saved in a file, you need to use other extension, like parquet or feather
- Pandas provides a wide range of dtypes, category is a great example, that can help greatly reduce the ram needed

This function tries to optimaze by trying other dtypes

In [None]:
def type_test(df):
    for col in df.columns:
        menor_tipo = df[col].dtype
        menor_memo = df[col].memory_usage()

        # int 
        if df[col].isna().sum() == 0:
            try:
                aux = pd.to_numeric(df[col], downcast='integer')

                a = aux.memory_usage()

                if menor_memo > a:
                    menor_tipo = aux.dtype
                    menor_memo = a
            except ValueError:
                print(f'Coluna {col} - int value error')

        # cat
        try: 
            aux = df[col].astype("category")

            a = aux.memory_usage()

            if menor_memo > a:
                menor_tipo = aux.dtype
                menor_memo = a
        except ValueError:
            print(f'Coluna {col} - cal value error')

        # float
        try:
            aux = pd.to_numeric(df[col], downcast='float')

            a = aux.memory_usage()

            if menor_memo > a:
                menor_tipo = aux.dtype
                menor_memo = a
        except ValueError:
            print(f'Coluna {col} - float value error')

        df[col] = df[col].astype(menor_tipo)
        

> Important information about the ``category`` dtype. It can only be saved in a file (no matter the extention), if the column is first converted to string. It happens because of pandas' default dtypes, ex. since the column has values similar to ``John``, and not ``'John'``, it assumes it is an object