# Features

**Skimpy** provides:

- a way to create summary statistics of **pandas** or **Polars** dataframes, using the `skim()` function, and print them to your console via the [rich](https://github.com/willmcgugan/rich) package
- support for summarising boolean, numeric, datetime, timedelta, string, and category datatypes
- a command line interface to `skim` csv files
- intelligent rounding of numerical values to 4 significant figures
- a way to export the visual summary statistics to lossless formats namely SVG or HTML
- a way to further work with the summary statistics, by returning them as a dictionary
- a way to clean up messy column names in both **pandas** and **Polars** dataframes

When using **skimpy**, please be aware that *numerical columns are rounded to 4 significant figures*. You should also be aware that *any timezone-aware datetimes are converted into their naive equivalents*.

You can find a full guide to the API on the [reference pages](reference/index.qmd).

## Skim a dataframe and return the statistics

To use `skim()` in its default mode, see the quickstart on [the homepage](index.ipynb).

If you want to export your results to a dictionary within Python, rather than printing them to console, use the `skim_get_data()` function instead. Let's see an example:

In [None]:
import pandas as pd
from rich import print
from skimpy import generate_test_data, skim_get_data

df = generate_test_data()

summary = skim_get_data(df)

And the dictionary has contents as follows:

In [None]:
print(summary)

## Clean up messy dataframe column names

**skimpy** also comes with a `clean_columns` function as a convenience (with thanks to the [**dataprep**](https://dataprep.ai/) package). This slugifies column names in **pandas** dataframes. For example,

In [None]:
from skimpy import clean_columns

columns = [
    "bs lncs;n edbn ",
    "Nín hǎo. Wǒ shì zhōng guó rén",
    "___This is a test___",
    "ÜBER Über German Umlaut",
]
messy_df = pd.DataFrame(columns=columns, index=[0], data=[range(len(columns))])
print("Column names:")
print(list(messy_df.columns))

Now let's clean these—by default what we get back is in *snake case*:

In [None]:
clean_df = clean_columns(messy_df)
print(list(clean_df.columns))

Other naming conventions are available, for example *camel case*:

In [None]:
clean_df = clean_columns(messy_df, case="camel")
print(list(clean_df.columns))

## Export the visual summary table to SVG

To export the figure containing the table of summary statistics, use the `skim_get_figure()` function. This will save an SVG file to the given (relative) path that you pass with the `save_path` argument.

## Run skim on a csv file from the command line

Although it's usually better to set datatypes before running **skimpy** on data, we provide a command line utility that can work with CSV files as a convenience.

You can run this with the below—but note that the command is `skimpy`, the name of the package, rather than `skim`, as in the Python function.


```bash
$ skimpy file.csv
```