# Features

Skimpy provides:

- a way to create summary statistics of **pandas** or **Polars** dataframes and print them to your console via the [rich](https://github.com/willmcgugan/rich) package
- support for summarising boolean, numeric, datetime, string, and category datatypes
- a command line interface to `skim` csv files
- intelligent rounding of numerical values to 2 significant figures
- a way to export the visual summary statistics to lossless format (SVG)
- a way to further work with the summary statistics, by returning them as a dictionary
- a way to clean up messy column names

You can find a full guide to the API on the [reference pages](reference/index.qmd).

## Skim a dataframe and return the statistics

If you want to export your results to a dictionary, just pass `return_data=True` to the `skim` function and make sure there's a left-hand side variable to copy the dictionary into. Of course, you can use this returned data in any further application as you need!


In [None]:
from skimpy import skim
from skimpy import generate_test_data
from rich import print
import pandas as pd

df = generate_test_data()

summary = skim(df, return_data=True)

And the dictionary has contents as follows:

In [None]:
print(summary)

## Clean up messy dataframe column names

**skimpy** also comes with a `clean_columns` function as a convenience (with thanks to the [**dataprep**](https://dataprep.ai/) package). This slugifies column names in **pandas** dataframes. For example,

In [None]:
from skimpy import clean_columns

columns = [
    "bs lncs;n edbn ",
    "Nín hǎo. Wǒ shì zhōng guó rén",
    "___This is a test___",
    "ÜBER Über German Umlaut",
]
messy_df = pd.DataFrame(columns=columns, index=[0], data=[range(len(columns))])
print("Column names:")
print(list(messy_df.columns))

Now let's clean these—by default what we get back is in *snake case*:

In [None]:
clean_df = clean_columns(messy_df)
print(list(clean_df.columns))

Other naming conventions are available, for example *camel case*:

In [None]:
clean_df = clean_columns(messy_df, case="camel")
print(list(clean_df.columns))

## Export the visual summary table to SVG

To export the figure containing the table of summary statistics, pass an argument to `skim`'s `record_results_path=` keyword argument. This will save an SVG file to the given (relative) path.

## Run skim on a csv file from the command line

Although it's usually better to set datatypes before running **skimpy** on data, we provide a command line utility that can work with CSV files as a convenience.

You can run this with the below—but note that the command is `skimpy`, the name of the package, rather than `skim`, as in the Python function.


```bash
$ skimpy file.csv
```