# File formats

In principle, Niimpy can deal with any files of any format - you only need to convert them to a standard DataFrame.  Still, it is very useful to have some common formats, so we present two standard formats with default readers:

* **CSV files** are very standard and normal to create and understand, but in order to deal with them everything must be loaded into memory.
* **sqlite3 databases**, which requires sqlite3 to read, but provides more power for filtering and automatic processing without reading everything into memory.

## DataFrame format (in-memory)

In-memory, data is stored in a [pandas DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html).  This is basically a normal dataframe.  There are some standardized columns (see {doc}`schema`) and the index is a DatetimeIndex.

## CSV files

CSV files should have a header that lists the column names and generally be readable by `pandas.read_csv`.

Reading these can be done with `niimpy.read_csv`:

In [1]:
TZ = 'Europe/Helsinki'
import niimpy
# niimpy.sampledata.DATA_CSV is a string containing a filename file
df = niimpy.read_csv(niimpy.sampledata.DATA_CSV, tz=TZ)

## sqlite3 databases

For the purposes of niimpy, sqlite3 databases can generally be seen as supercharged CSV files.

A single database file could contain multiple datasets within it, thus when reading them a **table name** must be specified.

One reads the entire database into memory using `sqlite.read_sqlite`:

In [2]:
# niimpy.sampledata.DATA is a string containing a filename
df = niimpy.read_sqlite(niimpy.sampledata.DATA, 'AwareScreen', tz=TZ)

You can list the tables within a database using `niimpy.read_sqlite_tables`:

In [3]:
niimpy.read_sqlite_tables(niimpy.sampledata.DATA)

{'AwareScreen'}

sqlite3 files are highly recommended as a data storage format, since many common exploration options can be done within the database itself without reading the whole data into memory or writing an iterator.  However, the interface is more difficult to use.  Niimpy (before 2021-07) used this as its primary interface, but since then this interface has been de-emphasized.  You can read more in {doc}`database`, but this is only recommended if you need efficiency when using massive amounts of data.

## Other formats

You can use any format you can read with pandas.  Apply the function `niimpy.util.df_normalize` in order to fix it up into the proper format.