In [1]:
import logging
logging.basicConfig(level=logging.INFO)

from pystatis import Table

# The `Table` class

The `Table` class in `pystatis` is the main interface for users to interact with the different databases and download the data/tables in form of `pandas` data frames.

To use the class, you have to pass only a single parameter: the `name` of the table you want to download.

In [2]:
t = Table(name="81000-0001")

## Downloading data

However, creating a new `Table` instance does not automatically retrieve the data from the database (or cache). Instead, you have to call another method: `get_data()`. The reason for this decision was to give you full control over the download process and avoid unnecessary downloads of big tables unless you are certain you want to start the download.

In [3]:
t.get_data()

INFO:pystatis.cache:Data was successfully cached under /Users/miay/.pystatis/data/81000-0001/22f1c95a5b4dbac1ad74/20240625.csv.
INFO:pystatis.http_helper:Code 0: erfolgreich


You can access the name of a table via the `.name` attribute.

In [4]:
t.name

'81000-0001'

After a successful download (or cache retrieval), you can always access the raw data, that is the original response from the web API as string, via the `.raw_data` attribute.

In [8]:
print(t.raw_data)

Statistik_Code;Statistik_Label;Zeit_Code;Zeit_Label;Zeit;1_Merkmal_Code;1_Merkmal_Label;1_Auspraegung_Code;1_Auspraegung_Label;2_Merkmal_Code;2_Merkmal_Label;2_Auspraegung_Code;2_Auspraegung_Label;BWS001__Bruttowertschoepfung__jew._ME;STR006__Guetersteuern_abzuegl._Guetersubventionen__jew._ME;STR020_______Guetersteuern__jew._ME;SUB003_______Guetersubventionen__jew._ME;VGR014__Bruttoinlandsprodukt__jew._ME;BIP005__nachr.:_Bruttoinlandsprodukt_(Veraenderung_in_%)__Prozent;BIP004__nachr.:_Bruttoinlandsprodukt_je_Einwohner__jew._ME
81000;Volkswirtschaftliche Gesamtrechnungen des Bundes;JAHR;Jahr;2014;DINSG;Deutschland insgesamt;DG;Deutschland;VGRPB5;Preisbasis;VGRJPM;in jeweiligen Preisen (Mrd. EUR);2635,393;292,037;298,774;6,737;2927,430;4,1;36149,000
81000;Volkswirtschaftliche Gesamtrechnungen des Bundes;JAHR;Jahr;2014;DINSG;Deutschland insgesamt;DG;Deutschland;VGRPB5;Preisbasis;VGRPKM;preisbereinigt, Kettenindex (2015=100);98,810;96,150;96,250;100,970;98,530;2,2;99,380
81000;Volkswirtsc

More likely, you are interested in the `pandas` `DataFrame`, which is accessible via the `.data` attribute.

In [13]:
t.data.head()

Unnamed: 0,Jahr,Deutschland insgesamt,Preisbasis,Bruttowertschoepfung__jew._ME,Guetersteuern_abzuegl._Guetersubventionen__jew._ME,_____Guetersteuern__jew._ME,_____Guetersubventionen__jew._ME,Bruttoinlandsprodukt__jew._ME,nachr.:_Bruttoinlandsprodukt_(Veraenderung_in_%)__Prozent,nachr.:_Bruttoinlandsprodukt_je_Einwohner__jew._ME
0,2014,Deutschland,in jeweiligen Preisen (Mrd. EUR),2635393,292037.0,298774.0,6737.0,2927430,41.0,36149000.0
1,2014,Deutschland,"preisbereinigt, Kettenindex (2015=100)",98810,96150.0,96250.0,100970.0,98530,22.0,99380.0
2,2014,Deutschland,"preisbereinigt, verkettete Volumenang. (Mrd. EUR)",2689628,,,,2981695,22.0,
3,2014,Deutschland,"preisbereinigt, unverkettete Volumenang.(Mrd. EUR)",2584829,,,,2873722,,
4,2015,Deutschland,in jeweiligen Preisen (Mrd. EUR),2722020,304160.0,310942.0,6782.0,3026180,34.0,37046000.0


## How `pystatis` prepares the data for you

As you can notice from a comparison between the `.raw_data` and `.data` formats, `pystatis` is doing a lot behind the scenes to provide you with a format that is hopefully the most useful to you. You will see and learn that there are a few parameters that you can use to actually change this behavior and adjust the table to your needs. 

But first we would like to explain to you how `pystatis` is preparing the data by default so you have a better understanding of the underlying process.

When we look at the header of the raw data, we can notice a few things:
- There are columns that don't have a direct use as they contain information not needed in the table, like the `Statistik_Code` and `Statistik_Label` columns at the beginning. You already know the statistic from the name of the table and this information is the same for each and every row anyway.
- There is always a time dimension, broken down into three different columns `Zeit_Code`, `Zeit_Label` and `Zeit`.
- The other dimensions are called variables (German "Merkmale") and they always come in groups of four columns: `N_Merkmal_Code`, `N_Merkmal_Label`, `N_Auspraegung_Code`, and `N_Auspraegung_Label`.
- The actual measurements or values are at the end of the table after the variables and each measurement has one column. The name of this column follows the format `<CODE>__<LABEL>__<UNIT>`, e.g. "BWS001__Bruttowertschoepfung__jew._ME". "BWS001" is the unique code for this variable, "Bruttowertschoepfung" is the human readable label of the variable, and "jew._ME" is the unit the measurement was recorded in.

**Note** This is only true for tables from Genesis and Regionalstatistik, the format of the Zensus tables is noticeably different from this, but we follow a similar approach to provide you the same convenient 

In [20]:
print("\n".join(t.raw_data.splitlines()[:2]))

Statistik_Code;Statistik_Label;Zeit_Code;Zeit_Label;Zeit;1_Merkmal_Code;1_Merkmal_Label;1_Auspraegung_Code;1_Auspraegung_Label;2_Merkmal_Code;2_Merkmal_Label;2_Auspraegung_Code;2_Auspraegung_Label;BWS001__Bruttowertschoepfung__jew._ME;STR006__Guetersteuern_abzuegl._Guetersubventionen__jew._ME;STR020_______Guetersteuern__jew._ME;SUB003_______Guetersubventionen__jew._ME;VGR014__Bruttoinlandsprodukt__jew._ME;BIP005__nachr.:_Bruttoinlandsprodukt_(Veraenderung_in_%)__Prozent;BIP004__nachr.:_Bruttoinlandsprodukt_je_Einwohner__jew._ME
81000;Volkswirtschaftliche Gesamtrechnungen des Bundes;JAHR;Jahr;2014;DINSG;Deutschland insgesamt;DG;Deutschland;VGRPB5;Preisbasis;VGRJPM;in jeweiligen Preisen (Mrd. EUR);2635,393;292,037;298,774;6,737;2927,430;4,1;36149,000


## All `get_data()` parameters explained

In [24]:
t.get_data(startyear=2020)

INFO:pystatis.http_helper:Code 0: erfolgreich


In [10]:
from pprint import pprint

pprint(t.metadata)

{'Copyright': '© Statistisches Bundesamt (Destatis), 2024',
 'Ident': {'Method': 'table', 'Service': 'metadata'},
 'Object': {'Code': '81000-0001',
            'Content': 'VGR des Bundes - Bruttowertschöpfung, '
                       'Bruttoinlandsprodukt\n'
                       '(nominal/preisbereinigt): Deutschland, Jahre',
            'Structure': {'Columns': [{'Code': 'JAHR',
                                       'Content': 'Jahr',
                                       'Selected': '10',
                                       'Structure': None,
                                       'Type': 'Merkmal',
                                       'Updated': 'see parent',
                                       'Values': '10'}],
                          'Head': {'Code': '81000',
                                   'Content': 'Volkswirtschaftliche '
                                              'Gesamtrechnungen des Bundes',
                                   'Selected': None,
          