In [1]:
import logging

from pystatis import Table

logging.basicConfig(level=logging.INFO)

# The `Table` Class

The `Table` class in `pystatis` is the main interface for users to interact with the different databases and download the data/tables in form of `pandas` `DataFrames`.


To use the class, you have to pass only a single parameter: the `name` of the table you want to download.


In [2]:
t = Table(name="81000-0001")

## Downloading data


However, creating a new `Table` instance does not automatically retrieve the data from the database (or cache). Instead, you have to call another method: `get_data()`. The reason for this decision was to give you full control over the download process and avoid unnecessary downloads of big tables unless you are certain you want to start the download.


In [3]:
t.get_data()

INFO:pystatis.http_helper:Database selected: genesis


hit


INFO:pystatis.http_helper:Code 0: erfolgreich


You can access the name of a table via the `.name` attribute.


In [4]:
t.name

'81000-0001'

After a successful download (or cache retrieval), you can always access the raw data, that is the original response from the web API as a string, via the `.raw_data` attribute.


In [5]:
print(t.raw_data)

statistics_code;statistics_label;time_code;time_label;time;1_variable_code;1_variable_label;1_variable_attribute_code;1_variable_attribute_label;2_variable_code;2_variable_label;2_variable_attribute_code;2_variable_attribute_label;value;value_unit;value_variable_code;value_variable_label
81000;Volkswirtschaftliche Gesamtrechnungen des Bundes;JAHR;Jahr;2020;DINSG;Deutschland insgesamt;DG;Deutschland;VGRPB5;Preisbasis (jeweilige Preise / preisbereinigt);VGRPVU;preisbereinigt, unverkettete Volumenang.(Mrd. EUR);-;jew. ME;BIP004;Bruttoinlandsprodukt je Einwohner
81000;Volkswirtschaftliche Gesamtrechnungen des Bundes;JAHR;Jahr;2020;DINSG;Deutschland insgesamt;DG;Deutschland;VGRPB5;Preisbasis (jeweilige Preise / preisbereinigt);VGRPVU;preisbereinigt, unverkettete Volumenang.(Mrd. EUR);-;jew. ME;STR020;Gütersteuern
81000;Volkswirtschaftliche Gesamtrechnungen des Bundes;JAHR;Jahr;2020;DINSG;Deutschland insgesamt;DG;Deutschland;VGRPB5;Preisbasis (jeweilige Preise / preisbereinigt);VGRPVU;preisb

More likely, you are interested in the `pandas` `DataFrame`, which is accessible via the `.data` attribute.


In [6]:
t.data.head()

Unnamed: 0,Jahr,Preisbasis (jeweilige Preise / preisbereinigt),Bruttoinlandsprodukt (Veränderung in %)__Prozent,Bruttoinlandsprodukt je Einwohner__jew. ME,Bruttoinlandsprodukt__jew. ME,Bruttowertschöpfung__jew. ME,Gütersteuern abzügl. Gütersubventionen__jew. ME,Gütersteuern__jew. ME,Gütersubventionen__jew. ME
0,2015,in jeweiligen Preisen (Mrd. EUR),3.4,37774.0,3085.65,2751.937,333.713,333.77,0.057
1,2015,"preisbereinigt, Kettenindex (2020=100)",1.7,98.93,97.18,97.41,95.36,95.29,17.96
2,2015,"preisbereinigt, verkettete Volumenang. (Mrd. EUR)",,,3352.341,3018.733,,,
3,2015,"preisbereinigt, unverkettete Volumenang.(Mrd. EUR)",,,3034.516,2700.857,,,
4,2016,in jeweiligen Preisen (Mrd. EUR),3.6,38812.0,3196.11,2853.046,343.064,343.124,0.06


Finally, you can also access the metadata for this table via the `.metadata` attribute.


In [7]:
from pprint import pprint

pprint(t.metadata)

{'Copyright': '© Statistisches Bundesamt (Destatis), 2025',
 'Ident': {'Method': 'table', 'Service': 'metadata'},
 'Object': {'Code': '81000-0001',
            'Content': 'VGR des Bundes - Bruttowertschöpfung, '
                       'Bruttoinlandsprodukt\n'
                       '(nominal/preisbereinigt): Deutschland, Jahre',
            'Structure': {'Columns': [{'Code': 'JAHR',
                                       'Content': 'Jahr',
                                       'Functions': None,
                                       'Selected': '10',
                                       'Structure': None,
                                       'Type': 'Merkmal',
                                       'Updated': 'see parent',
                                       'Values': '10'}],
                          'Head': {'Code': '81000',
                                   'Content': 'Volkswirtschaftliche '
                                              'Gesamtrechnungen des Bundes',
     

## How `pystatis` prepares the data for you


As you can notice from a comparison between the `.raw_data` and `.data` formats, `pystatis` is doing a lot behind the scenes to provide you with a format that is hopefully the most useful for you. You will see and learn that there are a few parameters that you can use to actually change this behavior and adjust the table to your needs.

But first we would like to explain to you how `pystatis` is preparing the data by default so you have a better understanding of the underlying process.


When we look at the header of the raw data, we can notice a few things:

- Many columns always come in a pair of `*_Code` and `*_Label` columns. Both contain the same information, only provided differently.
- There are columns that don't have a direct use as they contain information not needed in the table, like the `Statistik_Code` and `Statistik_Label` columns at the beginning. You already know the statistic from the name of the table and this information is the same for each and every row anyway.
- There is always a time dimension, broken down into three different columns `Zeit_Code`, `Zeit_Label` and `Zeit` (or `time_*` in English).
- The other dimensions are called variables (German "Merkmale") and they always come in groups of four columns: `N_Merkmal_Code`, `N_Merkmal_Label`, `N_Auspraegung_Code`, and `N_Auspraegung_Label` (English: variable code and label and variable value code and label).
- The actual measurements or values are at the end of the table after the variables and each measurement has one column. The name of this column follows the format `<CODE>__<LABEL>__<UNIT>`, e.g. "BWS001**Bruttowertschoepfung**jew.\_ME". "BWS001" is the unique code for this variable, "Bruttowertschoepfung" is the human readable label of the variable, and "jew.\_ME" is the unit the measurement was recorded in.

**Note** This is only true for tables from Genesis and Regionalstatistik, the format of the Zensus tables is noticeably different from this. However, we follow a similar approach to provide you the same convenient output format.


The following table hopefully makes it a little bit clearer what is happening when going from the raw data string to the pandas `DataFrame`. The example is showing the Table "11111-02-01-4" from Regionalstatistik, but remember, that Genesis and Regionalstatistik have identically formats. The table has a time dimension, one attribute and one value.

| Statistik_Code | Statistik_Label                 | Zeit_Code | Zeit_Label | Zeit       | 1_Merkmal_Code | 1_Merkmal_Label              | 1_Auspraegung_Code | 1_Auspraegung_Label | GEM001**Zahl_der_Gemeinden**Anzahl |
| -------------- | ------------------------------- | --------- | ---------- | ---------- | -------------- | ---------------------------- | ------------------ | ------------------- | ---------------------------------- |
| 11111          | Feststellung des Gebietsstandes | STAG      | Stichtag   | 31.12.2022 | KREISE         | Kreise und kreisfreie Städte | DG                 | Deutschland         | 10786                              |
| 11111          | Feststellung des Gebietsstandes | STAG      | Stichtag   | 31.12.2022 | KREISE         | Kreise und kreisfreie Städte | 01                 | Schleswig-Holstein  | 1106                               |


The same table has the following pandas representation after being "prettified" by `pystatis`:


In [8]:
t = Table("11111-02-01-4")
t.get_data()
t.data.head(2)

INFO:pystatis.http_helper:Database selected: regio


hit


INFO:pystatis.http_helper:Code 0: erfolgreich


Unnamed: 0,Stichtag,Amtlicher Gemeindeschlüssel (AGS)__Code,Amtlicher Gemeindeschlüssel (AGS),Zahl der Gemeinden__Anzahl
0,2023-12-31,DG,Deutschland,10775.0
1,2023-12-31,01,Schleswig-Holstein,1104.0


As you can see and hopefully agree, the pandas version (what we call "prettified") provides the same information, actually even more, because the header column names have become meaningful and there is a lot less noise that you need to filter out before you can get to the actual data.


For Zensus `pystatis` is basically doing the same, but in a slightly different way because since the release of Zensus 2022 the API no longer returns each measurement as a single column but only a single column for all values. `pystatis` is transforming this long data format back into a wide data format, so you can work with a tidy data set. See the following example of Table "4000W-1002" to understand what is going on.

| statistics_code | statistics_label                    | time_code | time_label | time       | 1_variable_code | 1_variable_label | 1_variable_attribute_code | 1_variable_attribute_label | 2_variable_code | 2_variable_label                      | 2_variable_attribute_code | 2_variable_attribute_label | value   | value_unit | value_variable_code | value_variable_label               |
| --------------- | ----------------------------------- | --------- | ---------- | ---------- | --------------- | ---------------- | ------------------------- | -------------------------- | --------------- | ------------------------------------- | ------------------------- | -------------------------- | ------- | ---------- | ------------------- | ---------------------------------- |
| 4000W           | Wohnungen (Gebietsstand 15.05.2022) | STAG      | Stichtag   | 2022-05-15 | GEODL1          | Deutschland      | DG                        | Deutschland                | WHGFL2          | Fläche der Wohnung (10 m²-Intervalle) | WFL170B179                | 170 - 179 m²               | 1,2     | %          | WHG002              | Wohnungen in Gebäuden mit Wohnraum |
| 4000W           | Wohnungen (Gebietsstand 15.05.2022) | STAG      | Stichtag   | 2022-05-15 | GEODL1          | Deutschland      | DG                        | Deutschland                | WHGFL2          | Fläche der Wohnung (10 m²-Intervalle) | WFL170B179                | 170 - 179 m²               | 509041  | Anzahl     | WHG002              | Wohnungen in Gebäuden mit Wohnraum |
| 4000W           | Wohnungen (Gebietsstand 15.05.2022) | STAG      | Stichtag   | 2022-05-15 | GEODL1          | Deutschland      | DG                        | Deutschland                | WHGFL2          | Fläche der Wohnung (10 m²-Intervalle) | WFL090B099                | 90 - 99 m²                 | 7,2     | %          | WHG002              | Wohnungen in Gebäuden mit Wohnraum |
| 4000W           | Wohnungen (Gebietsstand 15.05.2022) | STAG      | Stichtag   | 2022-05-15 | GEODL1          | Deutschland      | DG                        | Deutschland                | WHGFL2          | Fläche der Wohnung (10 m²-Intervalle) | WFL090B099                | 90 - 99 m²                 | 3082890 | Anzahl     | WHG002              | Wohnungen in Gebäuden mit Wohnraum |


In [9]:
t = Table("4000W-1002")
t.get_data()
t.data.head(2)

INFO:pystatis.cache:Data was successfully cached under /Users/miay/.pystatis/data/4000W-1002/3d1d8e69a7d4397b08b0/20250412.zip.
INFO:pystatis.http_helper:Database selected: zensus
INFO:pystatis.http_helper:Code 0: erfolgreich


Unnamed: 0,Stichtag,Fläche der Wohnung (10 m²-Intervalle),Wohnungen in Gebäuden mit Wohnraum__%,Wohnungen in Gebäuden mit Wohnraum__Anzahl
0,2022-05-15,Insgesamt,100.0,43106589.0
1,2022-05-15,Unter 30 m²,2.1,909120.0


As you can see, `pystatis` is not only increasing readability and making data access easy, it also reduces the amount of data you have to work with. Going from a long format back to a tidy wide format means cutting the number of rows to 1/3 because all three measurements get back their own column.


`pystatis` is doing the following things (by default) when parsing the original raw string:

- remove the information about the statistic
- for all variables: only keep the value column and choose the variable label as the column name
- for all measurements: remove the variable code from the column name, only keep label and unit
- set the proper data types (`datetime` for the time variable, if appropriate; `str` for regional codes)
- handling missing values (i.e. replacing characters "...", ".", "-", "/" and "x" by proper `NaN` values) and special characters
- choosing the right decimal character depending on the specified language (German: ",", English: ".")


All of this happens behind the scenes when you are downloading the data with `get_data()` and access it via the `Table.data` attribute.


## All `get_data()` parameters explained


You can find a list of all parameters in the [documentation](https://correlaid.github.io/pystatis/dev/pystatis.html#pystatis.table.Table.get_data) or in the docstring. All parameters are keyword parameters only (fancy Python star syntax: `f(*, everything from here on has to be a keyword only parameter)`).


In [10]:
?t.get_data

[31mSignature:[39m
t.get_data(
    *,
    prettify: bool = [38;5;28;01mTrue[39;00m,
    area: str = [33m'all'[39m,
    startyear: str = [33m''[39m,
    endyear: str = [33m''[39m,
    timeslices: str = [33m''[39m,
    regionalvariable: str = [33m''[39m,
    regionalkey: str = [33m''[39m,
    stand: str = [33m''[39m,
    language: str = [33m'de'[39m,
    quality: str = [33m'off'[39m,
) -> [38;5;28;01mNone[39;00m
[31mDocstring:[39m
Downloads raw data and metadata from GENESIS-Online.

Additional keyword arguments are passed on to the GENESIS-Online GET request for tablefile.

Args:
    prettify (bool, optional): Reformats the table into a readable format. Defaults to True.
    area (str, optional): Area to search for the object in GENESIS-Online. Defaults to "all".
    startyear (str, optional): Data beginning with that year will be returned.
        Parameter is cumulative to `timeslices`. Supports 4 digits (jjjj) or 4+2 digits (jjjj/jj).
        Accepts values 

### `prettify`


`prettify` is a boolean and can only be `True` or `False`. The default is `True` because `prettify` is basically doing all the above mentioned work behind the scenes to transform the raw data into the nicer tidy version. However, as we don't know what specific requirements you have, it can always be the case that we are not doing what you want to do or we are doing it in a wrong way. Instead of starting from scratch with the raw string, `prettify=False` will still give you a pandas `DataFrame` but without the transformations described in the previous sections. Basically, `prettify=False` gives you the raw data as a pandas `DataFrame` instead of a string without any transformation from our side.


In [11]:
t = Table("1000A-0000")
t.get_data(prettify=False)
t.data.head(3)

INFO:pystatis.http_helper:Database selected: zensus


hit


INFO:pystatis.http_helper:Code 0: erfolgreich


Unnamed: 0,statistics_code,statistics_label,time_code,time_label,time,1_variable_code,1_variable_label,1_variable_attribute_code,1_variable_attribute_label,value,value_unit,value_variable_code,value_variable_label
0,1000A,Bevölkerung kompakt (Gebietsstand 15.05.2022),STAG,Stichtag,2022-05-15,GEODL1,Deutschland,DG,Deutschland,82719540,Anzahl,PRS001,Personen
1,1000A,Bevölkerung kompakt (Gebietsstand 15.05.2022),STAG,Stichtag,2022-05-15,GEOGM4,Gemeinden (Gebietsstand 15.05.2022),092760130130,Lindberg,2294,Anzahl,PRS018,Personen
2,1000A,Bevölkerung kompakt (Gebietsstand 15.05.2022),STAG,Stichtag,2022-05-15,GEOGM4,Gemeinden (Gebietsstand 15.05.2022),073355011022,"Landstuhl, Sickingenstadt, Stadt",8305,Anzahl,PRS018,Personen


In [None]:
# don't be confused by the query, we have to query by ARS in this example because prettify=True sorts the data by ARS and the order is different from above
t = Table("1000A-0000")
t.get_data(prettify=True)
t.data[
    t.data["Amtlicher Regionalschlüssel (ARS)__Code"].isin(
        ["092760130130", "073355011022", "DG"]
    )
]

INFO:pystatis.http_helper:Database selected: zensus


hit


INFO:pystatis.http_helper:Code 0: erfolgreich


Unnamed: 0,Stichtag,Amtlicher Regionalschlüssel (ARS)__Code,Amtlicher Regionalschlüssel (ARS),Personen__Anzahl
0,2022-05-15,DG,Deutschland,82719540
4816,2022-05-15,073355011022,"Landstuhl, Sickingenstadt, Stadt",8305
6934,2022-05-15,092760130130,Lindberg,2294


In [19]:
t.data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10787 entries, 0 to 10786
Data columns (total 4 columns):
 #   Column                                   Non-Null Count  Dtype         
---  ------                                   --------------  -----         
 0   Stichtag                                 10787 non-null  datetime64[ns]
 1   Amtlicher Regionalschlüssel (ARS)__Code  10787 non-null  object        
 2   Amtlicher Regionalschlüssel (ARS)        10787 non-null  object        
 3   Personen__Anzahl                         10787 non-null  int64         
dtypes: datetime64[ns](1), int64(1), object(2)
memory usage: 337.2+ KB


### `area`


We don't have a good explanation for this one, so if you have a concrete use case, please let us know!

Here is the description from the official [documentation](https://www-genesis.destatis.de/genesis/misc/GENESIS-Webservices_Einfuehrung.pdf):

The area query parameter specifies the area in which the object is stored, which is analogous to online navigation. Here is the breakdown:

For internal users:

- Meine/Benutzer
- Gruppe
- Amt
- Katalog/Öffentlich
- Alle

For external users:

- Meine/Benutzer
- Katalog/Öffentlich

This parameter corresponds to:

- Bereich=Benutzer as Bereich=Meine
- Bereich=Öffentlich as Bereich=Katalog


### `startyear`, `endyear` and `timeslices`


All three parameters can be used to fetch data of a certain time range for the given Table. The default is Table specific and has to be checked for each Table, often it is just the latest period of time available.

The important thing here is that `timeslices` is **cumulative** to the other two options, meaning that `timeslices=N` will give you N years after `startyear` or before `endyear`.


Let's say you are interested in school-leaving qualifications over the years in Germany. Then Table [21111-0004](https://www-genesis.destatis.de/genesis//online?operation=table&code=21111-0004) might be of interest to you. The description of the table mentions that data is available for the years 1997/98 - 2021/22. But what will the API return if you specify no time parameter?


In [20]:
t = Table("21111-0004")
t.get_data()
t.data["Schuljahr"].unique()

INFO:pystatis.cache:Data was successfully cached under /Users/miay/.pystatis/data/21111-0004/7f787b175d83ae25ee55/20250412.zip.
INFO:pystatis.http_helper:Database selected: genesis
INFO:pystatis.http_helper:Code 0: erfolgreich


array(['2021-P1Y', '2022-P1Y'], dtype=object)

As you can see, `pystatis` only returns you, for whatever reason, the years 2020/21 and 2021/22. How can you get the ten latest years? Let's see:


In [21]:
t.get_data(timeslices=10)
t.data["Schuljahr"].unique()

INFO:pystatis.cache:Data was successfully cached under /Users/miay/.pystatis/data/21111-0004/188b2885044566b1329d/20250412.zip.
INFO:pystatis.http_helper:Database selected: genesis
INFO:pystatis.http_helper:Code 0: erfolgreich


array(['2013-P1Y', '2014-P1Y', '2015-P1Y', '2016-P1Y', '2017-P1Y',
       '2018-P1Y', '2019-P1Y', '2020-P1Y', '2021-P1Y', '2022-P1Y'],
      dtype=object)

In [22]:
t.get_data(startyear="2012")
t.data["Schuljahr"].unique()

INFO:pystatis.cache:Data was successfully cached under /Users/miay/.pystatis/data/21111-0004/2cd6fefd6a6c423b70bd/20250412.zip.
INFO:pystatis.http_helper:Database selected: genesis
INFO:pystatis.http_helper:Code 0: erfolgreich


array(['2012-P1Y', '2013-P1Y', '2014-P1Y', '2015-P1Y', '2016-P1Y',
       '2017-P1Y', '2018-P1Y', '2019-P1Y', '2020-P1Y', '2021-P1Y',
       '2022-P1Y'], dtype=object)

If you are only interested in a time period somewhere in between, you need to use both `startyear` and `endyear`:


In [23]:
t.get_data(startyear="2012", endyear="2015")
t.data["Schuljahr"].unique()

INFO:pystatis.cache:Data was successfully cached under /Users/miay/.pystatis/data/21111-0004/23e95897ca6a80fd2617/20250412.zip.
INFO:pystatis.http_helper:Database selected: genesis
INFO:pystatis.http_helper:Code 0: erfolgreich


array(['2012-P1Y', '2013-P1Y', '2014-P1Y', '2015-P1Y'], dtype=object)

You might expect that using `startyear` and `timeslices` might give the same result, but it turns out that this is not the case and quite misleading. In fact, `timeslices` is always coming on top of whatever you have selected with `startyear` and `endyear`. Is that confusing? We definitely think so!


In [None]:
t.get_data(
    startyear="2012", endyear="2015", timeslices=3
)  # gives everything between 2012 and 2015 three more years
t.data["Schuljahr"].unique()

INFO:pystatis.cache:Data was successfully cached under /Users/miay/.pystatis/data/21111-0004/3286f8f1af46f895cd26/20250412.zip.
INFO:pystatis.http_helper:Database selected: genesis
INFO:pystatis.http_helper:Code 0: erfolgreich


array(['2012-P1Y', '2013-P1Y', '2014-P1Y', '2015-P1Y', '2020-P1Y',
       '2021-P1Y', '2022-P1Y'], dtype=object)

In [None]:
t.get_data(
    endyear="2015", timeslices=3
)  # gives everything up to 2015 and three more years
t.data["Schuljahr"].unique()

INFO:pystatis.cache:Data was successfully cached under /Users/miay/.pystatis/data/21111-0004/6e1a814d2cac7d1233f1/20250412.zip.
INFO:pystatis.http_helper:Database selected: genesis
INFO:pystatis.http_helper:Code 0: erfolgreich


array(['1997-P1Y', '1998-P1Y', '1999-P1Y', '2000-P1Y', '2001-P1Y',
       '2002-P1Y', '2003-P1Y', '2004-P1Y', '2005-P1Y', '2006-P1Y',
       '2007-P1Y', '2008-P1Y', '2009-P1Y', '2010-P1Y', '2011-P1Y',
       '2012-P1Y', '2013-P1Y', '2014-P1Y', '2015-P1Y', '2020-P1Y',
       '2021-P1Y', '2022-P1Y'], dtype=object)

### `regionalvariable` and `regionalkey`


Tables that end with a "B" in Regionalstatistik are special: They allow to change the regional depth of the data, meaning that you can fetch data for different regional areas depending on these two variables. The same is true for all Zensus tables.

To select a specific region area, you can either specify `regionalvariable` and pass one of the reserved codes for this geo variable, or you can directly select a specific region via its key. Let's see some examples, so let's analyze Table [12613-01-01-5-B](https://www.regionalstatistik.de/genesis//online?operation=table&code=12613-01-01-5-B):


In [26]:
t = Table("12613-01-01-5-B")
t.get_data()
t.data.head(5)

INFO:pystatis.cache:Data was successfully cached under /Users/miay/.pystatis/data/12613-01-01-5-B/f2058cbcdbbce74f2ba4/20250412.zip.
INFO:pystatis.http_helper:Database selected: regio
INFO:pystatis.http_helper:Code 0: erfolgreich


Unnamed: 0,Jahr,Amtlicher Gemeindeschlüssel (AGS)__Code,Amtlicher Gemeindeschlüssel (AGS),Geschlecht,Gestorbene__Anzahl
0,2023,1001000,"Flensburg, kreisfreie Stadt",Insgesamt,
1,2023,1001000,"Flensburg, kreisfreie Stadt",männlich,
2,2023,1001000,"Flensburg, kreisfreie Stadt",weiblich,
3,2023,1002000,"Kiel, kreisfreie Stadt, Landeshauptstadt",Insgesamt,
4,2023,1002000,"Kiel, kreisfreie Stadt, Landeshauptstadt",männlich,


Instead of fetching the data for all municipalities, we can choose a different regional depth (see the codes [here](https://correlaid.github.io/pystatis/dev/pystatis.html#module-pystatis.table)), for example "KREISE", one level above "GEMEINDE", which is the default for this table.


In [27]:
t = Table("12613-01-01-5-B")
t.get_data(regionalvariable="KREISE")
t.data.head(5)

INFO:pystatis.cache:Data was successfully cached under /Users/miay/.pystatis/data/12613-01-01-5-B/2827c2eaae48cfa74268/20250412.zip.
INFO:pystatis.http_helper:Database selected: regio
INFO:pystatis.http_helper:Code 0: erfolgreich


Unnamed: 0,Jahr,Amtlicher Gemeindeschlüssel (AGS)__Code,Amtlicher Gemeindeschlüssel (AGS),Geschlecht,Gestorbene__Anzahl
0,2023,1001,"Flensburg, kreisfreie Stadt",Insgesamt,1165.0
1,2023,1001,"Flensburg, kreisfreie Stadt",männlich,609.0
2,2023,1001,"Flensburg, kreisfreie Stadt",weiblich,556.0
3,2023,1002,"Kiel, kreisfreie Stadt",Insgesamt,2758.0
4,2023,1002,"Kiel, kreisfreie Stadt",männlich,1364.0


`regionalkey` can be used to fetch only certain areas, see <https://datengui.de/statistik-erklaert/ags>. We now fetch only municipalities in Baden-Württemberg:


In [28]:
t = Table("12613-01-01-5-B")
t.get_data(regionalkey="08*")
t.data.head(5)

INFO:pystatis.cache:Data was successfully cached under /Users/miay/.pystatis/data/12613-01-01-5-B/a9ac273e2f2278e1df8b/20250412.zip.
INFO:pystatis.http_helper:Database selected: regio
INFO:pystatis.http_helper:Code 0: erfolgreich


Unnamed: 0,Jahr,Amtlicher Gemeindeschlüssel (AGS)__Code,Amtlicher Gemeindeschlüssel (AGS),Geschlecht,Gestorbene__Anzahl
0,2023,8111000,"Stuttgart, Landeshauptstadt",Insgesamt,
1,2023,8111000,"Stuttgart, Landeshauptstadt",männlich,
2,2023,8111000,"Stuttgart, Landeshauptstadt",weiblich,
3,2023,8115001,Aidlingen,Insgesamt,
4,2023,8115001,Aidlingen,männlich,


### `stand`


Can be used to only download tables that have a version newer than the given date.


In [29]:
t = Table("21111-0004")
t.get_data()
t.data.head(5)

INFO:pystatis.http_helper:Database selected: genesis


hit


INFO:pystatis.http_helper:Code 0: erfolgreich


Unnamed: 0,Schuljahr,Geschlecht,Schulart (mit Abschlussmöglichkeit),Schulabschlüsse,Absolventen und Abgänger__Anzahl
0,2021-P1Y,Insgesamt,Insgesamt,Insgesamt,769411.0
1,2021-P1Y,Insgesamt,Insgesamt,Ohne Hauptschulabschluss,52262.0
2,2021-P1Y,Insgesamt,Insgesamt,Hauptschulabschluss,125224.0
3,2021-P1Y,Insgesamt,Insgesamt,Mittlerer Schulabschluss,331806.0
4,2021-P1Y,Insgesamt,Insgesamt,Fachhochschulreife,770.0


In [30]:
t.metadata["Object"]["Updated"]

'24.11.2023 14:49:25h'

In [31]:
t.get_data(stand="01.01.2023")  # before updated date, so should return data
t.data.head()

INFO:pystatis.cache:Data was successfully cached under /Users/miay/.pystatis/data/21111-0004/6d26e280578c1022c09e/20250412.zip.
INFO:pystatis.http_helper:Database selected: genesis
INFO:pystatis.http_helper:Code 0: erfolgreich


Unnamed: 0,Schuljahr,Geschlecht,Schulart (mit Abschlussmöglichkeit),Schulabschlüsse,Absolventen und Abgänger__Anzahl
0,2021-P1Y,Insgesamt,Insgesamt,Insgesamt,769411.0
1,2021-P1Y,Insgesamt,Insgesamt,Ohne Hauptschulabschluss,52262.0
2,2021-P1Y,Insgesamt,Insgesamt,Hauptschulabschluss,125224.0
3,2021-P1Y,Insgesamt,Insgesamt,Mittlerer Schulabschluss,331806.0
4,2021-P1Y,Insgesamt,Insgesamt,Fachhochschulreife,770.0


In [32]:
t.get_data(stand="01.12.2024")  # after updated date, so error
t.data.head()

NoNewerDataError: Keine aktualisierten Daten vorhanden. (Mindestens ein Parameter enthält ungültige Werte. Er wurde angepasst, um den Service starten zu können.: stand)

### `language`


`language` can either be "de" or "en, with "de" being the default, obviously. Regionalstatistik is not supporting "en" and will not translate any data, Genesis and Zensus have some support for English, but you have to check for yourself, if the data is translated and to what extend.


In [33]:
t = Table("81000-0001")
t.get_data()
t.data.head(1)

INFO:pystatis.http_helper:Database selected: genesis


hit


INFO:pystatis.http_helper:Code 0: erfolgreich


Unnamed: 0,Jahr,Preisbasis (jeweilige Preise / preisbereinigt),Bruttoinlandsprodukt (Veränderung in %)__Prozent,Bruttoinlandsprodukt je Einwohner__jew. ME,Bruttoinlandsprodukt__jew. ME,Bruttowertschöpfung__jew. ME,Gütersteuern abzügl. Gütersubventionen__jew. ME,Gütersteuern__jew. ME,Gütersubventionen__jew. ME
0,2015,in jeweiligen Preisen (Mrd. EUR),3.4,37774.0,3085.65,2751.937,333.713,333.77,0.057


In [34]:
t = Table("81000-0001")
t.get_data(language="en")
t.data.head(1)

INFO:pystatis.cache:Data was successfully cached under /Users/miay/.pystatis/data/81000-0001/e70f84061d5d48f4285a/20250412.zip.
INFO:pystatis.http_helper:Database selected: genesis
INFO:pystatis.http_helper:Code 0: successfull


Unnamed: 0,Year,Price base (current prices / price-adjusted),Gross domestic product (change in %)__percent,Gross domestic product per inhabitant__unit app.,Gross domestic product__unit app.,Gross value added__unit app.,Subsidies on products__unit app.,Taxes on products less subsidies__unit app.,Taxes on products__unit app.
0,2015,At current prices (bn EUR),3.4,37774.0,3085.65,2751.937,0.057,333.713,333.77


### `quality`


`quality` can be either "on" or "off", with "off" being the default. When switching to "on", the downloaded table has additional quality columns "\_\_q" for each value column with quality symbols. Check [Explanation of symbols](https://www-genesis.destatis.de/genesis/online?operation=ergebnistabelleQualitaet&language=en&levelindex=3&levelid=1719342760835#abreadcrumb.) Not supported for all tables or databases.


In [35]:
t = Table("52111-0001")
t.get_data(quality="on")
t.data.head(1)

INFO:pystatis.http_helper:Database selected: genesis


hit


INFO:pystatis.http_helper:Code 0: erfolgreich


Unnamed: 0,Jahr,Beschäftigtengrößenklassen,WZ2008 (Abschnitte): URS,Unternehmen (EU)__Anzahl,Unternehmen (EU)__Anzahl__q
0,2022,0 bis unter 10 abhängig Beschäftigte,Bergbau und Gewinnung von Steinen und Erden,1051,e


In [36]:
t = Table("12211-Z-11")
t.get_data(quality="on")  # not supported, ignored, but also no warning
t.data.head(1)

INFO:pystatis.http_helper:Database selected: regio


hit


INFO:pystatis.http_helper:Code 0: erfolgreich


Unnamed: 0,Jahr,Amtlicher Gemeindeschlüssel (AGS)__Code,Amtlicher Gemeindeschlüssel (AGS),Art der Lebensform,Lebensformen__1000,Lebensformen__1000__q
0,2019,DG,Deutschland,Alleinstehende,18653.0,e


In [37]:
t = Table("1000A-0000")
t.get_data(quality="on")
t.data.head(1)

INFO:pystatis.cache:Data was successfully cached under /Users/miay/.pystatis/data/1000A-0000/ec488574ddc014f6340b/20250412.zip.
INFO:pystatis.http_helper:Database selected: zensus
INFO:pystatis.http_helper:Code 0: erfolgreich


Unnamed: 0,Stichtag,Amtlicher Regionalschlüssel (ARS)__Code,Amtlicher Regionalschlüssel (ARS),Personen__Anzahl,Personen__Anzahl__q
0,2022-05-15,DG,Deutschland,82719540,e
