# Summary of Entity Type and Legal Form Columns
This notebook summarizes the unique values present in the `entity type` and `legal form` columns from the merged time series checks output. The analysis is based on the final merged panel data as specified in the project workflow.

In [8]:
import polars as pl

# Define constants for file path and column names
MERGED_PANEL_PATH = "../data/data_ready/merged_panel_imputed.parquet"
ENTITY_TYPE_COL = "firm_entity_type"
LEGAL_FORM_COL = "firm_legal_form"

# Load the merged panel data lazily
panel = pl.scan_parquet(MERGED_PANEL_PATH)

# Collect unique values for entity type and legal form, handling missing columns
try:
    entity_types = panel.select(ENTITY_TYPE_COL).unique().collect().get_column(ENTITY_TYPE_COL).to_list()
except Exception as e:
    entity_types = f"Error: {e}"

try:
    legal_forms = panel.select(LEGAL_FORM_COL).unique().collect().get_column(LEGAL_FORM_COL).to_list()
except Exception as e:
    legal_forms = f"Error: {e}"

print("Unique values in 'firm_entity_type':", entity_types)
print("Unique values in 'firm_legal_form':", legal_forms)

Unique values in 'firm_entity_type': ['Podnik', 'Investiční společnost', 'Zájmová sdružení a spolky', 'Státní instituce', 'Banka', 'Fond kolektivního investování', 'Jiný subjekt', 'Vzdělávací zařízení', 'Zdravotnické zařízení', 'Pojišťovna', 'Úřady veřejné správy', 'Družstvo', 'Penzijní fond', 'Podnikatel']
Unique values in 'firm_legal_form': ['Společnost s ručením omezeným', 'Střední škola', 'Komoditní burza', 'Banka - akciová společnost', 'Veřejná obchodní společnost', 'Veřejně prospěšná instituce', 'Veřejná výzkumná instituce', 'Fyzická osoba podnikající dle živnostenského zákona zapsaná v obchodním rejstříku', 'Státní příspěvková organizace', 'Ústav', 'Fyzická osoba podnikající dle jiných zákonů než živnostenského a zákona o zemědělství', 'Evropská společnost', 'Zdravotní pojišťovna', 'Státní organizace Správa železnic', 'Komanditní společnost', 'Spolek', 'Družstvo', 'Akciová společnost', 'Pobočný spolek', 'Státní podnik', 'Obec (obecní úřad)', 'Zájmové sdružení právnických osob', 

In [9]:
# Count unique ICOs per entity type and per legal form
ICO_COL = "firm_ico"

import pandas as pd

# Unique ICOs per entity type
entity_type_counts = (
    panel.group_by(ENTITY_TYPE_COL)
    .agg(pl.col(ICO_COL).n_unique().alias("unique_ico_count"))
    .sort("unique_ico_count", descending=True)
    .collect()
)
print("Unique ICOs per entity type:")
print(entity_type_counts.to_pandas().to_string(index=False))

# Unique ICOs per legal form
legal_form_counts = (
    panel.group_by(LEGAL_FORM_COL)
    .agg(pl.col(ICO_COL).n_unique().alias("unique_ico_count"))
    .sort("unique_ico_count", descending=True)
    .collect()
)
print("\nUnique ICOs per legal form:")
print(legal_form_counts.to_pandas().to_string(index=False))

Unique ICOs per entity type:
             firm_entity_type  unique_ico_count
                       Podnik             48607
                     Družstvo               951
                 Jiný subjekt               652
                   Podnikatel               263
          Vzdělávací zařízení                54
             Státní instituce                49
                        Banka                46
                   Pojišťovna                46
    Zájmová sdružení a spolky                32
        Investiční společnost                24
         Úřady veřejné správy                24
                Penzijní fond                11
Fond kolektivního investování                 6
        Zdravotnické zařízení                 1

Unique ICOs per legal form:
                                                                          firm_legal_form  unique_ico_count
                                                            Společnost s ručením omezeným             40968
      

In [10]:
# empty entity type


**Methodology**: Sample Selection for the Non-Financial Corporate Sector
To accurately quantify the relationship between corporate profit margins and inflation, it is imperative to first define and isolate the relevant population of firms. The raw dataset, while comprehensive, includes a heterogeneous mix of entities, including state institutions, non-profits, and financial corporations, whose economic behavior and accounting standards differ fundamentally from those of non-financial private enterprises. Including these entities would introduce significant measurement error and bias into an analysis of market-driven profit behavior.

Therefore, we implement a two-stage filtering process to construct a panel representative of the Czech non-financial corporate sector (NFCS).

Stage 1: Selection based on Legal Form

The initial and broadest filter is applied based on the legal form (legal_form) of the entity. Our research objective is to analyze firms operating under standard corporate governance and for-profit motives. Consequently, we retain only entities with legal forms consistent with corporate status. Our inclusion list comprises:

Společnost s ručením omezeným (Limited Liability Company)
Akciová společnost (Joint-Stock Company)
Družstvo (Cooperative)
Komanditní společnost (Limited Partnership)
Veřejná obchodní společnost (General Partnership)
Evropská společnost (European Company)
This step effectively removes entities such as public administration bodies, state-funded organizations, associations, and unincorporated sole proprietors. This initial filter reduced the sample by 52,800 observations, confirming the presence of a significant number of non-corporate entities in the original data.

Stage 2: Exclusion of the Financial and Investment Sector based on Entity Type

Even within recognized corporate legal forms, the financial and investment sector operates under a distinct business model where traditional metrics like "operating margin" are not directly comparable to those of non-financial firms. Their profitability is driven by interest rate spreads, investment returns, and risk management, rather than the production and sale of goods and services.

To ensure the homogeneity of our sample, we perform a second filtering step based on the entity_type classification. After the legal form filter, the data still contained 2,496 observations corresponding to banks (Banka), insurance companies (Pojišťovna), pension funds (Penzijní fond), and investment entities. These are explicitly excluded from the primary analysis. We retain only entities classified as:

Podnik (Enterprise/Business)
Družstvo (Cooperative)
This refinement removes the remaining financial and investment firms, resulting in a final panel of 1,229,808 observations across 50,653 unique firms that robustly represent the Czech non-financial corporate sector. This carefully curated sample provides a methodologically sound basis for investigating the drivers of profit margins and their contribution to inflation.

