# `allClazz` Data

Building on what we learned in the previous chapter, we can quickly make a start on previewing the `allClazz` data.

Reviewing the available data feeds in the browser web tools when viewing pages on the [Dakar live results site](https://www.dakar.live.worldrallyraidchampionship.com/en/), we see that the URLs for the class data use path elements of the form `allClazz-2025-A`, `allClazz-2025-M` and so on. From inpecting the category data in the previous chapter, we know the available category codes are `A` (auto/car), `M` (moto/bike), `K` (classic), `F` (Future Mission).

So let's make a start by reviewing the data feed for the auto/car category.

In [28]:
# Load in the required pandas package
import pandas as pd

dakar_api_template = "https://www.dakar.live.worldrallyraidchampionship.com/api/{path}"

# Define the year
YEAR = 2025
# Define the category
CATEGORY = "A"

# Define the API path to the car clazz resource
# Use a Python f-string to instantiate variable values directly
clazz_path = f"allClazz-{YEAR}-{CATEGORY}"

# Define the URL
clazz_url = dakar_api_template.format(path=clazz_path)

#Preview the path and the URL
clazz_path, clazz_url

('allClazz-2025-A',
 'https://www.dakar.live.worldrallyraidchampionship.com/api/allClazz-2025-A')

Assuming that the data feed is a JSON data feed, let's try and load it into a dataframe as such:

In [29]:
# Load the data
clazz_df = pd.read_json(clazz_url)

# Preview the data
clazz_df.head()

Unnamed: 0,position,label,refueling,promotionalDisplay,categoryClazzLangs,liveDisplay,shortLabel,updatedAt,reference,_bind,_origin,_id,_key,_updatedAt,_parent,$group,_gets,categoryGroupLangs,tinyLabel,color
0,1,U,0,True,"[{'variable': 'cat.name.A_T3_U', 'text': 'T3.U...",False,cat.name.A_T3_U,2025-01-05T20:25:31+01:00,2025-A-T3-U,allClazz-2025-A,categoryClazz-2025-A-T3,18af44f476a4dc9363554ccfe1a9b9fe,_id,1736183714844,categoryGroup-2025-A:15f329900afa29e3e6b099ae6...,categoryGroup-2025-A:15f329900afa29e3e6b099ae6...,{'group': '$group'},,,
1,0,1,0,True,"[{'variable': 'cat.name.A_T3_1', 'text': 'T3.1...",False,cat.name.A_T3_1,2025-01-05T20:25:31+01:00,2025-A-T3-1,allClazz-2025-A,categoryClazz-2025-A-T3,a0a6386a4b9a61b73b036a50966345c0,_id,1736183714844,categoryGroup-2025-A:15f329900afa29e3e6b099ae6...,categoryGroup-2025-A:15f329900afa29e3e6b099ae6...,{'group': '$group'},,,
2,3,T4,0,False,"[{'variable': 'cat.name.A_T4_T4', 'locale': 'e...",False,cat.name.A_T4_T4,2025-01-05T20:25:31+01:00,2025-A-T4-T4,allClazz-2025-A,categoryClazz-2025-A-T4,058d77cc7db191813c30a902a8d5ba7c,_id,1736183714670,categoryGroup-2025-A:423ea731fdcba5cda62c83349...,categoryGroup-2025-A:423ea731fdcba5cda62c83349...,{'group': '$group'},,,
3,0,NO,0,False,"[{'text': 'T4: Modified Production SSV', 'loca...",False,cat.name.A_T4_NO,2025-01-05T20:25:31+01:00,2025-A-T4-NO,allClazz-2025-A,categoryClazz-2025-A-T4,0ec1b5373f8c1fb5ff70ea0590e16c50,_id,1736183714670,categoryGroup-2025-A:423ea731fdcba5cda62c83349...,categoryGroup-2025-A:423ea731fdcba5cda62c83349...,{'group': '$group'},,,
4,2,SSV2,0,False,"[{'variable': 'cat.name.A_T4_SSV2', 'text': 'S...",False,cat.name.A_T4_SSV2,2025-01-05T20:25:31+01:00,2025-A-T4-SSV2,allClazz-2025-A,categoryClazz-2025-A-T4,23ae09bc22535129a9af1e6b3071bc2c,_id,1736183714670,categoryGroup-2025-A:423ea731fdcba5cda62c83349...,categoryGroup-2025-A:423ea731fdcba5cda62c83349...,{'group': '$group'},,,


A quick preview of the data suggests once again we have a multiplicity of language labels. It's going to be a bit of a faff if we have to keep unpacking these into a longo form, reshaping them to a wide form, and then merging them back into the original dataframe.

But code is for nothing if not automating out repeated tasks, so let's create a way of doing that.

As with many other programming languages, Python allows you to define your own named functions or procedures. We have already seen how the `pandas` package contains a routines for loading data and working with dataframes, but doesn't seem to offer a one-liner off-the-shelf that addresses our immediate concern.

So let's fix that.

We already know what we want to do, and have identified the steps for doing it in the previous chapter. So let's wrap those steps into a single function that we can apply straightforwardly.

In [26]:
# We define a function by a using the `def` statement.
# The function signature identifies required and optional parameters.
# In this case, we require a dataframe and the name of the column
# we want to reshape. We also (optionally) identify the column
# that we want to merge against. By default, this is "shortLabel".
def mergeInLangLabels(df, col, key="shortLabel"):
    # Unpack the lists of labels into their own rows
    # to give a long dataframe.
    longLabels = pd.json_normalize(df[col].explode())

    # This is the only new bit
    # If there are no labels, we may get empty rows
    # or rows filled with null / NA values in the long datafreme.
    # So we can pre-emptively drop such rows if they appear.
    longLabels.dropna(axis="index", how="all", inplace=True)
    # If we don't drop the empty rows, we may get issues
    # in the pivot stage.

    # Reshape the long dataframe to a wide dataframe by pivoting
    # the locale to column names using text values, and using
    # the category (variable) as the row index.
    wideLabels = longLabels.pivot(
                    index='variable',
                    columns='locale',
                    values='text',
                    ).reset_index()
    
    # Merge the data back in to the original dataframe
    _df = pd.merge(df, wideLabels,
             left_on=key, right_on='variable')
    
    # Tidy up the dataframe by dropping the now redundant columns
    _df.drop("variable", axis=1, inplace=True)
    # If we pass in a column named "variable" trying to drop it
    # again will cause an error; so ignore any error...
    _df.drop(col, axis=1, inplace=True, errors="ignore")

    return _df

We can now generate out expanded, labelled data frame from a single line:

In [19]:
# Update the dataframe by using our new function to
# merge in the exploded and widenened language labels
clazz_df = mergeInLangLabels(clazz_df, "categoryClazzLangs")

# Preview the dataframe, limited to a few illustrative columns
clazz_df[["shortLabel", "en", "fr"]].head()

Unnamed: 0,shortLabel,en,fr
0,cat.name.A_T3_U,"T3.U: ""Ultimate"" Lightweight Prototype Cross-C...","T3.U: Véhicules Tout-terrain Prototype léger ""..."
1,cat.name.A_T3_1,T3.1: Lightweight Prototype Cross-Country,T3.1: Véhicules Tout-terrain Prototype léger
2,cat.name.A_T4_T4,T4: Modified Production SSV,T4 SSV de série modifié
3,cat.name.A_T4_NO,T4: Modified Production SSV,T4 SSV de série modifié
4,cat.name.A_T4_SSV2,SSV2,SSV2


To make things even more reusable, we can save our function to a file so that we can reload it in to other notebooks.

In [27]:
# The inspect package allows us to inspect Pyhton objects
import inspect

# For example, we can get the source code of our function
source_code = inspect.getsource(mergeInLangLabels)

# We also need to add in any package imports...
imports = """
import pandas as pd
"""

# And now we can write the source code to a file
with open("dakar_utils_2025.py", "w") as file:
    file.write(imports + "\n" + source_code)

# We can then use that file as simple package
# and import our function from it:
# from dakar_utils_2025 import mergeInLangLabels