# Analyse variation de prix campagne PMGF


**PMGF**

- SIRET AOM : 20007537200017
- Plus fortement incité sur les trajets dans l’AOM (4e
- DATE : 22/04/2025 mise en place de la variation (baisse du montant de la prise en charge passager ou augmentation du reste à charge passager) seulement pour les trajets externes (entrant ou sortant)

Avant : 3 € incitation / 1€ pour le passager
Désormais (Après le 22/04/2025) : c'est l'inverse, 1 € incitation / 3 € pour le passager

**détail du AVANT (01/01/2025 au 22/04/2025)** :

Départ et Arrivée au sein du PMGF :

- De 5 à 20 km : 1,50 € pour le conducteur par trajet ;
- De 20 à 40 km : 1,50 € pour le conducteur par trajet + 0,125 € par passager par kilomètre supplémentaire ;
- Au delà de 40 km : 4,00 € pour le conducteur par trajet et par passager.

Départ ou Arrivée au sein du PMGF :

- De 5 à 20 km : 0,50 € pour le conducteur par trajet ;
- De 20 à 40 km : 0,50 € pour le conducteur par trajet + 0,125 € par passager par kilomètre supplémentaire ;
- Au delà de 40 km : 3,00 € pour le conducteur par trajet et par passager.

**Détail de l’incitation Après 22/04/2025**:

Départ et Arrivée au sein du PMGF (IDEM qu’avant)

- De 5 à 20 km : 1,50 € par passager transporté ;
- De 20 à 30 km : 1,50 € par passager transporté + 0,15 € par passager par kilomètre supplémentaire soit 3,00 € pour 30 km ;
- De 30 à 40 km : 1,50 € par passager transporté + 0,10 € par passager par kilomètre supplémentaire soit 4,00 € pour 40 km ;
- Au delà de 40 km : 4,00 € par passager transporté.

Départ ou Arrivée au sein du PMGF :

- De 5 à 21 km : 0,50 € par passager transporté ;
- Au delà de 21 km : 1,00 € par passager transporté.
- COMMUNICATION : campagne de communication via l’application BBC au moment de la variation


# Dépendances


In [None]:
import json
import math
import os
from datetime import datetime, timedelta
from itertools import product
from zoneinfo import ZoneInfo

import branca.colormap as bcm
import folium
import geopandas as gpd
import matplotlib.cm as cm
import matplotlib.colors as mcolors
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.graph_objects as go
import polars as pl
import polars_h3 as plh3
import shapely
from sqlalchemy import create_engine

# Configuration


In [None]:
DB_URL = os.environ["DB_URL"]

In [None]:
AOM_CODE = "247400690"
AOM_SIRET = "20007537200017"
CAMPAIGN_CHANGE_DATE = datetime(2025, 4, 22, tzinfo=ZoneInfo("GMT"))
CAMPAIGN_CONFIGS = {
    "Période 1": {
        "dates": (datetime(2025, 1, 1), datetime(2025, 4, 21)),
        "distances_cat": {
            "full_inside_trip": {
                (0, 4): {
                    "incentive_amount_per_trip": 0,
                    "incentive_amount_per_passenger_per_km": 0,
                },
                (5, 19): {
                    "incentive_amount_per_trip": 1.5,
                    "incentive_amount_per_passenger_per_km": 0,
                },
                (20, 39): {
                    "incentive_amount_per_trip": 1.5,
                    "incentive_amount_per_passenger_per_km": 0.125,
                },
                (40, math.inf): {
                    "incentive_amount_per_trip": 4,
                    "incentive_amount_per_passenger_per_km": 0,
                },
            },
            "semi_inside_trip": {
                (0, 4): {
                    "incentive_amount_per_trip": 0,
                    "incentive_amount_per_passenger_per_km": 0,
                },
                (5, 19): {
                    "incentive_amount_per_trip": 0.5,
                    "incentive_amount_per_passenger_per_km": 0,
                },
                (20, 39): {
                    "incentive_amount_per_trip": 0.5,
                    "incentive_amount_per_passenger_per_km": 0.125,
                },
                (40, math.inf): {
                    "incentive_amount_per_trip": 3,
                    "incentive_amount_per_passenger_per_km": 0,
                },
            },
        },
    },
    "Période 2": {
        "dates": (datetime(2025, 4, 22), datetime.now()),
        "distances_cat": {
            "full_inside_trip": {
                (0, 4): {
                    "incentive_amount_per_passenger": 0,
                    "incentive_amount_per_passenger_per_km": 0,
                },
                (5, 19): {
                    "incentive_amount_per_passenger": 1.5,
                    "incentive_amount_per_passenger_per_km": 0,
                },
                (20, 29): {
                    "incentive_amount_per_passenger": 1.5,
                    "incentive_amount_per_passenger_per_km": 0.15,
                },
                (30, 39): {
                    "incentive_amount_per_passenger": 1.5,
                    "incentive_amount_per_passenger_per_km": 0.10,
                },
                (40, math.inf): {
                    "incentive_amount_per_passenger": 4,
                    "incentive_amount_per_passenger_per_km": 0,
                },
            },
            "semi_inside_trip": {
                (0, 4): {
                    "incentive_amount_per_passenger": 0,
                    "incentive_amount_per_passenger": 0,
                },
                (5, 21): {
                    "incentive_amount_per_passenger": 0.5,
                    "incentive_amount_per_passenger_per_km": 0,
                },
                (22, math.inf): {
                    "incentive_amount_per_passenger": 1,
                    "incentive_amount_per_passenger_per_km": 0,
                },
            },
        },
    },
}

In [None]:
## For plotting purposes
labels_map = {
    "month": "Mois",
    "num_journeys": "Nombre de journeys",
    "num_journeys_with_incentive": "Nombre de journey avec incitation",
    "operator": "Opérateur",
    "incentive_amount_avg": "Incitation moyenne",
    "driver_revenue_avg": "Revenu moyen conducteur",
    "passenger_contribution_avg": "Contribution moyenne passager",
    "incentive_amount_intra_avg": "Incitation moyenne intra",
    "driver_revenue_intra_avg": "Revenu moyen conducteur intra",
    "passenger_contribution_intra_avg": "Contribution moyenne passager intra",
    "incentive_amount_inter_avg": "Incitation moyenne inter",
    "driver_revenue_inter_avg": "Revenu moyen conducteur inter",
    "passenger_contribution_inter_avg": "Contribution moyenne passager inter",
    "incentive_amount_per_km_avg": "Montant moyen d'incitation par km",
    "passenger_contribution_per_km_avg": "Contribution moyenne passager par km",
    "driver_revenue_per_km_avg": "Revenu moyen conducteur par km",
    "week": "Semaine",
    "month": "Mois",
    "distance_avg": "Distance moyenne",
    "campaign_type": "Campagne",
    "distance": "Distance",
    "num_journeys_with_aom_incentive": "Nombre de journeys incitées par l'AOM",
    "num_journeys_with_operator_incentive": "Nombre de journeys incitées par un opérateur",
    "num_journeys_intra_territory": "Nombre de journeys intra-territoire",
    "num_journeys_inter_territory": "Nombre de journeys inter-territoires",
    "share_journeys_intra_territory": "% de journeys intra-territoire",
    "share_journeys_inter_territory": "% de journeys inter-territoires",
    "share_drivers": "% des conducteurs",
    "num_trips": "Nombre de trips",
    "is_intra_driver": "Conducteur intra",
    "driver_campaign_type": "Type de campagne du conducteur",
    "passenger_campaign_type": "Type de campagne du passager",
    "drivers_share": "% des conducteurs",
    "week_number": "Semaine n°",
    "passengers_share": "% des passagers",
}

In [None]:
colors_map = {
    "Période 0": "grey",
    "Période 1": "#9fc2b2",
    "Période 2": "#ebd999",
}

In [None]:
campaign_order = {
    "Période 0": 0,
    "Période 1": 1,
    "Période 2": 2,
}

In [None]:
def add_campaign_annotations(
    fig: go.Figure, label_position: str = "inside bottom left"
):
    fig.add_vrect(
        x0=datetime(2025, 1, 1),
        x1=datetime(2025, 4, 22),
        xref="x",
        fillcolor="#9fc2b2",
        annotation={"text": "Période 1"},
        annotation_position=label_position,
    )
    fig.add_vrect(
        x0=datetime(2025, 4, 22),
        x1=datetime(2025, 6, 9),
        xref="x",
        fillcolor="#ebd999",
        annotation={"text": "Période 2"},
        annotation_position=label_position,
    )

In [None]:
campaign_type_expr = (
    pl.when(pl.col("start_datetime") < pl.datetime(2025, 1, 1, time_zone="GMT"))
    .then(pl.lit("Période 0"))
    .when(pl.col("start_datetime") < pl.datetime(2025, 4, 22, time_zone="GMT"))
    .then(pl.lit("Période 1"))
    .otherwise(pl.lit("Période 2"))
    .alias("campaign_type")
)

# Queries


In [None]:
SQL_ENGINE = create_engine(DB_URL)

## Journeys


In [None]:
SQL = """
with perimeters_filtered as (
select
	p.com,
    p.l_com
from
	territory.territory_group_selector tgs
inner join geo.perimeters p on
	case
		when tgs.selector_type = 'aom' then p.aom = tgs.selector_value
		when tgs.selector_type = 'epci' then p.epci = tgs.selector_value
		else false
	end
	and tgs.territory_group_id = 36102
where p.year = 2024
),
first_trip as (
select
    driver_identity_key,
    min(c.start_datetime) as first_trip_datetime
from carpool_v2.carpools c
group by 1
),
first_trip_passengers as (
select
    passenger_identity_key,
    min(c.start_datetime) as first_trip_datetime
from carpool_v2.carpools c
group by 1
),
geo_filtered as (
select
	g.*,
    (p.com is not null and p2.com is not null) as is_fully_inside_campaign_area
from
	carpool_v2.geo g
left join perimeters_filtered p on
	g.start_geo_code = p.com
left join perimeters_filtered p2 on
	g.end_geo_code = p2.com	
where (p.com is not null or p2.com is not null)
and g.updated_at >= '2024-09-01'
),
incentives as (
select
	oi.carpool_id,
	sum(oi.amount) as incentive_amount,
    sum(oi.amount) filter (where siret='49190454600034') as amount_bbc,
    sum(oi.amount) filter (where siret='20007537200017') as amount_aom,
	array_agg(distinct oi.siret) as incentive_sirets
from
	carpool_v2.operator_incentives oi
inner join geo_filtered g on
	oi.carpool_id = g.carpool_id
where amount>0
group by
	1
),
journeys as 
(
select
	c."_id",
	c.operator_id,
	c.operator_journey_id,
	c.operator_trip_id,
    c.driver_identity_key,
    ft.first_trip_datetime,
    c.passenger_identity_key,
    ftp.first_trip_datetime as passenger_first_trip_datetime,
	c.start_datetime,
	c.end_datetime,
	c.distance,
	c.driver_revenue,
	c.passenger_contribution,
	i.incentive_amount,
    i.amount_bbc,
    i.amount_aom,
	i.incentive_sirets,
	c.start_position,
	c.end_position,
    c.passenger_seats,
    is_fully_inside_campaign_area,
	ST_MAKELINE(c.start_position::geometry,c.end_position::geometry) as journey_line	
from
	carpool_v2.carpools c
inner join geo_filtered g on
	c."_id" = g.carpool_id
left join incentives i on
	c."_id" = i.carpool_id
left join first_trip ft on ft.driver_identity_key=c.driver_identity_key
left join first_trip_passengers ftp on ftp.passenger_identity_key=c.passenger_identity_key
left join carpool_v2.status s on s."carpool_id"=c."_id" 
where
	(c.start_datetime between '2024-09-01' and '2025-06-30')
    and s.acquisition_status='processed'
    and s.fraud_status='passed'
    and s.anomaly_status='passed'
    )
SELECT
    j.*,
    CASE WHEN p.l_arr = p.country THEN p.l_country ELSE p.l_arr END as start_com,
    CASE WHEN p2.l_arr = p2.country THEN p2.l_country ELSE p2.l_arr END as end_com
from journeys j
left join carpool_v2.geo g on j."_id"=g."carpool_id"
left join geo.perimeters p on g."start_geo_code"=p.arr and p.year=2024
left join geo.perimeters p2 on g."end_geo_code"=p2.arr and p2.year=2024
"""

In [None]:
df_journeys_raw = pl.read_database(query=SQL, connection=SQL_ENGINE)

In [None]:
df_journeys_raw.head()

Voir ce qui est ciblé par BBC en terme d'OD incité.
Projet ligne expresse Annecy - Geneve, voir les incitatifs Suisse

Estimer le nombre de voitures évitées

Changer en % des conducteurs


In [None]:
df_journeys_raw.describe()

### Ajout des catégories de campagne


In [None]:
# Add campaign cat:
df_journeys_raw = df_journeys_raw.with_columns(campaign_type_expr)

### Ajout des classes de distance


In [None]:
distance_cat_expr = (
    pl.when(pl.col("campaign_type") == "Période 1")
    .then((pl.col("distance") / 1000).cut(breaks=[5, 20, 40], left_closed=True))
    .when(pl.col("campaign_type") == "Période 2")
    .then(
        pl.when(pl.col("is_fully_inside_campaign_area"))
        .then((pl.col("distance") / 1000).cut(breaks=[5, 20, 30, 40], left_closed=True))
        .otherwise((pl.col("distance") / 1000).cut(breaks=[5, 22], left_closed=True))
    )
)
df_journeys_raw = df_journeys_raw.with_columns(distance_cat_expr.alias("distance_cat"))

## Opérateurs


In [None]:
df_operators = pl.read_database(
    query="""
SELECT
    "_id",
    "name",
    "siret"
from operator.operators
where deleted_at is null
and name!='BlaBlaCar'
""",
    connection=SQL_ENGINE,
)

In [None]:
df_operators

## Géométrie campagne


In [None]:
df_geometries = pl.read_database(
    query="""
    select
        p.l_com,
    	p.geom_simple as geometry
    from
    	territory.territory_group_selector tgs
    inner join geo.perimeters p on
    	case
    		when tgs.selector_type = 'aom' then p.aom = tgs.selector_value
    		when tgs.selector_type = 'epci' then p.epci = tgs.selector_value
    		else false
    	end
    	and tgs.territory_group_id = 36102
    where p.year = 2024
    """,
    connection=SQL_ENGINE,
)
df_geometries = gpd.GeoDataFrame(df_geometries.to_pandas())
df_geometries = df_geometries.set_geometry(
    gpd.GeoSeries.from_wkb(df_geometries["geometry"])
)

In [None]:
geojson = json.loads(df_geometries.to_json())

# Identification incitateurs


In [None]:
df_journeys_raw = df_journeys_raw.with_columns(
    pl.col("incentive_sirets").list.contains(AOM_SIRET).alias("incentived_by_aom"),
    (
        pl.col("incentive_sirets")
        .list.set_intersection(df_operators["siret"].to_list())
        .list.len()
        > 0
    ).alias("incentived_by_operator"),
)

# Traitements géo


In [None]:
df_journeys_raw = df_journeys_raw.with_columns(
    pl.col("start_position")
    .map_elements(shapely.from_wkb, return_dtype=pl.Object)
    .alias("start_pos"),
    pl.col("end_position")
    .map_elements(shapely.from_wkb, return_dtype=pl.Object)
    .alias("end_pos"),
).with_columns(
    pl.col("start_pos")
    .map_elements(lambda x: x.x, return_dtype=pl.Float64)
    .alias("start_longitude"),
    pl.col("start_pos")
    .map_elements(lambda x: x.y, return_dtype=pl.Float64)
    .alias("start_latitude"),
    pl.col("end_pos")
    .map_elements(lambda x: x.x, return_dtype=pl.Float64)
    .alias("end_longitude"),
    pl.col("end_pos")
    .map_elements(lambda x: x.y, return_dtype=pl.Float64)
    .alias("end_latitude"),
)

## Filtrage des journeys sans incitations


In [None]:
df_journeys = df_journeys_raw.filter(pl.col("incentive_amount").is_not_null())

# Statistiques globales


In [None]:
agg_expressions = [
    pl.col("operator_journey_id").count().alias("num_journeys"),
    pl.col("is_fully_inside_campaign_area").sum().alias("num_journeys_intra_territory"),
    pl.col("operator_journey_id")
    .filter(pl.col("incentive_amount") > 0)
    .count()
    .alias("num_journeys_incentived"),
    pl.col("incentived_by_aom").sum().alias("num_journeys_with_aom_incentive"),
    pl.col("incentived_by_operator")
    .sum()
    .alias("num_journeys_with_operator_incentive"),
    pl.col("operator_journey_id")
    .filter(pl.col("is_fully_inside_campaign_area"))
    .count()
    .alias("num_journeys_fully_inside_campaign_area"),
    (pl.col("distance") / 1000).mean().alias("distance_avg"),
    (pl.col("incentive_amount").mean() / 100).alias("incentive_amount_avg"),
    (pl.col("passenger_contribution") / 100).mean().alias("passenger_contribution_avg"),
    (pl.col("driver_revenue").mean() / 100).alias("driver_revenue_avg"),
    (
        pl.col("incentive_amount")
        .filter(pl.col("is_fully_inside_campaign_area"))
        .mean()
        / 100
    ).alias("incentive_amount_intra_avg"),
    (
        pl.col("passenger_contribution")
        .filter(pl.col("is_fully_inside_campaign_area"))
        .mean()
        / 100
    ).alias("passenger_contribution_intra_avg"),
    (
        pl.col("driver_revenue").filter(pl.col("is_fully_inside_campaign_area")).mean()
        / 100
    ).alias("driver_revenue_intra_avg"),
    (
        pl.col("incentive_amount")
        .filter(pl.col("is_fully_inside_campaign_area").not_())
        .mean()
        / 100
    ).alias("incentive_amount_inter_avg"),
    (
        pl.col("passenger_contribution")
        .filter(pl.col("is_fully_inside_campaign_area").not_())
        .mean()
        / 100
    ).alias("passenger_contribution_inter_avg"),
    (
        pl.col("driver_revenue")
        .filter(pl.col("is_fully_inside_campaign_area").not_())
        .mean()
        / 100
    ).alias("driver_revenue_inter_avg"),
    (10 * (pl.col("incentive_amount") / pl.col("distance")))
    .mean()
    .alias("incentive_amount_per_km_avg"),
    (10 * (pl.col("passenger_contribution") / pl.col("distance")))
    .mean()
    .alias("passenger_contribution_per_km_avg"),
    (10 * (pl.col("driver_revenue") / pl.col("distance")))
    .mean()
    .alias("driver_revenue_per_km_avg"),
]

In [None]:
df_stats_by_month = (
    df_journeys.group_by(pl.col("start_datetime").dt.truncate("1mo").alias("month"))
    .agg(agg_expressions)
    .sort(pl.col("month"))
)
df_stats_by_month

In [None]:
df_stats_by_week = (
    df_journeys.filter(
        pl.col("start_datetime") <= datetime(2025, 6, 23, tzinfo=ZoneInfo("GMT"))
    )
    .group_by(pl.col("start_datetime").dt.truncate("1w").alias("week"))
    .agg(agg_expressions)
    .sort(pl.col("week"))
)

## Nombre de journeys


In [None]:
with pl.Config(set_fmt_str_lengths=120, set_tbl_width_chars=1000):
    print(
        df_journeys.select(
            pl.col("operator_journey_id").count().alias("Nombre de journeys"),
            pl.col("operator_journey_id")
            .filter(pl.col("incentive_amount") > 0)
            .count()
            .alias("Nombre de journeys avec incitation"),
            (
                100
                * pl.col("operator_journey_id")
                .filter(pl.col("incentive_amount") > 0)
                .count()
                / pl.col("operator_journey_id").count()
            ).alias("% journeys avec incitation"),
            pl.col("incentived_by_aom")
            .sum()
            .alias("Nombre de journeys avec incitation AOM"),
            (
                100
                * pl.col("incentived_by_aom").sum()
                / pl.col("operator_journey_id").count()
            ).alias("% journeys avec incitation AOM"),
            pl.col("incentived_by_operator")
            .sum()
            .alias("Nombre de journeys avec incitation opérateur"),
            (
                100
                * pl.col("incentived_by_operator").sum()
                / pl.col("operator_journey_id").count()
            ).alias("% journeys avec incitation opérateur"),
        )
        .with_columns(pl.selectors.all().round(2))
        .unpivot()
    )

### Evolution


#### Globale


In [None]:
def create_num_journeys_fig(df: pl.DataFrame, x_col: str = "month") -> go.Figure:
    traces = []
    max_y = 0
    for name in [
        "num_journeys",
        "num_journeys_with_aom_incentive",
        "num_journeys_with_operator_incentive",
    ]:
        trace = go.Scatter(
            x=df[x_col],
            y=df[name],
            mode="lines+text" if name == "num_journeys" else "lines",
            textposition="top center",
            text=df[name] if name == "num_journeys" else None,
            name=labels_map.get(name, name),
        )
        traces.append(trace)
        max_y = max(max_y, df[name].max())

    fig = go.Figure(traces)

    fig.update_layout(
        template="simple_white",
        title="PMGF - Nombre de journeys par mois",
        legend_orientation="h",
        legend_y=0.7,
        legend_yref="container",
    )
    fig.update_xaxes(title="Mois" if x_col == "month" else "Semaine")
    fig.update_yaxes(range=[0, max_y * 1.2], showgrid=True, title="Nombre de journeys")
    add_campaign_annotations(fig)

    return fig


fig_journeys_by_month = create_num_journeys_fig(df_stats_by_month)
fig_journeys_by_month.show()

fig_journeys_by_month.write_html("outputs/fig_journeys_par_mois.html")
fig_journeys_by_month.write_image(
    "outputs/fig_journeys_par_mois.svg", width=1280, height=720
)

#### Opérateur incitateurs


In [None]:
fig_journeys_by_operator = px.line(
    df_journeys.explode("incentive_sirets")
    .join(df_operators, left_on="incentive_sirets", right_on="siret", how="left")
    .group_by(["name", pl.col("start_datetime").dt.truncate("1mo")])
    .agg(pl.col("operator_journey_id").n_unique().alias("num_journeys"))
    .rename({"name": "operator", "start_datetime": "month"})
    .sort("month"),
    x="month",
    y="num_journeys",
    color="operator",
    template="simple_white",
    labels=labels_map,
    title="Nombre de journeys incités par opérateur",
)
fig_journeys_by_operator.update_yaxes(showgrid=True)
add_campaign_annotations(fig_journeys_by_operator)
fig_journeys_by_operator.show()


fig_journeys_by_operator.write_html("outputs/fig_journeys_par_operateur_mois.html")
fig_journeys_by_operator.write_image(
    "outputs/fig_journeys_par_operateur_mois.svg", width=1280, height=720
)

#### intra vs inter


In [None]:
fig_journeys_by_journey_type = px.line(
    df_stats_by_week.with_columns(
        (pl.col("num_journeys") - pl.col("num_journeys_intra_territory")).alias(
            "num_journeys_inter_territory"
        )
    ),
    x="week",
    y=["num_journeys_intra_territory", "num_journeys_inter_territory"],
    template="simple_white",
    labels=labels_map,
    title="Nombre de journeys par type de trajets",
)
fig_journeys_by_journey_type.update_traces(
    {"name": labels_map["num_journeys_intra_territory"]},
    selector={"name": "num_journeys_intra_territory"},
)
fig_journeys_by_journey_type.update_traces(
    {"name": labels_map["num_journeys_inter_territory"]},
    selector={"name": "num_journeys_inter_territory"},
)
fig_journeys_by_journey_type.update_yaxes(showgrid=True, title="Nombre de journeys")
add_campaign_annotations(fig_journeys_by_journey_type)
fig_journeys_by_journey_type.update_layout(
    legend_title="", legend_orientation="h", legend_y=0.7, legend_yref="container"
)
fig_journeys_by_journey_type.show()


fig_journeys_by_journey_type.write_html("outputs/fig_journeys_par_type_semaine.html")
fig_journeys_by_journey_type.write_image(
    "outputs/fig_journeys_par_type_semaine.svg", width=1280, height=720
)

In [None]:
fig_journeys_share_by_journey_type = px.line(
    df_stats_by_week.with_columns(
        (pl.col("num_journeys") - pl.col("num_journeys_intra_territory")).alias(
            "num_journeys_inter_territory"
        )
    ).with_columns(
        (100 * pl.col("num_journeys_intra_territory") / pl.col("num_journeys")).alias(
            "share_journeys_intra_territory"
        ),
        (100 * pl.col("num_journeys_inter_territory") / pl.col("num_journeys")).alias(
            "share_journeys_inter_territory"
        ),
    ),
    x="week",
    y=["share_journeys_intra_territory", "share_journeys_inter_territory"],
    template="simple_white",
    labels=labels_map,
    title="Répartition du type de journeys",
)
fig_journeys_share_by_journey_type.update_traces(
    {"name": labels_map["share_journeys_intra_territory"]},
    selector={"name": "share_journeys_intra_territory"},
)
fig_journeys_share_by_journey_type.update_traces(
    {"name": labels_map["share_journeys_inter_territory"]},
    selector={"name": "share_journeys_inter_territory"},
)
fig_journeys_share_by_journey_type.update_yaxes(
    showgrid=True, title="% des journeys", range=[0, 105]
)
add_campaign_annotations(fig_journeys_share_by_journey_type)
fig_journeys_share_by_journey_type.update_layout(
    legend_title="", legend_orientation="h", legend_y=0.7, legend_yref="container"
)
fig_journeys_share_by_journey_type.show()


fig_journeys_share_by_journey_type.write_html(
    "outputs/fig_journeys_ratio_par_type_semaine.html"
)
fig_journeys_share_by_journey_type.write_image(
    "outputs/fig_journeys_ratio_par_type_semaine.svg", width=1280, height=720
)

In [None]:
df_journeys_by_campaign_and_distance_cat = (
    df_journeys.filter(pl.col("campaign_type") != "Période 0")
    .group_by(["campaign_type", "is_fully_inside_campaign_area", "distance_cat"])
    .agg(pl.col("operator_journey_id").count().alias("num_journeys"))
    .with_columns(
        pl.when(pl.col("is_fully_inside_campaign_area"))
        .then(pl.lit("Intra-territoire"))
        .otherwise(pl.lit("Inter-territoire"))
        .alias("journey_type_label"),
        (
            pl.col("num_journeys")
            / (
                pl.col("num_journeys")
                .sum()
                .over(partition_by=["campaign_type", "is_fully_inside_campaign_area"])
            )
        ).alias("share"),
    )
)


def create_journeys_share_by_distance_cat_fig(
    df: pl.DataFrame, mode_intra_territory: bool = True, title=None
) -> go.Figure:
    colors = colors_map
    cat_labels_map = {
        "[-inf, 5)": "0-4 km",
        "[5, 20)": "5-19 km",
        "[20, 40)": "20-39 km",
        "[40, inf)": "40+ km",
        "[20, 30)": "20-29 km",
        "[30, 40)": "30-39 km",
        "[5, 22)": "5-21 km",
        "[22, inf)": "21+ km",
    }
    sorting_order = {
        "0-4 km": 0,
        "5-19 km": 1,
        "5-21 km": 1.5,
        "21+ km": 1.7,
        "20-29 km": 2,
        "20-39 km": 2,
        "30-39 km": 3,
        "40+ km": 4,
    }
    traces = []

    journey_type_filter_expr = pl.col("is_fully_inside_campaign_area")
    if not mode_intra_territory:
        journey_type_filter_expr = journey_type_filter_expr.not_()

    df = (
        df.filter(journey_type_filter_expr)
        .with_columns(pl.col("distance_cat").cast(pl.String).replace(cat_labels_map))
        .with_columns(
            pl.format(
                "<b>{}</b><br>{}%", "distance_cat", (100 * pl.col("share")).round(1)
            ).alias("text_vals"),
        )
    )
    for campaign_type in df["campaign_type"].unique().sort(descending=True):
        data_campaign = df.filter(pl.col("campaign_type") == campaign_type)
        distance_cat_list = data_campaign["distance_cat"].unique().to_list()
        distance_cat_list.sort(key=lambda x: sorting_order[x])
        for distance_cat in distance_cat_list:
            data_distance = data_campaign.filter(pl.col("distance_cat") == distance_cat)
            trace = go.Bar(
                x=data_distance["campaign_type"],
                y=data_distance["share"],
                text=data_distance["text_vals"],
                marker_color=[colors[campaign_type]],
                legendgroup=campaign_type,
                legendgrouptitle_text=campaign_type,
                name=f"{distance_cat}",
                marker_line_color="white",
                marker_line_width=2,
            )
            traces.append(trace)

    fig = go.Figure(traces)
    fig.update_layout(
        barmode="stack",
        template="simple_white",
        legend_groupclick="toggleitem",
        legend_grouptitlefont_color="black",
        height=500,
        title=title,
    )
    fig.update_yaxes(
        showgrid=True,
        title="% des trajets",
        tickformat=",.0%",
    )
    return fig

In [None]:
fig_journeys_share_by_distance_cat = create_journeys_share_by_distance_cat_fig(
    df_journeys_by_campaign_and_distance_cat,
    True,
    "Répartition des journeys par classes de distance - Trajets <i>intra</i>",
)
fig_journeys_share_by_distance_cat.show()

fig_journeys_share_by_distance_cat.write_html(
    "outputs/fig_journeys_ratio_par_distance_cat_intra.html"
)
fig_journeys_share_by_distance_cat.write_image(
    "outputs/fig_journeys_ratio_par_distance_cat_intra.svg", width=1280, height=720
)

In [None]:
fig_journeys_share_by_distance_cat_inter = create_journeys_share_by_distance_cat_fig(
    df_journeys_by_campaign_and_distance_cat,
    False,
    "Répartition des journeys par classes de distance - Trajets <i>inter</i>",
)
fig_journeys_share_by_distance_cat_inter.show()

fig_journeys_share_by_distance_cat_inter.write_html(
    "outputs/fig_journeys_ratio_par_distance_cat_inter.html"
)
fig_journeys_share_by_distance_cat_inter.write_image(
    "outputs/fig_journeys_ratio_par_distance_cat_inter.svg", width=1280, height=720
)

## Prix, revenus et incitations


In [None]:
def create_scatter_fig_prices(
    df: pl.DataFrame, stats_cols: list[str], x_col: str, title: str
) -> go.Figure:
    traces = []
    for name in stats_cols:
        trace = go.Scatter(
            x=df[x_col],
            y=df[name],
            name=labels_map.get(name, name),
            mode="lines+markers",
            marker_size=4,
        )
        traces.append(trace)
    fig = go.Figure(traces)
    add_campaign_annotations(fig)
    fig.update_layout(
        template="simple_white",
        title=title,
        legend_orientation="h",
        legend_y=0.7,
        legend_yref="container",
    )

    max_y = df.select(stats_cols).max().max_horizontal().item()

    fig.update_yaxes(
        range=[0, max_y * 1.2],
        title="Montant (euros)",
        showgrid=True,
        gridwidth=2,
        ticksuffix="€",
    )
    fig.update_xaxes(title="Mois" if x_col == "month" else "Semaine")

    return fig

In [None]:
fig_prices_by_week = create_scatter_fig_prices(
    df_stats_by_week,
    [
        "incentive_amount_avg",
        "passenger_contribution_avg",
        "driver_revenue_avg",
    ],
    "week",
    (
        "Montants moyens par trajet des incitations,"
        "<br>contributions passagers et revenus conducteurs"
    ),
)
fig_prices_by_week.show()


fig_prices_by_week.write_html("outputs/fig_prix_par_semaine.html")
fig_prices_by_week.write_image(
    "outputs/fig_prix_par_semaine.svg", width=1280, height=720
)

In [None]:
fig_prices_by_week_intra = create_scatter_fig_prices(
    df_stats_by_week,
    [
        "incentive_amount_intra_avg",
        "passenger_contribution_intra_avg",
        "driver_revenue_intra_avg",
    ],
    "week",
    (
        "Montants moyens par trajet <b>intra</b> des incitations,"
        "<br>contributions passagers et revenus conducteurs"
    ),
)
fig_prices_by_week_intra.show()


fig_prices_by_week_intra.write_html("outputs/fig_prix_intra_par_semaine.html")
fig_prices_by_week_intra.write_image(
    "outputs/fig_prix_intra_par_semaine.svg", width=1280, height=720
)

In [None]:
fig_prices_by_week_inter = create_scatter_fig_prices(
    df_stats_by_week,
    [
        "incentive_amount_inter_avg",
        "passenger_contribution_inter_avg",
        "driver_revenue_inter_avg",
    ],
    "week",
    (
        "Montants moyens par trajet <b>inter</b> des incitations,"
        "<br>contributions passagers et revenus conducteurs"
    ),
)
fig_prices_by_week_inter.show()


fig_prices_by_week_inter.write_html("outputs/fig_prix_inter_par_semaine.html")
fig_prices_by_week_inter.write_image(
    "outputs/fig_prix_inter_par_semaine.svg", width=1280, height=720
)

In [None]:
fig_prices_per_km_by_week = create_scatter_fig_prices(
    df_stats_by_week,
    [
        "incentive_amount_per_km_avg",
        "passenger_contribution_per_km_avg",
        "driver_revenue_per_km_avg",
    ],
    "week",
    (
        "Montants moyens <b>par km</b> par trajet des incitations,"
        "<br>contributions passagers et revenus conducteurs"
    ),
)
fig_prices_per_km_by_week.show()


fig_prices_per_km_by_week.write_html("outputs/fig_prix_par_km_par_semaine.html")
fig_prices_per_km_by_week.write_image(
    "outputs/fig_prix_par_km_par_semaine.svg", width=1280, height=720
)

### Incitation par rapport à la distance


In [None]:
px.scatter(
    df_journeys.filter(
        pl.col("amount_aom").is_not_null()
        & (pl.col("incentive_sirets") == ["20007537200017"])
        & (pl.col("campaign_type") != "Période 0")
        & (pl.col("is_fully_inside_campaign_area").not_())
    )
    .with_columns(pl.col("distance") / 1000)
    .unpivot(
        on=["amount_aom", "passenger_contribution", "driver_revenue"],
        index=["_id", "distance", "campaign_type"],
    )
    .sort("distance"),
    x="distance",
    y="value",
    color="variable",
    symbol="campaign_type",
    template="simple_white",
)

In [None]:
df_incentive_aom_by_distance = (
    df_journeys.filter(
        pl.col("amount_aom").is_not_null()
        & (pl.col("campaign_type") != "Période 0")
        & (pl.col("incentive_sirets") == ["20007537200017"])
        & (pl.col("is_fully_inside_campaign_area").not_())
    )
    .with_columns(
        pl.col("distance") / 1000,
        pl.col("driver_revenue")
        .cum_sum()
        .over(
            partition_by=[
                "driver_identity_key",
                pl.col("start_datetime").dt.truncate("1mo"),
            ],
            order_by="start_datetime",
        )
        .alias("driver_revenue_cumsum"),
    )
    .filter(pl.col("driver_revenue_cumsum") <= 5000)
    .group_by(
        [
            pl.col("campaign_type"),
            pl.col("distance").cut(
                list(range(0, 100, 5)), include_breaks=True, left_closed=True
            ),
        ]
    )
    .agg(
        pl.len(),
        (pl.col("amount_aom") / 100).mean().alias("Incitation AOM"),
        (pl.col("passenger_contribution") / 100).alias("Contribution passager").mean(),
        (pl.col("driver_revenue") / 100).alias("Revenu conducteur").mean(),
    )
    .with_columns(pl.col("distance").struct.unnest())
    .unpivot(
        on=["Incitation AOM", "Contribution passager", "Revenu conducteur"],
        index=["breakpoint", "category", "campaign_type"],
    )
    .sort(["campaign_type", "breakpoint"])
)

In [None]:
fig_incentive_aom_by_distance = px.line(
    df_incentive_aom_by_distance,
    x="breakpoint",
    y="value",
    color="variable",
    line_dash="campaign_type",
    template="simple_white",
    height=800,
    labels={**labels_map, "breakpoint": "Distance", "value": "Montant (€)"},
)

fig_incentive_aom_by_distance.update_layout(
    legend_title="",
    title="Montants moyens du revenu conducteur, contribution passager et incitation AOM en fonction de la distance"
    "<br><sub>Uniquement les trajets inter, intervalles de distance de 5km.</sub>",
)
fig_incentive_aom_by_distance.show()

fig_incentive_aom_by_distance.write_html("outputs/fig_incitation_aom_par_distance.html")
fig_incentive_aom_by_distance.write_image(
    "outputs/fig_incitation_aom_par_distance.svg", width=1280, height=720
)

In [None]:
df_incentive_aom_by_distance_intra = (
    df_journeys.filter(
        pl.col("amount_aom").is_not_null()
        & (pl.col("campaign_type") != "Période 0")
        & (pl.col("incentive_sirets") == ["20007537200017"])
        & (pl.col("is_fully_inside_campaign_area"))
    )
    .with_columns(
        pl.col("distance") / 1000,
        pl.col("driver_revenue")
        .cum_sum()
        .over(
            partition_by=[
                "driver_identity_key",
                pl.col("start_datetime").dt.truncate("1mo"),
            ],
            order_by="start_datetime",
        )
        .alias("driver_revenue_cumsum"),
    )
    .filter(pl.col("driver_revenue_cumsum") <= 5000)
    .group_by(
        [
            pl.col("campaign_type"),
            pl.col("distance").cut(
                list(range(0, 100, 5)), include_breaks=True, left_closed=True
            ),
        ]
    )
    .agg(
        pl.len(),
        (pl.col("amount_aom") / 100).mean().alias("Incitation AOM"),
        (pl.col("passenger_contribution") / 100).alias("Contribution passager").mean(),
        (pl.col("driver_revenue") / 100).alias("Revenu conducteur").mean(),
    )
    .with_columns(pl.col("distance").struct.unnest())
    .unpivot(
        on=["Incitation AOM", "Contribution passager", "Revenu conducteur"],
        index=["breakpoint", "category", "campaign_type"],
    )
    .sort(["campaign_type", "breakpoint"])
)

In [None]:
fig_incentive_aom_by_distance_intra = px.line(
    df_incentive_aom_by_distance_intra,
    x="breakpoint",
    y="value",
    color="variable",
    line_dash="campaign_type",
    template="simple_white",
    height=800,
    labels={**labels_map, "breakpoint": "Distance", "value": "Montant (€)"},
)

fig_incentive_aom_by_distance_intra.update_layout(
    legend_title="",
    title="Montants moyens du revenu conducteur, contribution passager et incitation AOM en fonction de la distance"
    "<br><sub>Uniquement les trajets intra, intervalles de distance de 5km.</sub>",
)
fig_incentive_aom_by_distance_intra.show()

fig_incentive_aom_by_distance_intra.write_html(
    "outputs/fig_incitation_aom_par_distance_intra.html"
)
fig_incentive_aom_by_distance_intra.write_image(
    "outputs/fig_incitation_aom_par_distance_intra.svg", width=1280, height=720
)

## Incitateurs


In [None]:
df_incitators_by_month = (
    (
        df_journeys.group_by(
            pl.col("start_datetime").dt.truncate("1mo"),
            pl.col("incentive_sirets").list.sort(),
        )
        .agg(pl.col("_id").n_unique().alias("num_journeys"))
        .with_columns(
            pl.col("incentive_sirets")
            .list.join(", ")
            .replace(
                {
                    "80820346700051": "ECOV",
                    "20007537200017": "PMGF",
                    "34409790200037": "ALLERGAN INDUSTRIE",
                    "20007085200013": "COMMUNAUTE DE COMMUNES USSES ET RHONE",
                    "49190454600034": "BBC",
                    "20004035000015": "CC BUGEY SUD",
                    "20007537200017, 34409790200037": "PMGF, ALLERGAN INDUSTRIE",
                    "20004035000015, 49190454600034": "CC BUGEY SUD, BBC",
                    "37937771600012": "NAEF IMMOBILIER",
                    "20007537200017, 37937771600012": "PMGF, NAEF IMMOBILIER",
                    "49985825600013": "SWISSPORT",
                    "20007537200017, 49985825600013": "PMGF, SWISSPORT",
                    "20005379100014": "REGION OCCITANIE",
                    "20007537200017, 49190454600034": "PMGF, BBC",
                    "80279897500024": "KAROS MOBILITY",
                    "20007196700018": "COMMUNAUTE DE COMMUNES PAYS D'EVIAN VALLEE D'ABONDANCE",
                    "20007537200017, 64203606500075": "PMGF, HILTON",
                    "30295849100946": "CREDIT AGRICOLE",
                }
            )
        )
    )
    .sort(["start_datetime", "incentive_sirets"])
    .filter(pl.col("num_journeys") > 10)
)

In [None]:
fig_incitators_by_month = px.line(
    df_incitators_by_month,
    x="start_datetime",
    y="num_journeys",
    color="incentive_sirets",
    template="simple_white",
    labels=labels_map,
)

add_campaign_annotations(fig_incitators_by_month, label_position="inside top left")

fig_incitators_by_month.update_layout(height=800)
fig_incitators_by_month.show()

## Distance


In [None]:
campaign_types_map_sort = {
    "Période 0": 0,
    "Période 1": 1,
    "Période 2": 2,
}

df_journeys.with_columns(campaign_type_expr).group_by("campaign_type").agg(
    (pl.col("distance") / 1000).mean().round(2)
).sort(
    pl.col("campaign_type").replace_strict(
        campaign_types_map_sort, return_dtype=pl.Int8
    )
)

In [None]:
fig_distance_by_week = px.line(
    df_stats_by_week,
    x="week",
    y="distance_avg",
    title="Distance moyenne des trajets",
    template="simple_white",
    labels=labels_map,
)

fig_distance_by_week.update_yaxes(
    range=[0, df_stats_by_week.select("distance_avg").max().item() * 1.2],
    tickvals=list(range(0, 45, 5)),
    showgrid=True,
    side="right",
    title="Distance (km)",
)
add_campaign_annotations(fig_distance_by_week)
fig_distance_by_week.show()

fig_distance_by_week.write_html("outputs/fig_distance_par_semaine.html")
fig_distance_by_week.write_image(
    "outputs/fig_distance_par_semaine.svg", width=1280, height=720
)

In [None]:
fig_distance_boxplots_by_campaign = px.box(
    df_journeys.with_columns(campaign_type_expr, pl.col("distance") / 1000),
    x="campaign_type",
    y="distance",
    template="simple_white",
    labels=labels_map,
    title="Box plots de la distribution de la distance en fonction de la campagne",
    color_discrete_map=colors_map,
)
fig_distance_boxplots_by_campaign.update_yaxes(range=[0, 150], title="Distnace (km)")
fig_distance_boxplots_by_campaign.show()

In [None]:
def create_campaign_distance_hist_fig(
    df: pl.DataFrame,
    bin_size: int,
    intra_filter: bool | None = None,
    use_share: bool = True,
) -> go.Figure:
    title_suffix = ""
    if intra_filter is not None:
        df = df.filter(
            pl.col("is_fully_inside_campaign_area")
            if intra_filter
            else pl.col("is_fully_inside_campaign_area").not_()
        )
        title_suffix = " - Trajets Intra" if intra_filter else " - Trajets Inter"

    df_distance_binned_by_campaign = (
        df.with_columns(campaign_type_expr, pl.col("distance") / 1000)
        .group_by(
            [
                pl.col("campaign_type"),
                pl.col("distance").cut(
                    range(0, 102, bin_size), include_breaks=True, left_closed=True
                ),
            ]
        )
        .len()
        .with_columns(pl.col("distance").struct.unnest())
        .with_columns(
            pl.col("category")
            .cast(pl.String)
            .str.extract(r"\[([0-9]+)\,", 1)
            .cast(pl.Int8)
            .alias("category_clean"),
            (
                100
                * pl.col("len")
                / pl.col("len").sum().over(partition_by="campaign_type")
            )
            .round(2)
            .alias("share"),
        )
    ).sort(["campaign_type", "distance"])
    df_distance_binned_by_campaign
    traces = []

    value_col = "share"
    y_axis_title = "% des trajets"
    opacity = 0.7
    if not use_share:
        value_col = "len"
        y_axis_title = "Nombre de journeys"
        opacity = 1

    for campaign in df_distance_binned_by_campaign["campaign_type"].unique().sort():
        data = df_distance_binned_by_campaign.filter(
            pl.col("campaign_type") == campaign
        ).sort("category")
        bar = go.Bar(
            x=data["category"],
            y=data[value_col],
            name=labels_map.get(campaign, campaign),
            opacity=opacity,
            visible=True if campaign != "Période 0" else "legendonly",
            marker_line_width=0,
            width=1,
            marker_color=colors_map.get(campaign),
        )
        traces.append(bar)
    fig = go.Figure(traces)
    fig.update_layout(
        barmode="overlay",
        template="simple_white",
        title=f"Distribution des distances effectuées dans les différentes campagnes{title_suffix}",
    )
    fig.update_yaxes(title=y_axis_title)
    fig.update_xaxes(title="Distance (km)")

    return fig

In [None]:
fig_campaign_distance_hist = create_campaign_distance_hist_fig(df_journeys, 5, None)
fig_campaign_distance_hist.show()

fig_campaign_distance_hist.write_html("outputs/fig_histo_distance.html")
fig_campaign_distance_hist.write_image(
    "outputs/fig_histo_distance.svg", width=1280, height=720
)

In [None]:
fig_campaign_intra_distance_hist = create_campaign_distance_hist_fig(
    df_journeys, 5, True
)
fig_campaign_intra_distance_hist.show()

fig_campaign_intra_distance_hist.write_html("outputs/fig_histo_distance_intra.html")
fig_campaign_intra_distance_hist.write_image(
    "outputs/fig_histo_distance_intra.svg", width=1280, height=720
)

In [None]:
fig_campaign_intra_distance_hist = create_campaign_distance_hist_fig(
    df_journeys, 5, True, use_share=False
)
fig_campaign_intra_distance_hist.show()

fig_campaign_intra_distance_hist.write_html(
    "outputs/fig_histo_distance_intra_volume.html"
)
fig_campaign_intra_distance_hist.write_image(
    "outputs/fig_histo_distance_intra_volume.svg", width=1280, height=720
)

In [None]:
fig_campaign_inter_distance_hist = create_campaign_distance_hist_fig(
    df_journeys, 5, False
)
fig_campaign_inter_distance_hist.show()
fig_campaign_inter_distance_hist.write_html("outputs/fig_histo_distance_inter.html")
fig_campaign_inter_distance_hist.write_image(
    "outputs/fig_histo_distance_inter.svg", width=1280, height=720
)

In [None]:
fig_campaign_inter_distance_hist = create_campaign_distance_hist_fig(
    df_journeys, 5, False, use_share=False
)
fig_campaign_inter_distance_hist.show()
fig_campaign_inter_distance_hist.write_html(
    "outputs/fig_histo_distance_inter_volume.html"
)
fig_campaign_inter_distance_hist.write_image(
    "outputs/fig_histo_distance_inter_volume.svg", width=1280, height=720
)

In [None]:
fig_distance_distribution_by_campaign = px.histogram(
    df_journeys.with_columns(campaign_type_expr, pl.col("distance") / 1000),
    x="distance",
    color="campaign_type",
    template="simple_white",
    labels={**labels_map},
    title="Distribution de la distance en fonction de la campagne",
    histnorm="percent",
    barmode="overlay",
    histfunc="count",
    color_discrete_map={
        "Période 0": "grey",
        "Période 1": "#9fc2b2",
        "Période 2": "#ebd999",
    },
)
fig_distance_distribution_by_campaign.update_xaxes(
    range=[0, 80], title="Distance effectué"
)
fig_distance_distribution_by_campaign.update_yaxes(title="% des trajets")
fig_distance_distribution_by_campaign.show()

In [None]:
bucket_size = 5

fig_distance_cumsum = px.line(
    (
        df_journeys.sort("distance")
        .group_by(
            [campaign_type_expr, (pl.col("distance") / (1_000 * bucket_size)).ceil()]
        )
        .agg(pl.col("_id").len())
        .sort("distance")
        .with_columns(
            (100 * pl.col("_id").cum_sum() / pl.col("_id").sum()).alias("share"),
            pl.col("distance") * bucket_size,
        )
    ),
    x="distance",
    y="share",
    color="campaign_type",
    color_discrete_map=colors_map,
    template="simple_white",
    labels={**labels_map, "share": "% des trajets"},
)
fig_distance_cumsum.update_layout(
    title="Fonction de répartition des trajets en fonction de la distance"
)
fig_distance_cumsum.update_xaxes(range=[0, 80])
fig_distance_cumsum.show()

In [None]:
bucket_size = 5

fig_distance_cumsum = px.line(
    (
        df_journeys.filter(pl.col("is_fully_inside_campaign_area"))
        .sort("distance")
        .group_by(
            [campaign_type_expr, (pl.col("distance") / (1_000 * bucket_size)).ceil()]
        )
        .agg(pl.col("_id").len())
        .sort("distance")
        .with_columns(
            (100 * pl.col("_id").cum_sum() / pl.col("_id").sum()).alias("share"),
            pl.col("distance") * bucket_size,
        )
    ),
    x="distance",
    y="share",
    color="campaign_type",
    color_discrete_map=colors_map,
    template="simple_white",
    labels={**labels_map, "share": "% des trajets"},
)
fig_distance_cumsum.update_layout(
    title="Fonction de répartition des trajets en fonction de la distance - Trajets intra"
)
fig_distance_cumsum.update_xaxes(range=[0, 80])
fig_distance_cumsum.show()

In [None]:
bucket_size = 5

fig_distance_cumsum = px.line(
    (
        df_journeys.filter(pl.col("is_fully_inside_campaign_area").not_())
        .sort("distance")
        .group_by(
            [campaign_type_expr, (pl.col("distance") / (1_000 * bucket_size)).ceil()]
        )
        .agg(pl.col("_id").len())
        .sort("distance")
        .with_columns(
            (100 * pl.col("_id").cum_sum() / pl.col("_id").sum()).alias("share"),
            pl.col("distance") * bucket_size,
        )
    ),
    x="distance",
    y="share",
    color="campaign_type",
    color_discrete_map=colors_map,
    template="simple_white",
    labels={**labels_map, "share": "% des trajets"},
)
fig_distance_cumsum.update_layout(
    title="Fonction de répartition des trajets en fonction de la distance - Trajets inter"
)
fig_distance_cumsum.update_xaxes(range=[0, 80])
fig_distance_cumsum.show()

# Conducteurs


## Acquisition


In [None]:
df_journeys.select(
    pl.col("driver_identity_key").n_unique().alias("Nombre de conducteurs uniques"),
    pl.col("driver_identity_key")
    .filter(
        pl.col("first_trip_datetime").is_between(
            datetime(2024, 9, 1, tzinfo=ZoneInfo("GMT")), CAMPAIGN_CHANGE_DATE
        ),
    )
    .alias(
        f"Nombre de conducteurs arrivés entre le 01/01/2024 et le {CAMPAIGN_CHANGE_DATE:%d/%m/%Y}"
    )
    .n_unique(),
    pl.col("driver_identity_key")
    .filter(
        pl.col("first_trip_datetime") >= CAMPAIGN_CHANGE_DATE,
    )
    .alias(f"Nombre de conducteurs arrivés après le {CAMPAIGN_CHANGE_DATE:%d/%m/%Y}")
    .n_unique(),
)

In [None]:
fig_new_drivers_count_by_week = px.bar(
    df_journeys.filter(
        pl.col("first_trip_datetime") >= datetime(2024, 9, 1, tzinfo=ZoneInfo("GMT"))
    )
    .group_by(pl.col("first_trip_datetime").dt.truncate("1w").alias("week"))
    .agg(pl.len())
    .sort(pl.col("week")),
    x="week",
    y="len",
    labels={**labels_map, "len": "Nombre de nouveaux conducteurs"},
    template="simple_white",
    title="Evolution de l'acquisition des conducteurs",
)
add_campaign_annotations(fig_new_drivers_count_by_week)
fig_new_drivers_count_by_week.show()

fig_new_drivers_count_by_week.write_html("outputs/fig_conducteurs_par_semaine.html")
fig_new_drivers_count_by_week.write_image(
    "outputs/fig_conducteurs_par_semaine.svg", width=1280, height=720
)

## Nombre de trajets


In [None]:
df_journeys.filter(
    pl.col("first_trip_datetime") >= datetime(2024, 9, 1, tzinfo=ZoneInfo("GMT")),
    pl.col("first_trip_datetime") <= datetime.now(ZoneInfo("GMT")) - timedelta(days=14),
).group_by(
    ["campaign_type", "driver_identity_key", pl.col("start_datetime").dt.truncate("1w")]
).agg(
    pl.len().alias("num_journeys"),
    pl.concat_str(pl.col("operator_id"), pl.lit("-"), pl.col("operator_trip_id"))
    .n_unique()
    .alias("num_trips"),
).group_by(["campaign_type", "start_datetime"]).agg(
    pl.col("num_journeys").mean().alias("Nombre moyen de journeys par semaine"),
    pl.col("num_trips").mean().alias("Nombre moyen de trips par semaine"),
).group_by(["campaign_type"]).agg(
    pl.col("Nombre moyen de journeys par semaine").mean(),
    pl.col("Nombre moyen de trips par semaine").mean(),
).sort(pl.col("campaign_type").replace(campaign_order))

In [None]:
df_journeys.filter(
    pl.col("first_trip_datetime") >= datetime(2024, 9, 1, tzinfo=ZoneInfo("GMT")),
    pl.col("first_trip_datetime") <= datetime.now(ZoneInfo("GMT")) - timedelta(days=30),
    pl.col("start_datetime") <= pl.col("first_trip_datetime") + pl.duration(days=30),
).group_by(["campaign_type", "driver_identity_key"]).agg(
    pl.len().alias("num_journeys"),
    pl.concat_str(pl.col("operator_id"), pl.lit("-"), pl.col("operator_trip_id"))
    .n_unique()
    .alias("num_trips"),
).group_by("campaign_type").agg(
    pl.col("num_journeys").mean().alias("Nombre moyen de journeys sur 30 jours"),
    pl.col("num_trips").mean().alias("Nombre moyen de trips sur 30 jours"),
).sort(pl.col("campaign_type").replace(campaign_order))

In [None]:
def create_num_drivers_by_num_trips_hist_fig(
    df: pl.DataFrame, step_size: int, max_step: int
) -> go.Figure:
    breaks = range(1, max_step + 1, step_size)

    campaign_types = df_journeys.select("campaign_type").unique().to_series().to_list()
    campaign_types = sorted(campaign_types, key=lambda x: campaign_order.get(x))

    # Création du DataFrame de toutes les combinaisons possibles
    combinations = pl.DataFrame(
        product(campaign_types, breaks), schema=["campaign_type", "breaks_raw"]
    ).with_columns(
        pl.col("breaks_raw")
        .cut(breaks, include_breaks=True, left_closed=True)
        .struct.unnest()
    )
    data_agg = (
        df.filter(
            pl.col("first_trip_datetime")
            >= datetime(2024, 9, 1, tzinfo=ZoneInfo("GMT")),
            pl.col("first_trip_datetime")
            <= datetime.now(ZoneInfo("GMT")) - timedelta(days=30),
            pl.col("start_datetime")
            <= pl.col("first_trip_datetime") + pl.duration(days=30),
        )
        .group_by(["campaign_type", "driver_identity_key"])
        .agg(
            pl.len().alias("num_journeys"),
            pl.concat_str(
                pl.col("operator_id"), pl.lit("-"), pl.col("operator_trip_id")
            )
            .n_unique()
            .alias("num_trips"),
        )
        .with_columns(
            pl.col("num_trips").cut(
                breaks=breaks, left_closed=True, include_breaks=True
            )
        )
        .group_by(["campaign_type", "num_trips"])
        .agg(pl.col("driver_identity_key").n_unique().alias("num_drivers"))
        .with_columns(pl.col("num_trips").struct.unnest())
    )

    data_complete = (
        combinations.join(
            data_agg,
            on=["campaign_type", "breakpoint"],
            how="left",
        )
        .with_columns(pl.col("num_drivers").fill_null(0))
        .with_columns(
            (
                100
                * pl.col("num_drivers")
                / pl.col("num_drivers").sum().over("campaign_type")
            )
            .round(2)
            .alias("share_drivers")
        )
        .sort([pl.col("campaign_type").replace(campaign_order), "breakpoint"])
    )

    traces = []

    for campaign_type in campaign_types:
        data_filtered = data_complete.filter(pl.col("campaign_type") == campaign_type)

        trace = go.Bar(
            x=data_filtered["category"],
            y=data_filtered["share_drivers"],
            marker_color=colors_map.get(campaign_type),
            name=campaign_type,
        )
        traces.append(trace)

    fig = go.Figure(traces)
    fig.update_layout(
        barmode="group",
        template="simple_white",
        title="Distribution du nombre de trips effectués sur 30 jours pour chaque campagne",
    )
    fig.update_xaxes(title="Nombre de trajets")
    fig.update_yaxes(title="% des conducteurs")

    return fig, data_complete


fig_drivers_by_trip_numbers_hist, data_complete = (
    create_num_drivers_by_num_trips_hist_fig(df_journeys, step_size=3, max_step=30)
)
fig_drivers_by_trip_numbers_hist.show()

fig_drivers_by_trip_numbers_hist.write_html(
    "outputs/fig_histo_trajets_conducteurs.html"
)
fig_drivers_by_trip_numbers_hist.write_image(
    "outputs/fig_histo_trajets_conducteurs.svg", width=1280, height=720
)

## Types de trajets


In [None]:
df_journeys.filter(
    pl.col("first_trip_datetime") >= datetime(2024, 9, 1, tzinfo=ZoneInfo("GMT")),
    pl.col("first_trip_datetime") <= datetime.now(ZoneInfo("GMT")) - timedelta(days=30),
    pl.col("start_datetime") <= pl.col("first_trip_datetime") + pl.duration(days=30),
).group_by(["campaign_type", "driver_identity_key"]).agg(
    (
        (
            pl.concat_str(
                pl.col("operator_id"), pl.lit("-"), pl.col("operator_trip_id")
            )
            .filter(pl.col("is_fully_inside_campaign_area"))
            .n_unique()
        )
        >= (
            pl.concat_str(
                pl.col("operator_id"), pl.lit("-"), pl.col("operator_trip_id")
            )
            .filter(pl.col("is_fully_inside_campaign_area").not_())
            .n_unique()
        )
    ).alias("is_intra_driver")
).group_by(["campaign_type"]).agg(
    (100 * pl.col("is_intra_driver").sum() / pl.len()).alias(
        "% des conducteurs avec une majorité de journeys intra"
    )
).sort(pl.col("campaign_type").replace(campaign_order))

In [None]:
df_journeys_trips_count_by_campagin_trip_type = (
    df_journeys.filter(
        pl.col("first_trip_datetime") >= datetime(2024, 9, 1, tzinfo=ZoneInfo("GMT")),
        pl.col("first_trip_datetime")
        <= datetime.now(ZoneInfo("GMT")) - timedelta(days=30),
        pl.col("start_datetime")
        <= pl.col("first_trip_datetime") + pl.duration(days=30),
    )
    .group_by(["campaign_type", "driver_identity_key"])
    .agg(
        (
            (
                pl.concat_str(
                    pl.col("operator_id"), pl.lit("-"), pl.col("operator_trip_id")
                )
                .filter(pl.col("is_fully_inside_campaign_area"))
                .n_unique()
            )
            >= (
                pl.concat_str(
                    pl.col("operator_id"), pl.lit("-"), pl.col("operator_trip_id")
                )
                .filter(pl.col("is_fully_inside_campaign_area").not_())
                .n_unique()
            )
        ).alias("is_intra_driver"),
        pl.len().alias("num_journeys"),
        pl.concat_str(pl.col("operator_id"), pl.lit("-"), pl.col("operator_trip_id"))
        .n_unique()
        .alias("num_trips"),
    )
    .group_by(["campaign_type", "is_intra_driver"])
    .agg(
        pl.col("num_journeys").mean().alias("Nombre moyen de journeys sur 30 jours"),
        pl.col("num_trips").mean().alias("Nombre moyen de trips sur 30 jours"),
    )
    .sort(pl.col("campaign_type").replace(campaign_order), "is_intra_driver")
)
df_journeys_trips_count_by_campagin_trip_type

In [None]:
fig_journeys_count_by_driver_type_campaign = px.bar(
    df_journeys_trips_count_by_campagin_trip_type.with_columns(
        pl.when(pl.col("is_intra_driver"))
        .then(pl.lit("Conducteur intra"))
        .otherwise(pl.lit("Conducteur inter"))
        .alias("driver_type")
    ),
    x="campaign_type",
    y="Nombre moyen de journeys sur 30 jours",
    color="driver_type",
    text="Nombre moyen de journeys sur 30 jours",
    text_auto=".1f",
    template="simple_white",
    barmode="group",
    labels=labels_map,
    title="Nombre de journeys par type de conducteur et campagne",
)
fig_journeys_count_by_driver_type_campaign.update_layout(legend_title=None)
fig_journeys_count_by_driver_type_campaign.show()
fig_journeys_count_by_driver_type_campaign.write_html(
    "outputs/fig_journeys_par_type_conducteur_et_campagne.html"
)
fig_journeys_count_by_driver_type_campaign.write_image(
    "outputs/fig_journeys_par_type_conducteur_et_campagne.svg", width=1280, height=720
)

In [None]:
fig_trips_count_by_driver_type_campaign = px.bar(
    df_journeys_trips_count_by_campagin_trip_type.with_columns(
        pl.when(pl.col("is_intra_driver"))
        .then(pl.lit("Conducteur intra"))
        .otherwise(pl.lit("Conducteur inter"))
        .alias("driver_type")
    ),
    x="campaign_type",
    y="Nombre moyen de trips sur 30 jours",
    color="driver_type",
    text="Nombre moyen de trips sur 30 jours",
    text_auto=".1f",
    template="simple_white",
    barmode="group",
    labels=labels_map,
    title="Nombre de trips par type de conducteur et campagne",
)

fig_trips_count_by_driver_type_campaign.update_layout(legend_title=None)
fig_trips_count_by_driver_type_campaign.show()
fig_trips_count_by_driver_type_campaign.write_html(
    "outputs/fig_trips_par_type_conducteur_et_campagne.html"
)
fig_trips_count_by_driver_type_campaign.write_image(
    "outputs/fig_trips_par_type_conducteur_et_campagne.svg", width=1280, height=720
)

In [None]:
df_passenger_mean_by_campagin_trip_type = (
    df_journeys.filter(
        pl.col("first_trip_datetime") >= datetime(2024, 9, 1, tzinfo=ZoneInfo("GMT")),
        pl.col("first_trip_datetime")
        <= datetime.now(ZoneInfo("GMT")) - timedelta(days=30),
        pl.col("start_datetime")
        <= pl.col("first_trip_datetime") + pl.duration(days=30),
    )
    .group_by(
        [
            pl.concat_str(
                pl.col("operator_id"), pl.lit("-"), pl.col("operator_trip_id")
            ).alias("trip_id"),
        ]
    )
    .agg(
        pl.col("is_fully_inside_campaign_area").max(),
        pl.col("passenger_seats").sum(),
        pl.col("campaign_type").max(),
        pl.col("driver_identity_key").max(),
    )
    .group_by(["campaign_type", "driver_identity_key"])
    .agg(
        (
            (
                pl.col("trip_id")
                .filter(pl.col("is_fully_inside_campaign_area"))
                .n_unique()
            )
            >= (
                pl.col("trip_id")
                .filter(pl.col("is_fully_inside_campaign_area").not_())
                .n_unique()
            )
        ).alias("is_intra_driver"),
        pl.len().alias("num_journeys"),
        pl.col("passenger_seats").mean(),
    )
    .group_by(["campaign_type", "is_intra_driver"])
    .agg(
        pl.col("passenger_seats").mean().alias("Nombre moyen de passagers"),
    )
    .sort(pl.col("campaign_type").replace(campaign_order), "is_intra_driver")
)
df_passenger_mean_by_campagin_trip_type

In [None]:
fig_passengers_count_by_driver_type_campaign = px.bar(
    df_passenger_mean_by_campagin_trip_type.with_columns(
        pl.when(pl.col("is_intra_driver"))
        .then(pl.lit("Conducteur intra"))
        .otherwise(pl.lit("Conducteur inter"))
        .alias("driver_type")
    ),
    x="campaign_type",
    y="Nombre moyen de passagers",
    color="driver_type",
    text="Nombre moyen de passagers",
    text_auto=".2f",
    template="simple_white",
    barmode="group",
    labels=labels_map,
    title="Nombre moyen de passagers par type de conducteur et campagne",
)

fig_passengers_count_by_driver_type_campaign.update_layout(legend_title=None)
fig_passengers_count_by_driver_type_campaign.show()
fig_passengers_count_by_driver_type_campaign.write_html(
    "outputs/fig_passagers_type_conducteur_et_campagne.html"
)
fig_passengers_count_by_driver_type_campaign.write_image(
    "outputs/fig_passagers_par_type_conducteur_et_campagne.svg", width=1280, height=720
)

## Rétention


In [None]:
df_acquisition_by_driver_type = (
    (
        df_journeys.filter(
            pl.col("first_trip_datetime")
            >= datetime(2024, 9, 1, tzinfo=ZoneInfo("GMT")),
            pl.col("first_trip_datetime")
            <= datetime.now(ZoneInfo("GMT")) - timedelta(weeks=6),
        )
        .group_by(["driver_identity_key"])
        .agg(
            pl.col("start_datetime").min(),
            pl.datetime_range(
                pl.col("start_datetime").min().dt.truncate("1w"),
                pl.col("start_datetime").min().dt.truncate("1w") + pl.duration(weeks=5),
                "1w",
            ).alias("week"),
        )
        .with_columns(
            (
                pl.when(
                    pl.col("start_datetime")
                    <= datetime(2025, 1, 1, tzinfo=ZoneInfo("GMT"))
                )
                .then(pl.lit("Période 0"))
                .when(pl.col("start_datetime") <= CAMPAIGN_CHANGE_DATE)
                .then(pl.lit("Période 1"))
                .otherwise(pl.lit("Période 2"))
                .alias("driver_campaign_type")
            )
        )
        .explode("week")
        .join(
            df_journeys.filter(
                pl.col("start_datetime")
                >= datetime(2024, 9, 1, tzinfo=ZoneInfo("GMT")),
            ),
            left_on=["driver_identity_key", "week"],
            right_on=[
                "driver_identity_key",
                pl.col("start_datetime").dt.truncate("1w"),
            ],
            how="left",
        )
    )
    .group_by([pl.col("driver_identity_key"), "week"])
    .agg(
        (pl.col("_id").count() > 0).alias("has_traveled"),
        pl.col("driver_campaign_type").max(),
    )
    .with_columns(
        pl.col("week")
        .rank()
        .over(partition_by=["driver_identity_key"], order_by="week")
        .alias("week_number")
    )
    .group_by(["driver_campaign_type", "week_number"])
    .agg(
        (100 * pl.col("has_traveled").sum() / pl.col("has_traveled").count()).alias(
            "drivers_share"
        )
    )
    .sort(
        [pl.col("driver_campaign_type").replace(campaign_order), pl.col("week_number")]
    )
)
df_acquisition_by_driver_type

In [None]:
fig_churn_by_campaign_type = px.line(
    df_acquisition_by_driver_type,
    x="week_number",
    y="drivers_share",
    color="driver_campaign_type",
    color_discrete_map=colors_map,
    template="simple_white",
    labels=labels_map,
    title="Attrition en fonction des différentes types de campagne",
)

fig_churn_by_campaign_type.update_yaxes(showgrid=True)
fig_churn_by_campaign_type.show()

fig_churn_by_campaign_type.write_html("outputs/fig_attrition_par_campagne.html")
fig_churn_by_campaign_type.write_image(
    "outputs/fig_attrition_par_campagne.svg", width=1280, height=720
)

## Conducteurs qui atteignent le seuil


In [None]:
df_incentives_stats_by_month_driver = (
    df_journeys_raw.with_columns(
        (pl.col("incentive_amount") / 100)
        .cum_sum()
        .over(
            partition_by=[
                "driver_identity_key",
                pl.col("start_datetime").dt.truncate("1mo"),
            ],
            order_by="start_datetime",
        )
        .alias("incentive_amount_cumu")
    )
    .group_by([pl.col("start_datetime").dt.truncate("1mo"), "driver_identity_key"])
    .agg(
        pl.col("incentive_amount_cumu").max().alias("incentive_amount_cumu_max"),
        pl.col("operator_trip_id")
        .filter(pl.col("incentive_amount_cumu") >= 50)
        .n_unique()
        .alias("num_trips_above_threshold"),
        pl.col("_id")
        .filter(pl.col("incentive_amount_cumu") >= 50)
        .n_unique()
        .alias("num_journeys_above_threshold"),
    )
)
df_incentives_stats_by_month_driver

In [None]:
fig_drivers_incentives_cat = px.bar(
    (
        df_incentives_stats_by_month_driver.group_by(["start_datetime"])
        .agg(
            (100 * (pl.col("num_trips_above_threshold") > 1).sum() / pl.len()).alias(
                "Conducteurs qui continuent à covoiturer"
            ),
            (100 * (pl.col("num_trips_above_threshold") == 1).sum() / pl.len()).alias(
                "Conducteurs qui s'arrêtent après avoir atteint le seuil"
            ),
        )
        .unpivot(
            index="start_datetime",
            value_name="share_drivers",
            variable_name="driver_cat",
        )
        .sort(["start_datetime", "driver_cat"])
    ),
    x="start_datetime",
    y="share_drivers",
    color="driver_cat",
    template="simple_white",
    title="Répartition des conducteurs qui atteignent le seuil",
    labels={
        **labels_map,
        "start_datetime": "Mois",
        "drivers_continuing_to_drive": "Conducteurs qui continuent à covoiturer",
        "drivers_stopping_to_drive_at_threshold": "Conducteurs qui s'arrêtent après avoir atteint le seuil",
    },
    barmode="group",
    text="share_drivers",
    text_auto=".1f",
)

fig_drivers_incentives_cat.update_layout(
    legend_orientation="h",
    legend_y=0.7,
    legend_yref="container",
    legend_title="",
)
fig_drivers_incentives_cat.update_yaxes(showgrid=True, title="Nombre de journeys")
add_campaign_annotations(fig_drivers_incentives_cat, label_position="inside top left")

fig_drivers_incentives_cat.show()

fig_drivers_incentives_cat.write_html("outputs/fig_conducteurs_seuils_incitation.html")
fig_drivers_incentives_cat.write_image(
    "outputs/fig_conducteurs_seuils_incitation.svg", width=1280, height=720
)

## Distribution des gains


In [None]:
breaks = list(range(0, 101, 10))
df_incentives_by_drivers_hist = (
    df_incentives_stats_by_month_driver.group_by(
        pl.col("incentive_amount_cumu_max").cut(breaks=breaks, include_breaks=True)
    )
    .agg((pl.col("driver_identity_key").n_unique()))
    .with_columns(
        pl.col("incentive_amount_cumu_max").struct.unnest(),
        (100 * pl.col("driver_identity_key") / pl.col("driver_identity_key").sum())
        .round(2)
        .alias("share"),
    )
    .sort("breakpoint")
)
fig_incentives_by_driver_hist = px.bar(
    df_incentives_by_drivers_hist,
    x="category",
    y="share",
    text="share",
    labels={**labels_map, "share": "% des conducteurs", "category": "Incitation reçue"},
    template="simple_white",
    title="Distribution des gains mensuels des conducteurs<br><sub>Par tranche de 10€,"
    " la première tranche est celle des conducteurs n'ayant percus aucune incitation.</sub>",
)
fig_incentives_by_driver_hist.show()
fig_incentives_by_driver_hist.write_html("outputs/fig_histo_incitation_conducteur.html")
fig_incentives_by_driver_hist.write_image(
    "outputs/fig_histo_incitation_conducteur.svg", width=1280, height=720
)

## Changements de profils


In [None]:
df_driver_cat_changes_stats = (
    df_journeys.group_by(["campaign_type", "driver_identity_key"])
    .agg(
        (
            (
                pl.concat_str(
                    pl.col("operator_id"), pl.lit("-"), pl.col("operator_trip_id")
                )
                .filter(pl.col("is_fully_inside_campaign_area"))
                .n_unique()
            )
            >= (
                pl.concat_str(
                    pl.col("operator_id"), pl.lit("-"), pl.col("operator_trip_id")
                )
                .filter(pl.col("is_fully_inside_campaign_area").not_())
                .n_unique()
            )
        ).alias("is_intra_driver")
    )
    .filter(pl.col("campaign_type") != "Période 0")
    .sort(pl.col("campaign_type").replace(campaign_order))
    .group_by("driver_identity_key", maintain_order=True)
    .agg(pl.col("campaign_type"), pl.col("is_intra_driver"))
    .with_columns(
        pl.when(pl.col("campaign_type") == ["Période 1"])
        .then(pl.lit("Conducteur a churné"))
        .when(pl.col("campaign_type") == ["Période 2"])
        .then(pl.lit("Nouveau conducteur"))
        .when(pl.col("is_intra_driver") == [False, True])
        .then(pl.lit("conducteur passé d'inter a intra"))
        .when(pl.col("is_intra_driver") == [True, False])
        .then(pl.lit("conducteur passé d'intra a inter"))
        .otherwise(pl.lit("Conducteur n'a pas changé de catégorie"))
        .alias("driver_cat")
    )
    .group_by("driver_cat")
    .agg(pl.len().alias("num_drivers"))
    .with_columns(
        (100 * pl.col("num_drivers") / pl.col("num_drivers").sum()).alias(
            "share_drivers"
        )
    )
)
df_driver_cat_changes_stats.sort("share_drivers", descending=True)

In [None]:
df_driver_cat_changes_stats.sort("share_drivers", descending=True).write_clipboard()

# Passagers


## Acquisition


In [None]:
df_journeys.select(
    pl.col("passenger_identity_key").n_unique().alias("Nombre de passagers uniques"),
    pl.col("passenger_identity_key")
    .filter(
        pl.col("passenger_first_trip_datetime").is_between(
            datetime(2024, 9, 1, tzinfo=ZoneInfo("GMT")), CAMPAIGN_CHANGE_DATE
        ),
    )
    .alias(
        f"Nombre de passagers arrivés entre le 01/01/2024 et le {CAMPAIGN_CHANGE_DATE:%d/%m/%Y}"
    )
    .n_unique(),
    pl.col("passenger_identity_key")
    .filter(
        pl.col("first_trip_datetime") > CAMPAIGN_CHANGE_DATE,
    )
    .alias(f"Nombre de passagers arrivés après le {CAMPAIGN_CHANGE_DATE:%d/%m/%Y}")
    .n_unique(),
)

In [None]:
fig_new_passengers_count_by_week = px.bar(
    df_journeys.filter(
        pl.col("passenger_first_trip_datetime")
        >= datetime(2024, 9, 1, tzinfo=ZoneInfo("GMT"))
    )
    .group_by(pl.col("passenger_first_trip_datetime").dt.truncate("1w").alias("week"))
    .agg(pl.len())
    .sort(pl.col("week")),
    x="week",
    y="len",
    labels={**labels_map, "len": "Nombre de nouveaux passagers"},
    template="simple_white",
    title="Evolution de l'acquisition des conducteurs",
)
add_campaign_annotations(
    fig_new_passengers_count_by_week, label_position="inside top left"
)
fig_new_passengers_count_by_week.show()

fig_new_passengers_count_by_week.write_html("outputs/fig_passagers_par_semaine.html")
fig_new_passengers_count_by_week.write_image(
    "outputs/fig_passagers_par_semaine.svg", width=1280, height=720
)

## Nombre de trajets


In [None]:
df_journeys.filter(
    pl.col("passenger_first_trip_datetime")
    >= datetime(2024, 9, 1, tzinfo=ZoneInfo("GMT")),
    pl.col("passenger_first_trip_datetime")
    <= datetime.now(ZoneInfo("GMT")) - timedelta(days=14),
).group_by(
    [
        "campaign_type",
        "passenger_identity_key",
        pl.col("start_datetime").dt.truncate("1w"),
    ]
).agg(
    pl.len().alias("num_journeys"),
    pl.concat_str(pl.col("operator_id"), pl.lit("-"), pl.col("operator_trip_id"))
    .n_unique()
    .alias("num_trips"),
).group_by(["campaign_type", "start_datetime"]).agg(
    pl.col("num_journeys").mean().alias("Nombre moyen de journeys par semaine"),
    pl.col("num_trips").mean().alias("Nombre moyen de trips par semaine"),
).group_by(["campaign_type"]).agg(
    pl.col("Nombre moyen de journeys par semaine").mean(),
    pl.col("Nombre moyen de trips par semaine").mean(),
).sort(pl.col("campaign_type").replace(campaign_order))

In [None]:
df_journeys.filter(
    pl.col("passenger_first_trip_datetime")
    >= datetime(2024, 9, 1, tzinfo=ZoneInfo("GMT")),
    pl.col("passenger_first_trip_datetime")
    <= datetime.now(ZoneInfo("GMT")) - timedelta(days=30),
    pl.col("start_datetime")
    <= pl.col("passenger_first_trip_datetime") + pl.duration(days=30),
).group_by(["campaign_type", "passenger_identity_key"]).agg(
    pl.len().alias("num_journeys"),
    pl.concat_str(pl.col("operator_id"), pl.lit("-"), pl.col("operator_trip_id"))
    .n_unique()
    .alias("num_trips"),
).group_by("campaign_type").agg(
    pl.col("num_journeys").mean().alias("Nombre moyen de journeys sur 30 jours"),
    pl.col("num_trips").mean().alias("Nombre moyen de trips sur 30 jours"),
).sort(pl.col("campaign_type").replace(campaign_order))

In [None]:
def create_num_passengers_by_num_trips_hist_fig(
    df: pl.DataFrame, step_size: int, max_step: int
) -> go.Figure:
    breaks = range(1, max_step + 1, step_size)

    campaign_types = df_journeys.select("campaign_type").unique().to_series().to_list()
    campaign_types = sorted(campaign_types, key=lambda x: campaign_order.get(x))

    # Création du DataFrame de toutes les combinaisons possibles
    combinations = pl.DataFrame(
        product(campaign_types, breaks), schema=["campaign_type", "breaks_raw"]
    ).with_columns(
        pl.col("breaks_raw")
        .cut(breaks, include_breaks=True, left_closed=True)
        .struct.unnest()
    )
    data_agg = (
        df.filter(
            pl.col("passenger_first_trip_datetime")
            >= datetime(2024, 9, 1, tzinfo=ZoneInfo("GMT")),
            pl.col("passenger_first_trip_datetime")
            <= datetime.now(ZoneInfo("GMT")) - timedelta(days=30),
            pl.col("start_datetime")
            <= pl.col("passenger_first_trip_datetime") + pl.duration(days=30),
        )
        .group_by(["campaign_type", "passenger_identity_key"])
        .agg(
            pl.len().alias("num_journeys"),
            pl.concat_str(
                pl.col("operator_id"), pl.lit("-"), pl.col("operator_trip_id")
            )
            .n_unique()
            .alias("num_trips"),
        )
        .with_columns(
            pl.col("num_trips").cut(
                breaks=breaks, left_closed=True, include_breaks=True
            )
        )
        .group_by(["campaign_type", "num_trips"])
        .agg(pl.col("passenger_identity_key").n_unique().alias("num_passengers"))
        .with_columns(pl.col("num_trips").struct.unnest())
    )

    data_complete = (
        combinations.join(
            data_agg,
            on=["campaign_type", "breakpoint"],
            how="left",
        )
        .with_columns(pl.col("num_passengers").fill_null(0))
        .with_columns(
            (
                100
                * pl.col("num_passengers")
                / pl.col("num_passengers").sum().over("campaign_type")
            )
            .round(2)
            .alias("share_passengers")
        )
        .sort([pl.col("campaign_type").replace(campaign_order), "breakpoint"])
    )

    traces = []

    for campaign_type in campaign_types:
        data_filtered = data_complete.filter(pl.col("campaign_type") == campaign_type)

        trace = go.Bar(
            x=data_filtered["category"],
            y=data_filtered["share_passengers"],
            marker_color=colors_map.get(campaign_type),
            name=campaign_type,
        )
        traces.append(trace)

    fig = go.Figure(traces)
    fig.update_layout(
        barmode="group",
        template="simple_white",
        title="Distribution du nombre de trips effectués sur 30 jours pour chaque campagne",
    )
    fig.update_xaxes(title="Nombre de trajets")
    fig.update_yaxes(title="% des passagers")

    return fig, data_complete


fig_passengers_by_trip_numbers_hist, data_complete = (
    create_num_passengers_by_num_trips_hist_fig(df_journeys, step_size=3, max_step=30)
)
fig_passengers_by_trip_numbers_hist.show()

fig_passengers_by_trip_numbers_hist.write_html(
    "outputs/fig_histo_trajets_passagers.html"
)
fig_passengers_by_trip_numbers_hist.write_image(
    "outputs/fig_histo_trajets_passagers.svg", width=1280, height=720
)

## Types de trajets


In [None]:
df_journeys.filter(
    pl.col("passenger_first_trip_datetime")
    >= datetime(2024, 9, 1, tzinfo=ZoneInfo("GMT")),
    pl.col("passenger_first_trip_datetime")
    <= datetime.now(ZoneInfo("GMT")) - timedelta(days=30),
    pl.col("start_datetime")
    <= pl.col("passenger_first_trip_datetime") + pl.duration(days=30),
).group_by(["campaign_type", "passenger_identity_key"]).agg(
    (
        (
            pl.concat_str(
                pl.col("operator_id"), pl.lit("-"), pl.col("operator_trip_id")
            )
            .filter(pl.col("is_fully_inside_campaign_area"))
            .n_unique()
        )
        >= (
            pl.concat_str(
                pl.col("operator_id"), pl.lit("-"), pl.col("operator_trip_id")
            )
            .filter(pl.col("is_fully_inside_campaign_area").not_())
            .n_unique()
        )
    ).alias("is_intra_passenger")
).group_by(["campaign_type"]).agg(
    (100 * pl.col("is_intra_passenger").sum() / pl.len()).alias(
        "% des passagers avec une majorité de journeys intra"
    )
).sort(pl.col("campaign_type").replace(campaign_order))

In [None]:
df_passengers_journeys_trips_count_by_campagin_trip_type = (
    df_journeys.filter(
        pl.col("passenger_first_trip_datetime")
        >= datetime(2024, 9, 1, tzinfo=ZoneInfo("GMT")),
        pl.col("passenger_first_trip_datetime")
        <= datetime.now(ZoneInfo("GMT")) - timedelta(days=30),
        pl.col("start_datetime")
        <= pl.col("passenger_first_trip_datetime") + pl.duration(days=30),
    )
    .group_by(["campaign_type", "passenger_identity_key"])
    .agg(
        (
            (
                pl.concat_str(
                    pl.col("operator_id"), pl.lit("-"), pl.col("operator_trip_id")
                )
                .filter(pl.col("is_fully_inside_campaign_area"))
                .n_unique()
            )
            >= (
                pl.concat_str(
                    pl.col("operator_id"), pl.lit("-"), pl.col("operator_trip_id")
                )
                .filter(pl.col("is_fully_inside_campaign_area").not_())
                .n_unique()
            )
        ).alias("is_intra_passenger"),
        pl.len().alias("num_journeys"),
        pl.concat_str(pl.col("operator_id"), pl.lit("-"), pl.col("operator_trip_id"))
        .n_unique()
        .alias("num_trips"),
    )
    .group_by(["campaign_type", "is_intra_passenger"])
    .agg(
        pl.col("num_journeys").mean().alias("Nombre moyen de journeys sur 30 jours"),
        pl.col("num_trips").mean().alias("Nombre moyen de trips sur 30 jours"),
    )
    .sort(pl.col("campaign_type").replace(campaign_order), "is_intra_passenger")
)
df_journeys_trips_count_by_campagin_trip_type

In [None]:
fig_passengers_journeys_count_by_driver_type_campaign = px.bar(
    df_passengers_journeys_trips_count_by_campagin_trip_type.with_columns(
        pl.when(pl.col("is_intra_passenger"))
        .then(pl.lit("Passager intra"))
        .otherwise(pl.lit("Passager inter"))
        .alias("passenger_type")
    ),
    x="campaign_type",
    y="Nombre moyen de journeys sur 30 jours",
    color="passenger_type",
    text="Nombre moyen de journeys sur 30 jours",
    text_auto=".1f",
    template="simple_white",
    barmode="group",
    labels=labels_map,
    title="Nombre de journeys par type de passagers et campagne",
)
fig_passengers_journeys_count_by_driver_type_campaign.update_layout(legend_title=None)
fig_passengers_journeys_count_by_driver_type_campaign.show()
fig_passengers_journeys_count_by_driver_type_campaign.write_html(
    "outputs/fig_journeys_par_type_passagers_et_campagne.html"
)
fig_passengers_journeys_count_by_driver_type_campaign.write_image(
    "outputs/fig_journeys_par_type_passagers_et_campagne.svg", width=1280, height=720
)

## Nombre de conducteurs


In [None]:
df_passengers_drivers_mean_by_campaign_trip_type = (
    df_journeys.filter(
        pl.col("passenger_first_trip_datetime")
        >= datetime(2024, 9, 1, tzinfo=ZoneInfo("GMT")),
        pl.col("passenger_first_trip_datetime")
        <= datetime.now(ZoneInfo("GMT")) - timedelta(days=30),
        pl.col("start_datetime")
        <= pl.col("passenger_first_trip_datetime") + pl.duration(days=30),
    )
    .group_by(
        [
            pl.concat_str(
                pl.col("operator_id"), pl.lit("-"), pl.col("operator_trip_id")
            ).alias("trip_id"),
        ]
    )
    .agg(
        pl.col("is_fully_inside_campaign_area").max(),
        pl.col("passenger_seats").sum(),
        pl.col("campaign_type").max(),
        pl.col("driver_identity_key").max(),
        pl.col("passenger_identity_key").max(),
    )
    .group_by(["campaign_type", "passenger_identity_key"])
    .agg(
        (
            (
                pl.col("trip_id")
                .filter(pl.col("is_fully_inside_campaign_area"))
                .n_unique()
            )
            >= (
                pl.col("trip_id")
                .filter(pl.col("is_fully_inside_campaign_area").not_())
                .n_unique()
            )
        ).alias("is_intra_passenger"),
        pl.len().alias("num_journeys"),
        pl.col("driver_identity_key").n_unique().alias("num_drivers"),
    )
    .group_by(["campaign_type", "is_intra_passenger"])
    .agg(
        pl.col("num_drivers").mean().alias("Nombre moyen de conducteurs différents"),
    )
    .sort(pl.col("campaign_type").replace(campaign_order), "is_intra_passenger")
)
df_passengers_drivers_mean_by_campaign_trip_type

In [None]:
fig_passengers_drivers_count_by_driver_type_campaign = px.bar(
    df_passengers_drivers_mean_by_campaign_trip_type.with_columns(
        pl.when(pl.col("is_intra_passenger"))
        .then(pl.lit("Passager intra"))
        .otherwise(pl.lit("Passager inter"))
        .alias("passenger_type")
    ),
    x="campaign_type",
    y="Nombre moyen de conducteurs différents",
    color="passenger_type",
    text="Nombre moyen de conducteurs différents",
    text_auto=".2f",
    template="simple_white",
    barmode="group",
    labels=labels_map,
    title="Nombre moyen de conducteurs différents par type de passager et campagne",
)

fig_passengers_drivers_count_by_driver_type_campaign.update_layout(legend_title=None)
fig_passengers_drivers_count_by_driver_type_campaign.show()
fig_passengers_drivers_count_by_driver_type_campaign.write_html(
    "outputs/fig_conducteurs_type_passager_et_campagne.html"
)
fig_passengers_drivers_count_by_driver_type_campaign.write_image(
    "outputs/fig_conducteurs_type_passager_et_campagne.svg", width=1280, height=720
)

## Attrition


In [None]:
CAMPAIGN_CHANGE_DATE

In [None]:
df_acquisition_by_passenger_type = (
    (
        df_journeys.filter(
            pl.col("passenger_first_trip_datetime")
            >= datetime(2024, 9, 1, tzinfo=ZoneInfo("GMT")),
            pl.col("passenger_first_trip_datetime")
            <= datetime.now(ZoneInfo("GMT")) - timedelta(weeks=6),
        )
        .group_by(["passenger_identity_key"])
        .agg(
            pl.col("start_datetime").min(),
            pl.datetime_range(
                pl.col("start_datetime").min().dt.truncate("1w"),
                pl.col("start_datetime").min().dt.truncate("1w") + pl.duration(weeks=5),
                "1w",
            ).alias("week"),
        )
        .with_columns(
            (
                pl.when(
                    pl.col("start_datetime")
                    <= datetime(2025, 1, 1, tzinfo=ZoneInfo("GMT"))
                )
                .then(pl.lit("Période 0"))
                .when(pl.col("start_datetime") <= CAMPAIGN_CHANGE_DATE)
                .then(pl.lit("Période 1"))
                .otherwise(pl.lit("Période 2"))
                .alias("passenger_campaign_type")
            )
        )
        .explode("week")
        .join(
            df_journeys.filter(
                pl.col("start_datetime")
                >= datetime(2024, 9, 1, tzinfo=ZoneInfo("GMT")),
            ),
            left_on=["passenger_identity_key", "week"],
            right_on=[
                "passenger_identity_key",
                pl.col("start_datetime").dt.truncate("1w"),
            ],
            how="left",
        )
    )
    .group_by([pl.col("passenger_identity_key"), "week"])
    .agg(
        (pl.col("_id").count() > 0).alias("has_traveled"),
        pl.col("passenger_campaign_type").max(),
    )
    .with_columns(
        pl.col("week")
        .rank()
        .over(partition_by=["passenger_identity_key"], order_by="week")
        .alias("week_number")
    )
    .group_by(["passenger_campaign_type", "week_number"])
    .agg(
        (100 * pl.col("has_traveled").sum() / pl.col("has_traveled").count()).alias(
            "passengers_share"
        )
    )
    .sort(
        [
            pl.col("passenger_campaign_type").replace(campaign_order),
            pl.col("week_number"),
        ]
    )
)
df_acquisition_by_passenger_type

In [None]:
fig_passenger_churn_by_campaign_type = px.line(
    df_acquisition_by_passenger_type,
    x="week_number",
    y="passengers_share",
    color="passenger_campaign_type",
    color_discrete_map=colors_map,
    template="simple_white",
    labels=labels_map,
    title="Attrition des passagers en fonction des différentes types de campagne",
)

fig_passenger_churn_by_campaign_type.update_yaxes(showgrid=True)
fig_passenger_churn_by_campaign_type.show()

fig_passenger_churn_by_campaign_type.write_html(
    "outputs/fig_attrition_passagers_par_campagne.html"
)
fig_passenger_churn_by_campaign_type.write_image(
    "outputs/fig_attrition_passagers_par_campagne.svg", width=1280, height=720
)

## Changements de profils


In [None]:
df_passenger_cat_changes_stats = (
    df_journeys.group_by(["campaign_type", "passenger_identity_key"])
    .agg(
        (
            (
                pl.concat_str(
                    pl.col("operator_id"), pl.lit("-"), pl.col("operator_trip_id")
                )
                .filter(pl.col("is_fully_inside_campaign_area"))
                .n_unique()
            )
            >= (
                pl.concat_str(
                    pl.col("operator_id"), pl.lit("-"), pl.col("operator_trip_id")
                )
                .filter(pl.col("is_fully_inside_campaign_area").not_())
                .n_unique()
            )
        ).alias("is_intra_passenger")
    )
    .filter(pl.col("campaign_type") != "Période 0")
    .sort(pl.col("campaign_type").replace(campaign_order))
    .group_by("passenger_identity_key", maintain_order=True)
    .agg(pl.col("campaign_type"), pl.col("is_intra_passenger"))
    .with_columns(
        pl.when(pl.col("campaign_type") == ["Période 1"])
        .then(pl.lit("Passager a churné"))
        .when(pl.col("campaign_type") == ["Période 2"])
        .then(pl.lit("Nouveau passager"))
        .when(pl.col("is_intra_passenger") == [False, True])
        .then(pl.lit("Passager passé d'inter a intra"))
        .when(pl.col("is_intra_passenger") == [True, False])
        .then(pl.lit("Passager passé d'intra a inter"))
        .otherwise(pl.lit("Passager n'a pas changé de catégorie"))
        .alias("passenger_cat")
    )
    .group_by("passenger_cat")
    .agg(pl.len().alias("num_passengers"))
    .with_columns(
        (100 * pl.col("num_passengers") / pl.col("num_passengers").sum()).alias(
            "share_passengers"
        )
    )
)
df_passenger_cat_changes_stats.sort("share_passengers", descending=True)

In [None]:
df_passenger_cat_changes_stats.sort(
    "share_passengers", descending=True
).write_clipboard()

# Geo


In [None]:
df_journeys.filter(pl.col("campaign_type") == "Période 1").group_by(
    ["start_com", "end_com"]
).agg(pl.col("_id").n_unique().alias("num_journeys")).with_columns(
    (100 * pl.col("num_journeys") / pl.col("num_journeys").sum()).alias("share")
).sort("num_journeys", descending=True).head(10)

In [None]:
df_journeys.filter(pl.col("campaign_type") == "Période 2").group_by(
    ["start_com", "end_com"]
).agg(pl.col("_id").n_unique().alias("num_journeys")).with_columns(
    (100 * pl.col("num_journeys") / pl.col("num_journeys").sum()).alias("share")
).sort("num_journeys", descending=True).head(10)

In [None]:
fig_density_map_start_campaign_before = px.density_map(
    df_journeys.filter(pl.col("campaign_type") == "Période 1").sample(20000),
    lat="start_latitude",
    lon="start_longitude",
    radius=10,
    center=dict(lat=46.2, lon=6.4),
    zoom=9,
    map_style="open-street-map",
    opacity=0.7,
)
fig_density_map_start_campaign_before.update_traces({"name": "Période 1"})
fig_density_map_start_campaign_before.add_trace(
    go.Choroplethmap(
        geojson=geojson,
        z=[1] * len(geojson["features"]),
        locations=[feature["id"] for feature in geojson["features"]],
        colorscale=[[0, "rgba(41, 128, 185,1.0)"], [1, "rgba(41, 128, 185,1.0)"]],
        marker_opacity=0.4,
        marker_line_width=0,
        marker_line_color="white",
        name="Territoire campagne",
    )
)
fig_density_map_start_campaign_before.update_layout(height=1000, showlegend=True)
fig_density_map_start_campaign_before.show()

In [None]:
fig_density_map_start_campaign_after = px.density_map(
    df_journeys.filter(pl.col("campaign_type") == "Période 2").sample(20000),
    lat="start_latitude",
    lon="start_longitude",
    radius=10,
    center=dict(lat=46.2, lon=6.4),
    zoom=9,
    map_style="open-street-map",
    opacity=0.7,
)
fig_density_map_start_campaign_after.add_trace(
    go.Choroplethmap(
        geojson=geojson,
        z=[1] * len(geojson["features"]),
        locations=[feature["id"] for feature in geojson["features"]],
        colorscale=[[0, "rgba(41, 128, 185,1.0)"], [1, "rgba(41, 128, 185,1.0)"]],
        marker_opacity=0.4,
        marker_line_width=0,
        marker_line_color="white",
        name="Territoire campagne",
    )
)
fig_density_map_start_campaign_after.update_layout(height=1000)
fig_density_map_start_campaign_after.show()

In [None]:
fig_density_map_end_campaign_before = px.density_map(
    df_journeys.filter(pl.col("campaign_type") == "Période 1").sample(20000),
    lat="end_latitude",
    lon="end_longitude",
    radius=10,
    center=dict(lat=46.2, lon=6.4),
    zoom=9,
    map_style="open-street-map",
    opacity=0.7,
)
fig_density_map_end_campaign_before.add_trace(
    go.Choroplethmap(
        geojson=geojson,
        z=[1] * len(geojson["features"]),
        locations=[feature["id"] for feature in geojson["features"]],
        colorscale=[[0, "rgba(41, 128, 185,1.0)"], [1, "rgba(41, 128, 185,1.0)"]],
        marker_opacity=0.4,
        marker_line_width=0,
        marker_line_color="white",
        name="Territoire campagne",
    )
)
fig_density_map_end_campaign_before.update_layout(height=1000)
fig_density_map_end_campaign_before.show()

In [None]:
fig_density_map_end_campaign_after = px.density_map(
    df_journeys.filter(pl.col("campaign_type") == "Période 2").sample(20000),
    lat="end_latitude",
    lon="end_longitude",
    radius=10,
    center=dict(lat=46.2, lon=6.4),
    zoom=9,
    map_style="open-street-map",
    opacity=0.7,
)
fig_density_map_end_campaign_after.add_trace(
    go.Choroplethmap(
        geojson=geojson,
        z=[1] * len(geojson["features"]),
        locations=[feature["id"] for feature in geojson["features"]],
        colorscale=[[0, "rgba(41, 128, 185,1.0)"], [1, "rgba(41, 128, 185,1.0)"]],
        marker_opacity=0.4,
        marker_line_width=0,
        marker_line_color="white",
        name="Territoire campagne",
    )
)
fig_density_map_end_campaign_after.update_layout(height=1000)
fig_density_map_end_campaign_after.show()

In [None]:
lats = []
lons = []
names = []
for row in (
    df_journeys.filter(pl.col("campaign_type") == "Période 1")
    .sample(500)
    .iter_rows(named=True)
):
    lats.extend([row["start_latitude"], row["end_latitude"]])
    lons.extend([row["start_longitude"], row["end_longitude"]])
    names.extend([row["_id"], row["_id"]])
    lats.append(
        [
            None,
        ]
    )
    lons.append(None)
    names.append(None)

fig_trips_map_campaign_after = px.line_map(
    lat=lats,
    lon=lons,
    hover_name=names,
    center=dict(lat=46.2, lon=6.4),
    zoom=8.5,
    map_style="open-street-map",
)
fig_trips_map_campaign_after.update_traces({"line_width": 0.5, "line_color": "black"})

fig_trips_map_campaign_after.add_trace(
    go.Choroplethmap(
        geojson=geojson,
        z=[1] * len(geojson["features"]),
        locations=[feature["id"] for feature in geojson["features"]],
        colorscale=[[0, "rgba(41, 128, 185,1.0)"], [1, "rgba(41, 128, 185,1.0)"]],
        marker_opacity=0.4,
        marker_line_width=0,
        marker_line_color="white",
        name="Territoire campagne",
    )
)
fig_trips_map_campaign_after.update_layout(height=1000)
fig_trips_map_campaign_after.show()

## H3


In [None]:
df_journeys_by_campaign_h3 = (
    df_journeys.with_columns(
        plh3.latlng_to_cell(
            lat="start_latitude", lng="start_longitude", resolution=7
        ).alias("h3_cell")
    )
    .group_by(["campaign_type", "h3_cell"])
    .agg(pl.col("_id").n_unique().alias("num_journeys"))
    .with_columns(
        (
            100
            * pl.col("num_journeys")
            / pl.col("num_journeys").sum().over("campaign_type")
        ).alias("share_journeys")
    )
)

In [None]:
variation_case_expr = (
    pl.when((pl.col("share_journeys") == 0) & (pl.col("share_journeys_right") == 0))
    .then(pl.lit(0))
    .when((pl.col("share_journeys") == 0) & (pl.col("share_journeys_right") != 0))
    .then(pl.lit(float("+inf")))
    .when((pl.col("share_journeys") != 0) & (pl.col("share_journeys_right") == 0))
    .then(pl.lit(float("-inf")))
)

df_journeys_variation_h3 = (
    df_journeys_by_campaign_h3.filter(pl.col("campaign_type") == "Période 1")
    .join(
        df_journeys_by_campaign_h3.filter(pl.col("campaign_type") == "Période 2"),
        on="h3_cell",
        how="full",
        validate="1:1",
    )
    .with_columns(
        pl.col("share_journeys").fill_null(0),
        pl.col("share_journeys_right").fill_null(0),
        pl.col("num_journeys").fill_null(0),
        pl.col("num_journeys_right").fill_null(0),
        pl.coalesce(pl.col("h3_cell"), pl.col("h3_cell_right")).alias("cell_joined"),
    )
    .with_columns(
        (
            pl.col("num_journeys_right").cast(pl.Float64)
            - pl.col("num_journeys").cast(pl.Float64)
        ).alias("variation_absolue"),
        (pl.col("share_journeys_right") - pl.col("share_journeys")).alias(
            "diff_variation"
        ),
        variation_case_expr.otherwise(
            100
            * (pl.col("share_journeys_right") - pl.col("share_journeys"))
            / pl.col("share_journeys")
        ).alias("taux_variation"),
        (
            100
            * (
                pl.col("share_journeys_right").fill_null(0)
                - pl.col("share_journeys").fill_null(0)
            )
            / pl.col("share_journeys")
        ).alias("taux_variation_clean"),
        plh3.cell_to_boundary("cell_joined").alias("cell_geom"),
    )
    .with_columns(
        pl.col("cell_geom").map_elements(
            lambda x: shapely.Polygon([[e[1], e[0]] for e in x]), return_dtype=pl.Object
        ),
        pl.col("cell_joined").cast(pl.String),
    )
)

valid_values = df_journeys_variation_h3.filter(
    (pl.col("share_journeys") > 0) & (pl.col("share_journeys_right") > 0)
)["diff_variation"].to_list()


# Fonction pour assigner une couleur en hex
def assign_color(taux, vmin: float, vmax: float):
    # Créer la normalisation et la colormap
    norm = mcolors.Normalize(vmin=vmin, vmax=vmax)
    cmap = cm.get_cmap("PiYG")
    if taux == float("+inf"):  # Apparition
        return mcolors.to_hex(cmap.get_over())
    elif taux == float("-inf"):  # Disparition
        return mcolors.to_hex(cmap.get_under())
    else:
        rgba = cmap(norm(taux))
        return mcolors.to_hex(rgba)


df_journeys_variation_h3 = df_journeys_variation_h3.with_columns(
    pl.col("variation_absolue")
    .map_elements(
        lambda row: assign_color(row, -2000, 2000),
        return_dtype=pl.String,
    )
    .alias("color_variation_absolue"),
    pl.col("diff_variation")
    .map_elements(
        lambda row: assign_color(row, -1, 1),
        return_dtype=pl.String,
    )
    .alias("color_diff_variation"),
    pl.col("taux_variation")
    .map_elements(
        lambda row: assign_color(
            row,
            -100,
            100,
        ),
        return_dtype=pl.String,
    )
    .alias("color_taux_variation"),
    pl.col("taux_variation_clean")
    .map_elements(
        lambda row: assign_color(
            row,
            -100,
            100,
        ),
        return_dtype=pl.String,
    )
    .alias("color_taux_variation_clean"),
)

gdf_journeys_variation_h3 = gpd.GeoDataFrame(
    df_journeys_variation_h3.to_pandas()
).set_geometry("cell_geom", crs=4356)

In [None]:
def create_map_h3(
    gdf: gpd.GeoDataFrame,
    metric_colname: str,
    color_colname: str,
    values_interval: list[float],
    colorscale_title: str,
) -> folium.Map:
    center = gdf.cell_geom.unary_union.centroid.coords[0][::-1]  # (lat, lon)
    cmap = cm.get_cmap("PiYG")
    # Convertir vers une LinearColormap de branca
    color_scale = bcm.LinearColormap(
        colors=[mcolors.to_hex(cmap(i)) for i in [0.0, 0.25, 0.5, 0.75, 1.0]],
        vmin=values_interval[0],
        vmax=values_interval[1],
    )
    color_scale.caption = colorscale_title

    # Créer la carte folium
    m = folium.Map(location=center, zoom_start=9, tiles="cartodbpositron")
    folium.GeoJson(
        geojson,
        style_function=lambda feature: {
            "weight": 2,
            "fillOpacity": 0.2,
        },
    ).add_to(m)
    # Ajouter les zones avec coloration
    folium.GeoJson(
        gdf.to_json(),
        style_function=lambda feature: {
            "fillColor": feature["properties"][color_colname],
            "color": "black",
            "weight": 1,
            "fillOpacity": 0.7,
        },
        tooltip=folium.GeoJsonTooltip(
            fields=[
                "cell_joined",
                "share_journeys",
                "share_journeys_right",
                metric_colname,
            ],
            aliases=["Zone", "Avant", "Après", "Différence"],
            localize=True,
        ),
    ).add_to(m)
    # Ajout de la colorbar à la carte
    color_scale.add_to(m)

    return m

In [None]:
m = create_map_h3(gdf_journeys_variation_h3, "diff_variation", "color_diff_variation")

m.save("map_diff_variation_start.html")
m

In [None]:
m = create_map_h3(
    gdf_journeys_variation_h3,
    "variation_absolue",
    "color_variation_absolue",
    [-2000, 2000],
    "Différence de nombre de trajets",
)

m.save("map_variation_absolu_start.html")
m

In [None]:
m = create_map_h3(
    gdf_journeys_variation_h3, "taux_variation_clean", "color_taux_variation_clean"
)

m.save("map_taux_vairation_start.html")
m

# Questions Marianne


## Êtes-vous certains que l’opérateur Blablacar Daily, entre mars et juin a subventionné en propre des trajets ?

Si OUI, il est essentiel que nous sachions lesquels , dans quelle mesure et si des commissions aux trajets nous ont été facturés pour ces trajets qui n’auraient pas dû l’être ?


In [None]:
df_journeys.columns

In [None]:
df_journeys_multi_incentives_by_month = df_journeys.group_by(
    pl.col("start_datetime").dt.truncate("1mo")
).agg(
    pl.col("_id").n_unique().alias("journey_count"),
    pl.col("_id")
    .filter((pl.col("amount_bbc") > 0) & (pl.col("amount_aom") > 0))
    .n_unique()
    .alias("journey_count_with_aom_and_bbc_incentive"),
    pl.col("_id")
    .filter(pl.col("amount_bbc").is_null() & (pl.col("amount_aom") > 0))
    .n_unique()
    .alias("journey_count_with_aom_incentive_only"),
    pl.col("_id")
    .filter((pl.col("amount_bbc") > 0) & pl.col("amount_aom").is_null())
    .n_unique()
    .alias("journey_count_with_bbc_incentive_only"),
    pl.col("_id")
    .filter((pl.col("amount_bbc").is_null()) & pl.col("amount_aom").is_null())
    .n_unique()
    .alias("journey_count_without_incentive"),
)
df_journeys_multi_incentives_by_month.unpivot(
    pl.selectors.numeric(), index="start_datetime"
)

In [None]:
px.line(
    df_journeys_multi_incentives_by_month.unpivot(
        pl.selectors.numeric(), index="start_datetime"
    ).sort("start_datetime"),
    x="start_datetime",
    y="value",
    color="variable",
    template="simple_white",
)

-> Bug, Blablacar a commencé à envoyer des amount=0 sans que je n'ai d'explication


| month                         | num_journeys_at_0 |
| ----------------------------- | ----------------- |
| 2025-07-01 00:00:00.000 +0200 | 0                 |
| 2025-06-01 00:00:00.000 +0200 | 17                |
| 2025-05-01 00:00:00.000 +0200 | 12 213            |
| 2025-04-01 00:00:00.000 +0200 | 6 373             |
| 2025-03-01 00:00:00.000 +0100 | 699               |
| 2025-02-01 00:00:00.000 +0100 | 0                 |
| 2025-01-01 00:00:00.000 +0100 | 0                 |
| 2024-12-01 00:00:00.000 +0100 | 0                 |
| 2024-11-01 00:00:00.000 +0100 | 0                 |
| 2024-10-01 00:00:00.000 +0200 | 0                 |
| 2024-09-01 00:00:00.000 +0200 | 0                 |


## Montants moyens au km : comment expliquez-vous le pic en hausse du coût au km début juin ? avec un passager qui ne contribue plus… ?


In [None]:
df_journeys.filter(
    pl.col("start_datetime").dt.truncate("1w")
    == datetime(2025, 6, 16, tzinfo=ZoneInfo("GMT"))
)

In [None]:
df_journeys.filter(
    pl.col("start_datetime").dt.truncate("1w")
    == datetime(2025, 6, 16, tzinfo=ZoneInfo("GMT"))
).select(
    pl.col("incentive_amount").mean() / 100,
    pl.col("passenger_contribution").mean() / 100,
    pl.col("driver_revenue").mean() / 100,
    pl.col("distance").mean() / 1000,
    pl.col("is_fully_inside_campaign_area").sum() / pl.len(),
)

In [None]:
lats = []
lons = []
names = []
for row in df_journeys.filter(
    pl.col("start_datetime").dt.truncate("1w")
    == datetime(2025, 6, 16, tzinfo=ZoneInfo("GMT"))
).iter_rows(named=True):
    lats.extend([row["start_latitude"], row["end_latitude"]])
    lons.extend([row["start_longitude"], row["end_longitude"]])
    names.extend([row["_id"], row["_id"]])
    lats.append(
        [
            None,
        ]
    )
    lons.append(None)
    names.append(None)

fig_trips_map_campaign_after = px.line_map(
    lat=lats,
    lon=lons,
    hover_name=names,
    center=dict(lat=46.2, lon=6.4),
    zoom=8.5,
    map_style="open-street-map",
)
fig_trips_map_campaign_after.update_traces({"line_width": 0.5, "line_color": "black"})

fig_trips_map_campaign_after.add_trace(
    go.Choroplethmap(
        geojson=geojson,
        z=[1] * len(geojson["features"]),
        locations=[feature["id"] for feature in geojson["features"]],
        colorscale=[[0, "rgba(41, 128, 185,1.0)"], [1, "rgba(41, 128, 185,1.0)"]],
        marker_opacity=0.4,
        marker_line_width=0,
        marker_line_color="white",
        name="Territoire campagne",
    )
)
fig_trips_map_campaign_after.update_layout(height=1000)
fig_trips_map_campaign_after.show()

In [None]:
df_journeys.filter(
    pl.col("start_datetime").dt.truncate("1w")
    == datetime(2025, 6, 9, tzinfo=ZoneInfo("GMT"))
).explode("incentive_sirets").group_by("incentive_sirets").agg(
    pl.len(),
    pl.col("incentive_amount").mean() / 100,
    pl.col("passenger_contribution").mean() / 100,
    pl.col("driver_revenue").mean() / 100,
    pl.col("distance").mean() / 1000,
    pl.col("is_fully_inside_campaign_area").sum() / pl.len(),
).with_columns((pl.col("len") / pl.len().sum()).alias("share")).sort(
    "len", descending=True
)

In [None]:
df_journeys.filter(
    pl.col("start_datetime").dt.truncate("1w")
    == datetime(2025, 6, 16, tzinfo=ZoneInfo("GMT"))
).explode("incentive_sirets").group_by("incentive_sirets").agg(
    pl.len(),
    pl.col("incentive_amount").mean() / 100,
    pl.col("passenger_contribution").mean() / 100,
    pl.col("driver_revenue").mean() / 100,
    pl.col("distance").mean() / 1000,
    pl.col("is_fully_inside_campaign_area").sum() / pl.len(),
).with_columns((pl.col("len") / pl.col("len").sum()).alias("share")).sort(
    "len", descending=True
)

In [None]:
df_journeys.filter(
    pl.col("start_datetime").dt.truncate("1w")
    == datetime(2025, 6, 16, tzinfo=ZoneInfo("GMT"))
).explode("incentive_sirets").filter(
    pl.col("incentive_sirets") == "80820346700051"
).head(1)["operator_journey_id"].item()

80820346700051 = Ecov / France Covoit
20007196700018 = "COMMUNAUTE DE COMMUNES PAYS D'EVIAN VALLEE D'ABONDANCE (CCPEVA)"


In [None]:
pxdf_journeys.filter(
    pl.col("start_datetime").dt.truncate("1w")
    >= datetime(2025, 4, 1, tzinfo=ZoneInfo("GMT"))
).group_by([pl.col("start_datetime").dt.truncate("1mo"), "operator_id"]).agg(
    pl.len(),
    pl.col("incentive_amount").mean() / 100,
    pl.col("passenger_contribution").mean() / 100,
    pl.col("driver_revenue").mean() / 100,
    pl.col("distance").mean() / 1000,
    pl.col("is_fully_inside_campaign_area").sum() / pl.len(),
).join(df_operators, left_on="operator_id", right_on="_id", validate="m:1").sort(
    "start_datetime", descending=True
)