# Automated Daily Reports to Telegram Chat

**Author:**  

Pavel Grigoryev

**Project Description:**

- As our application grows, stakeholders need consistent, automated insights into product performance. 
- Currently, manual reporting processes are time-consuming and lack standardization.
- This project replaces manual reporting with an automated pipeline that generates  insights and sends them straight to team chat, ensuring consistent access to key metrics.

**Project Goal:**

- To build a complete automated pipeline that generates daily product reports and delivers them to a Telegram chat, providing stakeholders with regular business insights without manual intervention.
   
**Data Sources:**

- `feed_actions` - News feed activity
- `message_actions` - Messaging activity 

**Main Conclusion**

- **Data Pipeline Development:** 
  - Built SQL queries to extract and calculate key product metrics from ClickHouse database
- **Automation Framework:** 
  - Created Airflow DAG for scheduled daily execution with comprehensive error handling
- **Chat Delivery System:** 
  - Integrated Telegram API for automated report distribution directly to team chat
- **Business Reporting:** 
  - Designed comprehensive visualizations covering both application-wide and feature-specific metrics
- **Stakeholder Focus:** 
  - Developed business-ready reports that answer key product performance questions

# Data Description

In the product database on ClickHouse, the data is stored in the following tables

Table feed_actions

Field | Description
-|-
user_id | User ID
post_id | Post ID
action | Action: view or like
time | Timestamp
gender | User's gender
age | User's age (1 = Male)
os | User's OS
source | Traffic source
country | User's country
city | User's city
exp_group | A/B test group

Table message_actions

Field | Description
-|-
user_id | Sender's ID
receiver_id | Receiver's ID
time | Send timestamp
gender | Sender's gender
age | Sender's age (1 = Male)
os | Sender's OS
source | Sender's traffic source
country | Sender's country
city | Sender's city
exp_group | Sender's A/B test group

# SQL Queries Development

SQL queries for extracting and calculating daily product metrics.

In [None]:
# app_daily_report_queries.py

"""
SQL queries for application report
"""

QUERY_DAU = """
    WITH union_data as (
        SELECT 
            toDate(time) as date
            , user_id 
            , max(service = 'feed') as has_feed
            , max(service = 'messenger') as has_messenger
        FROM (
            SELECT
                time
                , user_id
                , 'feed' as service
            FROM 
                feed_actions
            UNION ALL
            SELECT
                time
                , user_id
                , 'messenger' as service
            FROM 
                message_actions 
        )
        WHERE 
            toDate(time) BETWEEN yesterday() - 13 AND yesterday()  
        GROUP BY 
            date
            , user_id 
    )
    SELECT 
        date
        , uniq(user_id) as total_dau
        , uniqIf(user_id, has_feed = 1 and has_messenger = 0) as feed_only_dau
        , uniqIf(user_id, has_feed = 0 and has_messenger = 1) as messenger_only_dau
        , uniqIf(user_id, has_feed = 1 and has_messenger = 1) as both_services_dau
        , if(date >= yesterday() - 6, 'current', 'previous') as week_status
    FROM 
        union_data
    GROUP BY 
        date
    ORDER BY 
        date    
"""        

QUERY_NEW_USERS = """
    WITH first_actions AS (
        SELECT 
            user_id
            , toDate(min(time)) as first_date
            , toDate(minIf(time, service = 'feed')) as first_feed_date
            , toDate(minIf(time, service = 'messenger')) as first_messenger_date
        FROM (
            SELECT time, user_id, 'feed' as service
            FROM feed_actions
            UNION ALL
            SELECT time, user_id, 'messenger' as service  
            FROM message_actions
        )
        GROUP BY user_id
    )
    , first_all as (
        SELECT 
        first_date as date
        , uniq(user_id) as all_new_users
        FROM 
        first_actions
        WHERE 
        date BETWEEN yesterday() - 13 AND yesterday()
        GROUP BY 
        date
    )
    , first_feed as (
        SELECT 
        first_feed_date as date
        , uniq(user_id) as feed_new_users
        FROM 
        first_actions
        WHERE 
        date BETWEEN yesterday() - 13 AND yesterday()      
        GROUP BY 
        date
    )
    , first_messenger as (
        SELECT 
        first_messenger_date as date
        , uniq(user_id) as messenger_new_users
        FROM 
        first_actions
        WHERE 
        date BETWEEN yesterday() - 13 AND yesterday()      
        GROUP BY 
        date
    )
    , both_activity as (
        SELECT 
        toDate(time) as date
        , user_id
        FROM (
            SELECT time, user_id, 'feed' as service
            FROM feed_actions
            UNION ALL
            SELECT time, user_id, 'messenger' as service  
            FROM message_actions
        )
        GROUP BY 
        date
        , user_id
        HAVING 
        uniq(service) = 2
    )
    , first_both as (
        SELECT 
        date
        , uniq(user_id) as both_first_users
        FROM 
        both_activity
        WHERE 
        date BETWEEN yesterday() - 13 AND yesterday()      
        GROUP BY 
        date
    )
    SELECT 
        a.date as date
        , if(date >= yesterday() - 6, 'current', 'previous') as week_status
        , ifNull(a.all_new_users, 0) as all_new_users
        , ifNull(f.feed_new_users, 0) as feed_new_users
        , ifNull(m.messenger_new_users, 0) as messenger_new_users
        , ifNull(b.both_first_users, 0) as both_first_users
    FROM 
        first_all a
    LEFT JOIN first_feed f ON a.date = f.date
    LEFT JOIN first_messenger m ON a.date = m.date
    LEFT JOIN first_both b ON a.date = b.date
"""

QUERY_ACTIVITY = """
    WITH feed_stats as (
        SELECT 
            toDate(time) as date
            , countIf(action = 'view') as views
            , countIf(action = 'like') as likes
            , ifNull(likes / nullIf(views, 0), 0) as ctr
        FROM 
            feed_actions
        WHERE 
            date BETWEEN yesterday() - 13 AND yesterday()
        GROUP BY 
            date
    )
    , messenger_stats as (
        SELECT 
            toDate(time) as date
            , count() as messages
        FROM 
            message_actions 
        WHERE 
            date BETWEEN yesterday() - 13 AND yesterday()
        GROUP BY 
            date
    )    
    SELECT 
        if(f.date != toDate(0), f.date, m.date) as date
        , if(date >= yesterday() - 6, 'current', 'previous') as week_status
        , ifNull(f.views, 0) as views
        , ifNull(f.likes, 0) as likes
        , ifNull(f.ctr, 0) as ctr
        , ifNull(m.messages, 0) as messages
    FROM 
        feed_stats f
        FULL JOIN messenger_stats m ON f.date = m.date
    ORDER BY 
        date
"""    

QUERY_RETENTION = """
    WITH cohorts as (
        SELECT 
            user_id
            , toDate(min(time)) as cohort
        FROM (
            SELECT
                time
                , user_id
            FROM 
                feed_actions
            UNION ALL 
            SELECT
                time
                , user_id
            FROM 
                message_actions 
        )
        GROUP BY 
            user_id 
        HAVING  
            cohort BETWEEN yesterday() - 14 AND yesterday() - 7
    )
    , activity as (
        SELECT 
            toDate(time) as date
            , user_id
        FROM (
            SELECT
                time
                , user_id
            FROM 
                feed_actions
            UNION ALL  
            SELECT
                time
                , user_id
            FROM 
                message_actions 
            )
        WHERE   
            date BETWEEN yesterday() - 14 AND yesterday()
        GROUP BY 
            date
            , user_id
    )
    SELECT 
        c.cohort
        , a.date - c.cohort as lifetime
        , uniq(user_id) as users
    FROM 
        cohorts c
        JOIN activity a ON c.user_id = a.user_id
    WHERE 
        lifetime <= 7
    GROUP BY 
        c.cohort
        , lifetime
    ORDER BY 
        c.cohort  
"""

QUERY_ROLL_RETENTION_7D = """
    WITH users as (
        SELECT 
            user_id
            , max(toDate(time) = yesterday()) as returned_yesterday
            , max(toDate(time) != yesterday()) as returned_prev_week
        FROM (
            SELECT
                time
                , user_id
            FROM 
                feed_actions
            UNION ALL 
            SELECT
                time
                , user_id
            FROM 
                message_actions 
        )
        WHERE 
            toDate(time) BETWEEN yesterday() - 7 AND yesterday()
        GROUP BY 
            user_id
    )
    SELECT 
        sum(returned_yesterday) as yesterday_users
        , sum(returned_yesterday AND returned_prev_week) as all_week_users
        , all_week_users / yesterday_users as retention_7d
    FROM 
        users
"""

QUERY_FEED_DETAILED = """
    WITH users_stats as (
        SELECT 
            toDate(time) as date
            , user_id
            , uniq(post_id) as posts
            , countIf(action = 'view') as views
            , countIf(action = 'like') as likes
            , ifNull(likes / nullIf(views, 0), 0) as ctr
        FROM 
            feed_actions
        WHERE 
            toDate(time) BETWEEN yesterday() - 13 AND yesterday()
        GROUP BY 
            date
            , user_id
    )
    SELECT 
        date
        , if(date >= yesterday() - 6, 'current', 'previous') as week_status
        , AVG(posts) as posts_per_user
        , AVG(views) as views_per_user
        , AVG(likes) as likes_per_user
        , AVG(ctr) as ctr_per_user
    FROM 
        users_stats
    GROUP BY 
        date
"""

QUERY_MESSENGER_DETAILED = """
    SELECT 
        toDate(time) as date
        , if(date >= yesterday() - 6, 'current', 'previous') as week_status
        , uniq(user_id) as sender_dau
        , uniq(receiver_id) as receiver_dau
        , ifNull(sender_dau / nullIf(receiver_dau, 0), 0) as sender_to_receiver_ratio
        , ifNull(count() / nullIf(sender_dau, 0), 0) as messages_per_sender
    FROM 
        message_actions
    WHERE 
        toDate(time) BETWEEN yesterday() - 13 AND yesterday()
    GROUP BY 
        date
    ORDER BY 
        date        
"""

QUERY_USERS_DAILY_BY_SOURCE = """
    WITH union_data as (
        SELECT 
            toDate(time) as date
            , user_id 
            , max(service = 'feed') as has_feed
            , max(service = 'messenger') as has_messenger
            , any(source) as source
        FROM (
            SELECT
                time
                , user_id
                , 'feed' as service
                , source
            FROM 
                feed_actions
            UNION ALL
            SELECT
                time
                , user_id
                , 'messenger' as service
                , source
            FROM 
                message_actions 
        )
        WHERE 
            toDate(time) BETWEEN yesterday() - 6 AND yesterday()  
        GROUP BY 
            date
            , user_id 
    )
    SELECT 
        date
        , source
        , uniq(user_id) as total_dau
        , uniqIf(user_id, has_feed = 1 and has_messenger = 0) as feed_only_dau
        , uniqIf(user_id, has_feed = 0 and has_messenger = 1) as messenger_only_dau
        , uniqIf(user_id, has_feed = 1 and has_messenger = 1) as both_services_dau
    FROM 
        union_data
    GROUP BY 
        date
        , source
    ORDER BY 
        date   
"""        

# Helper Functions

Create a function for number formatting.

In [None]:
def format_number(value, variable_type=None):
    """Formats numbers for beautiful display"""
    if pd.isna(value):
        return ""
    
    if variable_type and 'ctr' in variable_type or variable_type == 'ratio':
        return f"{value:.1%}"
    elif variable_type == 'rate':
        return f"{value:.2f}"
    elif isinstance(value, (int, float)):
        if abs(value) >= 1e6:
            return f"{value/1e6:.1f}M"
        elif abs(value) >= 1e3:
            return f"{value/1e3:.0f}K"
        else:
            return f"{value:.0f}"
    else:
        return str(value)        

Create a function to fill missing dates.

In [None]:

def complete_missing_dates(df, date_column='date', days_back=13):
    """
    Complete missing dates in dataframe with zeros and current week status.
    """
    # Create full date range
    end_date = datetime.now().date() - pd.Timedelta(days=1)
    start_date = end_date - pd.Timedelta(days=days_back)
    full_range = pd.date_range(start=start_date, end=end_date, name=date_column)
    
    # Reindex to complete missing dates
    df_complete = (
        df
        .sort_values(date_column)
        .set_index(date_column)
        .reindex(full_range)
        .reset_index()
    )
    
    # Fill missing values
    df_complete['week_status'] = df_complete['week_status'].fillna('current')
    df_complete = df_complete.fillna(0)
    
    return df_complete

Create a function to calculate WoW (Week-over-Week) change.

In [None]:
def calc_wow(df: pd.DataFrame, date_col: str='date') -> pd.DataFrame:
    """Add wow to DataFrame"""
    df = df.sort_values(date_col)
    df_wow = (
        df.iloc[[6, 13]]
        .drop(date_col, axis=1)
        .set_index('week_status')
    )
    df_wow.loc['wow'] = (df_wow.loc['current'].astype('float') - df_wow.loc['previous'].astype('float')) / df_wow.loc['previous'].astype('float')
    return df_wow    

Create a function to format metrics for the report.

In [None]:
def format_metrics_report_wow(df: pd.DataFrame, metrics: dict, header: str=None):
    """
    Uses symbols for alignment in Telegram
    """
    report_lines = []
    if header:
        report_lines.append(header + '\n')
    for metric in metrics.keys():
        current_value = df.loc['current', metric]
        current_value = format_number(value=current_value, variable_type=metric)
        wow_change = df.loc['wow', metric]
        circle = "🟢" if wow_change > 0 else "🔴" if wow_change < 0 else "⚪"
        line = f"{metrics[metric][1]} {metrics[metric][0]}: {current_value} {circle} {wow_change:+.1%} WoW"
        report_lines.append(line)
    
    return "\n".join(report_lines)       

Create a function to prepare the dataframe.

In [None]:
def prepare_comparison_df(
    df: pd.DataFrame
    , date_col: str
    , id_var: str
    , has_previous: bool
    , value_in_id_var_for_text: str=None
    , text_format: str=None
    ) -> pd.DataFrame:
    """Prepares data for comparing the current and previous week"""
    if id_var not in df.columns:
        raise ValueError(f"DataFrame must contain {id_var} column")    
    if has_previous:
        df.loc[df[id_var] == 'previous', date_col] += pd.Timedelta(days=7)
    df = (
        df.melt(
            id_vars=[date_col, id_var]
        ) 
    )
    if value_in_id_var_for_text:
        mask = df[id_var]==value_in_id_var_for_text
        mask &= df.groupby('variable')[date_col].transform(
            lambda x: x.isin([x.min(), x.max()])
        )
        df['text'] = df['value'].where(mask)
        if text_format:
            df['text'] = df['text'].apply(lambda x: f"{x:{text_format}}")
        else:
            df['text'] = [format_number(v, var) for v, var in zip(df['text'], df['variable'])]
    return df    

Create a function to render charts.

In [None]:

def create_comparison_dashboard(
    df: pd.DataFrame
    , date_col: str
    , color: str
    , metrics: list
    , metric_titles: list
    , title: str
    , make_gray: bool
    , text: str=None
    , category_orders: dict=None
    , tickformats: list=None
    , trace_name_for_text: str=None
    , labels: dict=None
    , trace_names_map: dict=None
    ) -> go.Figure:
    """Creates a 2x2 dashboard for comparing metrics"""
    fig = px.line(
        df
        , x=date_col
        , y='value'
        , color=color
        , text=text
        , facet_col='variable'
        , facet_col_wrap=2
        , facet_col_spacing=0.08
        , facet_row_spacing=0.15
        , category_orders=category_orders
        , labels=labels
        , line_shape='spline'
        , title=title
        , width=1000
        , height=600
    )
    # Set up lines
    added_legends = []
    for trace in fig.data:
        # Update trace
        trace.showlegend = False
        trace.mode = 'lines+markers'
        if make_gray:
            if trace.name == trace_name_for_text:
                trace.update(
                    mode='lines+markers+text',
                    textposition='top center',
                    line=dict(color='#777777'),
                    marker=dict(color='#777777'),
                )
            else:  
                trace.update(
                    line=dict(color='#C1C1C1', dash='dash'), 
                    marker=dict(color='#C1C1C1'),       
                )
        if trace.name not in added_legends:
            added_legends.append(trace.name)
            legend_name = trace_names_map[trace.name] if trace_names_map else trace.name
            fig.add_trace(go.Scatter(
                x=[None], y=[None],
                mode=trace.mode.replace('+text', ''),
                name=legend_name,  
                legendgroup=trace.legendgroup,
                showlegend=True,
                line=dict(color=trace.line.color)
            ))                
    fig.update_layout(legend_title=None, legend_itemsizing='constant')
    fig.update_xaxes(matches=None, tickformat='%b %d', dtick='1D', showticklabels=True)
    fig.update_yaxes(matches=None, showticklabels=True, title=None)
    # Use formats of values ​​if indicated
    if tickformats:
        for tickformat, row, col in tickformats:
            fig.update_yaxes(tickformat=tickformat, row=row, col=col)
    # Update the headlines of the metrics
    # Correct order for 2x2 nets
    metric_titles = [metric_titles[2], metric_titles[3], metric_titles[0], metric_titles[1]]
    for i, annotation in enumerate(fig.layout.annotations):
        annotation.text = metric_titles[i]
    # Adjust Y-axis ranges to accommodate text labels
    # Add 10% padding based on data range for each subplot
    if trace_name_for_text:
        yaxis_map = {metrics[0]: 'yaxis3', metrics[1]: 'yaxis4', metrics[2]: 'yaxis', metrics[3]: 'yaxis2'}
        for variable in df['variable'].unique():
            subset = df[df['variable'] == variable]
            min_val = subset['value'].min()
            max_val = subset['value'].max()
            delta = (max_val - min_val) * 0.1  
            yaxis_name = yaxis_map[variable]
            
            if yaxis_name in fig.layout:
                fig.layout[yaxis_name]['range'] = (min_val - delta, max_val + delta)
    return fig

Create a function to render retention charts.

In [None]:

def create_retention_dashboard(df_retention):
    """Create Dashboard with Retention Heatmap and Line Chart"""
    
    # Data preparation for Line Chart
    df_retention_7d = (
        df_retention
        .reset_index()
        .rename_axis(columns=None)    
        [['cohort', 7]]
        .copy()
        .rename(columns={7: 'retention_7_day'})
    )
    df_retention_7d['text'] = np.nan
    df_retention_7d.loc[[0, df_retention_7d.index[-1]], 'text'] = (
        df_retention_7d['retention_7_day']
        .apply(lambda x: f"{x:.1%}")
    )
    
    fig = make_subplots(
        rows=1, cols=2,
        subplot_titles=('Retention First Week Performance', '7-Day Retention by Cohort'),
        horizontal_spacing=0.15
    )
    
    heatmap_fig = go.Heatmap(
        z=df_retention.values,
        x=df_retention.columns,
        y=df_retention.index.strftime('%b %d'),
        colorscale='Greens',
        colorbar=dict(
            tickformat='.0%',
            x=0.43,  
            y=0.5,  
            xanchor='left',  
            yanchor='middle', 
            thickness=15,  
        ),
    )
    fig.add_trace(heatmap_fig, row=1, col=1)
    
    # Right figure - Line Chart
    line_fig = go.Scatter(
        x=df_retention_7d['cohort'],
        y=df_retention_7d['retention_7_day'],
        mode='lines+markers+text',
        text=df_retention_7d['text'],
        textposition='top center',
        line=dict(color='#777777', shape='spline'),
        marker=dict(color='#777777')
    )
    fig.add_trace(line_fig, row=1, col=2)
    
    # Update Layout
    fig.update_layout(
        height=400,
        width=1100,
        showlegend=False,
        title_text="Retention Analysis Dashboard",
        margin=dict(l=90, r=20)
    )
    
    # Setting axes for Heatmap (left figure)
    fig.update_xaxes(
        title_text='Lifetime',
        title_standoff=7,
        type='category', 
        showgrid=False,
        row=1, col=1
    )
    fig.update_yaxes(
        title_text='Cohort',
        title_standoff=10,
        type='category',  
        showgrid=False,
        autorange='reversed',
        row=1, col=1
    )
    
    # Setting axes for Line Chart (right figure)
    fig.update_xaxes(
        title_text='Cohort',
        tickformat='%b %d',
        title_standoff=7,
        row=1, col=2
    )
    fig.update_yaxes(
        title_text='7-Day Retention',
        title_standoff=10,
        tickformat='.0%',
        row=1, col=2
    )
    vmax = df_retention.max().max()
    vmin = df_retention.min().min()
    center_color_bar = vmin + (vmax - vmin) * 0.7 if (vmax - vmin) > 0 else vmin
    
    for row in range(len(df_retention)):
        for col in range(len(df_retention.columns)):
            fig.add_annotation(
                text=f"{df_retention.values[row, col]:.0%}",
                x=col,
                y=row,
                xref="x",
                yref="y",
                showarrow=False,
                font=dict(
                    color="white" if df_retention.values[row, col] >= center_color_bar else "rgba(0, 0, 0, 0.7)",
                    size=10
                ),
                xanchor='center',
                yanchor='middle',
                row=1,  
                col=1  
            )
    return fig

# Airflow DAG Implementation  

Complete DAG code for automated daily report generation.

In [None]:
# dag_app_daily_report.py

from datetime import datetime, timedelta
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from telegram import InputFile
from airflow.decorators import dag, task
from airflow.operators.python import get_current_context
from textwrap import dedent
from dotenv import load_dotenv
import sys
import os
current_dir = os.path.dirname(os.path.abspath(__file__))
sys.path.insert(0, current_dir)
from utils_for_dags import (
    ChConnector
    , TelegramBot
    , calc_wow
    , format_number
    , format_metrics_report_wow
    , prepare_comparison_df
    , create_comparison_dashboard
    , create_retention_dashboard
    , complete_missing_dates    
)
from app_daily_report_queries import (
    QUERY_DAU
    , QUERY_NEW_USERS
    , QUERY_ACTIVITY
    , QUERY_RETENTION
    , QUERY_ROLL_RETENTION_7D
    , QUERY_FEED_DETAILED
    , QUERY_MESSENGER_DETAILED
    , QUERY_USERS_DAILY_BY_SOURCE
)

db = ChConnector()
bot = TelegramBot()

default_args = {
    'owner': 'Pavel Grigoryev',
    'depends_on_past': False,
    'retries': 2,
    'retry_delay': timedelta(minutes=5),
    'start_date': datetime(2025, 9, 25),
}

dag_config = {
    'default_args': default_args,
    'description': 'DAG for sending a daily report in Telegram',
    'schedule_interval': '0 11 * * *',
    'catchup': False,
    'tags': ['daily_report'],
    'max_active_runs': 1,
}

def handle_failure(context):
    """
    Callback function for processing unsuccessful tasks
    """
    exception = context.get('exception')
    task_instance = context['task_instance']

    print(f"Task {task_instance.task_id} It ended with an error:")
    print(f"Error: {exception}")
    print(f"Date of execution: {context['ds']}")
    print(f"Attempt №: {context['ti'].try_number}")

@dag(**dag_config)
def app_report():
    """
    DAG every day extracts data from the database, calculates metric, builds graphs, creates report and sends it to Telegram
    """
    # ==========================================================================
    # Extract
    # ==========================================================================
    @task(
        retries=3,
        retry_delay=timedelta(minutes=5),
        on_failure_callback=handle_failure
    )
    def extract_dau() -> pd.DataFrame:
        """
        Extracts data from the database
        """
        df = db.get_df(query=QUERY_DAU)
        return complete_missing_dates(df)

    @task(
        retries=3,
        retry_delay=timedelta(minutes=5),
        on_failure_callback=handle_failure
    )
    def extract_new_users() -> pd.DataFrame:
        """
        Extracts data from the database
        """
        df = db.get_df(query=QUERY_NEW_USERS)
        return complete_missing_dates(df)

    @task(
        retries=3,
        retry_delay=timedelta(minutes=5),
        on_failure_callback=handle_failure
    )
    def extract_activity() -> pd.DataFrame:
        """
        Extracts data from the database
        """
        df = db.get_df(query=QUERY_ACTIVITY)
        return complete_missing_dates(df)

    @task(
        retries=3,
        retry_delay=timedelta(minutes=5),
        on_failure_callback=handle_failure
    )
    def extract_retention() -> pd.DataFrame:
        """
        Extracts data from the database
        """
        return db.get_df(query=QUERY_RETENTION)

    @task(
        retries=3,
        retry_delay=timedelta(minutes=5),
        on_failure_callback=handle_failure
    )
    def extract_roll_retention_7d() -> pd.DataFrame:
        """
        Extracts data from the database
        """
        return db.get_df(query=QUERY_ROLL_RETENTION_7D)

    @task(
        retries=3,
        retry_delay=timedelta(minutes=5),
        on_failure_callback=handle_failure
    )
    def extract_feed_detailed() -> pd.DataFrame:
        """
        Extracts data from the database
        """
        df = db.get_df(query=QUERY_FEED_DETAILED)
        return complete_missing_dates(df)

    @task(
        retries=3,
        retry_delay=timedelta(minutes=5),
        on_failure_callback=handle_failure
    )
    def extract_messenger_detailed() -> pd.DataFrame:
        """
        Extracts data from the database
        """
        df = db.get_df(query=QUERY_MESSENGER_DETAILED)
        return complete_missing_dates(df)

    @task(
        retries=3,
        retry_delay=timedelta(minutes=5),
        on_failure_callback=handle_failure
    )
    def extract_users_daily_by_source() -> pd.DataFrame:
        """
        Extracts data from the database
        """
        return db.get_df(query=QUERY_USERS_DAILY_BY_SOURCE)

    # ==========================================================================
    # Transform
    # ==========================================================================

    # DAU
    @task(
        retries=3,
        retry_delay=timedelta(minutes=5),
        on_failure_callback=handle_failure,
        multiple_outputs=True
    )
    def transform_dau(df: pd.DataFrame) -> dict:
        """
        Calculates the growth of WOW
        """
        context = get_current_context()
        yesterday = context['execution_date']  # execution_date is period start for schedule_interval = '0 8 * * *'
        date_str = yesterday.strftime('%d %b %Y')          
        df_wow = calc_wow(df)
        df_comparison = prepare_comparison_df(
            df=df
            , date_col='date'
            , id_var='week_status'
            , value_in_id_var_for_text='current'
            , has_previous=True
        )
        metrics = {
            'total_dau': ['Total', '• 👥'],
            'feed_only_dau': ['Feed Only', '• 📰'],
            'messenger_only_dau': ['Messenger Only', '• 💬'],
            'both_services_dau': ['Both Services', '• 🔄']
        }
        msg = format_metrics_report_wow(df_wow, metrics, f'📊 App Report 📅 {date_str}\n\n👤 Daily Active Users')

        metrics = ['total_dau', 'feed_only_dau', 'messenger_only_dau', 'both_services_dau']
        metric_titles = ['Total DAU', 'Feed Only DAU', 'Messenger Only DAU', 'Both Services DAU']
        category_orders={'variable': metrics, 'week_status': ['current', 'previous']}

        fig = create_comparison_dashboard(
            df=df_comparison
            , date_col='date'
            , color='week_status'
            , text='text'
            , metrics=metrics
            , metric_titles=metric_titles
            , trace_name_for_text='current'
            , title='Daily Active Users'
            , labels={'date': 'Date'}
            , category_orders=category_orders
            , trace_names_map={'current': 'Current Week', 'previous': 'Previous Week'}
            , make_gray=True
        )
        return {"msg": msg, "fig": fig}

    # New Users
    @task(
        retries=3,
        retry_delay=timedelta(minutes=5),
        on_failure_callback=handle_failure,
        multiple_outputs=True
    )
    def transform_new_users(df: pd.DataFrame) -> dict:
        """
        Calculates the growth of WOW
        """       
        df_wow = calc_wow(df)
        df_comparison = prepare_comparison_df(
            df=df
            , date_col='date'
            , id_var='week_status'
            , value_in_id_var_for_text='current'
            , has_previous=True
        )
        metrics = {
            'all_new_users': ['All New Users', '• 👥'],
            'feed_new_users': ['Feed New Users', '• 📰'],
            'messenger_new_users': ['Msg New Users', '• 💬'],
            'both_first_users': ['Used Both First Time', '• 🔄']
        }
        msg = format_metrics_report_wow(df_wow, metrics, '🆕 New Users & First Usage')

        metrics = ['all_new_users', 'feed_new_users', 'messenger_new_users', 'both_first_users']
        metric_titles = ['All New Users', 'Feed New Users', 'Msg New Users', 'Used Both First Time']
        category_orders={'variable': metrics, 'week_status': ['current', 'previous']}

        fig = create_comparison_dashboard(
            df=df_comparison
            , date_col='date'
            , color='week_status'
            , text='text'
            , metrics=metrics
            , metric_titles=metric_titles
            , trace_name_for_text='current'
            , title='New Users Activity'
            , labels={'date': 'Date'}
            , category_orders=category_orders
            , trace_names_map={'current': 'Current Week', 'previous': 'Previous Week'}
            , make_gray=True
        )
        return {"msg": msg, "fig": fig}

    # Activity
    @task(
        retries=3,
        retry_delay=timedelta(minutes=5),
        on_failure_callback=handle_failure,
        multiple_outputs=True
    )
    def transform_activity(df: pd.DataFrame) -> dict:
        """
        Calculates the growth of WOW
        """
        df_wow = calc_wow(df)
        df_comparison = prepare_comparison_df(
            df=df
            , date_col='date'
            , id_var='week_status'
            , value_in_id_var_for_text='current'
            , has_previous=True
        )

        metrics = {
            'views': ['Views', '• 👀'],
            'likes': ['Likes', '• ❤️'],
            'ctr': ['CTR', '• 🎯'],
            'messages': ['Messages', '• ✉️']
        }
        msg = format_metrics_report_wow(df_wow, metrics, '🔥 Activity Metrics')

        metrics = ['views', 'likes', 'ctr', 'messages']
        metric_titles = ['Views', 'Likes', 'CTR', 'Messages']
        category_orders={'variable': metrics, 'week_status': ['current', 'previous']}

        fig = create_comparison_dashboard(
            df=df_comparison
            , date_col='date'
            , color='week_status'
            , text='text'
            , metrics=metrics
            , metric_titles=metric_titles
            , trace_name_for_text='current'
            , title='Activity Metrics'
            , labels={'date': 'Date'}
            , category_orders=category_orders
            , trace_names_map={'current': 'Current Week', 'previous': 'Previous Week'}
            , make_gray=True
        )
        return {"msg": msg, "fig": fig}

    # Retention
    @task(
        retries=3,
        retry_delay=timedelta(minutes=5),
        on_failure_callback=handle_failure,
        multiple_outputs=True
    )
    def transform_retention(df: pd.DataFrame, df_roll: pd.DataFrame) -> dict:
        """
        Creates a DataFrame for comparison
        """
        df_cohort = df.pivot_table(index='cohort', columns='lifetime', values='users', fill_value=0)
        df_retention = (
            df_cohort.div(df_cohort[0], axis=0)
            .drop(0, axis=1)
        )

        df_retention_7d = df_retention[7]
        df_retention_7d.index = df_retention_7d.index.strftime('%b %d')
        mean_retention = df_retention_7d.mean()
        best_cohort = df_retention_7d.agg(['idxmax', 'max'])
        worst_cohort = df_retention_7d.agg(['idxmin', 'min'])
        spread = abs(best_cohort['max'] - worst_cohort['min']) * 100
        yesterday_dau = format_number(int(df_roll['yesterday_users'].iloc[0]))
        all_week_users = format_number(int(df_roll['all_week_users'].iloc[0]))
        msg = dedent(f"""
        🎯 Retention Analysis

        7-Day Cohort Retention
        (Last 7 completed cohorts)
        • ⚖️ Mean: {mean_retention:.1%}
        • 🏆 Best Cohort: {best_cohort['max']:.1%} ({best_cohort['idxmax']})
        • ⚠️ Worst Cohort: {worst_cohort['min']:.1%} ({worst_cohort['idxmin']})
        • ↔️ Spread: {spread:.1f} pp

        Rolling Retention 7D (Current Audience)
        • 👥 Yesterday's DAU: {yesterday_dau}
        • 📅 Active in last 7 days: {all_week_users}
        • 🎯 Rolling Retention: {df_roll['retention_7d'].iloc[0]:.1%}
        """)

        fig = create_retention_dashboard(df_retention=df_retention)
        return {"msg": msg, "fig": fig}

    # Feed Detailed
    @task(
        retries=3,
        retry_delay=timedelta(minutes=5),
        on_failure_callback=handle_failure,
        multiple_outputs=True
    )
    def transform_feed(df: pd.DataFrame) -> dict:
        """
        Calculates the growth of WOW
        """
        df_wow = calc_wow(df)
        df_comparison = prepare_comparison_df(
            df=df
            , date_col='date'
            , id_var='week_status'
            , value_in_id_var_for_text='current'
            , has_previous=True
        )

        metrics = {
            'posts_per_user': ['Posts per User', '• 📝'],
            'views_per_user': ['Views per User', '• 👀'],
            'likes_per_user': ['Likes per User', '• ❤️'],
            'ctr_per_user': ['CTR per User', '• 🎯']
        }
        msg = format_metrics_report_wow(df_wow, metrics, '📰 Feed Detailed')

        metrics = ['posts_per_user', 'views_per_user', 'likes_per_user', 'ctr_per_user']
        metric_titles = ['Posts per User', 'Views per User', 'Likes per User', 'CTR per User']
        category_orders={'variable': metrics, 'week_status': ['current', 'previous']}

        fig = create_comparison_dashboard(
            df=df_comparison
            , date_col='date'
            , color='week_status'
            , text='text'
            , metrics=metrics
            , metric_titles=metric_titles
            , trace_name_for_text='current'
            , title = 'Feed Detailed Metrics'
            , labels={'date': 'Date'}
            , category_orders=category_orders
            , trace_names_map={'current': 'Current Week', 'previous': 'Previous Week'}
            , tickformats=[['.1%', 1, 2]]
            , make_gray=True
        )
        return {"msg": msg, "fig": fig}

    # Messenger Detailed
    @task(
        retries=3,
        retry_delay=timedelta(minutes=5),
        on_failure_callback=handle_failure,
        multiple_outputs=True
    )
    def transform_messenger(df: pd.DataFrame) -> dict:
        """
        Calculates the growth of WOW
        """
        df_wow = calc_wow(df)
        df_comparison = prepare_comparison_df(
            df=df
            , date_col='date'
            , id_var='week_status'
            , value_in_id_var_for_text='current'
            , has_previous=True
        )

        metrics = {
            'sender_dau': ['Sender DAU', '• 📢'],
            'receiver_dau': ['Receiver DAU', '• 📭'],
            'sender_to_receiver_ratio': ['Sender DAU / Receiver DAU', '• ⚖️'],
            'messages_per_sender': ['Messages per Sender', '• ✉️']
        }
        msg = format_metrics_report_wow(df_wow, metrics, '💬 Messenger Detailed')

        metrics = ['sender_dau', 'receiver_dau', 'sender_to_receiver_ratio', 'messages_per_sender']
        metric_titles = ['Sender DAU', 'Receiver DAU', 'Sender DAU / Receiver DAU', 'Messages per Sender']
        category_orders={'variable': metrics, 'week_status': ['current', 'previous']}

        fig = create_comparison_dashboard(
            df=df_comparison
            , date_col='date'
            , color='week_status'
            , text='text'
            , metrics=metrics
            , metric_titles=metric_titles
            , trace_name_for_text='current'
            , title = 'Messenger Detailed Metrics'
            , labels={'date': 'Date'}
            , category_orders=category_orders
            , trace_names_map={'current': 'Current Week', 'previous': 'Previous Week'}
            , make_gray=True
        )
        return {"msg": msg, "fig": fig}

    # DAU by Source
    @task(
        retries=3,
        retry_delay=timedelta(minutes=5),
        on_failure_callback=handle_failure,
        multiple_outputs=True
    )
    def transform_by_source(df: pd.DataFrame) -> dict:
        """
        Creates a DataFrame for comparison
        """
        df_comparison = prepare_comparison_df(
            df=df
            , date_col='date'
            , id_var='source'
            , has_previous=False
        )

        mask = df['date'] == df['date'].max()
        latest_data = df[mask]

        total_dau = latest_data['total_dau'].sum()
        ads_dau = latest_data[latest_data['source'] == 'ads']['total_dau'].iloc[0]
        organic_dau = latest_data[latest_data['source'] == 'organic']['total_dau'].iloc[0]

        ads_share = ads_dau / total_dau
        organic_share = organic_dau / total_dau

        ads_feed_share = latest_data[latest_data['source'] == 'ads']['feed_only_dau'].iloc[0] / ads_dau
        ads_messenger_share = latest_data[latest_data['source'] == 'ads']['messenger_only_dau'].iloc[0] / ads_dau
        ads_both_share = latest_data[latest_data['source'] == 'ads']['both_services_dau'].iloc[0] / ads_dau

        organic_feed_share = latest_data[latest_data['source'] == 'organic']['feed_only_dau'].iloc[0] / organic_dau
        organic_messenger_share = latest_data[latest_data['source'] == 'organic']['messenger_only_dau'].iloc[0] / organic_dau
        organic_both_share = latest_data[latest_data['source'] == 'organic']['both_services_dau'].iloc[0] / organic_dau

        msg =  dedent(f"""
            🌐 Active Users by Source 

            👥 Total DAU: {total_dau:,.0f}
            • Ads: {ads_share:.1%} ({ads_dau:,.0f})
            • Organic: {organic_share:.1%} ({organic_dau:,.0f})
            📰 Feed Only:
            • Ads: {ads_feed_share:.1%} • Organic: {organic_feed_share:.1%}
            💬 Messenger Only:
            • Ads: {ads_messenger_share:.1%} • Organic: {organic_messenger_share:.1%}
            🔄 Both Services:
            • Ads: {ads_both_share:.1%} • Organic: {organic_both_share:.1%}
        """)

        metrics = ['total_dau', 'feed_only_dau', 'messenger_only_dau', 'both_services_dau']
        metric_titles = ['Total DAU', 'Feed Only DAU', 'Messenger Only DAU', 'Both Services DAU']
        category_orders={'variable': metrics}
        trace_names_map={'ads': 'Ads', 'organic': 'Organic'} 

        fig = create_comparison_dashboard(
            df=df_comparison
            , date_col='date'
            , color='source'
            , metrics=metrics
            , metric_titles=metric_titles
            , title = 'Daily Active Users by Source'
            , labels={'date': 'Date'}
            , category_orders=category_orders
            , make_gray=False
            , trace_names_map=trace_names_map
        )
        return {"msg": msg, "fig": fig}

    # ==========================================================================
    # Load
    # ==========================================================================
    # DAU
    @task(
        retries=3,
        retry_delay=timedelta(minutes=5),
        on_failure_callback=handle_failure
    )
    def load_dau_message(message: str) -> None:
        """Task for sending DAU message"""
        print("📨 Sending DAU message...")
        if not bot.send_message(message=message):
            raise Exception("Failed to send DAU message")
        print("✅ DAU message sent successfully")

    @task(
        retries=3,
        retry_delay=timedelta(minutes=5),
        on_failure_callback=handle_failure
    )
    def load_dau_chart(figure: go.Figure) -> None:
        """Task for sending DAU chart"""
        print("📊 Sending DAU chart...")
        if not bot.send_chart(figure=figure):
            raise Exception("Failed to send DAU chart")
        print("✅ DAU chart sent successfully")

    # New users
    @task(
        retries=3,
        retry_delay=timedelta(minutes=5),
        on_failure_callback=handle_failure
    )
    def load_new_users_message(message: str) -> None:
        """Task for sending new users message"""
        print("📨 Sending new users message...")
        if not bot.send_message(message=message):
            raise Exception("Failed to send new users message")
        print("✅ new users message sent successfully")

    @task(
        retries=3,
        retry_delay=timedelta(minutes=5),
        on_failure_callback=handle_failure
    )
    def load_new_users_chart(figure: go.Figure) -> None:
        """Task for sending new users chart"""
        print("📊 Sending new users chart...")
        if not bot.send_chart(figure=figure):
            raise Exception("Failed to send new users chart")
        print("✅ new users chart sent successfully")

    # Activity
    @task(
        retries=3,
        retry_delay=timedelta(minutes=5),
        on_failure_callback=handle_failure
    )
    def load_activity_message(message: str) -> None:
        """Task for sending Activity message"""
        print("📨 Sending Activity message...")
        if not bot.send_message(message=message):
            raise Exception("Failed to send Activity message")
        print("✅ Activity message sent successfully")

    @task(
        retries=3,
        retry_delay=timedelta(minutes=5),
        on_failure_callback=handle_failure
    )
    def load_activity_chart(figure: go.Figure) -> None:
        """Task for sending Activity chart"""
        print("📊 Sending Activity chart...")
        if not bot.send_chart(figure=figure):
            raise Exception("Failed to send Activity chart")
        print("✅ Activity chart sent successfully")

    # Retention
    @task(
        retries=3,
        retry_delay=timedelta(minutes=5),
        on_failure_callback=handle_failure
    )
    def load_retention_message(message: str) -> None:
        """Task for sending Retention message"""
        print("📨 Sending Retention message...")
        if not bot.send_message(message=message):
            raise Exception("Failed to send Retention message")
        print("✅ Retention message sent successfully")

    @task(
        retries=3,
        retry_delay=timedelta(minutes=5),
        on_failure_callback=handle_failure
    )
    def load_retention_chart(figure: go.Figure) -> None:
        """Task for sending Retention chart"""
        print("📊 Sending Retention chart...")
        if not bot.send_chart(figure=figure):
            raise Exception("Failed to send Retention chart")
        print("✅ Retention chart sent successfully")

    # Feed Detailed
    @task(
        retries=3,
        retry_delay=timedelta(minutes=5),
        on_failure_callback=handle_failure
    )
    def load_feed_detailed_message(message: str) -> None:
        """Task for sending Feed Detailed message"""
        print("📨 Sending Feed Detailed message...")
        if not bot.send_message(message=message):
            raise Exception("Failed to send Feed Detailed message")
        print("✅ Feed Detailed message sent successfully")

    @task(
        retries=3,
        retry_delay=timedelta(minutes=5),
        on_failure_callback=handle_failure
    )
    def load_feed_detailed_chart(figure: go.Figure) -> None:
        """Task for sending Feed Detailed chart"""
        print("📊 Sending Feed Detailed chart...")
        if not bot.send_chart(figure=figure):
            raise Exception("Failed to send Feed Detailed chart")
        print("✅ Feed Detailed chart sent successfully")

    # Messenger Detailed
    @task(
        retries=3,
        retry_delay=timedelta(minutes=5),
        on_failure_callback=handle_failure
    )
    def load_messenger_detailed_message(message: str) -> None:
        """Task for sending Messenger Detailed message"""
        print("📨 Sending Messenger Detailed message...")
        if not bot.send_message(message=message):
            raise Exception("Failed to send Messenger Detailed message")
        print("✅ Messenger Detailed message sent successfully")

    @task(
        retries=3,
        retry_delay=timedelta(minutes=5),
        on_failure_callback=handle_failure
    )
    def load_messenger_detailed_chart(figure: go.Figure) -> None:
        """Task for sending Messenger Detailed chart"""
        print("📊 Sending Messenger Detailed chart...")
        if not bot.send_chart(figure=figure):
            raise Exception("Failed to send Messenger Detailed chart")
        print("✅ Messenger Detailed chart sent successfully")

    # Users by Source
    @task(
        retries=3,
        retry_delay=timedelta(minutes=5),
        on_failure_callback=handle_failure
    )
    def load_by_source_message(message: str) -> None:
        """Task for sending Users by Source message"""
        print("📨 Sending Users by Source message...")
        if not bot.send_message(message=message):
            raise Exception("Failed to send Users by Source message")
        print("✅ Users by Source message sent successfully")

    @task(
        retries=3,
        retry_delay=timedelta(minutes=5),
        on_failure_callback=handle_failure
    )
    def load_by_source_chart(figure: go.Figure) -> None:
        """Task for sending Users by Source chart"""
        print("📊 Sending Users by Source chart...")
        if not bot.send_chart(figure=figure):
            raise Exception("Failed to send Users by Source chart")
        print("✅ Users by Source chart sent successfully")

    # ==========================================================================
    # WORKFLOW
    # ==========================================================================

    # extract
    df_dau = extract_dau()
    df_new_users = extract_new_users()
    df_activity = extract_activity()
    df_retention = extract_retention()
    df_roll_retention_7d = extract_roll_retention_7d()
    df_feed_detailed = extract_feed_detailed()
    df_messenger_detailed = extract_messenger_detailed()
    df_users_daily_by_source = extract_users_daily_by_source()

    # transform
    transform_dau_result = transform_dau(df_dau)
    msg_dau = transform_dau_result["msg"]
    fig_dau = transform_dau_result["fig"]

    transform_new_users_result = transform_new_users(df_new_users)
    msg_new_users = transform_new_users_result["msg"]
    fig_new_users = transform_new_users_result["fig"]    
    
    transform_activity_result = transform_activity(df_activity)
    msg_activity = transform_activity_result["msg"]
    fig_activity = transform_activity_result["fig"]   
     
    transform_retention_result = transform_retention(df_retention, df_roll_retention_7d)
    msg_retention = transform_retention_result["msg"]
    fig_retention = transform_retention_result["fig"]    
        
    transform_feed_detailed_result = transform_feed(df_feed_detailed)
    msg_feed_detailed = transform_feed_detailed_result["msg"]
    fig_feed_detailed = transform_feed_detailed_result["fig"]      
    
    transform_messenger_detailed_result = transform_messenger(df_messenger_detailed)
    msg_messenger_detailed = transform_messenger_detailed_result["msg"]
    fig_messenger_detailed = transform_messenger_detailed_result["fig"]       
    
    transform_by_source_result = transform_by_source(df_users_daily_by_source)
    msg_by_source = transform_by_source_result["msg"]
    fig_by_source = transform_by_source_result["fig"]      


    # load
    dau_msg_task = load_dau_message(msg_dau)
    dau_chart_task = load_dau_chart(fig_dau)

    new_users_msg_task = load_new_users_message(msg_new_users)  
    new_users_chart_task = load_new_users_chart(fig_new_users)
    
    activity_msg_task = load_activity_message(msg_activity)  
    activity_chart_task = load_activity_chart(fig_activity)    

    retention_msg_task = load_retention_message(msg_retention)
    retention_chart_task = load_retention_chart(fig_retention)

    feed_detailed_msg_task = load_feed_detailed_message(msg_feed_detailed)
    feed_detailed_chart_task = load_feed_detailed_chart(fig_feed_detailed)   
    
    messenger_detailed_msg_task = load_messenger_detailed_message(msg_messenger_detailed)
    messenger_detailed_chart_task = load_messenger_detailed_chart(fig_messenger_detailed)    
            
    by_source_msg_task = load_by_source_message(msg_by_source)
    by_source_chart_task = load_by_source_chart(fig_by_source)
    
    # task dependencies
    (
        dau_msg_task >> dau_chart_task >>
        new_users_msg_task >> new_users_chart_task >>
        activity_msg_task >> activity_chart_task >>
        retention_msg_task >> retention_chart_task >>
        feed_detailed_msg_task >> feed_detailed_chart_task >>
        messenger_detailed_msg_task >> messenger_detailed_chart_task >>
        by_source_msg_task >> by_source_chart_task
    )
      
# ==========================================================================
# DAG execution
# ==========================================================================

app_report = app_report()

# Report Screenshots

Below are screenshots of a sample report.

<img src="app_report_telegram_part_1.png" alt="">
<img src="app_report_telegram_part_2.png" alt="">
<img src="app_report_telegram_part_3.png" alt="">
<img src="app_report_telegram_part_4.png" alt="">
<img src="app_report_telegram_part_5.png" alt="">
<img src="app_report_telegram_part_6.png" alt="">

# Conclusion

- **Data Pipeline Development:** 
  - Built SQL queries to extract and calculate key product metrics from ClickHouse database
- **Automation Framework:** 
  - Created Airflow DAG for scheduled daily execution with comprehensive error handling
- **Chat Delivery System:** 
  - Integrated Telegram API for automated report distribution directly to team chat
- **Business Reporting:** 
  - Designed comprehensive visualizations covering both application-wide and feature-specific metrics
- **Stakeholder Focus:** 
  - Developed business-ready reports that answer key product performance questions