```{toctree}
:hidden:
building_startup_analytics/1_data_infrastructure/1_analytical_database.ipynb
building_startup_analytics/2_product_dashboard/2_product_dashboard.ipynb
building_startup_analytics/3_metric_deep_dive/3_metric_deep_dive.ipynb
building_startup_analytics/4_ab_testing/4_ab_testing.ipynb
building_startup_analytics/5_metric_forecasting/5_metric_forecasting.ipynb
building_startup_analytics/6_automated_reporting/6_automated_reporting.ipynb
building_startup_analytics/7_anomaly_detection/7_alert_system.ipynb
```

# 🧩 Building Startup Analytics

Building analytics process for startup: infrastructure, dashboards, A/B testing, forecasting, automated reports, and anomaly detection.

### 👨‍💻 Author

**Pavel Grigoryev**

### 🛠️ Tech Stack & Methods

**Stack:**

- **Data & DB:** `Python` `Pandas` `ClickHouse`
- **Viz & BI:** `Superset` `Yandex DataLens` `Plotly`
- **ML & Stats:** `StatsModels` `SciPy` `Pingouin` `Uber Orbit`
- **Automation:** `Apache Airflow` `Telegram API`

**Methods:**

- **Data Infrastructure Design:**
  - Star schema modeling, ETL pipeline development, and data quality validation
- **Product Analytics:**
  - Retention analysis, cohort analysis, and engagement metrics tracking
- **Business Intelligence:**
  - Real-time dashboard design, KPI definition, and self-service reporting implementation
- **Statistical Hypothesis Testing:**
  - A/A and A/B test analysis, sample size calculation, and statistical power analysis
- **Time Series Forecasting:**
  - Bayesian structural models, trend/seasonality decomposition, and model validation
- **Anomaly Detection:**
  - MAD-based outlier detection, alert threshold optimization, and real-time monitoring
- **Automation Engineering:**
  - DAG orchestration, API integration, and scheduled reporting systems
- **Monte Carlo Simulation:**
  - Statistical power estimation and sample size determination through simulation

### 📌 Project Overview

- This project demonstrates the implementation of a complete product analytics system for an early-stage startup that has developed an application merging a messenger with a personalized news feed. 
- In this ecosystem, users can browse and interact with posts (views, likes) while simultaneously communicating with each other through direct messages. 
- The core challenge was to build the entire analytical infrastructure from scratch to understand user behavior across both features and enable data-driven decision-making.

**Project Goal:**

- **Build Data Infrastructure**
  - Create scalable analytical database for product metrics
  - Establish single source of truth for user behavior data

- **Enable Product Analytics**
  - Develop interactive dashboards for real-time monitoring
  - Implement retention analysis and engagement tracking

- **Establish Experimentation**
  - Build complete A/B testing pipeline for feature validation
  - Ensure statistical reliability of experiment results

- **Create Forecasting System**
  - Develop predictive models for key business metrics
  - Enable proactive anomaly detection and planning

- **Automate Business Reporting**
  - Implement daily automated reports to Telegram
  - Build comprehensive stakeholder communication system

- **Establish Proactive Monitoring**
  - Build real-time alerting system for metric anomalies
  - Implement 24/7 monitoring with immediate notifications

- **Drive Business Decisions**
  - Translate data insights into product recommendations
  - Establish continuous monitoring processes

**Expected Outcome:**

- A complete analytics ecosystem that provides real-time business visibility, enables data-driven product decisions, and automates stakeholder reporting through reliable, scalable data infrastructure.

### 🗃️ Data Source

The analysis uses data from the product database in ClickHouse, which consists of two main tables:

- **`feed_actions`** - tracks user interactions with the news feed
- **`message_actions`** - records messaging activity between users

### 🎯 Goal Delivered

Successfully implemented a complete analytics workflow from data collection to actionable business insights.

- **Centralized Data Foundation** with optimized analytical database
- **Comprehensive Monitoring** through interactive dashboards and alerts
- **Automated Intelligence** with daily reporting and anomaly detection
- **Experimental Framework** for data-driven feature validation
- **Predictive Capabilities** enabling proactive planning and optimization