Dashboard: https://lookerstudio.google.com/reporting/5853456b-ffa8-4663-a534-bf4248b4b9f2
This dbt project analyzes Stack Overflow data to answer the business question: "Which topics have the highest need for answers?"
Built as a dimensional star schema model using Stack Overflow's public dataset from BigQuery.
- Stack Overflow Public Dataset (
bigquery-public-data.stackoverflow
)- Questions, answers, users, tags, and votes data
- Fact Table:
fact_questions
- One row per question with metrics - Dimensions:
dim_tags
,dim_users
,dim_date
- Analysis View: Pre-joined view optimized for dashboard analysis
unanswered_question_count
- Questions with zero answersno_accepted_answer_count
- Questions without accepted solutionsanswer_count
- Total answers per questionview_count
- Question popularity
models/
├── staging/
│ ├── sources.yml # Source definitions
│ ├── stg_stackoverflow__questions.sql
│ ├── stg_stackoverflow__tags.sql
│ └── stg_stackoverflow__users.sql
├── marts/
│ ├── dim_tags.sql # Topic dimension
│ ├── dim_users.sql # User dimension
│ ├── dim_date.sql # Time dimension
│ ├── fact_questions.sql # Main fact table
│ └── analysis_view.sql # Dashboard-ready view
└── schema.yml # Documentation & tests
- dbt Cloud account
- BigQuery project with billing enabled
- Access to
bigquery-public-data.stackoverflow
# Install dependencies and run models
dbt run
# Run data quality tests
dbt test
# Generate documentation
dbt docs generate
Top 5 Topics Needing Answers:
- Python
- JavaScript
- ReactJS
- Java
- Android
Top 5 Topics Needing Better Quality Answers:
- Python
- Javascript
- ReactJS
- Java
- C#
Insights:
- Most topics have 2x more "no accepted answer" than "completely unanswered"
- Indicates answer quality issues rather than lack of responses
- Popular languages dominate both question volume and unanswered rates
Interactive Looker Studio dashboard available showing:
- Topics ranked by unanswered question counts
- Answer quality analysis (unanswered vs no accepted answer)
- User expertise distribution by topic
- Temporal trends in question patterns
- Simple joins from fact to dimensions
- Fast aggregations for dashboard queries
- Business-friendly structure for analysts
- Scalable design for additional metrics
- Questions filtered to 2022+ (manageable data size)
- Tags parsed from pipe-separated format
- User expertise classified by reputation thresholds
- Degenerate dimensions for commonly-used attributes