# üì∞ Bangladesh News Articles - Starter Notebook

This notebook demonstrates how to explore the Bangladesh News Articles Dataset.

**Dataset Features:**
- 74+ Bangladeshi newspapers (English & Bangla)
- Daily automated updates
- Multiple formats: JSON, CSV, Excel, Parquet

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

# Load the dataset
df = pd.read_csv('/kaggle/input/bangladesh-news-articles/articles.csv')
print(f"Total articles: {len(df):,}")
df.head()

## üìä Dataset Overview

In [None]:
# Basic statistics
print("Columns:", df.columns.tolist())
print(f"\nArticles per newspaper:")
df['paper_name'].value_counts().head(10)

In [None]:
# Articles by newspaper (top 15)
top_papers = df['paper_name'].value_counts().head(15)
plt.figure(figsize=(12, 6))
top_papers.plot(kind='barh', color='steelblue')
plt.xlabel('Number of Articles')
plt.title('Top 15 Newspapers by Article Count')
plt.tight_layout()
plt.show()

## üìÖ Publication Timeline

In [None]:
# Parse dates and plot timeline
df['date'] = pd.to_datetime(df['publication_date'], errors='coerce')
daily_counts = df.groupby(df['date'].dt.date).size()

plt.figure(figsize=(14, 5))
daily_counts.plot(kind='line', color='teal')
plt.xlabel('Date')
plt.ylabel('Articles')
plt.title('Daily Article Count')
plt.tight_layout()
plt.show()

## üìù Category Distribution

In [None]:
# Category breakdown
if 'category' in df.columns:
    categories = df['category'].value_counts().head(10)
    plt.figure(figsize=(10, 6))
    categories.plot(kind='pie', autopct='%1.1f%%')
    plt.title('Top 10 Categories')
    plt.ylabel('')
    plt.show()
else:
    print("No category column found")

## üîç Sample Articles

In [None]:
# Show sample headlines
sample = df[['paper_name', 'headline', 'category', 'publication_date']].dropna().sample(5)
for _, row in sample.iterrows():
    print(f"üì∞ [{row['paper_name']}] {row['headline'][:80]}...")
    print(f"   Category: {row['category']} | Date: {row['publication_date']}\n")

## üí° Next Steps

- **Sentiment Analysis**: Analyze article sentiment using TextBlob or Transformers
- **Topic Modeling**: Discover topics with LDA or BERTopic
- **Text Classification**: Train a classifier on categories
- **Bengali NLP**: Use BanglaLAMA or mBERT for Bengali articles