In [1]:
# Project Overview

## Introduction
This project compiles data on violence against the Dalit community in India, sourced from credible media outlets. The aim is to present this information in a structured format for better understanding and awareness.

## Objectives
- **Data Compilation**: Collect and organize news articles and reports.
- **Presentation**: Display data in a clear and accessible table format.
- **Awareness**: Highlight issues faced by the Dalit community through media reports.

## Methodology

### Data Sources
- **RSS Feeds**: Gather news from trusted sources.
- **Google News API**: Supplementary source for obtaining articles.
- **Twitter API**: Collect tweets to understand public discourse.

### Tools and Technologies
- **Python Libraries**:
  - `requests` for API interactions
  - `beautifulsoup4` for web scraping
  - `tweepy` for Twitter API access
  - `feedparser` for RSS feed parsing

## Folder Structure

- **docs/**: Documentation files
  - `data_sources.md`: Information on data sources used
  - `project_overview.md`: Overview of the project

- **src/**: Source code for data collection and processing

- **tests/**: Scripts for testing code functionality

- **data/**:
  - `raw/`: Unprocessed data files
  - `processed/`: Cleaned data ready for presentation

- **notebooks/**: Jupyter notebooks for any exploratory analysis (if needed)

## Future Work
- Expand to include more regional news sources.
- Develop a web interface for real-time data updates.
- Explore partnerships with NGOs for broader dissemination.

## Conclusion
This project serves as a resource to understand the extent of violence against the Dalit community by presenting media-reported data in an organized manner.

In [2]:
import pandas as pd

# Define the data for the CSV
data = {
    "News Platforms": [
        "Google News RSS Feed", 
        "Indian Express", 
        "The Hindu", 
        "Dainik Bhaskar", 
        "Dalit Times", 
        "Times of India", 
        "Dalit Archive", 
        "Ambedkarite India", 
        "NDTV", 
        "LiveMint", 
        "Deccan Herald", 
        "India News Network", 
        "The Statesman", 
        "The Print", 
        "Hindustan Times", 
        "The Wire", 
        "News Laundry", 
        "India Today", 
        "Newsclick", 
        "Scroll.in"
    ],
    "RSS Feeds": [
        "https://rss.app/rss-feed/google-news-rss-feed", 
        "https://indianexpress.com/section/india/crime/", 
        "https://www.thehindu.com/rssfeeds/", 
        "https://www.bhaskar.com/rss", 
        "", 
        "https://timesofindia.indiatimes.com/rss.cms", 
        "", 
        "", 
        "https://www.ndtv.com/rss", 
        "https://www.livemint.com/rss", 
        "", 
        "https://www.indianewsnetwork.com/rss-feeds/", 
        "", 
        "", 
        "https://tech.hindustantimes.com/rss", 
        "", 
        "", 
        "https://www.indiatoday.in/rss", 
        "https://www.newsclick.in/rss-feed", 
        ""
    ],
    "Twitter Handles": [
        "", 
        "https://x.com/IndianExpress", 
        "https://x.com/the_hindu", 
        "https://x.com/DainikBhaskar", 
        "https://x.com/DalitTime", 
        "https://x.com/timesofindia", 
        "https://x.com/dalitarchive", 
        "https://x.com/ambedkariteIND", 
        "https://x.com/ndtv", 
        "https://x.com/livemint", 
        "https://x.com/DeccanHerald", 
        "", 
        "https://x.com/TheStatesmanLtd", 
        "https://x.com/ThePrintIndia", 
        "https://x.com/htTweets", 
        "https://x.com/thewire_in", 
        "https://x.com/newslaundry", 
        "https://x.com/IndiaToday", 
        "https://x.com/newsclickin", 
        "https://x.com/scroll_in"
    ],
    "Facebook Pages": [
        "", 
        "https://www.facebook.com/indianexpress", 
        "https://www.facebook.com/thehindu", 
        "https://www.facebook.com/dainikbhaskar", 
        "https://www.facebook.com/timesdalit", 
        "https://www.facebook.com/TimesofIndia", 
        "",  
        "https://www.facebook.com/ambedkariteIND",
        "https://www.facebook.com/ndtv",
        "https://www.facebook.com/mint.live",
        "https://www.facebook.com/deccanherald",
        "",
        "https://www.facebook.com/thestatesman1875",
        "https://www.facebook.com/theprintindia",
        "https://www.facebook.com/hindustantimes",
        "https://www.facebook.com/thewire.in",
        "https://www.facebook.com/newslaundry",
        "https://www.facebook.com/IndiaToday",
        "https://www.facebook.com/newsclickonline",
        "https://www.facebook.com/scroll.in"
    ]
}

# Create a DataFrame

# Create a DataFrame
df = pd.DataFrame(data)

# Save the DataFrame to a CSV file
csv_file_path = 'data/raw/news_sources.csv'
df.to_csv(csv_file_path, index=False)

df


Unnamed: 0,News Platforms,RSS Feeds,Twitter Handles,Facebook Pages
0,Google News RSS Feed,https://rss.app/rss-feed/google-news-rss-feed,,
1,Indian Express,https://indianexpress.com/section/india/crime/,https://x.com/IndianExpress,https://www.facebook.com/indianexpress
2,The Hindu,https://www.thehindu.com/rssfeeds/,https://x.com/the_hindu,https://www.facebook.com/thehindu
3,Dainik Bhaskar,https://www.bhaskar.com/rss,https://x.com/DainikBhaskar,https://www.facebook.com/dainikbhaskar
4,Dalit Times,,https://x.com/DalitTime,https://www.facebook.com/timesdalit
5,Times of India,https://timesofindia.indiatimes.com/rss.cms,https://x.com/timesofindia,https://www.facebook.com/TimesofIndia
6,Dalit Archive,,https://x.com/dalitarchive,
7,Ambedkarite India,,https://x.com/ambedkariteIND,https://www.facebook.com/ambedkariteIND
8,NDTV,https://www.ndtv.com/rss,https://x.com/ndtv,https://www.facebook.com/ndtv
9,LiveMint,https://www.livemint.com/rss,https://x.com/livemint,https://www.facebook.com/mint.live
