<a href="https://colab.research.google.com/github/deedeeharris/scripts/blob/main/tg_get_subscribers_count_from_channels.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Telegram Channel Subscribers Scraper

## Overview

This Python script scrapes the subscriber count of Telegram channels from a provided CSV or Excel file. It utilizes BeautifulSoup for web scraping and Pandas for data manipulation. The subscriber count is extracted from the specified HTML class, and potential errors during the process are handled gracefully.

By: [Yedidya Harris](https://www.linkedin.com/in/yedidya-harris/), 01/2024

## How to Use

1. **Upload a CSV or Excel file** containing a column named 'Telegram_Link' with the Telegram channel links.

    **Example Input File Structure:**

| Telegram_Link                  |
| ------------------------------ |
| https://t.me/channel1           |
| https://t.me/channel2           |
| https://t.me/channel3           |
| https://t.me/channel4           |
| https://t.me/channel5           |

2. **Run the script** to retrieve and display the subscriber counts. Click on the 'Play' button (to the left of the words 'Show code'): image.png.

3. The processed data is saved to an Excel file, which is then **automatically downloaded**.


In [None]:
#@title Run me by clicking on the 'Play' button.
import requests
from bs4 import BeautifulSoup
import pandas as pd
import time
from IPython.display import display, HTML
from google.colab import files

def get_subscribers_count(channel_link, headers):
    response = requests.get(channel_link, headers=headers)
    if response.status_code == 200:
        try:
            soup = BeautifulSoup(response.text, 'html.parser')
            subscribers_element = soup.find('div', class_='tgme_page_extra')
            if subscribers_element is not None:
                subscribers_count = ''.join([s for s in subscribers_element.get_text(strip=True).split() if s.isdigit()])
                return subscribers_count
            else:
                # handle the case when the specified class is not found
                print(f"Warning: 'tgme_page_extra' class not found on {channel_link}")
                return 'N/A'
        except Exception as e:
            # handle any other exceptions that may occur during parsing
            print(f"Error while parsing {channel_link}: {e}")
            return 'Error'
    else:
        return f"Failed to retrieve the page. Status code: {response.status_code}"


def process_telegram_links(file_name):
    if file_name.endswith('.csv'):
        df = pd.read_csv(file_name)
    elif file_name.endswith('.xlsx'):
        df = pd.read_excel(file_name)
    else:
        raise ValueError(f"Unsupported file type. Supported types: .csv, .xlsx")



    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
    }

    df['Subscribers'] = ''

    for index, row in df.iterrows():
        channel_link = row['Telegram_Link']
        subscribers_count = get_subscribers_count(channel_link, headers)
        df.at[index, 'Subscribers'] = subscribers_count
        time.sleep(0.5)

    display(HTML(df.to_html()))

    output_file_path = 'output_subscribers_counts.xlsx'
    df.to_excel(output_file_path, index=False)

    files.download(output_file_path)
    print(f"\nDataFrame with subscriber counts saved and downloaded: {output_file_path}")

uploaded_file = files.upload()

file_name = list(uploaded_file.keys())[0]

# process the uploaded file
process_telegram_links(file_name)