# Retrieving information with `telethon`
## by Alexander Zhamoydin

Hello everyone, in this tutorial we are going to scrape some information from **Telegram** with the help of `telethon` package. See <a href="https://docs.telethon.dev/en/latest/">Documentation<a>

## Preparation

Before we begin, you have to create a telegram app - we need `api_id` and `api_hash` to be able to use telegarm API. Here are the steps you should follow:

- Visit this link - https://my.telegram.org/auth?to=apps and enter your phone number - you will then recieve a confirmation code

- After you recieve and enter your code, you have to give your app a name, choose type and create description
    You will see something like this:
    <img src="https://media.proglib.io/posts/2019/11/02/978bb3286b84b1b487f0a0c6afc0398b.png">

- After clicking `Create application` telegram wil generate and show you your `App api_id` and `App api_hash`. Copy them and assign to corresponding variables.
    Also add your username here

In [None]:
API_ID = "YOUR_API_HERE"
API_HASH = "YOUR_HASH_HERE"
USERNAME = "YOUR_USERNAME_HERE"

- Then we will continue by installing package

In [None]:
!pip install telethon

## Writing code

- Then we import libraries

In [None]:
import json
from datetime import date, datetime

from telethon.sync import TelegramClient
from telethon import connection
from telethon.tl.functions.channels import GetParticipantsRequest
from telethon.tl.types import ChannelParticipantsSearch
from telethon.tl.functions.messages import GetHistoryRequest

- We create client with the help of `TelegramClient` class, here we use our pre-defined variables

In [None]:
client = TelegramClient(USERNAME, API_ID, API_HASH)

- We call `.start()` - it returns coroutine, so we add await. More on coroutines: <a href="https://docs.python.org/3/library/asyncio-task.html">Python docs</a>
    You will be asked to enter your phone, and then you will be asked again to enter confirmation code. After that you will successfuly sign in

In [None]:
await client.start()

- Here we define URL of the channel we want to scrape and "connect" to this channel

In [None]:
URL = "YOUR URL HERE"

In [None]:
await client.get_entity(URL)

- We define function which collects information about all users in channel. It creates a list of dictionaries and then packs them into one JSON file

In [None]:
async def dump_all_participants(channel):
    offset_user = 0
    limit_user = 100

    all_participants = [] 
    filter_user = ChannelParticipantsSearch('')

    while True:
        participants = await client(GetParticipantsRequest(channel,
        filter_user, offset_user, limit_user, hash=0))
        if not participants.users:
            break
        all_participants.extend(participants.users)
        offset_user += len(participants.users)
        all_users_details = []

    # Here you can get any info you want, read more in docs
    for participant in all_participants:
        all_users_details.append({
            "id": participant.id,
            "first_name": participant.first_name,
            "last_name": participant.last_name,
            "user": participant.username,
            "phone": participant.phone,
            "is_bot": participant.bot
        })

    with open('channel_users.json', 'w', encoding='utf8') as outfile:
        json.dump(all_users_details, outfile, ensure_ascii=False)

- Then we define function, which scrapes messages, but before that we define class to work with dates

In [None]:
class DateTimeEncoder(json.JSONEncoder):
    def default(self, o):
        if isinstance(o, datetime):
            return o.isoformat()
        if isinstance(o, bytes):
            return list(o)
        return json.JSONEncoder.default(self, o)

In [None]:
async def dump_all_messages(channel, max_messages=-1):
    offset_msg = 0
    limit_msg = 100

    all_messages = []   # список всех сообщений
    total_messages = 0

    while True:
        history = await client(GetHistoryRequest(
            peer=channel,
            offset_id=offset_msg,
            offset_date=None, add_offset=0,
            limit=limit_msg, max_id=0, min_id=0,
            hash=0))
        if not history.messages:
            break
        messages = history.messages
        for message in messages:
            # here you can edit what properties you want to save
            all_messages.append(message.to_dict())
        offset_msg = messages[len(messages) - 1].id
        total_messages = len(all_messages)
        if max_messages != -1 and total_messages >= max_messages:
            break

    with open('channel_messages.json', 'w', encoding='utf8') as outfile:
        json.dump(all_messages, outfile, ensure_ascii=False, cls=DateTimeEncoder)

- Now lets launch our functions

In [None]:
await dump_all_participants(URL)
await dump_all_messages(URL, 200)

# Thanks for attention 	(ﾉ◕ヮ◕)ﾉ*:･ﾟ✧

## P.S. This tutorial is almost a translation of this article: <a href="https://proglib.io/p/pishem-prostoy-grabber-dlya-telegram-chatov-na-python-2019-11-06">Tutorial</a>