# Telegram

## About Telegram
This is a section about Telegram.

## Prerequisites
The following steps are necessary in order to be able to create a script to get data from Telegram.
### Create a new folder
This tutorial is going to use 2 files: the Python script, in a .py file, and a .env file to store credentials. Create a new folder somewhere on your computer to store these files.

Once you've created a folder, go ahead and create 2 files: "script.py" and ".env" so that your directory tree looks something like

```
-- Project Folder
   | script.py
   | .env
```
### Telegram Credentials
We're going to need an api_id and an api_hash value from Telegram in order to access its API. These are granted to users that register an app with Telegram.

First, [create a Telegram account](https://telegram.org/sign-up) if you don't already have one.

Then, head over to [this link](https://my.telegram.org/). Login and press the "API development tools" button.  Grab the api_id and api_hash values and store them in your .env file to use later.

![TELEGRAM APP EXAMPLE](images/telegram1.png)

In **.env**:
```Python
TELEGRAM_API_ID = "<your telegram api id>"
TELEGRAM_API_HASH = "<your telegram api hash>"
PHONE_NUMBER = "<your phone number (starting with +)>"
```

## Installing Telethon
The library that we're going to use to scrape messages from Telegram channels is called [Telethon](https://docs.telethon.dev/en/stable/). We'll also use the [dotenv](https://pypi.org/project/python-dotenv/) library to load credentials from the .env file.

To install Telethon and dotenv, run the following commands in your terminal:
```bash
pip install telethon
pip install python-dotenv
```

This installs the package so we can use it on our local machine.

```{note}
When installing on your machine, use "pip" on Windows or "pip3" on MacOS - if you need more guidance, check out [this guide](https://python.land/virtual-environments/installing-packages-with-pip)
```

## Basics
Now that we have the credentials ready and telethon installed, we can start script writing. The first step is to import the TelegramClient class from the telethon package so we can use it in our script:

In [1]:
#script.py

from telethon import TelegramClient

Next we need to load our .env variables that we set earlier. To do this we'll use the dotenv library that we installed. Use the following code to set your environment variables:

In [2]:
from dotenv import load_dotenv
import os

'''this line parses the .env file and loads 
the variables into the current process to 
allow the os to access the variables from your 
.env file.'''
load_dotenv()

TELEGRAM_API_ID = os.getenv("TELEGRAM_API_ID")
TELEGRAM_API_HASH = os.getenv("TELEGRAM_API_HASH")
PHONE_NUMBER = os.getenv("PHONE_NUMBER")

The TelegramClient object class is how we will interface with Telegram. It takes in 3 arguments, all strings:
```python
TelegramClient(<session_name>, <telegram_api_id>, <telegram_api_hash>)
```
The first argument is the session name. The Telethon library stores each login in a session file. This session file stores login information so that the API user only has to go through the login process once. Any string works, as long as it's consistent whenever you create a new TelegramClient object with the same Telegram account. It makes sense to use the username of the account the session will be associated with as a session name to keep things simple.

The last 2 arguments should look familiar- those are the login credentials stored in the .env file.

The TelegramClient object allows us to collect data using the credentials we obtained from [my.telegram.org](https://my.telegram.org). To start, we can request information about our account to verify that our credentials are working.

In [3]:
client = TelegramClient("simpatbos", TELEGRAM_API_ID, TELEGRAM_API_HASH)

async def main():
    await client.connect() #sends a request to connect

    if not await client.is_user_authorized(): #if there's no session file, we need to request a verification code and login
        print("Sending code...")
        await client.send_code_request(PHONE_NUMBER)
        code = input("Enter the code you received: ")
        await client.sign_in(PHONE_NUMBER, code)

    me = await client.get_me() #the get_me method returns info about the logged in user
    print(me.username)
    client.disconnect()
        
await main()

simpatbos


```{note}
The "async" and "await" keywords are necessary. The await keyword pauses excecution until the current task is finished. The async keyword tells the script that the function will include await statements.
```

After inputting the login code, the script output "simpatbos", the username of the author. Our script works!

## Collecting Data

Now that we're logged in, we can start to collect data. We'll use various methods of the client class in order to collect data.

### Getting a Channel Object

To get data from a channel, we first have to obtain an object that represents the channel. To do this, we'll use the
```python
client.get_entity()
```

method. This method allows us to get an entity (a fancy word for a User, Chat or Channel object). This method only takes 1 argument, the name, id, or link to a user, chat or channel. 

**Example:** Getting the offical New York Times chat. The chat is called "nytimes".

In [4]:
async with TelegramClient("simpatbos", TELEGRAM_API_ID, TELEGRAM_API_HASH) as client:
    channel_name = "nytimes"
    nytimes_channel = await client.get_entity(channel_name)
    print(nytimes_channel.title)

The New York Times


```{note}
Try printing the nytimes_channel object instead of the nytimes_channel.title attribute. You can observe all the attributes that are available for data collection.
```

### Collecting Chat Messages
To get the messages from a given channel, we'll use the
```python
client.iter_messages()
```
method. This method takes the name, id or link to a channel and returns an iterable with every single message object. It also takes a limit argument, which sets the maximum amount of messages to be retrieved. (You can omit the limit argument to get all messages.)

**Example:** Getting the messages from the offical New York times chat.

In [5]:
async with TelegramClient("simpatbos", TELEGRAM_API_ID, TELEGRAM_API_HASH) as client:
    channel_name = "nytimes"
    limit = 10
    async for message in client.iter_messages(channel_name, limit):
        print(message.date)

2025-04-02 19:19:03+00:00
2025-04-01 18:39:58+00:00
2025-03-31 19:59:24+00:00
2025-03-28 19:23:00+00:00
2025-03-27 19:47:07+00:00
2025-03-26 19:32:46+00:00
2025-03-25 18:46:01+00:00
2025-03-24 19:08:03+00:00
2025-03-21 19:30:57+00:00
2025-03-20 19:27:00+00:00


#### Search Queries

You can also return messages that include a search query using the search argument.

**Example:** Getting all messages from the offical New York Times chat that include the word "ran" 

In [6]:
async with TelegramClient("simpatbos", TELEGRAM_API_ID, TELEGRAM_API_HASH) as client:
    channel_name = "nytimes"
    query = "ran"
    async for message in client.iter_messages(channel_name, search=query):
        print(message.date)

2023-04-07 01:36:51+00:00
2022-05-22 15:28:13+00:00
2022-04-21 22:35:11+00:00
2022-04-12 20:44:45+00:00
2022-03-28 05:58:34+00:00


#### Date offset
You can also return messages that come before a certain date with the offset_date argument. This argument is of type datetime, which you can read about [here](https://www.w3schools.com/python/python_datetime.asp). The date given is NOT included in the returned messages.

**Example:** Getting all messages from the offical New York Times chat that include the word "Trump" from before April 2 2024.

In [7]:
import datetime

async with TelegramClient("simpatbos", TELEGRAM_API_ID, TELEGRAM_API_HASH) as client:
    channel_name = "nytimes"
    query = "Trump"
    before = datetime.datetime(2024, 4, 2)
    async for message in client.iter_messages(channel_name, search=query, offset_date=before):
        print(message.date)

2024-02-12 20:20:18+00:00
2023-11-20 20:33:54+00:00
2023-11-16 20:53:51+00:00
2023-08-14 19:39:56+00:00
2023-03-31 19:10:12+00:00
2022-05-20 21:19:43+00:00
2022-05-13 09:25:18+00:00


### Getting a Message's Sender
The message object contains a reference to the sender object. You can obtain information about the sender of the message with this object.

**Example:** Getting the names of the last 10 users to send messages in the "Python" telegram group (with id "Python")

In [8]:
async with TelegramClient("simpatbos", TELEGRAM_API_ID, TELEGRAM_API_HASH) as client:
    channel_id = "Python"
    limit = 10
    async for message in client.iter_messages(channel_id, limit=limit):
        print(message.sender.first_name if message.sender.first_name else "", 
              message.sender.last_name if message.sender.last_name else "")

cj 🚀
Артур Фурт
علي 
Charly Román
George 392.85 °C
ȺʍìɾⱮօհąʍʍąժ 
علي 
Bᴏᴛss 
Doragonsureiyā 
Bᴏᴛss 


### Getting Participants in a Channel
The following code demonstrates how to iterate over the participants of the Python telegram group.
```{note}
There have been changes to the Telegram API that make this iterator sort of useless. As you can see, the following code only gets 16 users despite the fact that there are 100,000+ participants in the group as of writing this. 
```

In [9]:
async with TelegramClient("simpatbos", TELEGRAM_API_ID, TELEGRAM_API_HASH) as client:
    channel_id = "Python"
    async for user in client.iter_participants(channel_id):
        print(user.first_name if user.first_name else "", 
              user.last_name if user.last_name else "")

cj 🚀
clouded 
Maxim N
Charly Román
Inuk Syooperstar
Артём🇷🇺 
steve raphael
George 392.85 °C
Samuel Bautista
Doragonsureiyā 
Vivian - vivian@Bleachdrinker
Rohan K
⟨ Simon | Schürrle ⟩
jcjordyn120 
inchidi 
Fede 
