**Article :**
https://www.learndatasci.com/tutorials/how-stream-text-data-twitch-sockets-python/m

# Twitch Scrapping Bot

Get the notebook and scripts for this article on :
https://github.com/LearnDataSci/articles/tree/master/How%20to%20Stream%20Text%20Data%20from%20Twitch%20with%20Sockets%20in%20Python

#### Getting your credentials

Go to https://twitchapps.com/tmi/ to request an auth token for your Twitch account. You'll need to click "Connect with Twitch" and "Authorize" to produce a token

**channel** corresponds to the streamer's name and can be the name of any channel you're interested in

In [347]:
server = 'irc.chat.twitch.tv'
port = 6667
nickname = 'mck_ix'
token = 'oauth:ltnrj0wblnumqh17kjgckqu6v60arp'
channel = '#ponce'

#### Connecting to Twitch with sockets
To establish a connection to Twitch IRC we'll be using Python's socket library. First we need to instantiate a socket:

In [348]:
import socket
sock = socket.socket()
sock.connect((server, port))

- PASS carries our token
- NICK carries our username
- JOIN carries the channel

Note that we send encoded strings by calling .encode('utf-8'). This encodes the string into bytes which allows it to be sent over the socket.

In [32]:
sock.send(f"PASS {token}\n".encode('utf-8'))
sock.send(f"NICK {nickname}\n".encode('utf-8'))
sock.send(f"JOIN {channel}\n".encode('utf-8'))

12

**Receiving channel messages**

Now we have successfully connected and can receive responses from the channel we subscribed to. To get a single response we can call **.recv()** and then decode the message from bytes:

In [33]:
resp = sock.recv(2048).decode('utf-8')

resp

':tmi.twitch.tv 001 mck_ix :Welcome, GLHF!\r\n:tmi.twitch.tv 002 mck_ix :Your host is tmi.twitch.tv\r\n:tmi.twitch.tv 003 mck_ix :This server is rather new\r\n:tmi.twitch.tv 004 mck_ix :-\r\n:tmi.twitch.tv 375 mck_ix :-\r\n:tmi.twitch.tv 372 mck_ix :You are in a maze of twisty passages, all alike.\r\n:tmi.twitch.tv 376 mck_ix :>\r\nPING :tmi.twitch.tv\r\n'

In [36]:
resp = sock.recv(2048).decode('utf-8')

resp

':huranthil!huranthil@huranthil.tmi.twitch.tv PRIVMSG #ponce :Mais Wingo il se défend de ouf ? Quel crack haha ponceOVERTOAD\r\n:randoche!randoche@randoche.tmi.twitch.tv PRIVMSG #ponce :GEOZONE\r\n'

In [37]:
# sock.close()

#### Writing messages to a file

In [38]:
import logging

logging.basicConfig(level=logging.DEBUG,
                    format='%(asctime)s — %(message)s',
                    datefmt='%Y-%m-%d_%H:%M:%S',
                    handlers=[logging.FileHandler('chat.log', encoding='utf-8')])

In [39]:
logging.info(resp)

#### Continuous message writing

In [40]:
from emoji import demojize

In [41]:
while True:
    resp = sock.recv(2048).decode('utf-8')

    if resp.startswith('PING'):
        sock.send("PONG\n".encode('utf-8'))
    
    elif len(resp) > 0:
        logging.info(demojize(resp))

KeyboardInterrupt: 

#### Parsing logs

In [42]:
msg = '2018-12-10_11:26:40 — :spappygram!spappygram@spappygram.tmi.twitch.tv PRIVMSG #ninja :Chat, let Ninja play solos'

In [43]:
from datetime import datetime

time_logged = msg.split()[0].strip()

time_logged = datetime.strptime(time_logged, '%Y-%m-%d_%H:%M:%S')

time_logged

datetime.datetime(2018, 12, 10, 11, 26, 40)

In [44]:
username_message = msg.split('—')[1:]
username_message = '—'.join(username_message).strip()

username_message

':spappygram!spappygram@spappygram.tmi.twitch.tv PRIVMSG #ninja :Chat, let Ninja play solos'

In [45]:
import re

username, channel, message = re.search(':(.*)\!.*@.*\.tmi\.twitch\.tv PRIVMSG #(.*) :(.*)', username_message).groups()

print(f"Channel: {channel} \nUsername: {username} \nMessage: {message}")

Channel: ninja 
Username: spappygram 
Message: Chat, let Ninja play solos


In [50]:
import pandas as pd

def get_chat_dataframe(file):
    data = []

    with open(file, 'r', encoding='utf-8') as f:
        lines = f.read().split('\n\n\n')
        
        for line in lines:
            try:
                time_logged = line.split('—')[0].strip()
                time_logged = datetime.strptime(time_logged, '%Y-%m-%d_%H:%M:%S')

                username_message = line.split('—')[1:]
                username_message = '—'.join(username_message).strip()

                username, channel, message = re.search(
                    ':(.*)\!.*@.*\.tmi\.twitch\.tv PRIVMSG #(.*) :(.*)', username_message
                ).groups()

                d = {
                    'dt': time_logged,
                    'channel': channel,
                    'username': username,
                    'message': message
                }

                data.append(d)
            
            except Exception:
                pass
            
    return pd.DataFrame().from_records(data)
        
    
df = get_chat_dataframe('chat.log')

In [51]:
df.set_index('dt', inplace=True)

print(df.shape)

df.head()

(56, 3)


Unnamed: 0_level_0,channel,username,message
dt,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2022-03-18 14:28:40,ponce,lesaumonparfait,ponceFLEUR ponceFLEUR ponceFLEUR ponceFLEUR po...
2022-03-18 14:32:19,ponce,eri_keii,Bonjour à toutes les belles fleurs ponceCOEURF...
2022-03-18 14:32:21,ponce,spyhna,@afternoune omg j'viens de capter que mon 05 c...
2022-03-18 14:32:22,ponce,raizie_,wooow
2022-03-18 14:32:24,ponce,hyoukho,ponceBLEUE


In [52]:
df

Unnamed: 0_level_0,channel,username,message
dt,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2022-03-18 14:28:40,ponce,lesaumonparfait,ponceFLEUR ponceFLEUR ponceFLEUR ponceFLEUR po...
2022-03-18 14:32:19,ponce,eri_keii,Bonjour à toutes les belles fleurs ponceCOEURF...
2022-03-18 14:32:21,ponce,spyhna,@afternoune omg j'viens de capter que mon 05 c...
2022-03-18 14:32:22,ponce,raizie_,wooow
2022-03-18 14:32:24,ponce,hyoukho,ponceBLEUE
2022-03-18 14:32:24,ponce,saylux_,Tu rides la peuf la
2022-03-18 14:32:25,ponce,bblackmc_,LUL
2022-03-18 14:32:26,ponce,cyrial_42,LUL
2022-03-18 14:32:28,ponce,goldengameon,ponceBLEUE
2022-03-18 14:32:30,ponce,thestrangiatobytor,LUL LUL


In [64]:
df.iloc[0:10,:]

Unnamed: 0_level_0,channel,username,message
dt,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2022-03-18 14:28:40,ponce,lesaumonparfait,ponceFLEUR ponceFLEUR ponceFLEUR ponceFLEUR po...
2022-03-18 14:32:19,ponce,eri_keii,Bonjour à toutes les belles fleurs ponceCOEURF...
2022-03-18 14:32:21,ponce,spyhna,@afternoune omg j'viens de capter que mon 05 c...
2022-03-18 14:32:22,ponce,raizie_,wooow
2022-03-18 14:32:24,ponce,hyoukho,ponceBLEUE
2022-03-18 14:32:24,ponce,saylux_,Tu rides la peuf la
2022-03-18 14:32:25,ponce,bblackmc_,LUL
2022-03-18 14:32:26,ponce,cyrial_42,LUL
2022-03-18 14:32:28,ponce,goldengameon,ponceBLEUE
2022-03-18 14:32:30,ponce,thestrangiatobytor,LUL LUL


In [73]:
df.iloc[0:10,1:3]

Unnamed: 0_level_0,username,message
dt,Unnamed: 1_level_1,Unnamed: 2_level_1
2022-03-18 14:28:40,lesaumonparfait,ponceFLEUR ponceFLEUR ponceFLEUR ponceFLEUR po...
2022-03-18 14:32:19,eri_keii,Bonjour à toutes les belles fleurs ponceCOEURF...
2022-03-18 14:32:21,spyhna,@afternoune omg j'viens de capter que mon 05 c...
2022-03-18 14:32:22,raizie_,wooow
2022-03-18 14:32:24,hyoukho,ponceBLEUE
2022-03-18 14:32:24,saylux_,Tu rides la peuf la
2022-03-18 14:32:25,bblackmc_,LUL
2022-03-18 14:32:26,cyrial_42,LUL
2022-03-18 14:32:28,goldengameon,ponceBLEUE
2022-03-18 14:32:30,thestrangiatobytor,LUL LUL


In [78]:
msg_01 = "EP 1"

motif = re.compile("1")
obj = motif.search(msg_01)
if obj:
    print('Trouvée')
else:
    print ('Non Trouvée')

Trouvée


In [113]:
msg_01 = "Naruto Shippuden"
msg_02 = "Naruto"

In [114]:
def hamming_distance(string1, string2): 
    if (len(string1) != len (string2)):
        return -1
    # Start with a distance of zero, and count up
    distance = 0
    # Loop over the indices of the string
    L = len(string1)
    for i in range(L):
        # Add 1 to the distance if these two characters are not equal
        if string1[i] != string2[i]:
            distance += 1
    # Return the final count of differences
    return distance
 
hamming_distance(msg_01, msg_02)

-1

In [115]:
import jaro
jaro.jaro_metric(msg_01, msg_02)

0.7916666666666666

In [117]:
import Levenshtein as lev

In [330]:
def levCalclulate(str1, str2):
    Distance = lev.distance(str1, str2)
    Ratio = lev.ratio(str1, str2)
#     print("Levenshtein entre {0} et {1}".format(str1, str2))
#     print("> Distance: {0}\n> Ratio: {1}\n".format(Distance, Ratio))
    return Distance, Ratio

# levCalclulate("Benoit", "Ben")
# levCalclulate("Benoit", "Benoist")
# levCalclulate(msg_01, msg_02)

In [127]:
msg_01 = "naruto shippuden"
msg_02 = "Narut shipouden".lower()

levCalclulate(msg_01, msg_02)

Levenshtein entre naruto Shippuden et narut shipouden
> Distance: 3
> Ratio: 0.8387096774193549



#### TEST BDD EXCEL

In [182]:
bdd_01 = pd.read_excel(r'BDD\bdd_01.xlsx')  

In [183]:
bdd_01

Unnamed: 0,Série,Question,Catégorie,Titre du son,Nom de l'oeuvre,"Type du son (EP,ED,OST)",Numéro du son,Auteur du son
0,1,1,Anime,Unravel,Tokyo Ghoul,OP,1,Toru Kitajima
1,1,2,Anime,The World,Death Note,OP,1,NIGHTMARE
2,1,3,Anime,Sign,Naruto Shippuden,OP,6,Flow
3,1,4,Anime,CHA-LA HEAD CHA-LA,Dragon Ball Z,OP,1,Kageyama Hironobu
4,1,5,Anime,Departure!,Hunter x Hunter (2011),OP,1,Ono Masatoshi


In [286]:
def remove_all_extra_spaces(string):
    return " ".join(string.split())

In [219]:
bdd_01_lower = bdd_01.apply(lambda x: x.astype(str).str.lower())

In [262]:
msg_01 = "Unravel / Tokyo Ghoul / OP/ 1/ Toru Kitajima"
msg_01 = msg_01.lower()
msg_01_liste = msg_01.split("/")
msg_01_clean = msg_01_liste

In [287]:
for index, elem in enumerate(msg_01_liste):
    if isinstance(elem, str) == True:
        msg_01_clean[index] = remove_all_extra_spaces(elem)
    else:
        msg_01_clean[index] = elem
    
msg_01_clean

['unravel', 'tokyo ghoul', 'op', '1', 'toru kitajima']

In [288]:
bdd_01_lower.iloc[0][3]

'unravel'

In [327]:
bdd_01_score = pd.read_excel(r'BDD\bdd_score.xlsx')  

In [329]:
bdd_01_score

Unnamed: 0,Username,Série,Question,Titre du son (+1),Nom de l'oeuvre (+3),"Type du son (EP,ED,OST) (+1)",Numéro du son (+1),Auteur du son (+1),Points,Total
0,MCK_IX,1,1,,,,,,,
1,MCK_IX,1,2,,,,,,,
2,MCK_IX,1,3,,,,,,,
3,MCK_IX,1,4,,,,,,,
4,MCK_IX,1,5,,,,,,,
5,aypepito,1,1,,,,,,,
6,aypepito,1,2,,,,,,,
7,aypepito,1,3,,,,,,,
8,aypepito,1,4,,,,,,,
9,aypepito,1,5,,,,,,,


In [339]:
msg_01_clean
msg_01_clean

['unravel', 'tokyo ghoul', 'op', '1', 'toru kitajima']

In [346]:
tot = 0
for elem in msg_01_clean:
    
    distance, ratio = levCalclulate(elem, bdd_01_lower.iloc[0][3]) # nom du son
    if (distance <= 3) and (ratio > 0.8):
        print("Nom de l'oeuvre :", elem, " (+3 pts)")
        tot += 3
    
    distance, ratio = levCalclulate(elem, bdd_01_lower.iloc[0][4]) # nom de l'oeuvre
    if (distance <= 3) and (ratio > 0.8):
        print("Titre du son :", elem, " (+1 pts)")
        tot += 1
    
    distance, ratio = levCalclulate(elem, bdd_01_lower.iloc[0][5]) # OP/ED/OST
    if (distance <= 3) and (ratio > 0.8):
        print("Type du son (EP,ED,OST) :", elem, " (+1 pts)")
        tot += 1

    distance, ratio = levCalclulate(elem, bdd_01_lower.iloc[0][6]) # Numéro
    if (distance <= 3) and (ratio > 0.8):
        print("Numéro du son :", elem, " (+1 pts)")
        tot += 1

    distance, ratio = levCalclulate(elem, bdd_01_lower.iloc[0][7]) # Auteur
    if (distance <= 3) and (ratio > 0.8):
        print("Auteur du son :", elem, " (+1 pts)")
        tot += 1

print("Total : ",tot,"/ 7 pts")

Nom de l'oeuvre : unravel  (+3 pts)
Titre du son : tokyo ghoul  (+1 pts)
Type du son (EP,ED,OST) : op  (+1 pts)
Numéro du son : 1  (+1 pts)
Auteur du son : toru kitajima  (+1 pts)
Total :  7 / 7 pts


In [336]:
print(bdd_01_lower.iloc[0][3])
print(bdd_01_lower.iloc[0][4])
print(bdd_01_lower.iloc[0][5])
print(bdd_01_lower.iloc[0][6])
print(bdd_01_lower.iloc[0][7])

unravel
tokyo ghoul
op
1
toru kitajima
