# Setings

Bibliotecas usadas:

- datetime
- pandas
- matplotlib

- sys
- json
- glob

- webvtt

Outros:

- Youtube API (https://developers.google.com/youtube/v3)
- Youtube-dl (https://youtube-dl.org/)

## Configurações básicas 

- Bibliotecas usadas
- Diretórios e locais de trabalho

In [1]:
# Bibliotecas usadas

"""
Caso não tenha algumas das bibliotecas instaladas digitar "!pip install". Exemplo: !pip install os. 
Exceção para os casos YoutubeAPI e Youtube-dl. Nesses casos, acessar os links indicados e verificar os documentos disponíveis. 
"""

import os
import pandas as pd
from matplotlib import pyplot as plt

import sys
import datetime
import json
import glob

import webvtt

In [2]:
# Diretórios e locais de trabalho

cwd = os.getcwd()
print(cwd)

/Users/dumoura/Dev/PDev/PoliticalRemix/Lives_da_Semana/Notebook_IngDef/Bolsonorao_Channel_Parsing


In [3]:
# Definir diretórios e locais de trabalho

BASE_DIR = os.path.dirname(cwd) # base de trabalho
DATA_DIR = os.path.join(BASE_DIR, "data") # dados gerais levantados durante projeto
META_DIR = os.path.join(BASE_DIR, "metadados") #metadados levantados durante projeto
CACHE_DIR = os.path.join(BASE_DIR, "cache") # material em processo - pode ser apagado ao final, caso julgue necessário
LINK_DIR = os.path.join(BASE_DIR, "links") # material em processo - pode ser apagado ao final, caso julgue necessário
SUB_DIR = os.path.join(CACHE_DIR, "subs") # local de trabalho para processamento de legenda
VID_DIR = os.path.join(CACHE_DIR, "vids") # local de trabalho para processamento de legenda

# Criar diretórios e locais de trabalho

os.makedirs(DATA_DIR, exist_ok=True)
os.makedirs(META_DIR, exist_ok=True)
os.makedirs(CACHE_DIR, exist_ok=True)
os.makedirs(SUB_DIR, exist_ok=True)
os.makedirs(VID_DIR, exist_ok=True)
os.makedirs(LINK_DIR, exist_ok=True)

In [4]:
# Verificar se os diretórios e locais de trabalho foram criados corretamente
#os.listdir(BASE_DIR)

In [6]:
#contexto

now = datetime.datetime.now()
year = datetime.datetime.now().year
day = datetime.datetime.now().day
month = datetime.datetime.now().month

context = f"_{month}-{day}-{year}"

In [5]:
#print(context)

# Fazer pesquisa em conteúdo verbal de vídeos de um canal 

Objetivo de pesquisa:
- Pesquisar/levantar o que está sendo dido (conteúdo verbal) pelo presidente sobre pandemia.

Passos:
- Selecionar link dos vídeos de corpus 
    - lives da semana

- Fazer download da legenda de cada vídeo
- Criar documento único
    - dados que permitam recuperar informações significativas/contextuais (video, decupagem, conteúdo verbal - o que está sendo dido)
    
- Criar Data Frame
- Salvar documento em .csv

### Selecionar link dos vídeos de corpus

- lives da semana

In [4]:
os.listdir(META_DIR)

['.DS_Store', 'LivesBolsonaro_Metadado_3-3-2021.csv']

In [7]:
# abrir documento geral JairChannel_corpusLives_LINKS_{context}
title = f"LivesBolsonaro_Metadado{context}"
df = pd.read_csv(f"{META_DIR}/{title}.csv")

In [8]:
len(df)
#print(df.videoId) 
#df.videoLink # aparentemente, o youtube-dl funciona melhor com link do que id

97

### Fazer download da legenda de cada vídeo em live da semana

In [9]:
# Levantar no diretório de cache/sub para baixar as legendas de cada vídeo - youtube-dl 
os.chdir(f"{SUB_DIR}")
print(os.getcwd())

/Users/dumoura/Dev/PDev/PoliticalRemix/Lives_da_Semana/Notebook_IngDef/cache/subs


In [10]:
# youtube-dl: para cada link em um arquivo [df] baixar o melhor formato de legenda automática em português [--write-auto-sub  --sub-format best], converter para vtt[--convert-subs vtt], não baixar os vídeos em si [--skip-download], ignorar erros [-i], salvar os arquivos e nomeá-los com o id do vídeo [--id]

for link in df.videoLink:
    os.system(f"youtube-dl {link} -i --id --write-auto-sub --sub-format best --convert-subs vtt --skip-download")

### Criar documento único

In [11]:
# Criar documento único
## 1. listar em diretório todos os documentos terminados em '.vtt'

vtt_dir = SUB_DIR
vtt_root = os.path.join(vtt_dir, "*.vtt")
file_list = glob.glob(vtt_root)

len(file_list)

77

In [None]:
### Criar documento único
## 2. criar Data Frame a partir de 'results' com as colunas id, start, end e text

In [12]:
results = []

for file in file_list:
    data = webvtt.read(file)
    
    for sub in data:
        
        _id = data.file
        start = sub.start  # start timestamp in text format
        end = sub.end  # end timestamp in text format
        text = sub.text  # caption text

        results.append([_id, start, end,  text])
        results_df = pd.DataFrame(results, columns=['id','start', 'end', 'text'])

In [13]:
# verificar results

for i in results_df.text:
    print(i)

it's good night everyone makes our farm lainé every Thursday at 7 pm
 hours of the night and let's go here quickly at the most 15 minutes is to put you already
 can happen is happening the government time são paulo today is celebrated
 in advance the day of our success and there is not in mine there was what I
 I said that in the press the press is important no one doubts
 social media are also important
 I said hurry that in spite of some mishaps between us is we must
 understand why the flame of democracy is not extinguished or rather the press
 functioning even some mishaps it is important for the maintenance of the
 flame of democracy is that old story that old
 story it's better the press sometimes get what they feel is the right press
 such logically on my part I do want to talk to the press
 we say what is important for every side over there is that we have two
 good images written in newspapers good audios good true videos above
 everything we can together right you can see the

 dataprev and yours also a second time week that
 it comes with all certainty I am in the family allowance and it was even for the guy wins 200
 family scholarship degrees will move to 600 1200 or 1800 are three groups i think
 that are a very important point the first is the single registration outside the
 family allowance which are around 10 million people we have the others
 payments between tuesday and wednesday next week we will also have is more than
 20 million people from the family allowance that we will pay on the normal day of 16 until the
 end of the month and fill registration which is what we are doing this week
 for informal workers who are informal workers the
 individual microentrepreneurs and also self-employed workers so it is the
 biggest program and if last point of insertion of people
 because 30 million Brazilians will have a free hand account an account
 digital in the savings box this was never done to cross it there it is on the screen
 here at the front a de

 military area of ​​the armed forces it is easy to come to our knowledge of your
 immediate superior that he follow chain of command he thinks he can win the
 that he thinks can be changed because I'm sure there will be
 sensitivity on the part of the defense ministry to correct is possible
 misconceptions that can happen we are human beings can happen there is no
 trophy doubt ask the question will you go tomorrow after the white house will not
 only goes to an arlington cemetery will deposit a wreath at the grave
 of the unknown soldier you can make a
 I'm not even going to talk about a comparison, but talk a little about how it is
 treatment of the military here the police forces respect for authority
 American here in your books simply
 you are proud of the american people on the streets the most american because the
 treatment given to the military here does not even compare with other countries and even
 in relation to brazil know that the military brazil any research of
 helipe 

 Russian can buy tchan I do not know if this intention is going to
 materialize or not but I am sure to materialize it certainly has a
 great the now yes a great strength is football from the northeast present here in
 Brazilian championship because the guy seems to have and wants to invest in
 fort and then it was wrong not that one I will not speak believe me if you want to have
 a car here in front of me the fortress fan on the right has the
 flamenco also you hurricane behind paranaense is the pig as the team for
 cheer each one here I don't have to be canceled mine will bring Fortaleza Ceará today
 hi haha ​​i don't do it in the face of the hour i'll break your side talk yours
 no name today already know game time 9 o'clock at night who has played and the
 Fortaleza will depend on Control C for Fortaleza, which is a film from Brazil and
 and that's what is South American right, guys, come on
 and quickly here the genius in Ceará will win tomorrow
 and the eight days of the geliyor

 our president rodrigo maia and davi a condo there we also have a good
 relationship that sheds a provisional measure for this change in the code
 of transit and the mayor owes three things here which i think is important
 the first of which is the official who is the first president since the referendum
 and respected the decision that the Brazilian population takes all the rest
 contradicted the decision a little bit that morning
 what we are guaranteeing with an arms possession decree and now the
 decree that organizes snipers and hunters collectors we we are
 guaranteeing the free option the free right the person chooses whether or not he wants to have
 a call now for her to be okay guys
 it has rules and conditions that it must comply with and finally it was a commitment
 from the president of the condition here a rural producer anywhere in the
 brazil the local producer she could have the weapon for her daily life
 a gun on a farm she’s like any other instrument she’s part of
 of

 countries or are doing business what do you produce there someone from another country
 to offer more tires is to sell here inside and outside
 my answer then answer the question of the
 of the rancher is that they spent nine years in the 0 to 0 losing inside the most
 small recovered now let's pass a little flight sacrifice to the president
 has stewardship has gaucho meat has the free car has no doubt day without
 no problem but i will defend my address here last week of course once
 per week logically my wife sent
 move to 2 is the rest I put xvii I put chicken what else is what solves the
 problem passes the crisis passes now tabulate there is only subsidy arranged
 to create a tax to give the rancher to go down to the house, there are those who act like
 way those who acted in previous governments put their economy on the canvas other
 countries from venezuela to brazil for the past year and a price freeze that
 is older remember this ends up not having the material on the shelf 

 employees is what is in the informality was already in the informality and lost its
 job I had the week that the present in the union of bars and restaurants spoke
 that this sector employs approximately 6 million people
 hi and most of them are people who are informal, so the government
 did everything possible ok and to meet then it is these people among other informal
 other professions will be more or less 20 million people I don't have much more
 much more than that in informality but they will not lose a job 20 minutes
 and the one decided so far has all 200 ml per person you four billion a month
 because this measure is worth three months, but for those who have nothing
 help and this is where we can go and say stay very clear a large part
 we can only be able to attend because the congress approved our
 request to declare a state of calamity, that is, the government can spend more than
 the ceiling, right, is that provided for in the specific legislation then it is the moment 

 curb and completed just congratulations to the Brazilian army congratulations to
 register earthquakes and in particular those of members of the engineering battalion of
 construction that participated in this work the news for you we already talked to
 economy we talked to the defense a soldier
 engaged as the loot a second lieutenant of the second Brazilian army right
 an army soldier and worked from second to second world war is of
 $ 25 so we are fighting all right depends
 that will depend on parliament if it is understood that it is fair I do not want to pass
 for 50 reais a soldier's daily so in my opinion second tarciso
 in the works all over brazil approximately one thousand soldiers are on this front of
 alonso so help is 50 a day instead of 25 that i feel very little
 I think 50 already helps our battalion soldiers
 engineering is construction congratulations Brazilian br 163 for
 the forecast is completed in December and in January, right
 together with tarciso inaugurated

 athletes and shape he's like big rabbit next democracy
 to see yesterday also to close I posted a story on facebook a lot
 dear people what happens the attorney general of the union he issues
 opinions and once there it seems to be published in the official union diary
 has the force of law and stayed there at lingo, because it helps to reform the
 federal police officers federal highway police officers
 civilians and the criminal police regarding the quality and
 personal integrality criticized me net that I forgot the military police leaves the
 market that the military poly has this right I have not forgotten is that it has
 right and another thing is not to forget ok is because the opinion of the lawyer already
 neither can he legislate he interprets the law judged to be precedents etc etc
 interprets everything and gives his opinion
 I am the one who decides the attorney general of the union and the work started then
 lawyer andrea mendonça and ended up as josé levy then signing 

 that something was happening regarding the decrease in viral load with
 that person who's changing me and nowadays is more and more
 Túlio being that our routine reduces the number of deaths even more who takes nor
 goes to the hospital very much is to be intubated in the building at work Tereza
 more or less you with the people she over 200 caught the virus none was
 to Hospital obviously then none died seeing it appear it was
 working from home huh what do you call work at home skin serious work
 ROblocks work ROblocks I had not already picked up on
 I'm at home and died I don't know if she took it like that I don't know that there are people
 that according to the communities right logically to these likes It can be
 that has some side effect I don't know you can't be but in fact the
 was used she didn’t use She didn’t she didn’t use the alternatives and then appear
 and what you know about viruses is and other health problems that had wax
 Field there compared to last year appreci

 supporting these measures of closing everything stays at home fine he is on the street
 handcuffs who are on the beach or looking like the police are similar to ours although
 with all due respect, the supreme federal court ruled that the
 relapses are the exclusive responsibility of governors and mayors then
 unemployment largely some governors to mayors have this
 responsibility what day will we stay open question then will you give time
 here and the class there of the young woman to do a
 comment there coming that is criticizing we have no problem
 in this 20 o'clock here so we have four questions right boy you can
 answer I answer do one is tomorrow it's just you live god you want
 in são vicente right and on the bridge there are two barreiros barreiros
 oe together with rosana valle ok seeing finally those of that work
 that so much disorder brought the population of são vicente santos and region first
 ask some some bids
 and then hi hi
 the most affected and least affected and

 so this is what i want to thank for every opportunity Thursday
 we will return if god wants to meet luciana we go to him is to expand the staff
 of my dear he was ours here and interpreter of
 pounds until next week we will rescue the
 players
the city of briefs on Marajó Island in Pará 19 hours to my left that the
 Fabiano interpreter of pounds here my side here our Admiral Bento
 Albuquerque Minister of Mines and Energy and here Pedro Guimarães
 president of Caixa Econômica Federal what we are doing here in the distant
 Marajó and we are here sister mission has more
 federal deputies with me here is Faria representing the squad
 a Cerebral Telebras It is also that deputies Marcelo Álvaro Antônio do
 Tourism will have tourism here but what time are you doing tourism here today
 hahaha and me Bruna Bianca who will deputy
 but here after work, the retired people here have people
 do it here, you see a lot of people so use our Zinho here at Caixa
 Even the old army ship was worth it, it

 very important thing here
 the north-south railway role a part of it completing its three and a half years
 for more than a thousand km so why would any minister this stretch
 of the highway was purchased at the full price offered by us and within
 approximately three years we will have here then concluded this north railway
 south connecting national port in tocantins in the far west of são paulo
 to exemplify it was a great job also from the deputy state of Goiás
 federal representatives are 17 federal deputies and three senators from the state of goiás
 we held meetings a meeting was on the eve of launching
 at the auction the minister was present the state governor the deputies
 federal senators and also a member of the public prosecutor's office at the
 tcu and is an example of interaction between the powers everything is right, the level of union
 union states chamber and senate discussing to promote brazil from a political
 efficient public that the next day we went so it is sã

 he will keep the veto be more 2 3 + 3 + 3 senator to resolve to return to keep
 beto to see what we see ok the veto will not be maintained so call me
 depend on me not hi total social media cruise
 that are secreted no one suffers more than I doubt that during the campaign
 look I was called a racist homophobic fascist chenofago doesn't like
 northeastern for everything over me ok pt one of your two inspections a
 a girl appears on the television with a swastika marked on her belly, right, and that of the pt
 said it was good the nose that did that after you want a skill and coming
 that I she herself made she allowed it to be done by the depth of the cuts
 300 fake news ok we survived all of that and more and more the population
 is getting it right and immediately knows what fake news is and
 all right now i'm being judged by several lawsuits asking for the impeachment of
 the plate doesn’t have this, this is what the next one looks at, just say that the most
 tricky is that I would

 there solidarity a concrete salad to avoid certain problem
 now we are going to qualify the judge just because there is an injunction to return there
 the menu of the supreme well if the judge months before if he cannot there against the
 another to do that is not worth french fries the other pleasure because he is a vegetarian not
 we are going to end up with red meat in the Federal Supreme Court this is going to
 each institution I will not criticize the Supreme because of that even I do not
 I do here this one day to try Lagoa here because I was my wife that I didn’t
 I want I want Lagoas guys
 the problem can be poor who can’t eat, okay, it’s not, it’s not for
 disqualify want to see something has a name there that was widely considered as
 supremagrill is with our dear there beauty André André Mendonça
 I use it when I could, André is not discarded, he’s already on the tape Jorge
 that tape and ok and that makes our why so connected
 also with seven months I would not speak here 

 defense until the government fears the best until the government ends defense is the service
 there is no deviation of function to put the civilian in defense is not a deviation of function
 started back there to meet the measures the defense 99 when it was when fernando
 henrique cardoso created the defense and created it by political imposition and not
 necessary military I don't like never liked
 the joy was free for three minutes right luiza maria of the times the illness boss who
 that's it and evading my military home was also a five-minute book is
 boring to work with honest people no felipe cardoso created the defense of heaven
 then the fear started, I already said I was going to put a defender in your general
 four-star theme started off by putting on a young shorts that is now the gift of
 impugnation to the question that the governor mouse what do you think of the general
 silva and luna commanding the binational itaipu there what the mayors of the region
 so it’s doing an

 giving a hug cannot be everyone is some more policemen is a matter of honor and
 satisfaction with the president, right, it's like it was in the past, too, sometimes that I
 opinion while parliamentary deputy say last week with the
 rodrigo maia making the bed he is the owner of the lack in parliament has helped
 enough to vote next week the agreement between brazil and the united states in the
 car launch center of alcantara increase the select group of countries that
 least until 20 years stopped by ideological position and the pt government never
 she never accepted that with an American she wanted to know that it was her secrecy in her
 than I was dancing the government pt wanted to see everything ended up not seeing anything ok
 hi and this market reserve really who asks to have access that the guy is
 launching will not be able to really did not get this thing ever there
 the head but it's one thing let's get a little bit of fake news here just one
 little bit of week only brasi

 we can do the same thing band am already talked to him in bed rodrigo
 Maya already talked to the colombo plane are the people who make the agenda in bed
 rodrigo maia was extremely sympathetic to this proposal
 now for us to really get it right that the resource comes here and be
 applied we have to revoke the decree and look
 only that environmental activism which repeals one law another law that changes the article of
 convention announces cláudia pedra is another article of the convention which raises the
 ordinance is another ordinance but the environmental decree that repeals one law and another is
 i want to play three four ten environmental decrees don't have to have ten laws 10
 rodrigo's project will show a very nice view of rio de janeiro
 I presenting the project intends in the next forty the project to revoke
 with the justification for that in that place but it is a tourist center of
 south america there you can be sure that the brazilian instead of wanting to cancún
 th

 disability is employment time service issues that come from outside Brazil to
 here you don’t have it formed inside here you can see it’s good to get you
 there is none here a school I speak the name does not seem to be in trouble but
 there particular renowned glue teaching the language the third sex speaking
 Minimax Green boys and girls if we had so much bath, right?
 a South Korea Japan tell me you could even do a
 I played until I thought it was wrong to make a joke, there's a lot of attention now in
 open school this what it is is not earned from it now the other side we had
 the collection of some 40 days the boys visiting Look at the Student's capacity
 Brazilian does not understand mathematics we went if I am not mistaken second or
 third place in I was joining radio there in the economy we were two times voice I
 it was on that day the young man's Brazilian capacity is name what is it that
 Tom is lacking, lacking discipline, lacking hierarchy, making them really
 teachers c

 it is as if a cockroach helps to transport everything cheaper in Brazil the freight has dropped
 price says the price drop you help everybody we're going to do the opposite
 instead of increasing strength or decreasing that what the economy running the most is
 wins but at the end of the line now it's not easy
 I want to thank that the people who are collaborating with the creation of the new
 party to the alliance for brazil
 the interested person not here helping us has to recognize the paid firm at the registry office
 about five reais and some cases they have to send me to run here
 so short there, after all, you didn't like it, you almost got a scratch off the file
 it weighs a lot for those who are on the tip of the tongue to thank you for your support
 collaboration for we make a new party to dispute the elections of Sunday 22
 a party of ours from brazil has this statute already has it tidy lets it go well
 of course, even if the new party or alliance is sanctioned in 2022
 cr

 in case you remove these pages here that deals
 of this type of subject that we have done here is enough in the violence
 I called the minister of health for a date in Mato Grosso do Sul
 the wife of the year for him and then the solution is that it was to tell you that he took
 will make a new booklet with less for cheaper and its figures
 no and we will quickly distribute collect these antennas for the way
 right in my head is right and let's not forget something here
 the last activity here right tomorrow international women's day and has
 in my mother cries with alloy port dealer is with 90 unit out
 from the ribeira valley i want to send a hug to all
 Brazilian women were kidnapped while we are here because of them
 that you are very, very important I am very important and our life the oil
 Brazil a kiss in everyone's heart is the president we have here is a
 interaction with our internet users and some questions they present to us
 I would like to submit to your presentation and

 banks formerly lent the money nobody knew hitting lent
 partner money is a similar deluxe took binding
 adoption intending to open this data April aid payment data
 emergency so raining increasing the transparent by all measures comes
 contributing for the federal government today to reach an interesting mark that
 is the not to be of corruption of the present government is the presence interests for
 nice people the corrupt at the end of the line we’re interested in preventing him from practicing
 production they divert public resources so so to a feeling many
 athletic so public times there is only combat so you see squeeze be
 more knocking on the door with someone yes this promotions keep happening and
 g1 control
 okay for sure that we only during this period of knowing from 19 to
 each and there were 111 joint operations with the federal police as a form with the
 general controllership of the union but only talking is preventive I will call
 attention to listen to a folded fabr

 at that time as a function and some problem I had as a ballet was not a crisis of the
 oil the second time the catholic is lacking oil that makes the price much more
 price of fuel in Brazil because the weight of the gasoline power I was there
 underneath it would be a show together here I don't know who it is
 more or less and maybe then the one below is the book as a logical day maybe I don't know
 semester that is the part that weighs the most in the taxpayer’s pocket
 is an average state tax of 30 for example
 so the federal government invested ethanol here in the album exactly decreases the
 dependence on importing from brazil was in this
 time I was already investing a lot in work I was already investing a lot
 but it is a little far from self-reliance
 but it’s something that we’re going to see for a week next week is for new york in
 a year the UN to make a statement there is in the face that you charge is because the
 most countries attack me in a very violent way is that I a

In [14]:
### Criar documento único
## 3. Salvar Data Frame a partir de 'results' com as colunas id, start, end e text.
results_df.to_csv(f"{DATA_DIR}/LivesBolsonaro_SubFirstSum{context}.csv")

## 1. Limpar documento

In [30]:
df = pd.read_csv(f"{DATA_DIR}/LivesBolsonaro_SubFirstSum{context}.csv")

In [31]:
len(df)
# # df.id[0]

31706

In [32]:
# df.head(11)

In [33]:
# for i in results_df.id:
#      print(i)

In [35]:
# criar colunas com o id e link para os vídeos
df[['dir_path','sub_doc']] = df.id.str.split(f"{SUB_DIR}/",expand=True)
df[['video_id', "doc"]] = df.sub_doc.str.split(".vtt",expand=True)
df["video_link"] = "https://www.youtube.com/watch?v=" + df.video_id

In [36]:
# apagar 'dir_path', '"doc" , sub_doc' 
# renomear "id" para "vid_path"

df.drop(['dir_path', "doc", 'sub_doc'], axis=1, inplace=True)
df.rename(columns={'id': 'vid_path'}, inplace = True)
df = df[['video_id','start', 'end', 'text','video_link', 'vid_path']]

In [37]:
df.head()

Unnamed: 0,video_id,start,end,text,video_link,vid_path
0,J9u1Cl49xlw.en,00:00:00.000,00:00:10.469,it's good night everyone makes our farm lainé ...,https://www.youtube.com/watch?v=J9u1Cl49xlw.en,/Users/dumoura/Dev/PDev/PoliticalRemix/Lives_d...
1,J9u1Cl49xlw.en,00:00:10.469,00:00:17.609,hours of the night and let's go here quickly ...,https://www.youtube.com/watch?v=J9u1Cl49xlw.en,/Users/dumoura/Dev/PDev/PoliticalRemix/Lives_d...
2,J9u1Cl49xlw.en,00:00:17.609,00:00:23.580,can happen is happening the government time s...,https://www.youtube.com/watch?v=J9u1Cl49xlw.en,/Users/dumoura/Dev/PDev/PoliticalRemix/Lives_d...
3,J9u1Cl49xlw.en,00:00:23.580,00:00:29.460,in advance the day of our success and there i...,https://www.youtube.com/watch?v=J9u1Cl49xlw.en,/Users/dumoura/Dev/PDev/PoliticalRemix/Lives_d...
4,J9u1Cl49xlw.en,00:00:29.460,00:00:33.450,I said that in the press the press is importa...,https://www.youtube.com/watch?v=J9u1Cl49xlw.en,/Users/dumoura/Dev/PDev/PoliticalRemix/Lives_d...


In [38]:
# Salavar em csv o doucumento "LivesBolsonaro_Subtitle" na pasta "data"
df.to_csv(f"{DATA_DIR}/LivesBolsonaro_SubtitleCl{context}.csv")

## 2. Primeira pesquisa documento

- quais vídeos na Live da semana são mencionados os termos hidroxicloroquina, cloroquina, ivermectina ou azitromicina?

In [39]:
#df = pd.read_csv(f"{DATA_DIR}/LivesBolsonaro_SubtitleC_{month}{day}{year}.csv")
title = f"/LivesBolsonaro_SubtitleCl{context}.csv"
df = pd.read_csv(f"{DATA_DIR}{title}")

In [40]:
df = df.loc[:, ~df.columns.str.contains('^Unnamed')]

In [45]:
#azitromicina
early_treatment = df[df.text.str.contains(r'chloroquine|hydroxychloroquine|azithromycin|ivermectin|early treatment', case=False)]

In [51]:
# len(early_treatment)

In [52]:
# early_treatment.head(5)

In [53]:
# for link in early_treatment.video_link.unique():
#     early_treatment = link
#     print(early_treatment)

# print(len(early_treatment))

**R**. São os 46 vídeos ascima

# Querying - Live da Semana

Which videos in Live da Semana mention medicine hydroxychloroquine and ivermectin?
When? How?

In [54]:
#df = pd.read_csv(f"{DATA_DIR}/LivesBolsonaro_SubtitleC_{month}{day}{year}.csv")
title = f"/LivesBolsonaro_SubtitleCl{context}.csv"
df = pd.read_csv(f"{DATA_DIR}{title}")

In [56]:
len(df)

31706

In [120]:
Chloroquine = df[df.text.str.contains(r'hydroxychloroquine|chloroquine', case=False)]
ChloroquineIvermectin = df[df.text.str.contains(r'hydroxychloroquine|chloroquine|ivermectin', case=False)]
EarlyTreatment = df[df.text.str.contains(r'early treatmen', case=False)]
PacificWar = df[df.text.str.contains('the war of the pacific|pacific war', case=False)]
CapitaoCloroquina = df[df.text.str.contains(r'chloroquine|hydroxychloroquine|azithromycin|ivermectin|early treatment', case=False)]
Virus = df[df.text.str.contains(r'little flu|COVID-19|flu|colds|pandemic|coronavirus', case=False)]
BrHealth = df[df.text.str.contains(r'mask|isolation|lockdown', case=False)]
Blame = df[df.text.str.contains(r'media|folha|globo|pt', case=False)]

In [118]:
# Chloroquine
# ChloroquineIvermectin
# EarlyTreatment
# PacificWar
# CapitaoCloroquina
# Virus
# BrHealth 
# Blame

In [121]:
# Save
Chloroquine.to_csv(f"{DATA_DIR}/LivesBolsonaro_chloroquine{context}.csv")
ChloroquineIvermectin.to_csv(f"{DATA_DIR}/LivesBolsonaro_ChloroquineIvermectin{context}.csv")
EarlyTreatment.to_csv(f"{DATA_DIR}/LivesBolsonaro_earlyTreatment{context}.csv")
PacificWar.to_csv(f"{DATA_DIR}/LivesBolsonaro_PacificwWar{context}.csv")
CapitaoCloroquina.to_csv(f"{DATA_DIR}/LivesBolsonaro_CapitaoCloroquina{context}.csv")
Virus.to_csv(f"{DATA_DIR}/LivesBolsonaro_Virus{context}.csv")
BrHealth.to_csv(f"{DATA_DIR}/LivesBolsonaro_BrHealth{context}.csv")
Blame.to_csv(f"{DATA_DIR}/LivesBolsonaro_Blame{context}.csv")

In [125]:
BrHealth = f"/LivesBolsonaro_BrHealth{context}.csv"
df = pd.read_csv(f"{DATA_DIR}{BrHealth}")
df = df.loc[:, ~df.columns.str.contains('^Unnamed')]

In [128]:
# Which
for link in df.video_link.unique():
    videos_cloroquina = link
    print(videos_cloroquina)

https://www.youtube.com/watch?v=2tB4XLKXSeI.en
https://www.youtube.com/watch?v=F9jXlF2ExQE.en
https://www.youtube.com/watch?v=ZLIUvoZDSFc.en
https://www.youtube.com/watch?v=rSO0DszwUbA.en
https://www.youtube.com/watch?v=XceWFVE7QLc.en
https://www.youtube.com/watch?v=8oPisf3kbGI.en
https://www.youtube.com/watch?v=8Cn1PGmlJuk.en
https://www.youtube.com/watch?v=WLd2HmL3Ua0.en
https://www.youtube.com/watch?v=ALFq4eHRUo0.en
https://www.youtube.com/watch?v=Ea5ZK0Fr5TM.en
https://www.youtube.com/watch?v=YCliiy_yl9Y.en
https://www.youtube.com/watch?v=ZZj93uz78NE.en
https://www.youtube.com/watch?v=AhySjAMku18.en
https://www.youtube.com/watch?v=GVzh8k6YjKU.en
https://www.youtube.com/watch?v=S28DvOuB6cM.en
https://www.youtube.com/watch?v=P3B8L5ql5GM.en
https://www.youtube.com/watch?v=5wTrE6F5jlc.en
https://www.youtube.com/watch?v=UqEQfL6il8M.en
https://www.youtube.com/watch?v=oVIJD_tuRPY.en
https://www.youtube.com/watch?v=vNyBRsVZ0gg.en
https://www.youtube.com/watch?v=VuMbYrq_ys4.en
https://www.y

In [129]:
for texts in df.text:
    print(texts)

 don't wear a mask it's a chain there's no use for a project like this one is the skin
 good then prison sentence for those who wear a mask she fine right I see your love
 here you have it b ****** not going black but who wears old mask
 Brazilian B ****** cannot be arrested but those who are not wearing a mask will go home
 quarantine social isolation suspension of activities you who are not working
 we should in the middle and there is to go for vertical isolation
 complications there may be more serious complications then i'm wearing a mask
 appropriate mask for that to pass my side that I still remain under
 masked people here the interpreter the brand president everybody
 used to seeing that broad smile from the president came in a mask and saw the
 minister with this mask is different hers is blue and mine but this shape
 meet with a mask like this is the reason we put it on to show how
 president is going to have to dispatch from here we are going to recommend the isolation
 is 