In [1]:
import pandas as pd

## Loading Data
Brings the urls generated by the Slack Bot into a pandas dataframe.

If this is the first time running this, comment out the first line where an existing dataframe is loaded in.

In [14]:
# If there is an existing data file, load it in.
df = pd.read_csv('text_data.csv', sep='\t')

# Gets the new url list
urls_list = list(set(open("slack_output.txt", "r").read().splitlines()))
urls = pd.DataFrame({'url' : urls_list, 'title': None, 'text': None})

# Gets only new urls not currently in the df.
new_urls = urls[~urls['url'].isin(df['url'])]
df = pd.concat([df, new_urls], axis=0)

# Removes youtube urls
df = df[~df['url'].str.contains("youtu")].reset_index(drop=True)

## Filling Data
A little automated "interface" for copy-pasting each article's title and text body into the data frame.
If the link is invalid (or you do not want to add it for any reason) just press enter to skip and the values will be whitespace which can be filtered out later.
If you make a mistake, stop the execution of the cell and just restart it, it will pickup on the last save (which is last full article submitted).

In [17]:
"""
Here is the block for filling in the titles and text of each article.
This could be done with webscraping or something, but with the fairly small amount of articles I will just get them manually.
"""
import os

unlabeled_df = df[df['title'].isna()]

for i, row in unlabeled_df.iterrows():
    os.system(f'cmd /c start iexplore {row.url}')
    print("")
    print(f'Index: {i} / {df.shape[0]}')
    print("Url: " + row.url)
    title = input("Title: ").strip('\t')
    text = input("Text: ").strip('\t')
    
    # Makes invalid articles have not null values
    if title == "":
        title = " "
        text = " "
    
    df.loc[i]['title'] = title
    df.loc[i]['text'] = text
    
    df.to_csv('text_data.csv', sep='\t', index=False)
    
print("~~~~~~")
print("Done!")

~~~~~~
Done!


In [30]:
df.head()

Unnamed: 0,url,title,text
0,https://sifted.eu/articles/ai-girlfriends,This AI girlfriend startup is making $100k a m...,t was never going to take long for some dudes ...
1,https://www.cbsnews.com/news/scammers-ai-mimic...,CBS MORNINGS Scammers use AI to mimic voices...,Artificial intelligence is making phone scams ...
2,https://www.apa.org/monitor/2023/07/psychology...,AI is changing every aspect of psychology. Her...,"In psychology practice, artificial intelligenc..."
3,https://www.ft.com/content/858981e5-41e1-47f1-...,,
4,https://proceedings.mlr.press/v81/buolamwini18...,Gender Shades: Intersectional Accuracy Dispari...,Abstract Recent studies demonstrate that machi...
