The Standard and Poor's (S&P) 500 Index is an collection of the 500 largest market capped public companies in the US and is a very popular investment tool for the general public, with an average return of approximately 10% year over year. As a result, many traders try to "beat the market" by trading individual companies in the hopes of gaining returns greater than the S&P 500. In this project, we will see if the popular comedy TV Show "South Park" can help traders beat the S&P 500 by trading the companies that are mentioned in the shows' episodes. 

The first step in this project is to create a list of publicly traded companies as well as their ticker symbols and the most relevant Wikipedia link for each company. That way, we can compare the Wikipedia links of every entry in the show to this list.

We will get the list of publicly traded companies from the Nasdaq website.

In [1]:
import pandas as pd
import wikipedia
import time
# #We'll want to automate this instead of manually downloading and reading the csv. The website is https://www.nasdaq.com/market-activity/stocks/screener
# stocks = pd.read_csv("nasdaq_screener_1661885402576.csv")
# #Keep only company ticker and name
# stocks = stocks[["Symbol","Name"]]
# stocks['Name'] = stocks['Name'].str.replace('Common Stock', '')

In [2]:
# wikilinks = []
# for company in stocks["Name"]:
#     try:
#         wikilinks.append(wikipedia.page(title = company, auto_suggest = False).url)
#     except:
#         wikilinks.append("None")
# wikilinks

In [3]:
# # stocks1 = stocks.head()
# # stocks1['Name'] = stocks1['Name'].str.replace('Common Stock', '')
# # wikilinks = []
# # for company in stocks1["Name"]:
# #     try:
# #         wikilinks.append(wikipedia.page(title = company, auto_suggest = False).url)
# #     except:
# #         wikilinks.append("None")
# # wikilinks
# stocks["Wikilink"] = wikilinks
# #Only keep relevant companies
# stocks = stocks[stocks.Wikilink != "None"]

In [4]:
from bs4 import BeautifulSoup
import requests
url = "https://en.wikipedia.org/wiki/List_of_South_Park_episodes"
page = requests.get(url)
soup = BeautifulSoup(page.content)
soup

<!DOCTYPE html>
<html class="client-nojs" dir="ltr" lang="en">
<head>
<meta charset="utf-8"/>
<title>List of South Park episodes - Wikipedia</title>
<script>document.documentElement.className="client-js";RLCONF={"wgBreakFrames":false,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgRequestId":"d53369d9-e678-42e2-a01c-36b934c51bce","wgCSPNonce":false,"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":false,"wgNamespaceNumber":0,"wgPageName":"List_of_South_Park_episodes","wgTitle":"List of South Park episodes","wgCurRevisionId":1105579636,"wgRevisionId":1105579636,"wgArticleId":224267,"wgIsArticle":true,"wgIsRedirect":false,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Pages containing links to subscription-only content","All articles with dead external links","Articles with dead

From the above HTML code, we see that the tables we are interested in are listed under the table class = "wikitable plainrowheaders wikiepisodetable". Each row that lists an episode is under the tr class = "vevent". Finally, under each of these rows, the air dates of the episodes is formatted as YYYY-MM-DD under the span class = "bday dtstart published updated". 

In [5]:
epi = {}
epitracker = 1
for table in soup.find_all("table", class_ = "wikitable plainrowheaders wikiepisodetable"):
    for row in table.find_all("tr", class_ = "vevent"):
        epi[epitracker]={"date": row.find("span", class_ = "bday dtstart published updated").contents[0],
               "id": row.find("th").attrs["id"],
               "wikiLink": row.find("a", href = True, title = True).attrs["href"],
               "title": row.find("a", href = True, title = True).attrs["title"]}
        #epitracker=epitracker+1
        epitracker+=1
epi

{1: {'date': '1997-08-13',
  'id': 'ep1',
  'wikiLink': '/wiki/Cartman_Gets_an_Anal_Probe',
  'title': 'Cartman Gets an Anal Probe'},
 2: {'date': '1997-08-20',
  'id': 'ep2',
  'wikiLink': '/wiki/Weight_Gain_4000',
  'title': 'Weight Gain 4000'},
 3: {'date': '1997-08-27',
  'id': 'ep3',
  'wikiLink': '/wiki/Volcano_(South_Park)',
  'title': 'Volcano (South Park)'},
 4: {'date': '1997-09-03',
  'id': 'ep4',
  'wikiLink': '/wiki/Big_Gay_Al%27s_Big_Gay_Boat_Ride',
  'title': "Big Gay Al's Big Gay Boat Ride"},
 5: {'date': '1997-09-10',
  'id': 'ep5',
  'wikiLink': '/wiki/An_Elephant_Makes_Love_to_a_Pig',
  'title': 'An Elephant Makes Love to a Pig'},
 6: {'date': '1997-09-17',
  'id': 'ep6',
  'wikiLink': '/wiki/Death_(South_Park)',
  'title': 'Death (South Park)'},
 7: {'date': '1997-10-29',
  'id': 'ep7',
  'wikiLink': '/wiki/Pinkeye_(South_Park)',
  'title': 'Pinkeye (South Park)'},
 8: {'date': '1997-11-19',
  'id': 'ep8',
  'wikiLink': '/wiki/Starvin%27_Marvin',
  'title': "Starvin

In [6]:
def urllist(bs):
    hrefs = []
    for link in bs.find_all("a", href = True, title = True):
        linkref = link.get("href","")
        if linkref.split("/")[1] == "wiki":
            hrefs.append(linkref)
    return list(set(hrefs))

In [7]:
for key in epi.keys():
    epiurl = 'https://en.wikipedia.org'+epi[key]["wikiLink"]
    epipage = requests.get(epiurl)
    episoup = BeautifulSoup(epipage.content)
    epi[key]["reflinks"] = urllist(episoup)
    time.sleep(0.1)
df = pd.DataFrame.from_dict(epi, orient = "index")
df

Unnamed: 0,date,id,wikiLink,title,reflinks
1,1997-08-13,ep1,/wiki/Cartman_Gets_an_Anal_Probe,Cartman Gets an Anal Probe,"[/wiki/South_Park_(season_16), /wiki/Tamale, /..."
2,1997-08-20,ep2,/wiki/Weight_Gain_4000,Weight Gain 4000,"[/wiki/South_Park_(season_16), /wiki/Special:R..."
3,1997-08-27,ep3,/wiki/Volcano_(South_Park),Volcano (South Park),"[/wiki/South_Park_(season_16), /wiki/Special:R..."
4,1997-09-03,ep4,/wiki/Big_Gay_Al%27s_Big_Gay_Boat_Ride,Big Gay Al's Big Gay Boat Ride,"[/wiki/South_Park_(season_16), /wiki/Cripple_F..."
5,1997-09-10,ep5,/wiki/An_Elephant_Makes_Love_to_a_Pig,An Elephant Makes Love to a Pig,"[/wiki/South_Park_(season_16), /wiki/Special:R..."
...,...,...,...,...,...
315,2022-03-02,ep315,/wiki/Back_to_the_Cold_War,Back to the Cold War,"[/wiki/Battle_of_Trostianets, /wiki/South_Park..."
316,2022-03-09,ep316,"/wiki/Help,_My_Teenager_Hates_Me!","Help, My Teenager Hates Me!","[/wiki/South_Park_(season_16), /wiki/Special:R..."
317,2022-03-16,ep317,/wiki/Credigree_Weed_St._Patrick%27s_Day_Special,Credigree Weed St. Patrick's Day Special,"[/wiki/South_Park_(season_16), /wiki/Special:R..."
318,2022-06-01,ep318,/wiki/South_Park_The_Streaming_Wars,South Park The Streaming Wars,"[/wiki/The_SpongeBob_Movie:_Sponge_on_the_Run,..."


In [8]:
totalurls = []
for key in epi.keys():
    #Try for key in epi
    urls = df["reflinks"][key]
    totalurls.extend(urls)
uniqueurls = list(set(totalurls))


In [9]:
complete='https://en.wikipedia.org/wiki/Tesla_Inc'
    #This is the complete URL
page=requests.get(complete)
soupy=BeautifulSoup(page.content)
identify='/wiki/Ticker_symbol'
func_call=urllist(soupy.find("table",class_="infobox vcard"))
func_call
#soupy
#soupy.find("a", class_ = "external text")

['/wiki/Automotive_industry',
 '/wiki/East_Asia',
 '/wiki/Gigafactory_Texas',
 '/wiki/United_States_dollar',
 '/wiki/Asset',
 '/wiki/Tesla_Model_3',
 '/wiki/Tesla_Energy',
 '/wiki/Chief_executive_officer',
 '/wiki/S%26P_100',
 '/wiki/S%26P_500',
 '/wiki/Net_income',
 '/wiki/Tesla_Megapack',
 '/wiki/Robyn_Denholm',
 '/wiki/International_Securities_Identification_Number',
 '/wiki/Nasdaq',
 '/wiki/Earnings_before_interest_and_taxes',
 '/wiki/DeepScale',
 '/wiki/Elon_Musk',
 '/wiki/Tesla_Model_S',
 '/wiki/Texas',
 '/wiki/Middle_East',
 '/wiki/San_Carlos,_California',
 '/wiki/Nasdaq-100',
 '/wiki/Europe',
 '/wiki/Tesla_Grohmann_Automation',
 '/wiki/Subsidiary',
 '/wiki/Equity_(finance)',
 '/wiki/Tesla_Powerpack',
 '/wiki/Renewable_energy_industry',
 '/wiki/Tesla_Powerwall',
 '/wiki/Oceania',
 '/wiki/Tesla_Model_X',
 '/wiki/North_America',
 '/wiki/Public_company',
 '/wiki/Chairman',
 '/wiki/Tesla_Model_Y',
 '/wiki/Austin,_Texas',
 '/wiki/Ticker_symbol',
 '/wiki/Southeast_Asia']

In [10]:
testy='https://en.wikipedia.org'+'/wiki/Tesla_Inc'
testy_pg=requests.get(testy)
soupytest=BeautifulSoup(testy_pg.content)
urllist(soupytest.find("table",class_="infobox vcard"))

['/wiki/Automotive_industry',
 '/wiki/East_Asia',
 '/wiki/Gigafactory_Texas',
 '/wiki/United_States_dollar',
 '/wiki/Asset',
 '/wiki/Tesla_Model_3',
 '/wiki/Tesla_Energy',
 '/wiki/Chief_executive_officer',
 '/wiki/S%26P_100',
 '/wiki/S%26P_500',
 '/wiki/Net_income',
 '/wiki/Tesla_Megapack',
 '/wiki/Robyn_Denholm',
 '/wiki/International_Securities_Identification_Number',
 '/wiki/Nasdaq',
 '/wiki/Earnings_before_interest_and_taxes',
 '/wiki/DeepScale',
 '/wiki/Elon_Musk',
 '/wiki/Tesla_Model_S',
 '/wiki/Texas',
 '/wiki/Middle_East',
 '/wiki/San_Carlos,_California',
 '/wiki/Nasdaq-100',
 '/wiki/Europe',
 '/wiki/Tesla_Grohmann_Automation',
 '/wiki/Subsidiary',
 '/wiki/Equity_(finance)',
 '/wiki/Tesla_Powerpack',
 '/wiki/Renewable_energy_industry',
 '/wiki/Tesla_Powerwall',
 '/wiki/Oceania',
 '/wiki/Tesla_Model_X',
 '/wiki/North_America',
 '/wiki/Public_company',
 '/wiki/Chairman',
 '/wiki/Tesla_Model_Y',
 '/wiki/Austin,_Texas',
 '/wiki/Ticker_symbol',
 '/wiki/Southeast_Asia']

In [19]:
companies_mentioned=[]
errors=[]
for url in uniqueurls:
    complete='https://en.wikipedia.org'+url
    #This is the complete URL
    page=requests.get(complete)
    soupy=BeautifulSoup(page.content)
    identify='/wiki/Ticker_symbol'
    try:
        func_call=urllist(soupy.find("table",class_="infobox vcard"))
        if identify in func_call:
            companies_mentioned.append(url)
    except:
        errors.append(url)
    time.sleep(0.1)

KeyboardInterrupt: 

In [None]:
companies_mentioned