[Reference](https://medium.com/python-in-plain-english/4-python-libraries-to-help-you-make-money-from-webscraping-57ba6d8ce56d)

There are three steps for web scraping:
1. Access the webpage
2. Locate and parse the items to be scraped
3. Save scraped items on a file

There are requests, selenium, beautiful soup, pandas and scrapy.

# 1. Access the webpage

## Requests

In [1]:
import requests

url = "http://eoddata.com/stocklist/NASDAQ/A.htm"
page = requests.get(url)

# Selenium

```python
from selenium import webdriver

url = "http://eoddata.com/stocklist/NASDAQ/A.htm"
driver = webdriver.Chrome('/Downloads/chromedriver')
driver.get(url)
```

# 2. Locate and parse the items to be scraped

## Selenium

![selenium](https://miro.medium.com/max/1400/1*fGvsYMzcKogQBJsDPWuLqg.png)

```python
stock_symbol = driver.find_elements_by_css_selector('#ctl00_cph1_divSymbols > table > tbody > tr > td:nth-child(1) > a')
stock_symbol

symbol = []
for x in stock_symbol: 
  sym = x.text 
  symbol.append(sym)
```

## Beautiful Soup

In [4]:
import requests
url = "http://eoddata.com/stocklist/NASDAQ/A.htm"
page = requests.get(url)

In [5]:
from bs4 import BeautifulSoup
soup = BeautifulSoup(page.text, 'html.parser')

In [6]:
soup


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
<head><link href="../../styles/jquery-ui-1.10.0.custom.min.css" rel="stylesheet" type="text/css"/><link href="../../styles/main.css" rel="stylesheet" type="text/css"/><link href="../../styles/button.css" rel="stylesheet" type="text/css"/><link href="../../styles/nav.css" rel="stylesheet" type="text/css"/>
<script src="/scripts/jquery-1.9.0.min.js" type="text/javascript"></script>
<script src="/scripts/jquery-ui-1.10.0.custom.min.js" type="text/javascript"></script>
<script type="text/javascript">		var _sf_startpt = (new Date()).getTime()</script>
<script src="scripts/jquery-1.4.2.min.js" type="text/javascript"></script>
<meta content="list of symbols for NASDAQ Stock Exchange,list of stock symbols,download symbols,stock symbols list,NASDAQ symbol list,NASDAQ stock ticker,NASDAQ stock list,NASDAQ stocks,ticker,tickers,sto

In [7]:
soup.title

<title>
	List of Symbols for NASDAQ Stock Exchange [NASDAQ] Starting with A
</title>

In [8]:
soup.title.name

'title'

In [9]:
soup.title.string

'\r\n\tList of Symbols for NASDAQ Stock Exchange [NASDAQ] Starting with A\r\n'

In [10]:
soup.title.parent.name

'head'

In [12]:
soup.p

<p>Copyright © 2003-2020 EODData, LLC. All rights reserved.</p>

In [15]:
# soup.p['class']

In [14]:
soup.a

<a href="../../default.aspx" id="ctl00_Header1_HyperLink1" title="End of day stock quote data and history"><img alt="EODData" border="0" id="ctl00_Header1_Image2" src="../../images/logo.gif" style="width:270px; height:55px"/></a>

In [16]:
soup.find_all('a')

[<a href="../../default.aspx" id="ctl00_Header1_HyperLink1" title="End of day stock quote data and history"><img alt="EODData" border="0" id="ctl00_Header1_Image2" src="../../images/logo.gif" style="width:270px; height:55px"/></a>,
 <a href="/default.aspx" target="" title="Home Page"><span>HOME</span></a>,
 <a href="/products/default.aspx" target="" title="Products and Services"><span>PRODUCTS &amp; SERVICES</span></a>,
 <a href="/howto/advancedget.aspx" target="" title="How To"><span>HOW TO</span></a>,
 <a href="/support/faq.aspx" target="" title="SUPPORT"><span>SUPPORT</span></a>,
 <a href="/about/default.aspx" target="" title="ABOUT"><span>ABOUT</span></a>,
 <a href="/myaccount/default.aspx" target="" title="My Account"><span>MY ACCOUNT</span></a>,
 <a href="/default.aspx" target="" title="Home Page"><span>Home Page</span></a>,
 <a href="/download.aspx" target="" title="Download"><span>Download</span></a>,
 <a href="/symbols.aspx" target="" title="Symbol Lists"><span>Symbol Lists</s

In [17]:
soup.find(id="link3")

In [20]:
elements = []
table = soup.find('div',{'id':'ctl00_cph1_divSymbols'})
for tr in table.find_all('tr'):
    for td in tr.find_all('td'):
        element = td.text
        elements.append(element)
x = len(elements)

symbol = []
for y in range(0,x,10):
    symbol.append(elements[y])

names = []
for y in range(1,x,10):
    names.append(elements[y])

In [21]:
symbol

['AACG',
 'AACQ',
 'AACQU',
 'AACQW',
 'AAL',
 'AAME',
 'AAOI',
 'AAON',
 'AAPL',
 'AAWW',
 'AAXJ',
 'AAXN',
 'ABCB',
 'ABEO',
 'ABIO',
 'ABMD',
 'ABTX',
 'ABUS',
 'ACAD',
 'ACAM',
 'ACAMU',
 'ACAMW',
 'ACBI',
 'ACCD',
 'ACER',
 'ACET',
 'ACEV',
 'ACEVU',
 'ACEVW',
 'ACGL',
 'ACGLO',
 'ACGLP',
 'ACHC',
 'ACHV',
 'ACIA',
 'ACIU',
 'ACIW',
 'ACLS',
 'ACMR',
 'ACNB',
 'ACOR',
 'ACRS',
 'ACRX',
 'ACST',
 'ACT',
 'ACTCU',
 'ACTG',
 'ACWI',
 'ACWX',
 'ADAP',
 'ADBE',
 'ADES',
 'ADI',
 'ADIL',
 'ADILW',
 'ADMA',
 'ADMP',
 'ADMS',
 'ADP',
 'ADPT',
 'ADRE',
 'ADRO',
 'ADSK',
 'ADTN',
 'ADTX',
 'ADUS',
 'ADVM',
 'ADXN',
 'ADXS',
 'AEGN',
 'AEHR',
 'AEIS',
 'AEMD',
 'AEP',
 'AEPPL',
 'AEPPZ',
 'AERI',
 'AESE',
 'AEY',
 'AEYE',
 'AEZS',
 'AFIB',
 'AFIN',
 'AFINP',
 'AFMD',
 'AFYA',
 'AGBA',
 'AGBAR',
 'AGBAU',
 'AGBAW',
 'AGCUU',
 'AGEN',
 'AGFS',
 'AGIO',
 'AGLE',
 'AGMH',
 'AGNC',
 'AGNCM',
 'AGNCN',
 'AGNCO',
 'AGNCP',
 'AGRX',
 'AGTC',
 'AGYS',
 'AGZD',
 'AHACU',
 'AHCO',
 'AHPI',
 'AIA',
 'AI

In [22]:
names

['Ata Creativity Global',
 'Artius Acquisition Inc Cl A',
 'Artius Acquisition Inc Unit',
 'Artius Acquisition Inc WT',
 'American Airlines Gp',
 'Atlantic Amer Cp',
 'Applied Optoelect',
 'Aaon Inc',
 'Apple Inc',
 'Atlas Air Ww',
 'All Country Asia Ex Japan Ishares MSCI',
 'Axon Inc',
 'Ameris Bancorp',
 'Abeona Therapeutics',
 'Arca Biopharma Inc',
 'Abiomed Inc',
 'Allegiance Banc CS',
 'Arbutus Biopharma Cp',
 'Acadia Pharmaceutica',
 'Acamar Partners Acquisition Corp Cl A',
 'Acamar Partners Acquisition Corp Units',
 'Acamar Partners Acquisition Corp WT',
 'Atlantic Capital',
 'Accolade Inc',
 'Acer Therapeutics Inc',
 'Adicet Bio Inc',
 'Ace Convergence Acquisition Corp. Cl A',
 'Ace Convergence Acquisition Corp',
 'Ace Convergence Acquisition Corp WT',
 'Arch Capital Grp Ltd',
 'Arch Capital Group Ltd ADR',
 'Arch Capital Group Ltd',
 'Acadia Healthcr Company',
 'Achieve Life Sciences Inc',
 'Acacia Communica',
 'AC Immune S.A.',
 'Aci Worldwide Inc',
 'Axcelis Tech Inc',
 'Acm

# 3. Save scraped items on a file

In [24]:
import pandas as pd
df = pd.DataFrame(index = None)
df['stock_symbol'] = symbol
df['stock_name'] = names
df.head()

Unnamed: 0,stock_symbol,stock_name
0,AACG,Ata Creativity Global
1,AACQ,Artius Acquisition Inc Cl A
2,AACQU,Artius Acquisition Inc Unit
3,AACQW,Artius Acquisition Inc WT
4,AAL,American Airlines Gp


In [25]:
df.set_index('stock_symbol', inplace = True)
df.head()

Unnamed: 0_level_0,stock_name
stock_symbol,Unnamed: 1_level_1
AACG,Ata Creativity Global
AACQ,Artius Acquisition Inc Cl A
AACQU,Artius Acquisition Inc Unit
AACQW,Artius Acquisition Inc WT
AAL,American Airlines Gp


In [26]:
df.to_json('NASDAQ Stock List')