'scraper.py' is a Python module that defines my own personal 'Scraper class'. This class has the following methods:

    'init': This is the constructor method that is called when an instance of the Scraper class is created. It takes the URL of the page to scrape (SOURCE: Wikipedia) and the API key for the Alpha Vantage API as arguments, and initializes the following instance variables:

    'self.url': The URL of the page to scrape
    'self.api_key': The API key for the Alpha Vantage API
    'self.base_url': The base URL for the Alpha Vantage API

    'parse_page': This method fetches the HTML content of the page at the URL specified in the 'self.url' instance variable, and parses it using the 'BeautifulSoup' library. It returns the 'BeautifulSoup' object representing the parsed HTML.

    'parse_table': This method takes the 'BeautifulSoup' object returned by 'parse_page' as an argument, and uses it to find the table on the page that you want to scrape. It then uses the pandas library to read the table data into a DataFrame and return it.

    'get_overview': This method makes a GET request to the Alpha Vantage API to get overview data for a given symbol. It takes the symbol as an argument, constructs the URL for the request using the 'self.base_url' and 'self.api_key' instance variables, and makes the request using the 'requests' library. It returns the response from the API as a dictionary.

    'get_data': This method iterates over the symbols in the 'self.df' DataFrame and calls 'get_overview' for each symbol to get the overview data. It appends the overview data to a new DataFrame called data and returns it.



In [10]:
#scraper.py
import requests
from bs4 import BeautifulSoup
import pandas as pd
import time
from urllib.parse import urlencode


class Scraper:
    def __init__(self, url, api_key):
        """
        Initialize the scraper object with the URL of the page to scrape
        and the API key for the Alpha Vantage API.
        """
        self.url = url
        self.api_url = "https://www.alphavantage.co/query"
        self.api_key = api_key
        
        # Fetch the page and parse the table
        self.soup = self.fetch_page()
        self.df = self.parse_table("Symbol")


    def fetch_page(self):
        """
        Fetch the HTML content of the page using the requests library.
        Return the BeautifulSoup object containing the parsed HTML.
        """
        # Use the requests library to fetch the HTML content of the page
        try:
            with requests.get(self.url) as r:
                r.raise_for_status()
                soup = BeautifulSoup(r.content, "html.parser")
        except requests.exceptions.RequestException as e:
            raise Exception("Error: Could not fetch the page") from e

        return soup

    def parse_table(self, index_col):
        """
        Use the `pandas` library to read the table from the HTML.
        Return the parsed table as a DataFrame.
        """
        # Find the table on the page using its attributes
        table = self.soup.find("table", attrs={"class": "wikitable sortable"})

        # Check if the table was found
        if table is None:
            raise Exception("Error: Invalid table number")

        # Parse the table using `pandas`
        df = pd.read_html(str(table), index_col="Symbol", header=0)[0]

        # Return the DataFrame containing the parsed table
        return df

    def get_overview(self, symbol):
        """
        Make a GET request to the Alpha Vantage API to get overview data for
        the given symbol.
        """
        # Set the parameters for the request
        params = {
            "function": "OVERVIEW",
            "symbol": symbol,
            "apikey": self.api_key
        }

        # Create the full URL using the base URL and the query string
        url = self.api_url + "?" + urlencode(params)

        # Make a GET request to the API
        try:
            r = requests.get(url)
            r.raise_for_status()
            response = r.json()
        except requests.exceptions.RequestException as e:
            # Handle any errors that occur when making the request
            raise Exception(f"Error: Could not get overview data for symbol {symbol}") from e
        except ValueError as e:
            # Handle any errors that occur when parsing the response
            raise Exception(f"Error: Could not parse response for symbol {symbol}") from e

        # Return the response from the API
        return response


    def get_data(self):
        """
        Get overview data for each symbol in the `df` DataFrame and append the
        data to a new DataFrame called `data`. Add a delay of 12 seconds
        between each request to ensure that only 5 requests are made per minute.
        """

        # Create an empty DataFrame to store the data
        data = pd.DataFrame()

        # Iterate over the symbols in the dataframe
        for symbol in self.df.index:
            # Get the overview data for the symbol
            response = self.get_overview(symbol)

            # Check if the response contains an error
            if "Error Message" in response:
                raise Exception(f"Error: {response['Error Message']}")

            # Append the overview data to the DataFrame
            data = data.append(response, ignore_index=True)

            # Check if there are more than 5 symbols in the DataFrame
            if len(self.df.index) > 5:
                # Wait 12 seconds before making the next request
                time.sleep(12)

        # Return the DataFrame containing the overview data
        return data


The following code creates a Scraper object and uses it to scrape a table from the specified URL using the provided API key. The table is then parsed and printed to the console.

In [11]:
# main.py
import time

# Import the scraper class from the scraper module
from scraper import Scraper

# Set the URL that you want to scrape
url = "https://en.wikipedia.org/wiki/Dow_Jones_Industrial_Average"

# Set the API key
api_key = "83FRHS4AHOFR3RRT"

# Create a Scraper object
scraper = Scraper(url, api_key)

# Fetch the page
soup = scraper.fetch_page()

# Get the Table from Wikipedia
df = scraper.parse_table(soup)
print(df)

                         Company Exchange                        Industry  \
Symbol                                                                      
MMM                           3M     NYSE                    Conglomerate   
AXP             American Express     NYSE              Financial services   
AMGN                       Amgen   NASDAQ               Biopharmaceutical   
AAPL                       Apple   NASDAQ          Information technology   
BA                        Boeing     NYSE           Aerospace and defense   
CAT                  Caterpillar     NYSE         Construction and Mining   
CVX                      Chevron     NYSE              Petroleum industry   
CSCO                       Cisco   NASDAQ          Information technology   
KO                     Coca-Cola     NYSE                  Drink industry   
DOW                          Dow     NYSE               Chemical industry   
GS                 Goldman Sachs     NYSE              Financial services   

This code attempts to retrieve data using the get_data method of the Scraper object. If the method is successful, the data is printed to the console. If there is an error, the error message is printed instead. The try and except blocks are used to handle potential errors that may occur when calling the get_data method. I cancelled the code, but you can try it and it will work (Change your Alpha Vantage API key, please!)

In [8]:
# Get the data
try:
    data = scraper.get_data()
    print(data)
except Exception as e:
    print(e)

  data = data.append(response, ignore_index=True)
  data = data.append(response, ignore_index=True)
  data = data.append(response, ignore_index=True)
  data = data.append(response, ignore_index=True)
  data = data.append(response, ignore_index=True)
  data = data.append(response, ignore_index=True)
  data = data.append(response, ignore_index=True)
  data = data.append(response, ignore_index=True)
  data = data.append(response, ignore_index=True)
  data = data.append(response, ignore_index=True)
  data = data.append(response, ignore_index=True)
  data = data.append(response, ignore_index=True)
  data = data.append(response, ignore_index=True)
  data = data.append(response, ignore_index=True)
  data = data.append(response, ignore_index=True)
  data = data.append(response, ignore_index=True)
  data = data.append(response, ignore_index=True)
  data = data.append(response, ignore_index=True)
  data = data.append(response, ignore_index=True)
  data = data.append(response, ignore_index=True)


KeyboardInterrupt: 

In [2]:
scraper.get_overview("MMM")

{'Symbol': 'MMM',
 'AssetType': 'Common Stock',
 'Name': '3M Company',
 'Description': 'The 3M Company is an American multinational conglomerate corporation operating in the fields of industry, worker safety, US health care, and consumer goods. The company produces over 60,000 products under several brands, including adhesives, abrasives, laminates, passive fire protection, personal protective equipment, window films, paint protection films, dental and orthodontic products, electrical and electronic connecting and insulating materials, medical products, car-care products, electronic circuits, healthcare software and optical films. It is based in Maplewood, a suburb of Saint Paul, Minnesota.',
 'CIK': '66740',
 'Exchange': 'NYSE',
 'Currency': 'USD',
 'Country': 'USA',
 'Sector': 'LIFE SCIENCES',
 'Industry': 'SURGICAL & MEDICAL INSTRUMENTS & APPARATUS',
 'Address': '3M CENTER, BLDG. 220-13E-26A, ST PAUL, MN, US',
 'FiscalYearEnd': 'December',
 'LatestQuarter': '2022-09-30',
 'MarketCap

In [4]:
print(data.columns)

Index(['Symbol', 'AssetType', 'Name', 'Description', 'CIK', 'Exchange',
       'Currency', 'Country', 'Sector', 'Industry', 'Address', 'FiscalYearEnd',
       'LatestQuarter', 'MarketCapitalization', 'EBITDA', 'PERatio',
       'PEGRatio', 'BookValue', 'DividendPerShare', 'DividendYield', 'EPS',
       'RevenuePerShareTTM', 'ProfitMargin', 'OperatingMarginTTM',
       'ReturnOnAssetsTTM', 'ReturnOnEquityTTM', 'RevenueTTM',
       'GrossProfitTTM', 'DilutedEPSTTM', 'QuarterlyEarningsGrowthYOY',
       'QuarterlyRevenueGrowthYOY', 'AnalystTargetPrice', 'TrailingPE',
       'ForwardPE', 'PriceToSalesRatioTTM', 'PriceToBookRatio', 'EVToRevenue',
       'EVToEBITDA', 'Beta', '52WeekHigh', '52WeekLow', '50DayMovingAverage',
       '200DayMovingAverage', 'SharesOutstanding', 'DividendDate',
       'ExDividendDate'],
      dtype='object')


In [5]:
df.to_csv("data.csv")

This code defines a CSVReader class that has a read_csv method for reading the contents of a CSV file. When the method is called, it attempts to open the file at the specified file path and read its contents using the csv module. If the file is not found or there is an error reading the file, an error message is printed. The contents of each row in the CSV file are printed to the console.

In [2]:
import csv

class CSVReader:
  def __init__(self, filepath):
    self.filepath = filepath

  def read_csv(self):
    try:
      with open(self.filepath, 'r') as csvfile:
        reader = csv.reader(csvfile)
        for row in reader:
          print(row)
    except FileNotFoundError:
      print(f"Error: The file at {self.filepath} could not be found.")
    except csv.Error:
      print(f"Error: An error occurred while reading the file at {self.filepath}.")


In [6]:
reader = CSVReader('data.csv')
data = reader.read_csv()
print(data)

['Symbol', 'Company', 'Exchange', 'Industry', 'Date added', 'Notes', 'Index weighting']
['MMM', '3M', 'NYSE', 'Conglomerate', '1976-08-09', 'As Minnesota Mining and Manufacturing', '2.41%']
['AXP', 'American Express', 'NYSE', 'Financial services', '1982-08-30', '', '3.02%']
['AMGN', 'Amgen', 'NASDAQ', 'Biopharmaceutical', '2020-08-31', '', '5.48%']
['AAPL', 'Apple', 'NASDAQ', 'Information technology', '2015-03-19', '', '2.84%']
['BA', 'Boeing', 'NYSE', 'Aerospace and defense', '1987-03-12', '', '3.36%']
['CAT', 'Caterpillar', 'NYSE', 'Construction and Mining', '1991-05-06', '', '4.52%']
['CVX', 'Chevron', 'NYSE', 'Petroleum industry', '2008-02-19', 'Also 1930-07-18 to 1999-11-01', '3.50%']
['CSCO', 'Cisco', 'NASDAQ', 'Information technology', '2009-06-08', '', '0.96%']
['KO', 'Coca-Cola', 'NYSE', 'Drink industry', '1987-03-12', 'Also 1932-05-26 to 1935-11-20', '1.22%']
['DOW', 'Dow', 'NYSE', 'Chemical industry', '1991-05-06', '', '0.98%']
['GS', 'Goldman Sachs', 'NYSE', 'Financial serv

In [9]:
######     TO DO: MAKE COLUMNS NUMERICS AND SELECT STOCKS BASED ON MY THRESHOLDS IN THE FOLLOWING CODE
######     THAT THRESHOLDS ARE WRONG, BUT I WILL USE THAT THs AS AN EXCERCISE. 
######     THE CORRECT WAY TO DO IT: CHANGE THE THRESHOLDS DEPENDING ON THE INDUSTRY, THE SECTOR, AND YOUR PREFERENCES!

In [None]:
# Set the thresholds for the market capitalization and earnings
market_cap_threshold = 5e9
earnings_threshold = 0

# Set the thresholds for the P/E and P/B ratios
pe_threshold = 15
ps_threshold = 3
pb_threshold = 1.5
peg_threshold = 1

# Set the threshold for the return on equity
roe_threshold = 0.15

# Set the threshold for the debt-to-equity ratio
debt_equity_threshold = 1

# Set the threshold for the current ratio
current_ratio_threshold = 1

# Identify undervalued stocks using the market capitalization, P/E and P/B ratios, and additional financial ratios
undervalued_stocks = data[(data['MarketCapitalization'] < market_cap_threshold) & 
                          (data['QuarterlyEarningsGrowthYOY'] > earnings_threshold) & 
                          (data['50DayMovingAverage'] > data['200DayMovingAverage']) & 
                          (data["AnalystTargetPrice"] > data['50DayMovingAverage']) & 
                          (data['PriceToSalesRatioTTM'] < ps_threshold) & 
                          (data['PEGRatio'] < peg_threshold) &
                          (data['PERatio'] < pe_threshold) &
                          (data['PriceToBookRatio'] < pb_threshold)]

# Print the list of undervalued stocks
#print('Undervalued stocks:')
#print(undervalued_stocks)
# Print the number of undervalued stocks
#print('\nNumber of undervalued stocks:', len(undervalued_stocks))