#Textual Business Information 
##Scraping qualitative information about companies from the web
The goal of this notebook is to show how to scrape the business description data from the web, specifically ***Google Finance***. 


The steps that this Colab Notebook demonstrates shows are: 

1.   **Provide a list of companies:** A CSV file with the tickers data.  
2.   **"Scrape" the raw html data from the website**" Loop through the list of tickers and get each business descriptions via URL `requests` and `BeautifulSoup` library. 
3.   Convert the text of each business descriptions to a Dataframe. 

###Load necessary libraries. 
* Note that "Beautiful Soup" is a python library for processing textual information from websites.
https://www.crummy.com/software/BeautifulSoup/bs4/doc/




In [1]:
import requests
from bs4 import BeautifulSoup
import numpy as np
import pandas as pd

###Load the tickers data. 

In [2]:
tickers = pd.read_csv('ticker_lect2.csv') # load tickers.csv 

###Download the raw data from the **Google Finance** website
* Loop through the tickers and get for each ticker a respective URL and business description.
* Note that, for this example Colab notebook], the `ticker` variable is limited to the first 4 (`[:4]`) tickers of companies in the list (you can remove this requirement to process a much larger set of companies).
 * Note that run Colab Notebook for a large set of 100's or 1000's of companies may take several hours.

In [3]:
# Create a loop to store URLs of all stocks' description page
URL = [] # empty list for URLs
DES = [] # empty list for descriptions 
ticker = tickers['Ticker'][:4] # for example purposes we limit the number of tickers to 4
for i in ticker:
  url ='https://www.google.com/finance/quote/'+i+':NASDAQ'
# url ='https://finance.yahoo.com/quote/'+i+'/profile' 
  URL.append(url)
  page = requests.get(url) # visits the URL 
  soup = BeautifulSoup(page.content, 'lxml')
  Business_Description = soup.find_all ('div', class_='bLLb2d')
#  htmldata = BeautifulSoup(page.content, 'html.parser')
#  Business_Description = htmldata.find('p',{'class':'Mt(15px) Lh(1.6)'}) # finds the business description part in the HTML code
  DES.append(Business_Description)

###Print out the raw business descriptions

In [4]:
# print(URL)
print(DES) # check the descriptions

[[<div class="bLLb2d">Apple Inc. is an American multinational technology company that specializes in consumer electronics, software and online services. Apple is the largest information technology company by revenue and, as of January 2021, it is the world's most valuable company, the fourth-largest personal computer vendor by unit sales and second-largest mobile phone manufacturer. It is one of the Big Five American information technology companies, alongside Alphabet, Amazon, Meta, and Microsoft.
Apple was founded as Apple Computer Company on April 1, 1976, by Steve Jobs, Steve Wozniak and Ronald Wayne to develop and sell Wozniak's Apple I personal computer. It was incorporated by Jobs and Wozniak as Apple Computer, Inc. in 1977 and the company's next computer, the Apple II became a best seller. Apple went public in 1980, to instant financial success. The company went onto develop new computers featuring innovative graphical user interfaces, including the original Macintosh, announce

###Convert the results to a dataframe for easier viewing. 

In [5]:
# Create new data frame that stores ticker, description of corresponding tickers 
company_des = pd.DataFrame({'ticker':ticker,'description':DES})
company_des.head()

Unnamed: 0,ticker,description
0,AAPL,[[Apple Inc. is an American multinational tech...
1,MSFT,[[Microsoft Corporation is an American multina...
2,TSLA,"[[Tesla, Inc. is an American electric vehicle ..."
3,AMZN,"[[Amazon.com, Inc. is an American multinationa..."
