## Web Scraping with Beautiful Soup

> Make Database, extract data from web sites and save it to a CSV file

In [2]:
import re
import pandas as pd
import urllib.request as ur
from bs4 import BeautifulSoup as bs

In [3]:
# load html code from url

htmlpage = ur.urlopen("https://docs.python.org/3/library/cmath.html")
sp = bs(htmlpage)


In [4]:
# find all functions names

names = sp.body.findAll('dt')
function_names = re.findall('id="cmath.\w+', str(names))
function_names = [item[4:] for item in function_names]

print(function_names)

['cmath.phase', 'cmath.polar', 'cmath.rect', 'cmath.exp', 'cmath.log', 'cmath.log10', 'cmath.sqrt', 'cmath.acos', 'cmath.asin', 'cmath.atan', 'cmath.cos', 'cmath.sin', 'cmath.tan', 'cmath.acosh', 'cmath.asinh', 'cmath.atanh', 'cmath.cosh', 'cmath.sinh', 'cmath.tanh', 'cmath.isfinite', 'cmath.isinf', 'cmath.isnan', 'cmath.isclose', 'cmath.pi', 'cmath.e', 'cmath.tau', 'cmath.inf', 'cmath.infj', 'cmath.nan', 'cmath.nanj']


In [5]:
# find all functions description

desc = sp.body.findAll('dd')
function_usage = []

for item in desc:
    item = item.text
    item = item.replace("\n", " ")
    function_usage.append(item)

print(function_usage)

['Return the phase of x (also known as the argument of x), as a float.  phase(x) is equivalent to math.atan2(x.imag, x.real).  The result lies in the range [-π, π], and the branch cut for this operation lies along the negative real axis, continuous from above.  On systems with support for signed zeros (which includes most systems in current use), this means that the sign of the result is the same as the sign of x.imag, even when x.imag is zero: >>> phase(complex(-1.0, 0.0)) 3.141592653589793 >>> phase(complex(-1.0, -0.0)) -3.141592653589793   ', 'Return the representation of x in polar coordinates.  Returns a pair (r, phi) where r is the modulus of x and phi is the phase of x.  polar(x) is equivalent to (abs(x), phase(x)). ', 'Return the complex number x with polar coordinates r and phi. Equivalent to r * (math.cos(phi) + math.sin(phi)*1j). ', 'Return e raised to the power x, where e is the base of natural logarithms. ', 'Returns the logarithm of x to the given base. If the base is not

In [6]:
# Check List

print(len(function_usage))
print(len(function_names))

30
30


In [7]:
# Create dataframe

df = pd.DataFrame({'Function Name': function_names, 'Function Usage':function_usage})
df

Unnamed: 0,Function Name,Function Usage
0,cmath.phase,Return the phase of x (also known as the argum...
1,cmath.polar,Return the representation of x in polar coordi...
2,cmath.rect,Return the complex number x with polar coordin...
3,cmath.exp,"Return e raised to the power x, where e is the..."
4,cmath.log,Returns the logarithm of x to the given base. ...
5,cmath.log10,Return the base-10 logarithm of x. This has th...
6,cmath.sqrt,Return the square root of x. This has the same...
7,cmath.acos,Return the arc cosine of x. There are two bran...
8,cmath.asin,Return the arc sine of x. This has the same br...
9,cmath.atan,Return the arc tangent of x. There are two bra...


In [8]:
# save the file as csv

df.to_csv("cmath.csv")