# 1. Demo downloading files from websites 

There are ```txt``` and ```pdf``` files on:

```https://sandeepmj.github.io/scrape-example-page/pages.html```

Do the following:

1. Download all ```pdf``` files.
2. Download all files at one time.

In [6]:
## create new cells as necessary
# import libraries
from bs4 import BeautifulSoup  ## scrape info from web pages
import requests ## get web pages from server
import time # time is required. we will use its sleep function
from random import randrange # generate random numbers
import wget # can put down documents, files from websites

In [4]:
## capture response
url = "https://sandeepmj.github.io/scrape-example-page/pages.html"
response = requests.get(url)
response.status_code

200

In [5]:
## making soup
soup = BeautifulSoup(response.text, "html.parser")

In [8]:
## finding section with pdf links
pdf_holder = soup.find("ul",class_="pdfs")
pdf_holder

<ul class="pdfs downloadable">
<p class="pages">Download this list of PDFs</p>
<li>PDF Document <a href="files/pdf_1.pdf">1</a> </li>
<li>PDF Document <a href="files/pdf_2.pdf">2</a></li>
<li>PDF Document <a href="files/pdf_3.pdf">3</a></li>
<li>PDF Document <a href="files/pdf_4.pdf">4</a></li>
<li>PDF Document <a href="files/pdf_5.pdf">5</a></li>
<li>PDF Document <a href="files/pdf_6.pdf">6</a></li>
<li>PDF Document <a href="files/pdf_7.pdf">7</a></li>
<li>PDF Document <a href="files/pdf_8.pdf">8</a></li>
<li>PDF Document <a href="files/pdf_9.pdf">9</a></li>
<li>PDF Document <a href="files/pdf_10.pdf">10</a></li>
</ul>

In [9]:
## extracting a tags from sections
## isolating link from a tags and combining it with base url
links_a_tag_pdf = pdf_holder.find_all("a")
all_pdf_links_fl = []
base_url = "https://sandeepmj.github.io/scrape-example-page/"
for a_tag in links_a_tag_pdf:
    all_pdf_links_fl.append(base_url+a_tag.get("href"))
all_pdf_links_fl

['https://sandeepmj.github.io/scrape-example-page/files/pdf_1.pdf',
 'https://sandeepmj.github.io/scrape-example-page/files/pdf_2.pdf',
 'https://sandeepmj.github.io/scrape-example-page/files/pdf_3.pdf',
 'https://sandeepmj.github.io/scrape-example-page/files/pdf_4.pdf',
 'https://sandeepmj.github.io/scrape-example-page/files/pdf_5.pdf',
 'https://sandeepmj.github.io/scrape-example-page/files/pdf_6.pdf',
 'https://sandeepmj.github.io/scrape-example-page/files/pdf_7.pdf',
 'https://sandeepmj.github.io/scrape-example-page/files/pdf_8.pdf',
 'https://sandeepmj.github.io/scrape-example-page/files/pdf_9.pdf',
 'https://sandeepmj.github.io/scrape-example-page/files/pdf_10.pdf']

In [11]:
## downloading pdfs
links_total = len(all_pdf_links_fl)
link_count = 1


for link in all_pdf_links_fl:
    print(f"Downloading link {link_count} of {links_total}")
    link_count += 1
    wget.download(link)
    snoozer = randrange(3,7)
    print(f"snoozing for {snoozer} before next link")
    time.sleep(snoozer)

Downloading link 1 of 10
100% [..........................................................] 12812 / 12812snoozing for 6 before next link
Downloading link 2 of 10
100% [..........................................................] 12897 / 12897snoozing for 4 before next link
Downloading link 3 of 10
100% [..........................................................] 12908 / 12908snoozing for 3 before next link
Downloading link 4 of 10
100% [..........................................................] 12843 / 12843snoozing for 6 before next link
Downloading link 5 of 10
100% [..........................................................] 12881 / 12881snoozing for 4 before next link
Downloading link 6 of 10
100% [..........................................................] 12906 / 12906snoozing for 6 before next link
Downloading link 7 of 10
100% [..........................................................] 12816 / 12816snoozing for 3 before next link
Downloading link 8 of 10
100% [.................

In [13]:
## grabbing downloadable sections
docs_holder = soup.find_all("ul", class_="downloadable")

## extracting a tags from sections
all_doc_a_tags = [item.find_all("a") for item in docs_holder]

## flattening lists
import itertools
flat_a_tags = list(itertools.chain(*all_doc_a_tags))

## pulling url from a tags and combining with base url
doc_links = [base_url+item.get("href") for item in flat_a_tags]

In [15]:
## downloading all documents
links_total = len(doc_links)
link_count = 1

for link in doc_links:
    print(f"Downloading link {link_count} of {links_total}")
    link_count += 1
    wget.download(link)
    snoozer = randrange(3,7)
    print(f"snoozing for {snoozer} before next link")
    time.sleep(snoozer)

Downloading link 1 of 20
100% [................................................................] 76 / 76snoozing for 6 before next link
Downloading link 2 of 20
100% [................................................................] 66 / 66snoozing for 4 before next link
Downloading link 3 of 20
100% [................................................................] 70 / 70snoozing for 4 before next link
Downloading link 4 of 20
100% [................................................................] 63 / 63snoozing for 6 before next link
Downloading link 5 of 20
100% [................................................................] 66 / 66snoozing for 6 before next link
Downloading link 6 of 20
100% [................................................................] 66 / 66snoozing for 3 before next link
Downloading link 7 of 20
100% [................................................................] 69 / 69snoozing for 3 before next link
Downloading link 8 of 20
100% [.................

# 2. Universal conversion function
Rewrite your function from last week so it can do both:

- take individual string values like ```$12.24267```, ```10,201``` and ```$12,501``` and convert them into floating point numbers like 12.24, 10201.0 and 12501.0

- take string values in lists and convert them to floating point numbers. (reminder: you use a zip function).

Test it on the numbers above and in this list:

In [25]:
## list of string numbers
string_numbers = ["$12.24267", "10,201", "$12,501", "42,901", "$902,091"]

In [55]:
def StringConvert(a_string):
    return round(float(a_string.replace("$","").replace(",","_")),2)


In [57]:
list(map(StringConvert, string_numbers))

[12.24, 10201.0, 12501.0, 42901.0, 902091.0]

In [58]:
StringConvert("$12.24267")

12.24