# The Basics of Web Scraping Using Python. 

In [6]:
from bs4 import BeautifulSoup as bs
import pandas as pd
import time
import requests

Using a get request to access the South African Weather Services. 

In [4]:
webpage = requests.get("https://www.weathersa.co.za")

A get request with a status 200 shows that the requests has succeed.

In [5]:
webpage

<Response [200]>

Defining Beautiful and extracting the html content.

In [7]:
soup = bs(webpage.content)
soup

<html>
<head>
<meta charset="utf-8"/>
<meta content="width=device-width, initial-scale=1.0" name="viewport"/>
<meta content="IE=11.0" http-equiv="X-UA-Compatible"/>
<title>SAWS Home - WeatherSA Portal</title>
<script src="https://api.tiles.mapbox.com/mapbox-gl-js/v1.1.1/mapbox-gl.js"></script>
<link href="https://api.tiles.mapbox.com/mapbox-gl-js/v1.1.1/mapbox-gl.css" rel="stylesheet"/>
<link href="/css/carouseller.css" rel="stylesheet"/>
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/2.1.3/jquery.min.js"></script>
<script src="/js/carouseller.js"></script>
<script src="/js/jquery.easing.1.3.js"></script>
<link crossorigin="anonymous" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/css/bootstrap.min.css" integrity="sha384-1q8mTJOASx8j1Au+a5WDVnPi2lkFfwwEAa8hDDdjZlpLegxhjVME1fgjWPGmkzs7" rel="stylesheet"/>
<link href="https://maxcdn.bootstrapcdn.com/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet"/>
<link href="/lib/leaflet/dist/leaflet.css" rel="styleshee

Displaying all the classes without isolating a specific one.

In [10]:
soup.find_all(class_='')

[<html>
 <head>
 <meta charset="utf-8"/>
 <meta content="width=device-width, initial-scale=1.0" name="viewport"/>
 <meta content="IE=11.0" http-equiv="X-UA-Compatible"/>
 <title>SAWS Home - WeatherSA Portal</title>
 <script src="https://api.tiles.mapbox.com/mapbox-gl-js/v1.1.1/mapbox-gl.js"></script>
 <link href="https://api.tiles.mapbox.com/mapbox-gl-js/v1.1.1/mapbox-gl.css" rel="stylesheet"/>
 <link href="/css/carouseller.css" rel="stylesheet"/>
 <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/2.1.3/jquery.min.js"></script>
 <script src="/js/carouseller.js"></script>
 <script src="/js/jquery.easing.1.3.js"></script>
 <link crossorigin="anonymous" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/css/bootstrap.min.css" integrity="sha384-1q8mTJOASx8j1Au+a5WDVnPi2lkFfwwEAa8hDDdjZlpLegxhjVME1fgjWPGmkzs7" rel="stylesheet"/>
 <link href="https://maxcdn.bootstrapcdn.com/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet"/>
 <link href="/lib/leaflet/dist/leaflet.css"

Isolating the row class from the html.

In [8]:
soup.find_all(class_='row')

[<div class="row">
 <div class="col-md-2 sawslogo">
 <a href="/"> <img src="/images/SAWS logoFINAL.png" width="80%"/></a>
 </div>
 <div class="col-md-7">
 <!-- /72803759/WSA_Homepage_300x250_Middle_RHS -->
 <div id="div-gpt-ad-1518613981347-0">
 <!-- /72803759/WSA_Homepage_728x90_Top -->
 <div id="div-gpt-ad-1518613981347-0">
 <script>
                                 googletag.cmd.push(function () { googletag.display('div-gpt-ad-1518613981347-0'); });
                             </script>
 </div>
 </div>
 </div>
 <div class="col-md-1"></div>
 <div class="col-md-2 dealogo">
 <a href="https://www.environment.gov.za/" target="_blank"><img class="img-responsive" src="/images/DEFF_LOGO.png" width="100%"/></a>
 </div>
 </div>,
 <div class="row">
 <div class="col-md-8">
 <form action="/" class="search-form cmxform" id="form1" method="post" role="search">
 <div class="box">
 <div class="container-1">
 <fieldset>
 <input autocomplete="off" class="customPlaceholder" id="search" minlength="3" n

Within the row class we have the geographical locations.

In [13]:
locations = [i.text for i in soup.find_all(class_='row')]

When we explore the locations variable we see all the different geographical locations.

In [14]:
locations 

['\n\n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n',
 "\n\n\n\n\n\n\n\n\n\n\n\n\nSt Faith's (Cathula)\nHarding\nNew Hanover\nDumisa\nImpendle\nCamperdown\nRichmond\nEkuvukeni\nSobabili\nGoodhome\nDundee\neNdumeni\nNqutu\nNquthu\nPomeroy\nUtrecht\nDannhauser\nPaulpietersburg\nNongoma\nGaries\nHluhluwe\nHlabisa\nKwa Mbonambi\nMelmoth\nNkandla\nKwaDukusa\nStanger\nNdwedwe\nMaphumulo\nUnderberg\nUmzimkhulu\nDurban (Greyville)\nHillcrest\nMount Edgecombe\nAmanzimtoti\nPort Shepstone\nMargate\nCape Vidal\nJansenville\nPeddie\nAdelaide\nTarkastad\nMolteno\nLady Frere\nCofimvaba\nKomga\nButterworth\nDutywa\nLibode\nLusikisiki\nBizana\nNtabankulu\nTsolo\nMount ayliff\nMaclear\nBurgersdorp\nPlettenberg Bay\nTsitsikama\nCape St Francis\nCape Recife\nCape Padrone\nPort Alfred\nNgqushwa\nBuffalo City\nKei River\nWavecrest\nThe Haven\nCoffee Bay\nPort St Johns\nMkambati\nWild Coast Sun\nCeres\nSwellendam\nPrince Albert\nPort Nolloth\nKleinzee\nHondeklip Bay\nStrandfontein\nLamberts Bay\nCape Columbine\nLangebaan

Removing the ends locations content.

In [15]:
locations = locations [2:-2]

In [17]:
locations

 '\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n Marine Website Successfully Launched \n SAWS is proud to announce the launch of the new marine website. The highlights of the site include the recently operational, high-resolution Storm Surge and Wave Forecast models  \n\n\n\n\n\n\n\n Our New Feature: Storm Tracker \n \n\n\n\n\n\n\n\n  Twitter Feeds\n\n\n\n\nTweets by SAWeatherServic\n\n\n\n\n\n']

The data is split into the different locations and corresponding weather forecasts. 

In [48]:
for c in locations:
    split_lines = c.split(",")
    print(split_lines[0])
    






Popular Cities 
 Today 
 Friday 
 Saturday 




Pretoria
25° 19°  77 °  66 ° 
24° 17°  75 °  63 ° 
28° 17°  82 °  63 ° 


Johannesburg
23° 18°  73 °  64 ° 
22° 15°  72 °  59 ° 
24° 15°  75 °  59 ° 


Cape Town
25° 17°  77 °  63 ° 
25° 16°  77 °  61 ° 
24° 19°  75 °  66 ° 


Durban
24° 19°  75 °  66 ° 
26° 21°  79 °  70 ° 
26° 22°  79 °  72 ° 


Bloemfontein
27° 15°  81 °  59 ° 
29° 15°  84 °  59 ° 
30° 17°  86 °  63 ° 


Polokwane
25° 19°  77 °  66 ° 
24° 17°  75 °  63 ° 
32° 13°  90 °  55 ° 


Upington
34° 20°  93 °  68 ° 
33° 19°  91 °  66 ° 
36° 19°  97 °  66 ° 


Port Elizabeth
25° 20°  77 °  68 ° 
24° 17°  75 °  63 ° 
24° 18°  75 °  64 ° 


East London
22° 17°  72 °  63 ° 
25° 18°  77 °  64 ° 
25° 18°  77 °  64 ° 






 Weather Alerts 





 Disruptive Rain   Maquassi Hills / Wolmaranstad 
 2021/01/27 12:00:00 AM to 2021/01/29 11:59:59 PM 
 Localized flash flooding of low-lying areas















 Marine Website Successfully Launched 
 SAWS is proud to announce the launch 

In [50]:
Weather_forecasts = pd.DataFrame(data=split_lines[0])