<a href="https://colab.research.google.com/github/Mainabryan/100-days-challenge-for-machine-learning-practise/blob/main/web_scraping(tutorials).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## 1. Importing BeautifulSoup and Creating an Instance

First, you need to import the `requests` library to fetch the web page content. If you don't have it installed, you can install it using pip:

In [2]:
%pip install requests



Then, you can use `requests.get()` to fetch the content of the web page and store it in the `page` variable. Replace the placeholder URL with the actual URL of the web page you want to scrape.

In [5]:
import requests
from bs4 import BeautifulSoup
page = requests.get("https://forecast.weather.gov/MapClick.php?lat=37.7772&lon=-122.4168")
soup = BeautifulSoup(page.content, 'html.parser')
seven_day = soup.find(id="seven-day-forecast")
forecast_items = seven_day.find_all(class_="tombstone-container")
tonight = forecast_items[0]
print(tonight.prettify())

<div class="tombstone-container">
 <p class="period-name">
  Tonight
 </p>
 <p>
  <img alt="Tonight: Partly cloudy, with a low around 51. West wind 16 to 21 mph, with gusts as high as 26 mph. " class="forecast-icon" src="newimages/medium/nsct.png" title="Tonight: Partly cloudy, with a low around 51. West wind 16 to 21 mph, with gusts as high as 26 mph. "/>
 </p>
 <p class="temp temp-low">
  Low: 51 °F
 </p>
 <p class="short-desc">
  Partly Cloudy
 </p>
</div>



In [6]:

period = tonight.find(class_="period-name").get_text()
short_desc = tonight.find(class_="short-desc").get_text()
temp = tonight.find(class_="temp").get_text()
print(period)
print(short_desc)
print(temp)

Tonight
Partly Cloudy
Low: 51 °F


In [7]:
img = tonight.find("img")
desc = img['title']
print(desc)


Tonight: Partly cloudy, with a low around 51. West wind 16 to 21 mph, with gusts as high as 26 mph. 


In [8]:
period_tags = seven_day.select(".tombstone-container .period-name")
periods = [pt.get_text() for pt in period_tags]
short_descs = [sd.get_text() for sd in seven_day.select(".tombstone-container .short-desc")]
temps = [t.get_text() for t in seven_day.select(".tombstone-container .temp")]
descs = [d["title"] for d in seven_day.select(".tombstone-container img")]
print(periods)
print(short_descs)
print(temps)
print(descs)

['Tonight', 'Monday', 'Monday Night', 'Tuesday', 'Tuesday Night', 'Wednesday', 'Wednesday Night', 'Juneteenth', 'Thursday Night']
['Partly Cloudy', 'BecomingSunny andBreezy', 'Mostly Clearand Breezythen MostlyClear', 'Sunny', 'Mostly Clear', 'Sunny', 'Mostly Clear', 'Sunny', 'Mostly Clear']
['Low: 51 °F', 'High: 64 °F', 'Low: 52 °F', 'High: 69 °F', 'Low: 54 °F', 'High: 71 °F', 'Low: 53 °F', 'High: 65 °F', 'Low: 53 °F']
['Tonight: Partly cloudy, with a low around 51. West wind 16 to 21 mph, with gusts as high as 26 mph. ', 'Monday: Partly sunny, then gradually becoming sunny, with a high near 64. Breezy, with a west wind 17 to 25 mph, with gusts as high as 32 mph. ', 'Monday Night: Mostly clear, with a low around 52. Breezy, with a west wind 13 to 22 mph, with gusts as high as 31 mph. ', 'Tuesday: Sunny, with a high near 69. West wind 7 to 12 mph increasing to 13 to 18 mph in the afternoon. Winds could gust as high as 24 mph. ', 'Tuesday Night: Mostly clear, with a low around 54. West w

In [9]:
import pandas as pd
weather = pd.DataFrame({
"period": periods,
"short_desc": short_descs,
"temp": temps,
"desc": descs
})

print(weather)

            period                              short_desc         temp  \
0          Tonight                           Partly Cloudy   Low: 51 °F   
1           Monday                 BecomingSunny andBreezy  High: 64 °F   
2     Monday Night  Mostly Clearand Breezythen MostlyClear   Low: 52 °F   
3          Tuesday                                   Sunny  High: 69 °F   
4    Tuesday Night                            Mostly Clear   Low: 54 °F   
5        Wednesday                                   Sunny  High: 71 °F   
6  Wednesday Night                            Mostly Clear   Low: 53 °F   
7       Juneteenth                                   Sunny  High: 65 °F   
8   Thursday Night                            Mostly Clear   Low: 53 °F   

                                                desc  
0  Tonight: Partly cloudy, with a low around 51. ...  
1  Monday: Partly sunny, then gradually becoming ...  
2  Monday Night: Mostly clear, with a low around ...  
3  Tuesday: Sunny, with a hig

In [10]:
temp_nums = weather["temp"].str.extract("(?P<temp_num>\d+)", expand=False)
weather["temp_num"] = temp_nums.astype('int')
mean_temp = weather["temp_num"].mean()
print(mean_temp)
is_night = weather["temp"].str.contains("Low")
night_forecasts = weather[is_night]
night_forecasts

59.111111111111114


Unnamed: 0,period,short_desc,temp,desc,temp_num
0,Tonight,Partly Cloudy,Low: 51 °F,"Tonight: Partly cloudy, with a low around 51. ...",51
2,Monday Night,Mostly Clearand Breezythen MostlyClear,Low: 52 °F,"Monday Night: Mostly clear, with a low around ...",52
4,Tuesday Night,Mostly Clear,Low: 54 °F,"Tuesday Night: Mostly clear, with a low around...",54
6,Wednesday Night,Mostly Clear,Low: 53 °F,"Wednesday Night: Mostly clear, with a low arou...",53
8,Thursday Night,Mostly Clear,Low: 53 °F,"Thursday Night: Mostly clear, with a low aroun...",53


Now you can run the original cell to create the BeautifulSoup instance: