#### Step 1

Importing requests to request URLs. I will be looking at the Met Office UK's website for data on High Temperatures in major Indian cities for the entire week.
This work is based on [Joe James' Python Web Scraping tutorial](https://www.youtube.com/watch?v=zD0FDYI5_rs)

In [16]:
import requests
r = requests.get('https://www.metoffice.gov.uk/weather/world/india/list')
print(len(r.text)) # Printing length to test

74029


#### Step 2 

Import Beautiful Soup and turn the HTML text into a bs4 object. I used the [Beautiful Soup Documentation here](https://beautiful-soup-4.readthedocs.io/en/latest/)

In [17]:
from bs4 import BeautifulSoup
soup = BeautifulSoup(r.text)
print(soup.title.string) # Printing the page title to test

India forecast locations - Met Office


Using the find_all method, the code finds all Links in the HTML with the tag 'a' and prints them out.

In [18]:
for link in soup.find_all('a'):
    print(link.get('href'))

#content
/
/weather
/research
/services
/about-us
/
/weather
/
/weather/maps-and-charts
/weather/forecast/uk
/weather/maps-and-charts/uk-weather-map
/weather/maps-and-charts/cloud-cover-map
/weather/maps-and-charts/precipitation-map
/weather/maps-and-charts/rainfall-radar-forecast-map
/weather/maps-and-charts/temperature-map
/weather/maps-and-charts/wind-map
/weather/maps-and-charts/wind-gusts-map
/weather/maps-and-charts/surface-pressure
/weather/world/list
/weather/climate
/weather/climate/climate-explained/index
/weather/climate-change/what-is-climate-change
/weather/climate-change/causes-of-climate-change
/weather/climate-change/effects-of-climate-change
/weather/climate-change/climate-change-questions
/weather/climate/uk-climate
/weather/climate/science
/weather/climate-change/organisations-and-reports
/weather/specialist-forecasts
/weather/specialist-forecasts/coast-and-sea
/weather/specialist-forecasts/mountain
/weather/specialist-forecasts/space-weather
/weather/learn-about
/we

#### Step 4

Next, the code creates base_url to save for later and an empty list called state_links to save all the state/city links from the HTML. It fetches all the links and appends them to state_link if they include '/weather/forecast' as all loinks to individual Indian states are in the format '/weather/forecast/string'. The only exception is 'weather/forecast/uk' which has a link to UK forecast, so the code ignores that.
Example page: [Agartala (India) weather - Met Office](https://www.metoffice.gov.uk/weather/forecast/wh0zu7npp#?date=2023-03-14)

In [5]:
base_url = 'https://www.metoffice.gov.uk'
state_links = []
for link in soup.find_all('a'):
    url = link.get('href')
    if url and '/weather/forecast/' in url and '/weather/forecast/uk' not in url:
        state_links.append(url)
print(len(state_links))

169


The rest of the code (before the final loop) only tests using the example page given above, which is the sixth page in the state_links. Steps 1 and 2 are repeated and the page is converted into a bs4 object

In [6]:
r = requests.get(base_url + state_links[5])
soup = BeautifulSoup(r.text)
print(soup.title.string)

Allahabad (India) weather - Met Office


#### Step 5

Next, it finds all span tags and saves them into a rows. This is where the High temperatures are saved in the page. The bs4 object is converted into a Python list and used to find all tags that include the keyword 'tab-temp-high' used in the HTML

In [7]:
rows = soup.find_all('span')
print(len(rows))

954


In [8]:
rows = [row for row in rows if 'tab-temp-high' in str(row)]
print(len(rows))
print(rows)

14
[<span class="tab-temp-high" data-value="33.44" title="Maximum daytime temperature">33°</span>, <span class="tab-temp-high" data-value="33.44" title="Maximum daytime temperature">33°</span>, <span class="tab-temp-high" data-value="33.51" title="Maximum daytime temperature">34°</span>, <span class="tab-temp-high" data-value="33.51" title="Maximum daytime temperature">34°</span>, <span class="tab-temp-high" data-value="31.54" title="Maximum daytime temperature">32°</span>, <span class="tab-temp-high" data-value="31.54" title="Maximum daytime temperature">32°</span>, <span class="tab-temp-high" data-value="29.79" title="Maximum daytime temperature">30°</span>, <span class="tab-temp-high" data-value="29.79" title="Maximum daytime temperature">30°</span>, <span class="tab-temp-high" data-value="29.61" title="Maximum daytime temperature">30°</span>, <span class="tab-temp-high" data-value="29.61" title="Maximum daytime temperature">30°</span>, <span class="tab-temp-high" data-value="28.13"

#### Step 7

Next, the code creates an empty list for high_temps and finds all the 'td's and appends them to the list to get a series of temperatures.

In [21]:
high_temps = []
for row in rows:
    tds = row.find_all('td')
    for i in range(1,14):
        high_temps.append(rows[i].text)
        
del high_temps[13:]
print(high_temps)

['32°', '32°', '32°', '32°', '32°', '29°', '29°', '27°', '27°', '27°', '27°', '29°', '29°']


Next, I used matplotlib to plot the temperatures from the list

In [27]:
import matplotlib.pyplot as plt

x = range(0,13)
y = high_temps

plt.plot(x,y)
plt.show()

TypeError: bad operand type for unary -: 'range'

#### Step 8

The code then finds the name of the state by picking up only the first word from the page title (name of city/state) which is in the format 'Allahabad (India) weather - Met Office' It then combines the name and the high_temps list into a dictionary.

## 

In [12]:
state = soup.title.string.split()[0]
print(state)

Allahabad


In [20]:
data = {}
data[state] = high_temps
print(data)

{'Vishakhapatnam': ['32°', '32°', '32°', '32°', '32°', '29°', '29°', '27°', '27°', '27°', '27°', '32°', '32°']}


### Final Loop

After the proof of concept for one page, I run all the steps 1-8 in a loop to get temperature data for every single state link on the page. This is then saved into a CSV file called high_temps.csv using the csv module

In [14]:
data = {}
for state_link in state_links:
    url = base_url + state_link
    r = requests.get(base_url + state_link)
    soup = BeautifulSoup(r.text)
    rows = soup.find_all('span')
    rows = [row for row in rows if 'tab-temp-high' in str(row)]
    high_temps = []
    for row in rows:
        tds = row.find_all('td')
        for i in range(1,12):
            high_temps.append(rows[i].text)
    del high_temps[13:]
    s = soup.title.string
    state = soup.title.string.split()[0]
    data[state] = high_temps
print(data)

{'Agartala': ['33°', '32°', '32°', '32°', '32°', '32°', '32°', '30°', '30°', '28°', '28°', '33°', '32°'], 'Agra': ['33°', '33°', '33°', '31°', '31°', '29°', '29°', '28°', '28°', '28°', '28°', '33°', '33°'], 'Ahmadabad': ['39°', '35°', '35°', '36°', '36°', '35°', '35°', '35°', '35°', '34°', '34°', '39°', '35°'], 'Aizawl': ['25°', '25°', '25°', '25°', '25°', '26°', '26°', '24°', '24°', '22°', '22°', '25°', '25°'], 'Akola': ['35°', '36°', '36°', '32°', '32°', '32°', '32°', '31°', '31°', '31°', '31°', '35°', '36°'], 'Allahabad': ['33°', '34°', '34°', '32°', '32°', '30°', '30°', '30°', '30°', '28°', '28°', '33°', '34°'], 'Ambala': ['30°', '30°', '30°', '30°', '30°', '27°', '27°', '25°', '25°', '26°', '26°', '30°', '30°'], 'Aminidivi': ['34°', '33°', '33°', '33°', '33°', '33°', '33°', '33°', '33°', '33°', '33°', '34°', '33°'], 'Amravati': ['34°', '35°', '35°', '31°', '31°', '31°', '31°', '30°', '30°', '30°', '30°', '34°', '35°'], 'Amritsar': ['31°', '31°', '31°', '30°', '30°', '27°', '27°', 

In [19]:
import csv

with open('high_temps.csv','w') as f:
    w = csv.writer(f)
    w.writerows(data.items())