# Week 1: Exercises

**Web and Social Network Analytics**

---

Complete these exercises to practice web scraping techniques covered in the lecture notes.

**Disclaimer**: This educational content is provided for instructional purposes only. Always respect website terms of service and legal requirements when scraping.

---

## Setup

Run the cell below to import all required libraries.

In [None]:
# Standard libraries
import os
import time

# Web scraping
from urllib.request import urlopen
from bs4 import BeautifulSoup
import requests

# Data handling
import pandas as pd

# For dynamic scraping (Exercise 3)
# from playwright.sync_api import sync_playwright

print('Libraries imported successfully!')

---

# Exercise 1: BeautifulSoup Basics (Easy)

## Task

Extract the **SCQF Level** from a course page on the University of Edinburgh's DRPS website.

**URL**: `http://www.drps.ed.ac.uk/24-25/dpt/cxcmse11427.htm`

**Expected Output**: The cell containing text like "SCQF Level 11"

## Skills Practiced
- Fetching web pages with `urlopen`
- Parsing HTML with BeautifulSoup
- Finding elements by tag and class
- Searching for specific text content

## Hints

<details>
<summary>Hint 1: How to fetch the page</summary>

```python
url = 'http://www.drps.ed.ac.uk/24-25/dpt/cxcmse11427.htm'
html = urlopen(url)
soup = BeautifulSoup(html, 'html.parser')
```
</details>

<details>
<summary>Hint 2: Finding the table</summary>

The SCQF information is in a table with class `sitstablegrid`. Use:
```python
table = soup.find('table', {'class': 'sitstablegrid'})
```
</details>

<details>
<summary>Hint 3: Searching for text</summary>

Loop through cells and check if 'SCQF' is in the text:
```python
for cell in table.find_all('td'):
    if 'SCQF' in cell.text:
        # Found it!
```
</details>

In [None]:
# Exercise 1: Your code here

# Step 1: Fetch the page
url = 'http://www.drps.ed.ac.uk/24-25/dpt/cxcmse11427.htm'

# Your code below:



---

# Exercise 2: Multi-Item Scraping (Medium)

## Task

Scrape quotes from **quotes.toscrape.com** and create a pandas DataFrame.

**Requirements**:
1. Scrape the **first 3 pages** of quotes
2. For each quote, extract:
   - The quote text
   - The author name
   - The tags (as a list)
3. Store in a DataFrame with columns: `text`, `author`, `tags`
4. Add a 1-second delay between page requests

**Expected**: A DataFrame with ~30 quotes (10 per page)

## Skills Practiced
- Handling pagination
- Extracting multiple data points
- Building structured datasets
- Respectful scraping with delays

## Hints

<details>
<summary>Hint 1: URL pattern for pagination</summary>

Pages follow the pattern:
- Page 1: `https://quotes.toscrape.com/page/1/`
- Page 2: `https://quotes.toscrape.com/page/2/`
- etc.
</details>

<details>
<summary>Hint 2: Finding quote elements</summary>

Each quote is in a `div` with class `quote`:
- Text: `span` with class `text`
- Author: `small` with class `author`
- Tags: `a` elements with class `tag`
</details>

<details>
<summary>Hint 3: Getting tags as a list</summary>

```python
tags = [tag.text for tag in quote.find_all('a', {'class': 'tag'})]
```
</details>

In [None]:
# Exercise 2: Your code here

all_quotes = []

# Loop through pages 1 to 3
for page_num in range(1, 4):
    url = f'https://quotes.toscrape.com/page/{page_num}/'
    print(f'Scraping page {page_num}...')
    
    # Your code below:
    
    
    # Don't forget to add a delay!

# Create DataFrame
# df = pd.DataFrame(all_quotes)
# df.head(10)

In [None]:
# Display your results
# print(f'Total quotes: {len(df)}')
# df.head()

---

# Exercise 3: Dynamic Content with Playwright (Medium)

## Task

Scrape quotes from the **JavaScript-rendered** version of the quotes website: `https://quotes.toscrape.com/js/`

**Requirements**:
1. Use Playwright to render the JavaScript content
2. Extract all quotes from the first page
3. Create a DataFrame with `text` and `author` columns

**Why is this different?**
The `/js/` version loads quotes using JavaScript. BeautifulSoup alone won't see them!

## Skills Practiced
- Setting up Playwright
- Waiting for dynamic content
- Combining Playwright with BeautifulSoup

## Hints

<details>
<summary>Hint 1: Setup Playwright</summary>

```python
# Install if needed:
# !pip install playwright
# !playwright install chromium

from playwright.sync_api import sync_playwright
```
</details>

<details>
<summary>Hint 2: Basic Playwright pattern</summary>

```python
with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    page = browser.new_page()
    page.goto(url)
    page.wait_for_selector('.quote')  # Wait for content
    html = page.content()
    browser.close()
```
</details>

<details>
<summary>Hint 3: Then use BeautifulSoup</summary>

After getting the HTML from Playwright:
```python
soup = BeautifulSoup(html, 'html.parser')
quotes = soup.find_all('div', {'class': 'quote'})
```
</details>

In [None]:
# Exercise 3: Setup
# Uncomment these lines if you haven't installed Playwright yet:
# !pip install playwright
# !playwright install chromium

In [None]:
# Exercise 3: Your code here

from playwright.sync_api import sync_playwright

url = 'https://quotes.toscrape.com/js/'

# Your code below:



In [None]:
# Display your results


---

# Exercise 4: API Data Retrieval (Easy-Medium)

## Task A: Weather API (Required)

Use the **Open-Meteo API** to get current weather for 3 Scottish cities.

**Cities and coordinates**:
- Edinburgh: (55.95, -3.19)
- Glasgow: (55.86, -4.25)
- Aberdeen: (57.15, -2.11)

**Requirements**:
1. Fetch current weather for each city
2. Extract temperature and wind speed
3. Create a DataFrame with columns: `city`, `temperature`, `windspeed`

## Skills Practiced
- Making API requests
- Parsing JSON responses
- Working with API parameters

## Hints

<details>
<summary>Hint 1: API endpoint and parameters</summary>

```python
url = 'https://api.open-meteo.com/v1/forecast'
params = {
    'latitude': 55.95,
    'longitude': -3.19,
    'current_weather': True
}
```
</details>

<details>
<summary>Hint 2: Making the request</summary>

```python
response = requests.get(url, params=params)
data = response.json()
```
</details>

<details>
<summary>Hint 3: Accessing weather data</summary>

```python
temperature = data['current_weather']['temperature']
windspeed = data['current_weather']['windspeed']
```
</details>

In [None]:
# Exercise 4A: Your code here

import requests

cities = {
    'Edinburgh': (55.95, -3.19),
    'Glasgow': (55.86, -4.25),
    'Aberdeen': (57.15, -2.11)
}

weather_data = []

# Your code below:



In [None]:
# Display your results
# weather_df = pd.DataFrame(weather_data)
# weather_df

## Task B: Google Maps API (Optional - Advanced)

If you have a Google Cloud API key, try to fetch place details.

**Note**: This requires setting up Google Cloud Platform and enabling the Places API.

**Task**:
1. Get details for "Edinburgh Castle" 
2. Extract the rating and number of reviews
3. Print the first 3 reviews (if available)

In [None]:
# Exercise 4B (Optional): Your code here

# api_key = 'YOUR_API_KEY_HERE'  # Replace with your key

# Your code below:



---

# Bonus Challenge

## Task: Combine Multiple Techniques

Create a function that:
1. Takes a city name as input
2. Uses Open-Meteo API to get current weather
3. Uses web scraping to get additional info (your choice of source)
4. Returns a dictionary with combined information

**This is open-ended** - be creative!

In [None]:
# Bonus Challenge: Your code here

def get_city_info(city_name, latitude, longitude):
    """
    Get comprehensive information about a city.
    
    Args:
        city_name: Name of the city
        latitude: City latitude
        longitude: City longitude
    
    Returns:
        Dictionary with city information
    """
    info = {'city': city_name}
    
    # Your code below:
    
    return info

# Test your function
# result = get_city_info('Edinburgh', 55.95, -3.19)
# print(result)

---

## Submission Checklist

Before submitting, make sure:

- [ ] Exercise 1 outputs the SCQF level
- [ ] Exercise 2 produces a DataFrame with ~30 quotes
- [ ] Exercise 3 successfully scrapes the JS-rendered page
- [ ] Exercise 4A shows weather for 3 cities
- [ ] All code cells run without errors
- [ ] You've added comments explaining your approach

---

*See Week1-Exercise-Solutions.ipynb for complete solutions*