## Exercise 4
Using the solutions from the previous exercises, write a script that retrieves the data from all categories and saves it into a csv file named `coderslab-shop-data.csv`.

The file should be a table with the following headings:
```
name | price | description_short | qty | category
```

Set `|` as the column separator 

### Hint
- if you want to track the progress of retrieving category data you can use the `tqdm` library (not discussed in class - [click](https://pypi.org/project/tqdm/)),
- change the default file encoding to write the currency symbol correctly (unless it has been removed earlier),
- it may be useful to set the appropriate new line character when opening the file.

In [1]:
# import required libraries here
import requests
from bs4 import BeautifulSoup
from tqdm import tqdm

r = requests.get('https://mystore-testlab.coderslab.pl/index.php?id_category=3&controller=category')
soup = BeautifulSoup(r.text, 'html.parser')

In [2]:
# copy-paste the previous functions definitions here

def get_categories_urls(url):
    # URL of the website to scrape
#     url = "https://mystore-testlab.coderslab.pl/index.php"

    # Fetch the HTML content
    response = requests.get(url)
    soup = BeautifulSoup(response.content, 'html.parser')

    # Find the panel with category names
    categories_panel = soup.find('ul', id='top-menu')
#     print(categories_panel)

    # Extract category names and URLs
    categories = []
    for link in categories_panel.find_all('a', {"data-depth": "0"}):
        name = link.get_text().replace('\ue313\n\ue316\n\n\n', '').strip()
        href = link.get('href')
        categories.append({"url": href, "name": name})

    return categories

def download_category_items(category_url):
    # Fetch the HTML content of the category page
    response = requests.get(category_url)
    soup = BeautifulSoup(response.content, 'html.parser')

    # Find all product elements
    products_panel = soup.find_all('article', class_='product-miniature')

    # Extract product URLs
    product_urls = []
    for product in products_panel:
        product_link = product.find('a', class_='thumbnail')['href']
        product_urls.append(product_link)

    return product_urls

def download_product_data(product_url):
    # Fetch the HTML content of the product page
    response = requests.get(product_url)
    soup = BeautifulSoup(response.content, 'html.parser')

    # Extract product name
    name_tag = soup.find('h1', class_='h1')
    name = name_tag.get_text().strip() if name_tag else 'N/A'

    # Extract price and currency
    price_tag = soup.find('span', class_='current-price-value')
    price = price_tag.get_text().strip() if price_tag else 'N/A'

    # Extract short description
    description_short_tag = soup.find('div', {'id': 'product-description-short-1'})
    description_short = description_short_tag.get_text().strip() if description_short_tag else 'N/A'

    # Extract quantity - Assuming default to 0 if not found
    qty_tag = soup.find('span', {'data-stock': True})
    qty = qty_tag['data-stock'] if qty_tag else '0'

    # Return the extracted data
    return {
        'name': name,
        'price': price,
        'description_short': description_short,
        'qty': qty
    }

In [3]:
# get the list of available categories here
base_url = 'https://prod-kurs.coderslab.pl/index.php'
category_urls = get_categories_urls(base_url)

In [4]:
# get product information here
all_products_data = []

for category in tqdm(category_urls, desc="Processing categories"):
    category_url = category['url']
    category_name = category['name']
    
    product_urls = download_category_items(category_url)  # Passing the URL string

    for product_url in tqdm(product_urls, desc=f"Processing products in {category_name}"):
        product_data = download_product_data(product_url)  # Passing the URL string
        product_data['category'] = category_name
        all_products_data.append(product_data)

Processing categories:   0%|          | 0/3 [00:00<?, ?it/s]

Processing products in Clothes: 100%|██████████| 5/5 [00:03<00:00,  1.47it/s]
Processing products in Accessories: 100%|██████████| 12/12 [00:08<00:00,  1.42it/s]
Processing products in Art: 100%|██████████| 7/7 [00:04<00:00,  1.50it/s]
Processing categories: 100%|██████████| 3/3 [00:19<00:00,  6.66s/it]


In [5]:
# save results to CSV file here

with open('coderslab-shop-data.csv', mode='w', newline='', encoding='utf-8-sig') as file:
    # Write headers
    file.write('name|price|description_short|qty|category\n')

    for product in all_products_data:
        file.write(f"{product['name']}|{product['price']}|{product['description_short']}|{product['qty']}|{product['category']}\n")