# **Web Scraping Assignment - Data Analysis Course**

**Welcome** to the Web Scraping assignment. Your task is to scrape rental price data for apartments in Cluj-Napoca from the OLX website and create meaningful visual representations of the collected data. Use Python and relevant libraries to extract, process, and visualize the information.

---

### **Task Description**

The goal of this assignment is to scrape rental listings from OLX for apartments in Cluj-Napoca and analyze the relationship between apartment sizes, room categories, and rental prices. You will:

1. **Extract Data** from multiple pages of OLX rental listings in Cluj-Napoca. [Visit the website here](https://www.olx.ro/imobiliare/apartamente-garsoniere-de-inchiriat/cluj-napoca/).
2. **Visualize rental prices** as a function of apartment size.
3. **Categorize apartments by the number of rooms** and analyze price variations across these categories.
4. **Calculate average rental prices** for different size groups and room categories.

---

### **Data Collection & Web Scraping**

- Use **BeautifulSoup** and **Requests** to extract rental prices, apartment sizes, and room numbers from OLX listings.
- Scrape data from **multiple pages** to ensure a representative sample size.
- Organize the data into a structured format such as a **Pandas DataFrame** for easier analysis.

---

### **Data Visualization Requirements**

- Create **various visualizations** to effectively represent the data and highlight trends.
- Ensure the visualizations are easy to understand, with proper labels, titles, and legends.
- Focus on visualizing the relationship between rental prices, apartment sizes, and room categories.

---

### **Submission Instructions**

1. **Upload your solution to GitHub** in the same project as your previous assignment. If the dataset is too large to upload, provide a link to the data in your notebook.
2. Ensure that the code is well-documented and easy to follow.

---

### **Exploration and Tools**

For this assignment, you are encouraged to explore the following libraries:

- **Requests**
- **BeautifulSoup**

To get started with web scraping, check out these tutorials that provide step-by-step guidance:

- [Web Scraping Tutorial 1](https://colab.research.google.com/github/Giffy/AI_Intro-to-Machine-Learning/blob/master/Session-9/Intro_to_web_scraping.ipynb#scrollTo=nekvLjaS6WNz)  
- [Web Scraping Tutorial 2](https://colab.research.google.com/github/nestauk/im-tutorials/blob/3-ysi-tutorial/notebooks/Web-Scraping/Web%20Scraping%20Tutorial.ipynb#scrollTo=qK-EQGZ6DAuv)


**Minimalist Example**

Extracting all external links from the Wikipedia page about the blue whale:

In [1]:
%%capture
!pip install requests_html
!pip install lxml_html_clean
!pip install bs4

In [2]:
from requests_html import HTMLSession
from bs4 import BeautifulSoup
import requests

In [3]:
def get_page (website_url):
    session = HTMLSession()
    session.headers['user-agent'] = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36'
    response = session.get(website_url, headers=session.headers)
    return response.content

In [27]:
wiki_url = 'https://en.wikipedia.org/wiki/Blue_whale'
blue_whale_page = get_page(wiki_url)
soup = BeautifulSoup(blue_whale_page)

In [42]:
links = soup.find_all('a')

In [46]:
external_links = [link['href'] for link in links if 'href' in link.attrs and "https" in link['href']]