In this notebook, I scraped data using beautiful soup.

This code block imports libraries required for web scraping, data analysis, visualization, and machine learning tasks:

- **requests**: Sends HTTP requests to fetch web content, such as HTML pages, from websites.
- **BeautifulSoup**: Parses and navigates HTML content, enabling extraction of specific data elements from web pages.
- **pandas**: Provides tools for data manipulation and analysis, allowing structured representation of data in tabular formats like DataFrames.
- **numpy**: Adds support for numerical computing, including arrays and mathematical operations.
- **seaborn**: A statistical data visualization library based on Matplotlib, used for creating visually appealing and informative plots.
- **matplotlib.pyplot**: The plotting interface for Matplotlib, used to create static, interactive, and animated visualizations.
- **sklearn.model_selection.train_test_split**: A method from Scikit-learn for splitting datasets into training and testing subsets, crucial for building and evaluating machine learning models.


In [None]:
#import the needed libraries

import requests
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

This code block belows scrapes property data from the Nigeria Property Centre website and saves it as a CSV file:

- **Initialize empty lists**:  
  `names`, `prices`, `addresses`, and `info` are created to store the scraped property names, prices, addresses, and additional information.

- **Iterate through pages**:  
  The code loops through a single page of the property listing site (`page=1`) using a `for` loop. For larger datasets, increase the range to include more pages.

- **Send HTTP request**:  
  `requests.get` fetches the HTML content of the specified page URL, and BeautifulSoup parses it for data extraction.

1. **Scrape property names**:  
   - The `find_all` method locates HTML elements with the class `content-title`, which contain property names.  
   - The text of each element is appended to the `names` list.

2. **Scrape property prices**:  
   - Prices are located using the `price` class.  
   - Extracted prices are added to the `prices` list.

3. **Scrape property addresses**:  
   - Addresses are extracted from `<address>` tags with the `voffset-bottom-10` class.  
   - These are appended to the `addresses` list.

4. **Scrape additional information**:  
   - Details like bedrooms, bathrooms, and other auxiliary information are found using the `aux-info` class.  
   - These are stored in the `info` list.

- **Clean scraped prices**:  
  The code removes any entries in the `prices` list that contain symbols like `₦` or `$`.

- **Validate scraped data**:  
  The `len` function prints the lengths of the `names`, `prices`, `addresses`, and `info` lists to ensure data integrity.

- **Create a DataFrame**:  
  A pandas DataFrame, `df`, is constructed using the scraped data, with columns `Name`, `Price`, `Address`, and `Info`.

- **Save data to CSV**:  
  The DataFrame is exported to a CSV file named `Lagos_properties.csv` for further analysis or sharing.


In [None]:
names = []
prices = []
addresses = []
info = []

for i in range(1,2):
  url = 'https://nigeriapropertycentre.com/for-rent/flats-apartments/lagos/showtype?page='+str(i)
  response =requests.get(url)
  soup = BeautifulSoup(response.content,'html.parser')



  names_raw = soup.find_all('h4', class_='content-title')
  for i in names_raw:
    n = i.text
    names.append(n)



  prices_raw = soup.find_all('span', class_='price')
  for i in prices_raw:
    p = i.text
    prices.append(p)


  address_raw = soup.find_all('address', class_='voffset-bottom-10')
  for i in address_raw:
    a = i.text
    addresses.append(a)



  info_raw = soup.find_all('ul', class_='aux-info')
  for i in info_raw:
    a = i.text
    info.append(a)



# Remove both Naira and Dollar signs
prices = [item for item in prices if item not in ['₦', '$']]


print(len(names))
print(len(prices))

print(len(addresses))
print(len(info))

df = pd.DataFrame({'Name':names,'Price':prices,'Address':addresses,'Info':info})


df.to_csv('Lagos_properties.csv')

21
21
21
21
