## Collecting Data 

#web scraping - Beautiful Soup 

Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works by parsing the HTML into a tree structure, which can then be easily navigated and searched. Beautiful Soup is a powerful tool for web scraping, but it can also be used to extract data from other types of documents, such as PDF files and Word documents.

Installation

To install Beautiful Soup, you can use the pip package manager:

```python 
pip install beautifulsoup4
```

To use Beautiful Soup, you first need to create a BeautifulSoup object from the HTML or XML document that you want to parse. You can do this by passing the path to the file to the BeautifulSoup() constructor:

In [16]:
import requests
from bs4 import BeautifulSoup

url = "https://www.sakile.co.ke/"

response = requests.get(url) 


content = response.content()





TypeError: 'bytes' object is not callable

## **Data Collection, Processing and Analysis**


In [8]:
import pandas as pd

data = pd.read_excel("data/sample.xlsx")

In [9]:
data.head(10)

Unnamed: 0,first_name,last_name,occupation,age,location,phone_number
0,Orelia,Godman,Sales Representative,80,0803 Nevada Parkway,915-517-9908
1,Ulric,Village,Budget/Accounting Analyst I,18,5949 Debra Drive,206-741-4241
2,Eal,Zappel,Operator,57,71 Lillian Place,932-565-8912
3,Cortney,Stairs,Assistant Media Planner,44,1 Messerschmidt Trail,543-599-8289
4,Sinclair,Simonian,Database Administrator III,52,8688 Homewood Center,660-563-7827
5,Adrea,Domingues,Staff Scientist,37,92 Dapin Circle,856-449-3959
6,Tasha,Liley,Business Systems Development Analyst,42,07918 Village Lane,852-272-4554
7,Codee,Rotherforth,Media Manager III,80,029 Pierstorff Drive,122-589-6462
8,Robbin,Forte,Systems Administrator III,61,4561 Melvin Terrace,239-411-6043
9,Gerty,Brotherhead,General Manager,55,7295 Ryan Place,758-492-7773


#Data Collection,
- 

#Processing 


#Analysis 

In [18]:
import csv
import time
import requests
from bs4 import BeautifulSoup

url_base = 'https://www.buyrentkenya.com/houses-for-rent?page='
headers = {'User-Agent': 'Mozilla/5.0'}

houses_data = []

for page in range(1, 125):
    url = url_base + str(page)
    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.content, 'html.parser')

    listings = soup.find_all('div', class_='px-5') 

    for listing in listings:
        title = listing.find('span', class_='hidden').text.strip()
        location = listing.find('p', class_='text-sm').text.strip()
        size = listing.find('a', class_='text-grey-500').text.strip() 
        price = listing.find('div', {'class': 'flex justify-between items-center'}).text.strip()

        houses_data.append({'title': title, 'location': location, 'size': size, 'rental fee': price})


# Save the data to a CSV file
csv_file_path = 'data/houses-for-rent.csv'
with open(csv_file_path, 'w', newline='', encoding='utf-8') as f:
    csv_writer = csv.DictWriter(f, fieldnames=['title', 'location', 'size', 'rental fee'])
    csv_writer.writeheader()
    csv_writer.writerows(houses_data)
