# LAND PRICE PREDICTION APP USING AWS SAGEMAKER - End-to-End
We will build a simple Land Price Prediction App to help people looking to buy land in Cameroon, get the expected price per quartier they intend to buy land from.
As seen in the Best Practices for Machine Learning Projects on AWS, the following steps will be taken:
- I)   SCRAPING THE DATA
- II)  IMPORTING THE DATA INTO SAGEMAKER 
- III) EXPLORATORY DATA ANALYSIS IN SAGEMAKER
- IV)  FEATURE ENGINEERING IN SAGEMAKER
- V)   PREDICTIVE MODEL BUILDING AND DEPLOYMENT IN SAGEMAKER
- VI)  MODEL INFERERENCE IN SAGEMAKER

### I) SCRAPING THE DATA
We will perform the following tasks in order to successully scrape the data we need
- a.) Importing the necessary Libraries 
- b.) Writing the ETL functions to obtain the data 
- c.) Scraping and storing the data to a dictionary
- d.)Saving the final scraped dataframe to a CSV file using pandas

#### a.) Importing the necessary Libraries to scrape the data

In [2]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import time

#### b.) Writing ETL functions to Extract and Load the data to a Dictionary

In [11]:
# Create function using Request and BeautifulSoup to get the URL of the pages we will need to scrape 
def get_urls(page_number):
    base_url = 'https://www.jumia.cm'
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.54 Safari/537.36'}
    request = requests.get(f'https://www.jumia.cm/en/land-plots?page={page_number}&xhr=ugmii', headers)
    soup = BeautifulSoup(request.text, 'html.parser')
    partial_url_list = soup.find_all('article')
    for partial_url in partial_url_list:
        new_url = base_url + partial_url.find('a')['href']
        url_list.append(new_url)
        print(f"Getting the Urls for page {page_number}")
    return

In [12]:
# Create function using BeautifulSoup to parse URLs from all the pages from the above function 
def extract_page(url):
    url = url
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.54 Safari/537.36'}
    request = requests.get(url, headers)
    soup = BeautifulSoup(request.text, 'html.parser')
    return soup

In [13]:
# Create function to obtain the data we need from all those URLs above and store in a dictionary
def transform_page(soup):
    main_div = soup.find('div', class_='twocolumn')
    price = main_div.find('span', {'class': 'price'}).get_text(strip=True)
    location = main_div.select('dl > dd')[1].text.strip()
    try:
        area = main_div.find_all('h3')[1].get_text(strip=True).replace('Area', "")
    except IndexError:
        area = ''

    items = {
        'Price': price,
        'Location': location,
        'Area': area
    }
    land_data_list.append(items)

    print(f"Scrapping the page '{soup.find('title').text}'...")
    return

#### c.) Scraping and Storing the data into a dictionary

In [14]:
# Extracting all the URLs for from page 1 to the number of pages required
url_list = []
for page_number in range(1, 2):
    get_urls(page_number)

Getting the Urls for page 1
Getting the Urls for page 1
Getting the Urls for page 1
Getting the Urls for page 1
Getting the Urls for page 1
Getting the Urls for page 1
Getting the Urls for page 1
Getting the Urls for page 1
Getting the Urls for page 1


In [15]:
#Extracting and Transfroming all the data from the required pages selected above
land_data_list = []
for url in url_list:
    page = extract_page(url)
    transform_page(page)

Scrapping the page 'Terrain  Titré à vendre Logbessou 200m2/1000m2 /500m2  | Douala | Jumia Deals'...
Scrapping the page 'Terrain Titré en Vente | Mfou | Jumia Deals'...
Scrapping the page 'Terrain  De 200 m² À Vendre | Mbalgong | Jumia Deals'...
Scrapping the page 'terrain titré à vendre à l'échangeur Ahala | Ahala | Jumia Deals'...
Scrapping the page 'Terrain Titré À Vendre À Nkolbisson (béatitude) | Nkolbisson | Jumia Deals'...
Scrapping the page 'Terrain à vendre  | Emana | Jumia Deals'...
Scrapping the page 'Terrain résidentiel à la cité chirac  | Yassa | Jumia Deals'...
Scrapping the page 'Terrain titré à vendre | Olembe | Jumia Deals'...
Scrapping the page 'Terrain De 500m² À Vendre | Bonaberi | Jumia Deals'...


  #### d.) Saving the scraped data as a CSV file using pandas   

In [18]:
# Exporting to CSV
df = pd.DataFrame(land_data_list)
print('Printing first 05 elements...')
print(df.head())
df.to_csv('land_price_data.csv',index = False)

Printing first 05 elements...
            Price    Location      Area
0      60,000FCFA      Douala   1000 m2
1         700FCFA        Mfou  30000 m2
2   1,500,000FCFA    Mbalgong    200 m2
3  10,000,003FCFA       Ahala    300 m2
4      10,000FCFA  Nkolbisson    500 m2
