# Project II: Pipelines (Parcel Estimator Tool)

## Purpose
The Parcel Estimator tool returns the following parcel data based on zip code:
- Average price per acre.
- Average price.
- average acreage.
- Most popular seller.
- listing popularity for the area.


## How to Use
Replace the variable "zip" with your given zip code and run all lines of code. The csv and charts exported will be found in the output folder.


## Tool Functionality
1. Import: Import previous tool data. 
2. Scrape: Scrape Landmodo for parcel data into dataframe.
3. Clean: Clean dataframe's new rows.
4. Visualize: Plot charts, identifying the differences in key metrics.
5. Export: Export dataframe as CSV and charts as PNG files.


## Resources
- Dataset: import previous csv's to compare.
- Landmodo: landmodo.com
- Libraries: Pandas, Numpy, BeautifulSoup, Matplotlib, Plotly

## 1. Import previous tool data.

In [1]:
pwd

'/Users/venice/Downloads/1.DS/projects/project-2-pipeline'

In [2]:
# Import general libraries
import pandas as pd
import numpy as np
from datetime import date

# import libraries necessary for web scraping
import requests
from bs4 import BeautifulSoup
from urllib.request import urlopen
import re

# Import data cleaning and enriching file
from src.cleaning import*

# This code ignores warnings from Pandas not to overwrite data ;)
pd.options.mode.chained_assignment = None

# import and check csv file
df = pd.read_csv ('input/LandModo_Data.csv', encoding='latin')
#df.sample(5)
df

Unnamed: 0,Zip Code,Location,State,Country,Parcel Size (acres),Parcel Price,Listing Name,Listing Author,Post Date
0,79847,Cornudas,TX,USA,20.18,18000.0,LAND FOR SALE â How to grow Old & Rich toget...,LandJakes,04/14/2022
1,79847,Cornudas,TX,USA,20.0,18000.0,Invest 20 Acres or Neighboring Lots in Cornuda...,LandJakes,03/04/2022
2,79847,Cornudas,TX,USA,20.0,18000.0,"Invest 20 Acres Land in Hudspeth County, TX. N...",LandJakes,03/04/2022
3,79847,Cornudas,TX,USA,20.0,20000.0,Summery Living in The Sunny Texas! Own 20 Acre...,LandJakes,03/04/2022
4,79847,Cornudas,TX,USA,20.0,18000.0,Invest 20 Acres or Neighboring Lots in Cornuda...,LandJakes,02/10/2022
5,79847,Cornudas,TX,USA,18.0,18000.0,"Invest 20 Acres Land in Hudspeth County, TX. N...",LandJakes,02/10/2022
6,79855,Van Horn,TX,USA,15.64,7500.0,"Van Horn, TX 79855",Auction Flippers LLC,06/06/2022
7,79855,Van Horn,TX,USA,11.0,5500.0,"Van Horn, TX 79855",Auction Flippers LLC,06/06/2022
8,79855,Van Horn,TX,USA,11.0,5500.0,"Van Horn, TX 79855",Auction Flippers LLC,06/06/2022
9,79855,Van Horn,TX,USA,11.0,5500.0,"Van Horn, TX 79855",Auction Flippers LLC,06/06/2022


In [3]:
# Check if all/how many rows are filled
missing_vals(df)

Missing values:

Zip Code               0
Location               0
State                  0
Country                0
Parcel Size (acres)    0
Parcel Price           0
Listing Name           0
Listing Author         0
Post Date              0
dtype: int64

Total values:

Zip Code               18
Location               18
State                  18
Country                18
Parcel Size (acres)    18
Parcel Price           18
Listing Name           18
Listing Author         18
Post Date              18
dtype: int64


## 2. Scrape Landmodo for parcel data.

### Request zip code to search.

In [4]:
# Request zip code input from the user, check if input is a 5 digit int
zip_code = input_zip() # test zip: 79847

Enter valid 5-digit Zip Code to search: 79847


### What information do I need to save?
9 criteria: Zip Code, Location, State, Country, parcel size (acres), parcel price, listing name, listing author, posting date

### Input zip code and loop through all pages.

In [5]:
# Set up the Landmodo url to scrape
pg = 1 # set page value to 1
url = f'https://www.landmodo.com/properties?page={pg}&q={zip_code}&property_status=Land+for+Sale'

# Loop through pages
while empty_page(url) is False:
    dfnew = pd.concat([df, new_search(url)], axis=0)
    pg +=1
    url = f'https://www.landmodo.com/properties?page={pg}&q={zip_code}&property_status=Land+for+Sale'
dfnew

Unnamed: 0,Zip Code,Location,State,Country,Parcel Size (acres),Parcel Price,Listing Name,Listing Author,Post Date
0,79847,Cornudas,TX,USA,20.18,18000.0,LAND FOR SALE â How to grow Old & Rich toget...,LandJakes,04/14/2022
1,79847,Cornudas,TX,USA,20.0,18000.0,Invest 20 Acres or Neighboring Lots in Cornuda...,LandJakes,03/04/2022
2,79847,Cornudas,TX,USA,20.0,18000.0,"Invest 20 Acres Land in Hudspeth County, TX. N...",LandJakes,03/04/2022
3,79847,Cornudas,TX,USA,20.0,20000.0,Summery Living in The Sunny Texas! Own 20 Acre...,LandJakes,03/04/2022
4,79847,Cornudas,TX,USA,20.0,18000.0,Invest 20 Acres or Neighboring Lots in Cornuda...,LandJakes,02/10/2022
5,79847,Cornudas,TX,USA,18.0,18000.0,"Invest 20 Acres Land in Hudspeth County, TX. N...",LandJakes,02/10/2022
6,79855,Van Horn,TX,USA,15.64,7500.0,"Van Horn, TX 79855",Auction Flippers LLC,06/06/2022
7,79855,Van Horn,TX,USA,11.0,5500.0,"Van Horn, TX 79855",Auction Flippers LLC,06/06/2022
8,79855,Van Horn,TX,USA,11.0,5500.0,"Van Horn, TX 79855",Auction Flippers LLC,06/06/2022
9,79855,Van Horn,TX,USA,11.0,5500.0,"Van Horn, TX 79855",Auction Flippers LLC,06/06/2022


In [6]:
# If no parcel data is found for the input zip code, request user to restart the tool.
if len(dfnew) == len(df):
    print('This zip code has no matches. Please restart the tool to try another zip code.')

## 3. Cleaning
See "cleaning.py" in the "src" folder. All cleaning is done inside of a separate cleaning file.

### Additional cleaning:
- Data with incorrect zip codes removed.
- Searches with zero hits warn the user of such.
- Missing values are replaced with null.

## 4. Visualize

Important information to chart:
- Average price, acreage per zip
- Price, Acreage range per zip
- Zip code heat map (by hits, by price per acre)
- previous charts, by state or sub-state county

### See "Visualization.ipynb" for continuation.

## 5. Export dataframe for use in Visualization file.

In [7]:
dfnew.to_csv(f'output/landmodo_search_{date.today()}.csv', index="False")