# Project 1.) First Cloud Function

### Description : Post a cloud function that takes in a string of numbers and returns a json file that contains the the sum of all of the single digit numbers

#### Example : input ="12345"
#### output = 1+2+3+4+5 = 15
#### returns({"answer":15})

In [1]:
import json

def sum_num(request):
    str_number = request.get_json().get('str_number')
    list_num = [int(s) for s in str_number if s.isdigit()]
    total_num = sum(list_num)

    return json.dumps({"Answer": total_num})

## 1.b.) Query your cloud function using requests for example input "012937", "2" and "9999999999999"

In [2]:
import requests

url = 'https://us-central1-seismic-aloe-387921.cloudfunctions.net/function-1'

r = requests.post(url, "012937")

r.text

'{"Answer": 22}'

In [3]:
r = requests.post(url, "2")

r.text

'{"Answer": 2}'

In [4]:
r = requests.post(url, "9999999999999")

r.text

'{"Answer": 117}'

# Project 2.) Automated Webscraping

### Description : Find a website that is scrapable with Beautiful soup that updates with some frequency. Build a cloud function to programatically scrape the useful content

In [5]:
from google.cloud import storage
import os
from io import StringIO
from bs4 import BeautifulSoup
import requests
import pandas as pd
from datetime import date

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "Credential.json"

In [6]:
client = storage.Client()
bucket_name = "econ446project2"
bucket = client.get_bucket(bucket_name)

df = pd.DataFrame()
df.to_csv("local_house.csv")

# File Path on Cloud
blob = bucket.blob("webscrape/house.csv")

# Local File path to post to the cloud
blob.upload_from_filename("local_house.csv")

In [29]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
from datetime import date

def get_prices_location():
    headers = {
        'User-Agent': "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9 "}
    URL = "https://www.apartmentfinder.com/California/Los-Angeles-Apartments"
    page = requests.get(URL, headers=headers)
    soup = BeautifulSoup(page.content, "html.parser")
    prices = soup.findAll("span", attrs={"class": "altRentDisplay layout-hidden-xs"})
    locations = soup.findAll("address", attrs={"class": "flex-12 address ellipses"})
    clean_p = [p.text for p in prices]
    address = [l.text for l in locations]

    low_prices = []
    high_prices = []

    for item in clean_p:
        clean_string = item.strip()
        clean_string = clean_string.replace("\r\n", "").replace(",", "").replace("$", "").replace(" ", "")
        prices = clean_string.split("-")

        try:
            low_price = int(prices[0].strip())
        except ValueError:
            low_price = "not sure"

        if len(prices) > 1:
            try:
                high_price = int(prices[1].strip())
            except ValueError:
                high_price = "not sure"
        else:
            high_price = "not sure"

        low_prices.append(low_price)
        high_prices.append(high_price)

    zipcodes = []

    for item in address:
        clean_string = item.strip()
        clean_loc = clean_string.replace("\r\n", "")
        zipcodes.append(clean_loc[-5:])

    df = pd.DataFrame({'Date': [date.today()] * len(clean_p),
                       'Low Price': low_prices,
                       'High Price': high_prices,
                       'Zipcode': zipcodes})

    df.set_index('Date', inplace=True)  # Set 'Date' as the index

    return df


In [30]:
get_prices_location() 

Unnamed: 0_level_0,Low Price,High Price,Zipcode
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2023-06-01,2950,5100,90017
2023-06-01,2574,5810,90017
2023-06-01,2640,5094,90010
2023-06-01,2950,5471,90017
2023-06-01,3100,3400,90046
2023-06-01,1636,6972,90004
2023-06-01,2247,25000,90017
2023-06-01,3217,13266,90048
2023-06-01,2877,6057,90045
2023-06-01,2225,5066,90028


## 2.b.) Query your stored files

In [9]:
# Take the data that is already on the cloud
def download_data():
    os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "Credential.json"

    client = storage.Client()
    bucket_name = "econ446project2"
    bucket = client.get_bucket(bucket_name)
    
    blob = bucket.blob("webscrape/house.csv")
    
    csv_data = blob.download_as_text()
    
    df = pd.read_csv(StringIO(csv_data))
    
    return(df)

In [10]:
#Post a new csv without storing locally
def post_data(request):
    old_data = download_data()
    new_data = get_prices_location()
    df = pd.concat([old_data,new_data])
    
    csv_data = df.to_csv(index = False)
    
    client = storage.Client()
    bucket_name = "econ446project2"
    bucket = client.get_bucket(bucket_name)
    
    blob.upload_from_string(csv_data)
    
    return({"status": 200,
           "length_data": len(df)})

In [11]:
post_data("")

{'status': 200, 'length_data': 25}

In [12]:
df = download_data()
df = df.drop("Unnamed: 0", axis=1)  # Drop the "Unnamed: 0" column
df = df.set_index("Date")  # Set the "Date" column as the index

df

Unnamed: 0_level_0,Low Price,High Price,Zipcode
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2023-06-01,2640,5094,90010
2023-06-01,2950,5471,90017
2023-06-01,3100,3400,90046
2023-06-01,1636,6972,90004
2023-06-01,2247,25000,90017
2023-06-01,3217,13266,90048
2023-06-01,2877,6057,90045
2023-06-01,2225,5066,90028
2023-06-01,3294,4512,90064
2023-06-01,3075,8100,90036


## 2.c.) State how this could be useful in a business setting

Market Analysis: The data includes price ranges and zip codes, which can be used to analyze the real estate market in different areas. By examining the low and high prices in each zip code, businesses can gain insights into pricing trends, demand, and market competitiveness.

Pricing Strategy: Businesses can use this data to determine competitive pricing for their properties or rentals. By comparing their prices with the average or median prices in the corresponding zip codes, they can adjust their pricing strategy to attract potential customers and remain competitive.

Forecasting and Trend Analysis: By analyzing historical data over time, businesses can identify pricing trends and make predictions about future market conditions. This can assist in decision-making related to property investments, sales projections, or lease renewals.

# Project 3.) 

### Description : Build some machine learning model using scikit learn and make it queriable using cloud functions

In [13]:
from sklearn.neural_network import MLPClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
from scipy.special import softmax
import joblib
from google.cloud import storage
import os
from io import StringIO
from io import BytesIO
import pandas as pd
import requests

In [14]:
bean = pd.read_excel("Dry_Bean_Dataset.xlsx")
bean

Unnamed: 0,Area,Perimeter,MajorAxisLength,MinorAxisLength,AspectRation,Eccentricity,ConvexArea,EquivDiameter,Extent,Solidity,roundness,Compactness,ShapeFactor1,ShapeFactor2,ShapeFactor3,ShapeFactor4,Class
0,28395,610.291,208.178117,173.888747,1.197191,0.549812,28715,190.141097,0.763923,0.988856,0.958027,0.913358,0.007332,0.003147,0.834222,0.998724,SEKER
1,28734,638.018,200.524796,182.734419,1.097356,0.411785,29172,191.272750,0.783968,0.984986,0.887034,0.953861,0.006979,0.003564,0.909851,0.998430,SEKER
2,29380,624.110,212.826130,175.931143,1.209713,0.562727,29690,193.410904,0.778113,0.989559,0.947849,0.908774,0.007244,0.003048,0.825871,0.999066,SEKER
3,30008,645.884,210.557999,182.516516,1.153638,0.498616,30724,195.467062,0.782681,0.976696,0.903936,0.928329,0.007017,0.003215,0.861794,0.994199,SEKER
4,30140,620.134,201.847882,190.279279,1.060798,0.333680,30417,195.896503,0.773098,0.990893,0.984877,0.970516,0.006697,0.003665,0.941900,0.999166,SEKER
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
13606,42097,759.696,288.721612,185.944705,1.552728,0.765002,42508,231.515799,0.714574,0.990331,0.916603,0.801865,0.006858,0.001749,0.642988,0.998385,DERMASON
13607,42101,757.499,281.576392,190.713136,1.476439,0.735702,42494,231.526798,0.799943,0.990752,0.922015,0.822252,0.006688,0.001886,0.676099,0.998219,DERMASON
13608,42139,759.321,281.539928,191.187979,1.472582,0.734065,42569,231.631261,0.729932,0.989899,0.918424,0.822730,0.006681,0.001888,0.676884,0.996767,DERMASON
13609,42147,763.779,283.382636,190.275731,1.489326,0.741055,42667,231.653248,0.705389,0.987813,0.907906,0.817457,0.006724,0.001852,0.668237,0.995222,DERMASON


In [15]:
bean_select = bean[["Area","Perimeter","MajorAxisLength","Class"]]
bean_select

Unnamed: 0,Area,Perimeter,MajorAxisLength,Class
0,28395,610.291,208.178117,SEKER
1,28734,638.018,200.524796,SEKER
2,29380,624.110,212.826130,SEKER
3,30008,645.884,210.557999,SEKER
4,30140,620.134,201.847882,SEKER
...,...,...,...,...
13606,42097,759.696,288.721612,DERMASON
13607,42101,757.499,281.576392,DERMASON
13608,42139,759.321,281.539928,DERMASON
13609,42147,763.779,283.382636,DERMASON


In [16]:
X = bean_select[["Area","Perimeter","MajorAxisLength"]]
y = bean["Class"]

In [17]:
# Standardize features
scaler = StandardScaler()
X = scaler.fit_transform(X)


# Initialize MLPClassifier
clf = KNeighborsClassifier(n_neighbors=5)


clf = clf.fit(X,y)

In [18]:
import joblib
joblib.dump(clf, "Neuron.sav") 

joblib.dump(scaler, "PrePro.sav")

['PrePro.sav']

In [19]:
# Store Model on Cloud
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "Gcredentials.json"
client = storage.Client() 
bucket = client.get_bucket("webscraping_machine_learning")
blob = bucket.blob("bean/neural_net.sav")
blob.upload_from_filename("Neuron.sav")

bucket = client.get_bucket("webscraping_machine_learning")
blob = bucket.blob("bean/preprocess.sav")
blob.upload_from_filename("PrePro.sav")

In [20]:
# Create Function to access the Model from Cloud
def load_scikit_model(file_name):
    bucket_name = "webscraping_machine_learning"
    source_blob = "bean/" + file_name
    
    os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "Gcredentials.json"
    client = storage.Client()
    
    bucket = client.get_bucket(bucket_name)
    blob = bucket.blob(source_blob)
    
    model_data = blob.download_as_bytes()
    
    model = joblib.load(BytesIO(model_data))
    return(model)

In [21]:
model = load_scikit_model("neural_net.sav")
preproc = load_scikit_model("preprocess.sav")

In [22]:
# Set up for the cloud function
import google
import joblib
import pandas
import requests
import sklearn
from urllib.parse import parse_qs
from google.cloud import storage
import os
from io import StringIO
from joblib import load
from io import BytesIO

In [23]:
# Create a cloud function
def dry_bean(request):
    try:
        model = load_scikit_model("neural_net.sav")
        preproc = load_scikit_model("preprocess.sav")
        print(request.get_data().decode())
        ### Format request into these parameters

        query_string = request.get_data().decode()
        dictionary = parse_qs(query_string)
        dictionary = {k: int(v[0]) for k, v in dictionary.items()}
        for key, value in dictionary.items():
            globals()[key] = value


        X = preproc.transform([[Area,Perimeter,MajorAxisLength]])

        prediction = model.predict(X)[0]

        return({"status" : 200,
               "prediction" : prediction})
    except:
        return({"status" : 501,
           "X" : X})

In [24]:
import session_info
session_info.show()

In [25]:
url = "https://us-central1-my-project-371104.cloudfunctions.net/Dry_Beans"
r = requests.post(url, { "Area" : 31209,
    "Perimeter" : 663,
    "MajorAxisLength" : 219})
r.text

'{"prediction":"SEKER","status":200}\n'

## 3.b.) Make a user-friendly input page that takes the inputs to your ML model and displays the output. Post to a sharable webpage. Link below

In [26]:
import ipywidgets as widgets
from IPython.display import display

In [27]:
text_area = widgets.Text(
    value = "",
    placeholder = "Type Bean Area",
    description = "Area",
    disabled = False)

text_perimeter = widgets.Text(
    value = "",
    placeholder = "Type Bean Perimeter",
    description = "Perimeter",
    disabled = False)

text_mal = widgets.Text(
    value = "",
    placeholder = "Type Bean MajorAxisLength",
    description = "MAL",
    disabled = False)

button = widgets.Button(description = "Click Here!")

def my_function(button):
    url = "https://us-central1-my-project-371104.cloudfunctions.net/Dry_Beans"
    r = requests.post(url, { "Area" : text_area.value,
    "Perimeter" : text_perimeter.value,
    "MajorAxisLength" : text_mal.value})
    
    print("Prediction of Bean Type: " , r.json()["prediction"])

button.on_click(my_function)

In [28]:
display(text_area)
display(text_perimeter)
display(text_mal)
display(button)

Text(value='', description='Area', placeholder='Type Bean Area')

Text(value='', description='Perimeter', placeholder='Type Bean Perimeter')

Text(value='', description='MAL', placeholder='Type Bean MajorAxisLength')

Button(description='Click Here!', style=ButtonStyle())

## 3.c.) Think of a company that would use the ML app you just built. What employees could use this app what would they use it for? Write a short paragraph.

The ML app I just built is for dry bean type prediction. In my opinion, the employees in bean processing factory may use this ML app to predict the bean type of the beans they just collected from farmers. After they classify the bean types, they can process different types of beans with different processing methods. At last, they will distributed these processed beans to retail markets for sale.