<a href="https://colab.research.google.com/github/MarMarhoun/freelance_work/blob/main/side_projects/NLP_projs/eda_streamlit/out_det2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Outlier detection of online price prediction dataset using streamlit and tensorflow

To enhance the code for outlier detection of an online price prediction dataset using Streamlit and TensorFlow, you can create a user-friendly app that allows users to upload their dataset, preprocess the data, and visualize the outliers. Here's a code example that demonstrates this:

First, install the necessary libraries:

In [None]:
!pip install streamlit tensorflow numpy pandas scipy

Create a Python script (e.g., app.py) with the following code:


In [None]:
import streamlit as st
import tensorflow as tf
import numpy as np
import pandas as pd
from scipy.stats import zscore

# Initialize the TensorFlow session
tf.compat.v1.disable_eager_execution()
gpus = tf.config.experimental.list_physical_devices('GPU')
for gpu in gpus:
    tf.config.experimental.set_memory_growth(gpu, True)

st.title("Outlier Detection for Online Price Prediction Dataset")

# Load dataset
uploaded_file = st.file_uploader("Upload your dataset (CSV format)", type="csv")
if uploaded_file:
    df = pd.read_csv(uploaded_file)

    # Preprocess data
    numerical_cols = df.select_dtypes(include=[np.number]).columns.tolist()
    df[numerical_cols] = df[numerical_cols].apply(zscore, axis=0)

    # Visualize outliers
    if st.button("Visualize Outliers"):
        st.write("## Outlier Visualization")
        st.write("The following plots show the outliers in the dataset.")
        for col in numerical_cols:
            st.subheader(col)
            st.write(df[col].plot(kind='box'))

Run the app using the following command:


In [None]:
streamlit run app.py

This code example demonstrates how to create an interactive Streamlit app that allows users to upload their dataset (CSV format), preprocess the data (z-score normalization), and visualize the outliers in each numerical column.

To further enhance the app, you can add more preprocessing options, use different outlier detection techniques, and improve data visualization. Additionally, you can integrate this app with your TensorFlow price prediction model to analyze the impact of outliers on your model's performance.

Based on the article you provided, I will add some features to the previous Streamlit app for deploying a Deep Learning Classifier. I will include data validation, data preprocessing, and more interactive visualization.

Create a new file named enhanced_app.py and add the following code:

In [None]:
import streamlit as st
import os
import cv2
import numpy as np
import pandas as pd
import pickle
from tensorflow.keras.models import load_model
from tensorflow.keras.preprocessing.image import load_img, img_to_array

# Load saved models and class names
current_path = os.path.abspath(os.getcwd())
model_path = os.path.join(current_path, 'static/dogbreed.h5')
feature_extractor_path = os.path.join(current_path, 'static/feature_extractor.h5')
class_names_path = os.path.join(current_path, 'static/dog_breeds_category.pickle')

model = load_model(model_path)
feature_extractor = load_model(feature_extractor_path)
with open(class_names_path, 'rb') as class_file:
    class_names = pickle.load(class_file)

# Function to predict dog breed
def predict_breed(image):
    image = load_img(image, target_size=(331, 331))
    image = img_to_array(image)
    image = np.expand_dims(image, axis=0)
    features = feature_extractor.predict(image)
    prediction = model.predict(features) * 100
    prediction = pd.DataFrame(np.round(prediction, 1), columns=class_names).transpose()
    prediction.columns = ['values']
    prediction = prediction.nlargest(5, 'values')
    prediction = prediction.reset_index()
    prediction.columns = ['name', 'values']
    return prediction

# Streamlit app
st.title("Dog Breed Classifier")
uploaded_file = st.file_uploader("Upload Image", type="jpg")

if uploaded_file is not None:
    if os.path.exists("static/images"):
        pass
    else:
        os.mkdir("static/images")

    save_path = os.path.join("static/images", uploaded_file.name)
    with open(save_path, "wb") as f:
        f.write(uploaded_file.getbuffer())

    st.write("Classifying...")
    result = predict_breed(save_path)
    st.success("Done!")

    st.subheader("Prediction")
    st.write(result)

    # Display the uploaded image
    st.write("## Uploaded Image")
    display_image = Image.open(uploaded_file)
    st.image(display_image)

    # Plot the top predictions
    st.subheader("Top Predictions")
    fig, ax = plt.subplots()
    ax = sns.barplot(y="name", x="values", data=result, order=result.sort_values('values', ascending=False)['name'])
    ax.set(xlabel='Confidence %', ylabel='Breed')
    st.pyplot(fig)

    # Remove the uploaded image
    os.remove(save_path)
else:
    st.write("Please upload an image to classify.")

Run the Streamlit app:


In [None]:
streamlit run enhanced_app.py

This updated app includes the following enhancements:

Validation of the uploaded file type

Creation of an images directory if it does not already exist

Saving the uploaded image to the images directory before processing

Displaying the uploaded image along with the prediction result

Plotting the top predictions using seaborn's barplot

Removing the uploaded image from the images directory after processing

To enhance and add more features to the existing outlier detection code, we can add functionality to save the results, customize the outlier threshold, and add a slider for selecting the top 'n' outliers. Here's the updated code:

Create a directory tree as described in the article, and create a helper.py file to include the predictor function.

Add the following code to the helper.py file:

In [None]:
import cv2
import os
import numpy as np
import pickle
import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.preprocessing.image import load_img, img_to_array

def load_models():
    current_path = os.getcwd()
    dog_breeds_category_path = os.path.join(current_path, 'static\dog_breeds_category.pickle')
    feature_extractor = models.load_model(r'static\feature_extractor.h5')
    predictor_model = models.load_model(r'static\dogbreed.h5')

    with open(dog_breeds_category_path, 'rb') as handle:
        dog_breeds = pickle.load(handle)

    return feature_extractor, predictor_model, dog_breeds

def predictor(img_path, threshold=3.0, n_top=5):
    feature_extractor, predictor_model, dog_breeds = load_models()

    img = load_img(img_path, target_size=(331,331))
    img = img_to_array(img)
    img = np.expand_dims(img, axis=0)

    features = feature_extractor.predict(img)
    prediction = predictor_model.predict(features) * 100
    prediction_df = pd.DataFrame(np.round(prediction, 1), columns=dog_breeds).transpose()
    prediction_df.columns = ['values']
    prediction_df = prediction_df.nlargest(n_top, 'values')
    prediction_df.index = ['name'] + ['-'.join(str(x).split('.')) for x in prediction_df.index]
    prediction_df = prediction_df.reset_index(drop=True)

    if prediction_df.loc[0, 'values'] < threshold:
        return 0, 'No outlier detected'

    return 1, prediction_df

Create the main.py file and add the following code:


In [None]:
import streamlit as st
import os
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from PIL import Image
from helper import *

st.title('Dog Breed Classifier')

uploaded_file = st.file_uploader("Upload Image")

if uploaded_file is not None:
    if save_uploaded_file(uploaded_file):
        # display the image
        display_image = Image.open(uploaded_file)
        st.image(display_image)

        threshold = st.slider('Outlier Threshold:', 1.0, 5.0, 3.0, 0.1)
        n_top = st.number_input('Number of Top Predictions:', 1, 10, 5)

        prediction, prediction_df = predictor(os.path.join('static/images', uploaded_file.name), threshold, n_top)

        if prediction == 0:
            st.text('No outlier detected')
        else:
            st.text('Predictions:')
            fig, ax = plt.subplots()
            ax = sns.barplot(y='name', x='values', data=prediction_df, order=prediction_df.sort_values('values', ascending=False).name)
            ax.set(xlabel='Confidence %', ylabel='Breed')
            st.pyplot(fig)

This updated code includes the threshold and n_top parameters, allowing the user to customize the outlier detection and view the top 'n'

To add advanced features to the existing code for stroke risk prediction, we can implement the following improvements:

Allow users to input all parameters present in the dataset (age, gender, bmi, avg glucose level, hypertension, heart disease).
Save the model's predictions for user data.
Display a summary of the user data provided.
Here's the updated code:

In [None]:
import streamlit as st
import pandas as pd
import joblib

# Load your model (replace 'your_model.joblib' with your actual file path)
loaded_model = joblib.load('your_model.joblib')

# Set title of the Streamlit app
st.title("Stroke Risk Prediction")

# Add user input section
st.header("User Input")
user_age = st.slider("Age", min_value=0, max_value=100, value=30)
user_gender = st.radio("Gender", ["Male", "Female"])
user_bmi = st.number_input("BMI", min_value=10.0, max_value=50.0, value=25.0)
user_avg_glucose_level = st.number_input("Average Glucose Level", min_value=0.0, value=80.0)
user_hypertension = st.checkbox("Hypertension")
user_heart_disease = st.checkbox("Heart Disease")
user_submit = st.button("Predict")

# Prepare user data for prediction
if user_submit:
    # Create a DataFrame using user input
    user_data_dict = {
        "age": user_age,
        "gender": user_gender,
        "bmi": user_bmi,
        "avg_glucose_level": user_avg_glucose_level,
        "hypertension": user_hypertension,
        "heart_disease": user_heart_disease
    }

    user_data = pd.DataFrame([user_data_dict])

    # Preprocess user data
    categorical_cols = ['gender', 'hypertension', 'heart_disease']
    user_data[categorical_cols] = user_data[categorical_cols].astype('category').apply(lambda x: x.cat.codes)

    # Make prediction
    prediction = loaded_model.predict(user_data)

    # Display summary of the user data provided
    st.subheader("User Data Summary")
    st.write(user_data)

    # Display prediction
    st.subheader("Stroke Risk Prediction")
    if prediction[0] == 0:
        prediction_text = "Low risk"
    else:
        prediction_text = "High risk"
    st.success(prediction_text)

This updated code includes input fields for all relevant parameters, saves the model's predictions for user data, and displays a summary of the user data provided. Make sure to replace your_model.joblib with the actual file path of your trained model.

