# Streamlit for Sale Predictions - Part A
**Student**: Matthew Malueg

**Tasks**
- In a notebook, prepare the dataset and save as a new .csv file for app
    - Correct items in Item_Fat_Content column
    - Drop Item_Identifier, Outlet_Identifier, Outlet_Establishment_Year
- In a .py file, the app should include
    - Title
    - Markdown header for each section
        - Interactive Pandas datatframe of prepared dataset
        - Button to trigger display of dataframe of Descriptive Statistics
        - Button to trigger display of Summary Information (output of .info)
        - Button to trigger display of Null Values

In [2]:
# Imports
import streamlit as st
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from io import StringIO

In [4]:
# Load in the file
fpath =  "Data/sales_predictions_2023.csv"
df = pd.read_csv(fpath)

In [5]:
# Drop unneeded columns
cols_to_drop = ['Item_Identifier', 'Outlet_Identifier', 'Outlet_Establishment_Year']
df = df.drop(columns=cols_to_drop)
df.head()

Unnamed: 0,Item_Weight,Item_Fat_Content,Item_Visibility,Item_Type,Item_MRP,Outlet_Size,Outlet_Location_Type,Outlet_Type,Item_Outlet_Sales
0,9.3,Low Fat,0.016047,Dairy,249.8092,Medium,Tier 1,Supermarket Type1,3735.138
1,5.92,Regular,0.019278,Soft Drinks,48.2692,Medium,Tier 3,Supermarket Type2,443.4228
2,17.5,Low Fat,0.01676,Meat,141.618,Medium,Tier 1,Supermarket Type1,2097.27
3,19.2,Regular,0.0,Fruits and Vegetables,182.095,,Tier 3,Grocery Store,732.38
4,8.93,Low Fat,0.0,Household,53.8614,High,Tier 3,Supermarket Type1,994.7052


In [6]:
# Address inconsistencies
df['Item_Fat_Content'].value_counts()

Low Fat    5089
Regular    2889
LF          316
reg         117
low fat     112
Name: Item_Fat_Content, dtype: int64

In [7]:
# Remap values so they are uniform
mapping_dict = {'LF': 'Low Fat', 'low fat': 'Low Fat', 'reg': 'Regular'}
df['Item_Fat_Content'] = df['Item_Fat_Content'].replace(mapping_dict)
df['Item_Fat_Content'].value_counts()

Low Fat    5517
Regular    3006
Name: Item_Fat_Content, dtype: int64

In [9]:
# Save processed dataset as new .csv
# df.to_csv('Data/sales_2023_cleaned.csv', index=False)

In [11]:
"""

### Streamlit_Project_1_Core_A.py for Wk3 of Advanced Machine Learning
# Create a simple app to demonstrate the use of Streamlit features

# Imports
import streamlit as st
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from io import StringIO

# Function for loading data
# Adding data caching
@st.cache_data
def load_data():
    fpath =  "Data/sales_2023_cleaned.csv"
    df = pd.read_csv(fpath)
    return df

# load the data 
df = load_data()

##################################

# Add title
st.title("Sales Price Analysis")

# Display an interactive dataframe
st.header("Product Sales Data")
st.dataframe(df, width=800)

# Display Descriptive Statistics button
st.markdown('#### Descriptive Statistics')
if st.button('Show Descriptive Statistics'):
    st.dataframe(df.describe().round(2))

## Display Summary Information button
# Create a string buffer to capture content and write the info into the buffer
buffer = StringIO()
df.info(buf=buffer)
summary_info = buffer.getvalue()
st.markdown("#### Summary Info")
if st.button('Show Summary Info'):
    st.text(summary_info)

## Display Null Values button
st.markdown("#### Null Values")
if st.button('Show Null Values'):
    nulls =df.isna().sum()
    st.dataframe(nulls)

""";