## Build a Data Quality Dashboard

**Description**: Create a simple dashboard that displays data quality metrics using a library like `dash` or `streamlit`.

**Steps:**
1. Install Streamlit: pip install streamlit
2. Create a Python script dashboard.py.
3. Run the dashboard: streamlit run dashboard.py

In [1]:
# Write your code from here
!pip3 install streamlit


Defaulting to user installation because normal site-packages is not writeable
Collecting streamlit
  Downloading streamlit-1.45.1-py3-none-any.whl (9.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.9/9.9 MB[0m [31m16.5 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hCollecting packaging<25,>=20
  Downloading packaging-24.2-py3-none-any.whl (65 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m65.5/65.5 kB[0m [31m14.4 MB/s[0m eta [36m0:00:00[0m
Collecting watchdog<7,>=2.1.5
  Downloading watchdog-6.0.0-py3-none-manylinux2014_x86_64.whl (79 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m79.1/79.1 kB[0m [31m4.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting pydeck<1,>=0.8.0b4
  Downloading pydeck-0.9.1-py2.py3-none-any.whl (6.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.9/6.9 MB[0m [31m41.6 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0mm
Collecting pyarrow>=7.0
  Downloading pyarrow

In [2]:
import streamlit as st
import pandas as pd
import os

# Helper Functions
def load_data(file_path):
    try:
        df = pd.read_csv(file_path)
        return df, None
    except Exception as e:
        return None, str(e)

def calculate_dqi(df):
    total_values = df.size
    if total_values == 0:
        return 0.0, 0
    missing_values = df.isnull().sum().sum()
    dqi = (1 - (missing_values / total_values)) * 100
    return dqi, missing_values

# Streamlit UI
st.title("📊 Data Quality Dashboard")

# File input
file_path = st.text_input("Enter CSV File Path:", "/workspaces/AI_DATA_ANALYSIS_/src/Module 8/Data Quality Scoring & Reporting/students.csv")

if file_path:
    df, error = load_data(file_path)
    
    if error:
        st.error(f"Error loading file: {error}")
    else:
        st.success("File loaded successfully.")
        st.write("### Preview of Data")
        st.dataframe(df.head())

        dqi, missing_values = calculate_dqi(df)

        # Display Metrics
        st.metric(label="Data Quality Index (DQI)", value=f"{dqi:.2f}%")
        st.metric(label="Missing Values", value=missing_values)

        # Column-wise missing info
        st.write("### Missing Values by Column")
        missing_per_column = df.isnull().sum()
        st.bar_chart(missing_per_column)


2025-05-25 06:42:28.824 
  command:

    streamlit run /home/vscode/.local/lib/python3.10/site-packages/ipykernel_launcher.py [ARGUMENTS]
2025-05-25 06:42:28.833 Session state does not function when running a script without `streamlit run`


In [None]:
!streamlit run dashboard.py



Collecting usage statistics. To deactivate, set browser.gatherUsageStats to false.
[0m
[0m
[34m[1m  You can now view your Streamlit app in your browser.[0m
[0m
[34m  Local URL: [0m[1mhttp://localhost:8501[0m
[34m  Network URL: [0m[1mhttp://10.0.1.24:8501[0m
[34m  External URL: [0m[1mhttp://20.192.21.52:8501[0m
[0m
