## **Nama : Aqilah Jihan Nabila**
## **Divisi : Junior Data Scientist**

## **Tugas Versi 2**
### **Link Colab :** https://colab.research.google.com/drive/1nvKbg7U5MQZeuNfaGMcTzhFejRFCUcem?usp=sharing

## **Notes : Jika grafik tidak muncul, bisa cek melalui link colab diatas**

# **Proses**

In [31]:
# !pip install -q plotly pandas numpy ipywidgets

In [32]:
import pandas as pd
import numpy as np
import plotly.graph_objects as go
from ipywidgets import widgets, HBox, VBox, Output
from IPython.display import display

df = pd.read_csv('/content/Melbourne_housing_Clean.csv')
df.head()

Unnamed: 0,Suburb,Address,Rooms,Type,Price,Method,SellerG,Date,Distance,Postcode,...,Bathroom,Car,Landsize,BuildingArea,YearBuilt,CouncilArea,Lattitude,Longtitude,Regionname,Propertycount
0,Abbotsford,68 Studley St,2.0,h,870000.0,SS,Jellis,2016-09-03,2.5,3067.0,...,1.0,1.0,126.0,136.0,1970.0,Yarra City Council,-37.8014,144.9958,Northern Metropolitan,4019.0
1,Abbotsford,85 Turner St,2.0,h,1480000.0,S,Biggin,2016-12-03,2.5,3067.0,...,1.0,1.0,202.0,136.0,1970.0,Yarra City Council,-37.7996,144.9984,Northern Metropolitan,4019.0
2,Abbotsford,25 Bloomburg St,2.0,h,1035000.0,S,Biggin,2016-02-04,2.5,3067.0,...,1.0,0.0,156.0,136.0,1970.0,Yarra City Council,-37.8079,144.9934,Northern Metropolitan,4019.0
3,Abbotsford,18/659 Victoria St,3.0,u,870000.0,VB,Rounds,2016-02-04,2.5,3067.0,...,2.0,1.0,0.0,136.0,1970.0,Yarra City Council,-37.8114,145.0116,Northern Metropolitan,4019.0
4,Abbotsford,5 Charles St,3.0,h,1465000.0,SP,Biggin,2017-03-04,2.5,3067.0,...,2.0,0.0,134.0,136.0,1970.0,Yarra City Council,-37.8093,144.9944,Northern Metropolitan,4019.0


In [33]:
# Preprocessing

# Kolom penting
kolom = ['Price','Distance','Landsize','Rooms','Regionname','Suburb','Address','Date']
df = df[kolom].dropna()

# Ubah tipe numerik
for c in ['Price','Distance','Landsize','Rooms']:
    df[c] = pd.to_numeric(df[c], errors='coerce')

# Buat variabel bantu
df['Landsize_vis'] = np.sqrt(df['Landsize'].replace(0, np.nan).fillna(1))
df['Rooms_cat'] = df['Rooms'].apply(lambda r: str(int(r)) if r < 4 else '4+')
df.head()


Unnamed: 0,Price,Distance,Landsize,Rooms,Regionname,Suburb,Address,Date,Landsize_vis,Rooms_cat
0,870000.0,2.5,126.0,2.0,Northern Metropolitan,Abbotsford,68 Studley St,2016-09-03,11.224972,2
1,1480000.0,2.5,202.0,2.0,Northern Metropolitan,Abbotsford,85 Turner St,2016-12-03,14.21267,2
2,1035000.0,2.5,156.0,2.0,Northern Metropolitan,Abbotsford,25 Bloomburg St,2016-02-04,12.489996,2
3,870000.0,2.5,0.0,3.0,Northern Metropolitan,Abbotsford,18/659 Victoria St,2016-02-04,1.0,3
4,1465000.0,2.5,134.0,3.0,Northern Metropolitan,Abbotsford,5 Charles St,2017-03-04,11.575837,3


In [34]:
# --- 1. Install & aktifkan widget ---
!pip install -q plotly pandas numpy ipywidgets==8.1.1
from google.colab import output
output.enable_custom_widget_manager()

# --- 2. Import library ---
import pandas as pd
import numpy as np
import plotly.graph_objects as go
from ipywidgets import widgets, HBox, VBox, Output
from IPython.display import display
from google.colab import files
import io

# --- 3. Upload dataset ---
print("Silakan upload file dataset modul 4 (misalnya Melbourne_housing_Clean.csv)")
uploaded = files.upload()
fname = list(uploaded.keys())[0]
df = pd.read_csv(io.BytesIO(uploaded[fname]))
print("Dataset berhasil dimuat:", df.shape)

# --- 4. Preprocessing & Sampling agar visual tidak padat ---
kolom = ['Price','Distance','Landsize','Rooms','Regionname','Suburb','Address','Date']
df = df[kolom].dropna()
for c in ['Price','Distance','Landsize','Rooms']:
    df[c] = pd.to_numeric(df[c], errors='coerce')
df = df.dropna(subset=['Price','Distance','Landsize','Rooms'])

# Ambil sample agar visual ringan (tanpa ubah distribusi)
df = df.sample(3000, random_state=42)

# Buat variabel bantu
df['Landsize_vis'] = np.sqrt(df['Landsize'].replace(0, np.nan).fillna(1))
df['Rooms_cat'] = df['Rooms'].apply(lambda r: str(int(r)) if r < 4 else '4+')

# --- 5. Widget filter interaktif ---
regions = sorted(df['Regionname'].unique())

region_select = widgets.SelectMultiple(
    options=regions,
    value=tuple(regions[:3]),  # tampilkan subset awal
    description='Region',
    rows=5,
    layout=widgets.Layout(width='260px')
)

price_min, price_max = int(df['Price'].min()), int(df['Price'].max())
price_slider = widgets.IntRangeSlider(
    value=[price_min, int(price_max*0.6)],
    min=price_min,
    max=price_max,
    step=10000,
    description='Price range',
    layout=widgets.Layout(width='600px')
)

fig_out = Output()
stats_out = Output()

# --- 6. Fungsi membuat figure ---
def make_figure(df_f):
    # warna lembut (ColorBrewer-style)
    colors = ['#8dd3c7','#ffffb3','#bebada','#fb8072','#80b1d3',
              '#fdb462','#b3de69','#fccde5','#d9d9d9','#bc80bd']
    rooms_sorted = sorted(df_f['Rooms_cat'].unique(), key=lambda x: (int(x[0]) if x!='4+' else 4))
    traces = []

    for i, r in enumerate(rooms_sorted):
        sub = df_f[df_f['Rooms_cat']==r]
        traces.append(go.Scatter(
            x=sub['Distance'],
            y=sub['Price'],
            mode='markers',
            name=f'Rooms {r}',
            marker=dict(
                size=np.clip(sub['Landsize_vis']/5, 4, 25),
                color=colors[i % len(colors)],
                opacity=0.65,
                line=dict(width=0.5, color='DimGray')
            ),
            customdata=sub[['Suburb','Address','Rooms','Landsize','Distance','Price','Date','Regionname']].values,
            hovertemplate=(
                "<b>%{customdata[0]}</b><br>%{customdata[1]}<br>"
                "Rooms: %{customdata[2]} • Landsize: %{customdata[3]} m²<br>"
                "Distance: %{x} km • Price: $%{y}<br>"
                "Region: %{customdata[7]}<extra></extra>"
            ),
            selected=dict(marker=dict(opacity=1, color='black', size=14)),
            unselected=dict(marker=dict(opacity=0.15))
        ))

    layout = go.Layout(
        title=dict(
            text="<b>Pengaruh Distance, Landsize, dan Rooms terhadap Price</b>",
            x=0.5, xanchor='center', font=dict(size=20)
        ),
        xaxis=dict(title='Distance dari CBD (km)', gridcolor='LightGrey'),
        yaxis=dict(title='Price (AUD)', tickprefix='$', gridcolor='LightGrey'),
        plot_bgcolor='rgba(245,245,245,0.95)',
        paper_bgcolor='rgba(250,250,250,1)',
        dragmode='lasso',
        height=650,
        legend=dict(title='Rooms', orientation='h', y=-0.2)
    )
    return go.FigureWidget(data=traces, layout=layout)

# --- 7. Fungsi filter & update ---
def filter_df():
    reg = list(region_select.value)
    pmin, pmax = price_slider.value
    return df[(df['Regionname'].isin(reg)) & (df['Price'].between(pmin, pmax))]

def update(_=None):
    dff = filter_df()
    with fig_out:
        fig_out.clear_output(wait=True)
        fig = make_figure(dff)
        display(fig)

        def on_select(trace, points, selector):
            idx = points.point_inds
            if not idx:
                with stats_out:
                    stats_out.clear_output()
                    print("Tidak ada titik yang dipilih.")
                return
            rows = trace.customdata[idx]
            s = pd.DataFrame(rows, columns=['Suburb','Address','Rooms','Landsize','Distance','Price','Date','Regionname'])
            s[['Price','Distance','Landsize']] = s[['Price','Distance','Landsize']].apply(pd.to_numeric, errors='coerce')

            with stats_out:
                stats_out.clear_output()
                print(f"Jumlah titik terpilih: {len(s)}")
                print(f"Rata-rata Price: ${s['Price'].mean():,.0f}")
                print(f"Median Landsize: {s['Landsize'].median():,.0f} m²")
                print(f"Rata-rata Distance: {s['Distance'].mean():.2f} km")
                print("Distribusi Rooms:")
                display(s['Rooms'].value_counts().sort_index())

        for t in fig.data:
            t.on_selection(on_select)

region_select.observe(update, names='value')
price_slider.observe(update, names='value')

# --- 8. Render interface ---
update()
controls = HBox([region_select, price_slider])
box = VBox([controls, fig_out, stats_out])
display(box)


Silakan upload file dataset modul 4 (misalnya Melbourne_housing_Clean.csv)


Saving Melbourne_housing_Clean.csv to Melbourne_housing_Clean (7).csv
Dataset berhasil dimuat: (34856, 21)


VBox(children=(HBox(children=(SelectMultiple(description='Region', index=(0, 1, 2), layout=Layout(width='260px…

# Laporan Visualisasi Interaktif Modul 5
### *Implementasi Eksplorasi Multivariat dengan Plotly Graph Objects*

> **Insight kompleks:** Hubungan antara **jarak ke pusat kota (Distance)**, **luas tanah (Landsize)**, dan **jumlah kamar (Rooms)** terhadap **harga properti (Price)** di wilayah metropolitan Melbourne.

---

## 1. Tujuan Visualisasi
Visualisasi ini dirancang untuk **mendukung eksplorasi terbuka** terhadap data properti Melbourne agar pengguna dapat:
- Mengamati **pola hubungan non-linear** antara *Distance* dan *Price*,  
- Melihat pengaruh *Rooms* dan *Landsize* terhadap harga dalam konteks geografis (*Regionname*),  
- Menyusun **hipotesis baru**, seperti:  
  *“Apakah rumah besar dengan 4+ kamar tetap mahal meski jauh dari pusat kota?”*

---

## 2. Tujuan Kognitif Tiap Interaksi

| Jenis Interaksi | Tujuan Kognitif | Variabel Terkait |
|------------------|------------------|------------------|
| **Hover** | Mendukung *data-driven attention* — pengguna fokus ke detail titik tertentu tanpa kehilangan konteks global. *(Ware, 2013)* | Menampilkan detail dari `Suburb`, `Rooms`, `Landsize`, `Price`, `Distance`, dan `Regionname`. |
| **Filter (Region & Price range)** | Memungkinkan *comparison-by-subset* *(Shneiderman, 1996)* — membandingkan harga antar wilayah dan kisaran harga tertentu. | Variabel `Regionname` dan `Price`. |
| **Selection (Lasso)** | Mendukung *analytical reasoning* — pengguna dapat memilih kelompok titik untuk menghitung statistik rata-rata dan distribusi. | Variabel `Rooms`, `Landsize`, `Distance`, dan `Price`. |

Interaksi-interaksi ini mengikuti **hierarki eksplorasi data Shneiderman’s Mantra (1996)**:  
> “Overview first, zoom and filter, then details-on-demand.”

---

## 3. Perbandingan dengan Alternatif Visual Encoding

| Visual Encoding | Kelebihan | Kekurangan |
|------------------|-----------|-------------|
| **Bubble Scatter Plot (dipilih)** | Menampilkan empat variabel sekaligus (X, Y, warna, ukuran) dan cocok untuk data kontinu. | Dapat padat bila jumlah titik besar (diatasi dengan sampling dan transparansi). |
| **Heatmap** | Jelas untuk distribusi kepadatan. | Tidak bisa menampilkan variabel numerik tambahan seperti *Landsize* dan *Rooms*. |
| **3D Scatter Plot** | Menarik dan dinamis. | Persepsi kedalaman sering bias, sulit ditafsirkan (Ware, 2013). |

Maka, **Bubble Scatter Plot** tetap menjadi pilihan paling efektif karena menyeimbangkan *complexity* dan *clarity*.

---

## 4. Justifikasi Ilmiah Desain Visual

| Komponen Desain | Keputusan | Dasar Ilmiah |
|------------------|------------|---------------|
| **Jenis visual: Bubble Scatter Plot** | Menampilkan hubungan multivariat antara `Distance`, `Price`, `Landsize`, dan `Rooms`. | Posisi X-Y adalah encoding paling akurat untuk nilai kontinu. *(Cleveland & McGill, 1984)* |
| **Skema warna pastel (ColorBrewer-style)** | Warna lembut membantu pembedaan kategori tanpa overstimulation. | Berdasarkan *ColorBrewer Guidelines* (Brewer, 1999). |
| **Ukuran bubble ∝ √Landsize** | Menghindari bias persepsi area terhadap luas tanah. | Berdasarkan *Stevens’ Power Law* (1957) bahwa persepsi area bersifat non-linear. |
| **Opacity 0.65 dan sampling 3000 titik** | Mengurangi overplotting dan meningkatkan keterbacaan. | Disarankan oleh *Heer & Bostock (2010)* untuk data besar. |
| **Layout dengan legend horizontal & background abu lembut** | Menjaga fokus pengguna pada pola utama. | Prinsip *Gestalt Proximity* (Ware, 2013). |
| **Interaksi Hover, Filter, Selection** | Meningkatkan eksplorasi berbasis hipotesis. | Berdasarkan *Information Visualization Pipeline* (Card et al., 1999). |

---

## 5. Kategori Ilmiah Interaksi

| Interaksi | Kategori (Yi et al., 2007) |
|------------|-----------------------------|
| Hover | **Explore** – mengeksplorasi data dengan menyorot detail tertentu. |
| Filter | **Select** – memilih subset data untuk fokus analisis. |
| Selection (Lasso) | **Encode / Reconfigure** – menyesuaikan representasi berdasarkan subset yang dipilih. |

📚 *Yi, J. S., Kang, Y. A., Stasko, J., & Jacko, J. A. (2007). Toward a deeper understanding of the role of interaction in information visualization. IEEE TVCG, 13(6), 1224–1231.*

---

## 6. Kesimpulan
Visualisasi interaktif ini:
- Menunjukkan bahwa **harga rumah (Price)** cenderung **menurun seiring bertambahnya Distance**,  
- Namun **jumlah kamar (Rooms)** dan **luas tanah (Landsize)** memiliki efek peningkatan harga,  
- Memberikan ruang eksplorasi melalui *hover*, *filter*, dan *selection*,  
- Dirancang berdasarkan prinsip ilmiah untuk mendukung persepsi dan analisis yang efektif.

Visual ini bukan hanya menyampaikan insight, tetapi juga **memfasilitasi pengguna untuk merumuskan hipotesis baru** tentang dinamika harga properti di Melbourne.

---

## 7. Daftar Referensi Akademik
- Brewer, C. A. (1999). *Color Use Guidelines for Mapping and Visualization.*  
- Card, S., Mackinlay, J., & Shneiderman, B. (1999). *Readings in Information Visualization: Using Vision to Think.*  
- Cleveland, W. S., & McGill, R. (1984). *Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods.*  
- Heer, J., & Bostock, M. (2010). *Crowdsourcing Graphical Perception: Using Mechanical Turk to Assess Visualization Design.*  
- Shneiderman, B. (1996). *The Eyes Have It: A Task by Data Type Taxonomy for Information Visualizations.*  
- Stevens, S. S. (1957). *On the Psychophysical Law.* Psychological Review, 64(3), 153–181.  
- Ware, C. (2013). *Information Visualization: Perception for Design.*  
- Yi, J. S., Kang, Y. A., Stasko, J., & Jacko, J. A. (2007). *Toward a deeper understanding of the role of interaction in information visualization.*

---
