**Mount Google Drive and Import Libraries**



*   Mount Google Drive: Connects your Google Drive to Google Colab to access    files stored there.
*   Import Libraries: Loads various libraries used for data analysis, machine learning modeling, visualization, and geospatial analysis, such as pandas, tensorflow, matplotlib, plotly, and others.



In [None]:
from google.colab import drive
drive.mount('/content/drive')

import tensorflow as tf
print(tf.__version__)

!pip install tensorflow==2.15
!pip install plotly_express
!pip install openpyxl

import pandas as pd
import numpy as np
import geopandas as gpd
import matplotlib.pyplot as plt
import seaborn as sns

import plotly
import plotly.offline as py
import plotly.graph_objs as go
import plotly_express as px

from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score, accuracy_score
from sklearn.metrics.pairwise import haversine_distances
from math import radians
from tensorflow import keras
from tensorflow.keras import layers, models
from sklearn.preprocessing import StandardScaler, LabelEncoder


**Load Dataset and Inspect Data**


*   Load Data: Reads the dataset from the Excel file stored in your Google Drive.
*   Inspect Data: Displays the first few rows of the dataset to check its structure.



In [None]:
file_path = r'/content/drive/MyDrive/capstone/dataset.xlsx'
df = pd.read_excel(file_path)

df.head()


**Detect and Remove Outliers**



Outlier Detection: Uses the Interquartile Range (IQR) method to detect outliers in the latitude and longitude columns.




In [None]:
# Detect outliers in latitude and longitude columns using IQR (Interquartile Range) method
Q1 = df['latitude'].quantile(0.25)
Q3 = df['latitude'].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
outliers = df[(df['latitude'] < lower_bound) | (df['latitude'] > upper_bound)]

Q1 = df['longitude'].quantile(0.25)
Q3 = df['longitude'].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
outliers = df[(df['longitude'] < lower_bound) | (df['longitude'] > upper_bound)]


**Visualize Latitude and Longitude**

Data Visualization: Creates a scatter plot to visualize the distribution of data based on latitude and longitude.



In [None]:
plt.scatter(df['longitude'], df['latitude'])
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.title('Scatter Plot of Latitude and Longitude')
plt.show()


**Visualize Historical Category Distribution**

Category Visualization: Creates a bar chart to visualize the distribution of historical categories in the dataset.

In [None]:
kategori_counts = df['kategori sejarah'].astype(str).str.lower().value_counts()
kategori_counts = kategori_counts.sort_values(ascending=False)

plt.figure(figsize=(12, 6))
plt.bar(kategori_counts.index, kategori_counts.values)
plt.xlabel('Kategori Sejarah')
plt.ylabel('Jumlah')
plt.title('Grafik Data Berdasarkan Kategori Sejarah')
plt.xticks(rotation=0, ha='center')
plt.tight_layout()
plt.show()


**K-Means Clustering Based on Latitude and Longitude**

K-Means Clustering: Applies K-Means clustering on the latitude and longitude data to group locations into 5 clusters, and evaluates the clustering accuracy by comparing the predicted clusters to the true labels.

In [None]:
X = df[['latitude', 'longitude']].values

label_encoder = LabelEncoder()
df['lokasi_encoded'] = label_encoder.fit_transform(df['Lokasi'])
y_true = df['lokasi_encoded'].values

cluster_mapping = {
    0: 'Jakarta Pusat',
    1: 'Jakarta Timur',
    2: 'Jakarta Utara',
    3: 'Jakarta Barat',
    4: 'Jakarta Selatan'
}

kmeans = KMeans(n_clusters=5, random_state=0, n_init=10)
df['cluster'] = kmeans.fit_predict(X)
df['predicted_lokasi'] = df['cluster'].map(cluster_mapping)

accuracy = accuracy_score(y_true, df['cluster'])
print(f"Clustering Accuracy: {accuracy}")


**Visualize Clusters on a Map**

Map Visualization: Uses Plotly to create an interactive map, visualizing the clustering results based on latitude and longitude.

In [None]:
fig = px.scatter_mapbox(df, lat="latitude", lon="longitude", color="predicted_lokasi",
                        zoom=10, height=600, width=800,
                        mapbox_style="carto-positron",
                        title="Jakarta Regions based on Latitude and Longitude")
fig.update_layout(mapbox_zoom=10, mapbox_center={"lat": df['latitude'].mean(), "lon": df['longitude'].mean()})
fig.show()


**Find Nearest Data Points Using Haversine Distance**

Find Nearest Data Points: Defines a function to find the nearest data points to a given test point using the Haversine formula to calculate distances based on latitude and longitude.

In [None]:
def find_nearest_data_points(test_latitude, test_longitude, df, n_neighbors=3):
    test_point = np.array([[radians(test_latitude), radians(test_longitude)]])
    df['lat_rad'] = np.radians(df['latitude'])
    df['lon_rad'] = np.radians(df['longitude'])
    distances = haversine_distances(test_point, df[['lat_rad', 'lon_rad']].values) * 6371  # Earth's radius in km
    df['distance'] = distances[0]
    nearest_data = df.sort_values(by='distance').head(n_neighbors)
    return nearest_data


**Prepare Data for TensorFlow Model**

Prepare Data for Training: Prepares the feature variables (latitude, longitude) and the target variable (cluster). The features are normalized using StandardScaler.

In [None]:
X = df[['latitude', 'longitude']].values
y = df['cluster'].values
scaler = StandardScaler()
X = scaler.fit_transform(X)

**Build and Train a Neural Network Model**

Build Neural Network Model: Constructs a neural network using Keras. The network consists of multiple dense layers and dropout layers to prevent overfitting. It uses the softmax activation function for multi-class classification and sparse_categorical_crossentropy for the loss function.

In [None]:
model = models.Sequential([
    layers.InputLayer(input_shape=(2,)),
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.4),
    layers.Dense(64, activation='relu'),
    layers.Dropout(0.4),
    layers.Dense(32, activation='relu'),
    layers.Dense(5, activation='softmax')
])

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

history = model.fit(X, y, epochs=40, batch_size=8, validation_split=0.2)


**Make Predictions with the Mode**

Prediction: Uses the trained model to predict the cluster of a new test point.

In [None]:
test_point_for_model = scaler.transform(np.array([[test_point[0], test_point[1]]]))
predicted_cluster = model.predict(test_point_for_model)
predicted_cluster


**Random Forest Model**

Random Forest: Implements a Random Forest classifier to predict the region for a new test point based on latitude and longitude.

In [None]:
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier()
model.fit(X, y)
test_point = (-6.146975375216407, 106.75060382943789)
predicted_region = model.predict([test_point])
print("Predicted Region:", predicted_region)


**Evaluate the Model**

Model Evaluation: Evaluates the model on the training data and prints the loss and accuracy metrics.

In [None]:
loss, accuracy = model.evaluate(X, y, verbose=0)
print(f"Test Loss: {loss:.4f}")
print(f"Test Accuracy: {accuracy:.4f}")


**Save the Model and Convert to TensorFlow.js Format**

Save and Convert Model: Saves the trained model and converts it into TensorFlow.js format to be used in web applications.

In [None]:
saved_model_path = "./my_model.h5"
model.save(saved_model_path)

!pip install tensorflow_decision_forests==1.8.1

!tensorflowjs_converter \
    --input_format=keras \
    {saved_model_path} \
    "./"

!zip model.zip *.bin model.json
