# **Solusi Implementasi Sistem Deteksi Potensi Kebocoran Pipa**

## **Langkah-langkah Implementasi**

**Kumpulkan Data:**
        Data yang diperlukan meliputi catatan kebocoran pipa, kondisi pipa, tekanan air, suhu, usia pipa, dan faktor-faktor lain selama 10 tahun terakhir.

In [41]:
# Import library yang diperlukan
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Load data
data = pd.read_csv("database.csv")
data

Unnamed: 0,Report Number,Supplemental Number,Accident Year,Accident Date/Time,Operator ID,Operator Name,Pipeline/Facility Name,Pipeline Location,Pipeline Type,Liquid Type,...,Other Fatalities,Public Fatalities,All Fatalities,Property Damage Costs,Lost Commodity Costs,Public/Private Property Damage Costs,Emergency Response Costs,Environmental Remediation Costs,Other Costs,All Costs
0,20100016,17305,2010,1/1/2010 7:15 AM,32109,ONEOK NGL PIPELINE LP,KINDER MORGAN JCT,ONSHORE,ABOVEGROUND,"HVL OR OTHER FLAMMABLE OR TOXIC FLUID, GAS",...,,,,110.0,1517.0,0.0,0.0,0.0,0.0,1627
1,20100254,17331,2010,1/4/2010 8:30 AM,15786,PORTLAND PIPELINE CORP,24-INCH MAIN LINE,ONSHORE,ABOVEGROUND,CRUDE OIL,...,,,,4000.0,8.0,0.0,0.0,0.0,0.0,4008
2,20100038,17747,2010,1/5/2010 10:30 AM,20160,"PETROLOGISTICS OLEFINS, LLC",,ONSHORE,ABOVEGROUND,"HVL OR OTHER FLAMMABLE OR TOXIC FLUID, GAS",...,,,,0.0,200.0,0.0,0.0,0.0,0.0,200
3,20100260,18574,2010,1/6/2010 7:30 PM,11169,"ENBRIDGE ENERGY, LIMITED PARTNERSHIP",SUPERIOR TERMINAL,ONSHORE,UNDERGROUND,CRUDE OIL,...,,,,200.0,40.0,0.0,11300.0,0.0,0.0,11540
4,20100030,16276,2010,1/7/2010 1:00 PM,300,"PLAINS PIPELINE, L.P.",RED RIVER EAST,ONSHORE,UNDERGROUND,CRUDE OIL,...,,,,20000.0,150.0,0.0,7500.0,2000.0,0.0,29650
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2790,20170015,22020,2016,12/27/2016 8:40 AM,32334,TC OIL PIPELINE OPERATIONS INC,KEYSTONE,ONSHORE,ABOVEGROUND,CRUDE OIL,...,,,,0.0,15.0,0.0,0.0,61000.0,0.0,61015
2791,20170028,22046,2016,12/28/2016 4:20 PM,4906,EXXONMOBIL PIPELINE CO,BRRF - CHOCTAW ETHANE/PROPANE MIX SYSTEM,ONSHORE,UNDERGROUND,"HVL OR OTHER FLAMMABLE OR TOXIC FLUID, GAS",...,,,,0.0,5400.0,0.0,0.0,0.0,100000.0,105400
2792,20170027,22045,2016,12/29/2016 6:40 AM,39145,ENBRIDGE STORAGE (CUSHING) L.L.C.,CUSHING CENTRAL TERMINAL,ONSHORE,TANK,CRUDE OIL,...,,,,7000.0,50.0,0.0,5000.0,3000.0,0.0,15050
2793,20170024,22032,2017,1/3/2017 10:00 AM,32147,MARATHON PIPE LINE LLC,MIDLAND STATION,ONSHORE,UNDERGROUND,"REFINED AND/OR PETROLEUM PRODUCT (NON-HVL), LI...",...,,,,11852.0,11.0,0.0,29565.0,0.0,0.0,41428


**Analisis Data:** Lakukan analisis terhadap data yang terkumpul. Identifikasi pola atau tren dari kebocoran pipa yang terjadi, seperti apakah ada musim tertentu yang lebih rentan terhadap kebocoran, atau apakah ada faktor-faktor tertentu seperti usia pipa atau jenis material yang mempengaruhi frekuensi kebocoran.

In [42]:
# Prasiapkan data (penanganan data yang hilang)
data.fillna(method='ffill', inplace=True)

if 'Accident Date/Time' in data.columns:
    data['Accident Date/Time'] = pd.to_datetime(data['Accident Date/Time'])
else:
    print("Kolom 'Accident Date/Time' tidak ditemukan dalam dataset.")


In [43]:
# Membuat kolom target berdasarkan 'Unintentional Release (Barrels)'
data['Leak'] = data['Unintentional Release (Barrels)'] > 0

# Drop kolom yang tidak relevan dengan label data
kolom_drop = ['Report Number', 'Supplemental Number', 'Operator ID', 'Operator Name',
              'Pipeline/Facility Name', 'Accident City', 'Accident County', 'Accident Date/Time',
              'Pipeline Location', 'Liquid Name']
data = data.drop(columns=[col for col in kolom_drop if col in data.columns])



In [44]:
# Cek kolom mana yang masih memiliki nilai string
for col in data.columns:
    if data[col].dtype == 'object':
        unique_values = data[col].unique()
        print(f"Kolom {col} memiliki nilai: {unique_values}")

# Hapus baris dengan nilai string yang tidak valid ('NO' dan 'YES')
data.replace(["NO", "YES"], np.nan, inplace=True)
data.dropna(inplace=True)

Kolom Pipeline Type memiliki nilai: ['ABOVEGROUND' 'UNDERGROUND' 'TANK' 'TRANSITION AREA']
Kolom Liquid Type memiliki nilai: ['HVL OR OTHER FLAMMABLE OR TOXIC FLUID, GAS' 'CRUDE OIL'
 'REFINED AND/OR PETROLEUM PRODUCT (NON-HVL), LIQUID'
 'CO2 (CARBON DIOXIDE)'
 'BIOFUEL / ALTERNATIVE FUEL(INCLUDING ETHANOL BLENDS)']
Kolom Liquid Subtype memiliki nilai: ['LPG (LIQUEFIED PETROLEUM GAS) / NGL (NATURAL GAS LIQUID)' 'OTHER HVL'
 'GASOLINE (NON-ETHANOL)' 'DIESEL, FUEL OIL, KEROSENE, JET FUEL'
 'ANHYDROUS AMMONIA' 'OTHER'
 'MIXTURE OF REFINED PRODUCTS (TRANSMIX OR OTHER MIXTURE)' 'BIODIESEL']
Kolom Accident State memiliki nilai: ['KS' 'ME' 'LA' 'WI' 'TX' 'ND' 'OK' 'IL' 'MN' 'NY' 'CA' 'IN' 'CO' 'MS'
 'NJ' 'WA' 'IA' 'NC' 'MO' 'NM' 'PA' 'FL' 'VA' 'WY' 'KY' 'TN' 'MI' 'ID'
 'GA' 'NV' 'OH' 'SD' 'AK' 'SC' 'UT' 'NE' 'MT' 'AL' 'AR' 'MD' 'PR' 'CT'
 'OR' 'WV' 'HI' 'MA']
Kolom Cause Category memiliki nilai: ['INCORRECT OPERATION' 'MATERIAL/WELD/EQUIP FAILURE'
 'NATURAL FORCE DAMAGE' 'EXCAVATION DAMAGE' '

In [45]:
# Pastikan dataset tidak kosong setelah pembersihan
if data.empty:
    print("Dataset kosong setelah menghapus nilai string 'NO' dan 'YES'. Tidak ada data untuk diproses.")
else:
    # Visualisasi distribusi kebocoran vs tidak bocor
    sns.countplot(data=data, x='Leak')
    plt.title('Distribusi Kebocoran vs Tidak Bocor')
    plt.xlabel('Kebocoran')
    plt.ylabel('Jumlah')
    plt.xticks([0, 1], ['Tidak Bocor', 'Bocor'])
    plt.show()

    # Visualisasi penyebab kebocoran
    if 'Cause Category' in data.columns:
        plt.figure(figsize=(10, 6))
        sns.countplot(data=data, y='Cause Category', order=data['Cause Category'].value_counts().index)
        plt.title('Distribusi Penyebab Kebocoran')
        plt.xlabel('Jumlah')
        plt.ylabel('Kategori Penyebab')
        plt.show()

    # Visualisasi jumlah kebocoran per tahun
    if 'Accident Year' in data.columns:
        plt.figure(figsize=(10, 6))
        sns.countplot(data=data, x='Accident Year')
        plt.title('Jumlah Kebocoran per Tahun')
        plt.xlabel('Tahun')
        plt.ylabel('Jumlah Kebocoran')
        plt.xticks(rotation=45)
        plt.show()



Dataset kosong setelah menghapus nilai string 'NO' dan 'YES'. Tidak ada data untuk diproses.


**Pengumpulan Data Tambahan:** Selain data kebocoran pipa, perlu juga mengumpulkan data tambahan seperti informasi cuaca (suhu, curah hujan), informasi tentang kondisi tanah di sekitar pipa, dan informasi operasional pabrik air mineral (misalnya tekanan air, aliran air).

In [46]:
# Encode kolom kategorikal
categorical_columns = ['Pipeline Type', 'Liquid Type', 'Liquid Subtype', 'Accident State',
                           'Cause Category', 'Cause Subcategory']
categorical_columns = [col for col in categorical_columns if col in data.columns]  # Pastikan kolom ada dalam data
data = pd.get_dummies(data, columns=categorical_columns)

# Memastikan semua kolom sudah numerik
for col in data.columns:
    if data[col].dtype == 'object':
        print(f"Kolom {col} masih memiliki nilai string!")



Kolom Shutdown Date/Time masih memiliki nilai string!
Kolom Restart Date/Time masih memiliki nilai string!


**Pemodelan Prediksi:** Gunakan teknik pemodelan statistik atau machine learning untuk membangun model prediksi. Model ini akan menggunakan data historis untuk memprediksi kemungkinan terjadinya kebocoran pipa di masa mendatang berdasarkan faktor-faktor yang telah diidentifikasi.

In [47]:
    # Fitur dan label
    X = data.drop('Leak', axis=1)
    y = data['Leak']

    # Bagi data menjadi data pelatihan dan data pengujian
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    # Buat dan latih model
    model = RandomForestClassifier(n_estimators=100, random_state=42)
    model.fit(X_train, y_train)

    # Prediksi dan evaluasi
    y_pred = model.predict(X_test)
    print(confusion_matrix(y_test, y_pred))
    print(classification_report(y_test, y_pred))

    feat_importances = pd.Series(model.feature_importances_, index=X.columns)
    feat_importances.nlargest(10).plot(kind='barh')
    plt.title('10 Fitur Terpenting')
    plt.show()

ValueError: With n_samples=0, test_size=0.2 and train_size=None, the resulting train set will be empty. Adjust any of the aforementioned parameters.