## **1. Perkenalan**
---
**MILESTONE 3**  

**Nama**       : Devin Lee   

**Batch**       : HCK-009  

**Dataset**    : Landslide Risk Assessment Factors  

**Background** : Dataset ini dibuat oleh Mohammad Rahdan, dimana pada dataset ini  berisikan suatu informasi dan faktor-faktor yang mempengaruhi terjadinya tanah longsor di Iran. Dataset ini berisikan lebih dari 4000 zona bahaya longsor, dimana setiap dari zona-zona ini memiliki faktor-faktor tertentu yang dapat memicu terjadinya longsor. Faktor-faktor ini terdiri dari slope/kemiringan, climate, tectonic activity dan sebagainya. Bahkan apakah ada faktor yang diakibatkan oleh manusia.

**Objective**   : Tujuan dilakukan analisa ini adalah untuk mengetahui faktor-faktor apa saja yang dapat memicu terjadinya bencana longsor ini, dan apa saran-saran yang akan diberikan untuk menghindari ataupun meminimalisir terjadinya longsor atau setidaknya tidak memakan korban jiwa.

# **2. Import Libraries**

In [19]:
import sqlalchemy as db
from sqlalchemy import create_engine
import pandas as pd
import numpy as np
import re
import warnings
from great_expectations.data_context import FileDataContext
import plotly.express as px

# **3. Data Loading**
<pre>
- ID
</pre>
<pre>
- LONG                  : The longitude of the landslide point
</pre>
<pre>
- LAT                   : The latitude of the landslide point
</pre>
<pre>
- SUB_Basin             : The name of the watershed of the landslide point
</pre>
<pre>
- Elevation             : The elevation of the landslide point from sea level (m)
</pre>
<pre>
- AAP(mm)               : The average annual precipitation at the landslide point
</pre>
<pre>
- RiverDIST(m)          : The distance of the landslide point from the river
</pre>
<pre>
- FaultDIST(m)          : The distance of the landslide point from the fault
</pre>
<pre>
- Landuse_Type          : The landuse type at the landslide point
</pre>
<pre>
- Slope(Percent)        : The slope at the landslide point (Values range from 0 to 100%)
</pre>
<pre>
- Slope(Degrees)        : The slope at the landslide point (Values range from 0 to 90)
</pre>
<pre>
- GEO_UNIT              : The geology unit of the landslide point
</pre>
<pre>
- DES_GEOUNI            : The description of the geology unit of the landslide point
</pre>
<pre>
- Climate_Type          : The climate type of the landslide point
</pre>
<pre>
- DES_ClimateType       : The description of the climate type of the landslide point
</pre>

## *Pre-Loading*

In [20]:
df = pd.read_csv('Landslide_Factors_IRAN.csv')
df

Unnamed: 0,ID,LONG,LAT,SUB_Basin,Elevation,AAP(mm),RiverDIST(m),FaultDIST(m),Landuse_Type,Slop(Percent),Slop(Degrees),GEO_UNIT,DES_GEOUNI,Climate_Type,DES_ClimateType
0,1,52.326,27.763,Mehran,617.0,137,1448.705292,40639.578900,poorrange,42.240669,22.899523,EOas-ja,"Undivided Asmari and Jahrum Formation , regard...",A-M-VW,"Warm and humid, with a humid period longer tha..."
1,2,52.333,27.772,Mehran,944.0,137,344.299484,40135.029130,mix(woodland_x),68.219116,34.301464,KEpd-gu,Keewatin Epedotic quartz diorite,A-M-VW,"Warm and humid, with a humid period longer tha..."
2,3,52.326,27.763,Mehran,617.0,137,1448.705292,40639.578900,poorrange,42.240669,22.899523,EOas-ja,"Undivided Asmari and Jahrum Formation , regard...",A-M-VW,"Warm and humid, with a humid period longer tha..."
3,4,52.333,27.694,Mehran,55.0,137,1889.828623,42189.544420,rock,12.141766,6.922833,Mlmmi,Low weathering grey marls alternating with ba...,A-M-VW,"Warm and humid, with a humid period longer tha..."
4,5,52.324,27.682,Mehran,20.0,137,874.201691,43010.084000,poorrange,2.216230,1.269598,MuPlaj,"Brown to grey , calcareous , feature - formin...",A-M-VW,"Warm and humid, with a humid period longer tha..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4290,4291,48.055,38.453,Karasu,1362.0,542,438.505912,13834.693060,modrange,27.209608,15.221452,Eav,Early Archean volcanic rocks,SA-K-W,"Hot and dry, with a dry period longer than the..."
4291,4292,48.809,38.439,Vilascay / Lankaran / Tangarud,20.0,779,681.559483,10.379438,mix(agri_X),4.432460,2.537951,Qtr,Travertine,PH-C-W,"Warm and dry, with a dry period longer than th..."
4292,4293,48.555,38.441,Karasu,1572.0,779,268.529472,5376.553979,mix(woodland_x),20.652390,11.668892,Eav,Early Archean volcanic rocks,SH-K-M,"Hot and dry, with a dry period longer than the..."
4293,4294,48.562,38.442,Karasu,1491.0,779,130.758834,5342.636975,modforest,15.542682,8.834613,Eav,Early Archean volcanic rocks,SH-K-M,"Hot and dry, with a dry period longer than the..."


## *Loading by SQL Alchemy*

In [21]:
# connecting postgresql database to python
engine = create_engine('postgresql+psycopg2://postgres:postgres@localhost:5432/milestone')
df = pd.read_sql_table('table_m3', engine)
df

Unnamed: 0,ID,LONG,LAT,SUB_Basin,Elevation,AAP_(mm),RiverDIST_(m),FaultDIST_(m),Landuse_Type,Slop(Percent),Slop(Degrees),GEO_UNIT,DES_GEOUNI,Climate_Type,DES_ClimateType
0,1.0,52.326,27.763,Mehran,617.0,137.0,1448.705292,40639.578900,poorrange,42.240669,22.899523,EOas-ja,"Undivided Asmari and Jahrum Formation , regard...",A-M-VW,"Warm and humid, with a humid period longer tha..."
1,2.0,52.333,27.772,Mehran,944.0,137.0,344.299484,40135.029130,mix(woodland_x),68.219116,34.301464,KEpd-gu,Keewatin Epedotic quartz diorite,A-M-VW,"Warm and humid, with a humid period longer tha..."
2,3.0,52.326,27.763,Mehran,617.0,137.0,1448.705292,40639.578900,poorrange,42.240669,22.899523,EOas-ja,"Undivided Asmari and Jahrum Formation , regard...",A-M-VW,"Warm and humid, with a humid period longer tha..."
3,4.0,52.333,27.694,Mehran,55.0,137.0,1889.828623,42189.544420,rock,12.141766,6.922833,Mlmmi,Low weathering grey marls alternating with ba...,A-M-VW,"Warm and humid, with a humid period longer tha..."
4,5.0,52.324,27.682,Mehran,20.0,137.0,874.201691,43010.084000,poorrange,2.216230,1.269598,MuPlaj,"Brown to grey , calcareous , feature - formin...",A-M-VW,"Warm and humid, with a humid period longer tha..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4290,4291.0,48.055,38.453,Karasu,1362.0,542.0,438.505912,13834.693060,modrange,27.209608,15.221452,Eav,Early Archean volcanic rocks,SA-K-W,"Hot and dry, with a dry period longer than the..."
4291,4292.0,48.809,38.439,Vilascay / Lankaran / Tangarud,20.0,779.0,681.559483,10.379438,mix(agri_X),4.432460,2.537951,Qtr,Travertine,PH-C-W,"Warm and dry, with a dry period longer than th..."
4292,4293.0,48.555,38.441,Karasu,1572.0,779.0,268.529472,5376.553979,mix(woodland_x),20.652390,11.668892,Eav,Early Archean volcanic rocks,SH-K-M,"Hot and dry, with a dry period longer than the..."
4293,4294.0,48.562,38.442,Karasu,1491.0,779.0,130.758834,5342.636975,modforest,15.542682,8.834613,Eav,Early Archean volcanic rocks,SH-K-M,"Hot and dry, with a dry period longer than the..."


# **4. Data Cleaning**

- Checking Data

1. Checking Special Character
---
Pada proses ini akan mengecek apakah pada data terdapat special character. Kegunaannya adalah jika ada keterdapatan special character pada data, maka akan dihandling jika memang special character tersebut tidak dibutuhkan. Dan juga kegunaan dari checking special character adalah untuk dimasukan ke dalam function cleaning yang tujuannya ketika ada input data baru dengan special character, maka akan secara otomatis akan dihandling

In [22]:
# Remove Numbers and Special Characters
df.columns = [re.sub(r'[^a-zA-Z _()]', '', col) for col in df.columns]

df.columns = [re.sub(r'@!#$%\^&\*$()', '', col) for col in df.columns]
df.sample(5)

Unnamed: 0,ID,LONG,LAT,SUB_Basin,Elevation,AAP_(mm),RiverDIST_(m),FaultDIST_(m),Landuse_Type,Slop(Percent),Slop(Degrees),GEO_UNIT,DES_GEOUNI,Climate_Type,DES_ClimateType
3400,3400.0,48.472,37.889,Qezel Owzan,2186.0,450.0,1112.438144,1917.684554,mix(verylowforest_x),9.265309,5.293519,Eav,Early Archean volcanic rocks,SA-K-M,"Moderate, with a moderate and hot and humid an..."
10,11.0,53.167,27.489,Mehran,940.0,128.0,982.950499,41682.44522,poorrange,39.747475,21.676573,EOas-ja,"Undivided Asmari and Jahrum Formation , regard...",A-M-VW,"Warm and humid, with a humid period longer tha..."
4027,4024.0,55.942,37.651,Gorgan,1128.0,142.0,1257.439485,2084.531364,mix(bagh_X),12.5556,7.156381,Qsw,Swamp and marsh,A-C-W,"Hot and dry, with a dry period longer than the..."
2527,2528.0,52.515,36.302,Babol,276.0,254.0,58.339693,2372.557473,denseforest,8.067199,4.612176,"Mm,s,l","Mixture of mudstone, siltstone, and limestone",SH-C-W,"Hot and dry, with a dry period longer than the..."
1397,1394.0,47.028,33.837,Gamasb,1293.0,179.0,701.036438,6406.052082,agri,22.997997,12.951675,EMas-sb,Undivided Asmari and Shahbazan Formation,SA-C-W,"Hot and dry, with a dry period longer than the..."


In [23]:
df.columns

Index(['ID', 'LONG', 'LAT', 'SUB_Basin', 'Elevation', 'AAP_(mm)',
       'RiverDIST_(m)', 'FaultDIST_(m)', 'Landuse_Type', 'Slop(Percent)',
       'Slop(Degrees)', 'GEO_UNIT', 'DES_GEOUNI', 'Climate_Type',
       'DES_ClimateType'],
      dtype='object')

2. Reduce Comma Number
---

In [24]:
# Columns to exclude from formatting
exclude = ['LONG', 'LAT']

# Apply formatting to all columns except the excluded ones
formatted_columns = df.drop(columns=exclude).applymap(lambda x: '{:.2f}'.format(x) if isinstance(x, (int, float)) else x)

# Concatenate the excluded columns back
df = pd.concat([df[exclude], formatted_columns], axis=1)


In [25]:
df.head(5)

Unnamed: 0,LONG,LAT,ID,SUB_Basin,Elevation,AAP_(mm),RiverDIST_(m),FaultDIST_(m),Landuse_Type,Slop(Percent),Slop(Degrees),GEO_UNIT,DES_GEOUNI,Climate_Type,DES_ClimateType
0,52.326,27.763,1.0,Mehran,617.0,137.0,1448.71,40639.58,poorrange,42.24,22.9,EOas-ja,"Undivided Asmari and Jahrum Formation , regard...",A-M-VW,"Warm and humid, with a humid period longer tha..."
1,52.333,27.772,2.0,Mehran,944.0,137.0,344.3,40135.03,mix(woodland_x),68.22,34.3,KEpd-gu,Keewatin Epedotic quartz diorite,A-M-VW,"Warm and humid, with a humid period longer tha..."
2,52.326,27.763,3.0,Mehran,617.0,137.0,1448.71,40639.58,poorrange,42.24,22.9,EOas-ja,"Undivided Asmari and Jahrum Formation , regard...",A-M-VW,"Warm and humid, with a humid period longer tha..."
3,52.333,27.694,4.0,Mehran,55.0,137.0,1889.83,42189.54,rock,12.14,6.92,Mlmmi,Low weathering grey marls alternating with ba...,A-M-VW,"Warm and humid, with a humid period longer tha..."
4,52.324,27.682,5.0,Mehran,20.0,137.0,874.2,43010.08,poorrange,2.22,1.27,MuPlaj,"Brown to grey , calcareous , feature - formin...",A-M-VW,"Warm and humid, with a humid period longer tha..."


3. Lowercase Column

In [26]:
df.columns = [x.lower() for x in df.columns]
df.head()

Unnamed: 0,long,lat,id,sub_basin,elevation,aap_(mm),riverdist_(m),faultdist_(m),landuse_type,slop(percent),slop(degrees),geo_unit,des_geouni,climate_type,des_climatetype
0,52.326,27.763,1.0,Mehran,617.0,137.0,1448.71,40639.58,poorrange,42.24,22.9,EOas-ja,"Undivided Asmari and Jahrum Formation , regard...",A-M-VW,"Warm and humid, with a humid period longer tha..."
1,52.333,27.772,2.0,Mehran,944.0,137.0,344.3,40135.03,mix(woodland_x),68.22,34.3,KEpd-gu,Keewatin Epedotic quartz diorite,A-M-VW,"Warm and humid, with a humid period longer tha..."
2,52.326,27.763,3.0,Mehran,617.0,137.0,1448.71,40639.58,poorrange,42.24,22.9,EOas-ja,"Undivided Asmari and Jahrum Formation , regard...",A-M-VW,"Warm and humid, with a humid period longer tha..."
3,52.333,27.694,4.0,Mehran,55.0,137.0,1889.83,42189.54,rock,12.14,6.92,Mlmmi,Low weathering grey marls alternating with ba...,A-M-VW,"Warm and humid, with a humid period longer tha..."
4,52.324,27.682,5.0,Mehran,20.0,137.0,874.2,43010.08,poorrange,2.22,1.27,MuPlaj,"Brown to grey , calcareous , feature - formin...",A-M-VW,"Warm and humid, with a humid period longer tha..."


4. Handling Missing Value  
Pada proses ini akan dilakukan handling missing value

In [27]:
df.replace(' ', np.nan, inplace=True)
df = df.dropna()
df.isna().sum()

long               0
lat                0
id                 0
sub_basin          0
elevation          0
aap_(mm)           0
riverdist_(m)      0
faultdist_(m)      0
landuse_type       0
slop(percent)      0
slop(degrees)      0
geo_unit           0
des_geouni         0
climate_type       0
des_climatetype    0
dtype: int64

5. Checking Type

In [28]:
df.dtypes

long               float64
lat                float64
id                  object
sub_basin           object
elevation           object
aap_(mm)            object
riverdist_(m)       object
faultdist_(m)       object
landuse_type        object
slop(percent)       object
slop(degrees)       object
geo_unit            object
des_geouni          object
climate_type        object
des_climatetype     object
dtype: object

6. Rename Columns

In [29]:
# Raplace name columns
df.rename(columns={
    "riverdist(m)": "river_distance_(m)",
    "faultdist(m)": "fault_distance_(m)"
}, inplace=True)

## Combine all function of cleaning into 1 function

In [30]:
def clean(df):
    df = pd.read_csv('Landslide_Factors_IRAN.csv')
    # Remove numbers and special characters except for spaces and underscores from column names
    df.columns = [re.sub(r'[^a-zA-Z _ ()]', '', col) for col in df.columns]

    # missing value
    df.replace(' ', np.nan, inplace=True)
    df = df.dropna()
    
    # Remove '@!#$%^&*' from the end of each column name
    df.columns = [re.sub(r'@!#$%\^&\*$', '', col) for col in df.columns]
    
    # Convert column names to lowercase
    df.columns = [x.lower() for x in df.columns]

    # Format numbers to two decimal places
    exclude = ['id','long', 'lat']
    formatted_columns = df.drop(columns=exclude).applymap(lambda x: '{:.2f}'.format(x) if isinstance(x, (int, float)) else x)
    df = pd.concat([df[['id']], df[exclude[1:]], formatted_columns], axis=1)
    
    # Raplace name columns
    df.rename(columns={
        "riverdist(m)": "river_distance_(m)",
        "faultdist(m)": "fault_distance_(m)"
    }, inplace=True)

    # Convert Data Types for All Columns
    data_types = {
        'id': 'float64',
        'long': 'float64',
        'lat': 'float64',
        'sub_basin': 'string',
        'elevation': 'float64',
        'landuse_type': 'string',
        'slop(percent)': 'float64',
        'slop(degrees)': 'float64',
        'geo_unit': 'string',
        'des_geouni': 'string',
        'climate_type': 'string',
        'des_climatetype': 'string'
    }

    # Apply the data types to the DataFrame
    df = df.astype(data_types)

    # Return the DataFrame
    return df

df = clean(df)
df.to_csv('P2M3_devin_lee_data_clean.csv', index=False)

In [31]:
df.columns

Index(['id', 'long', 'lat', 'sub_basin', 'elevation', 'aap(mm)',
       'river_distance_(m)', 'fault_distance_(m)', 'landuse_type',
       'slop(percent)', 'slop(degrees)', 'geo_unit', 'des_geouni',
       'climate_type', 'des_climatetype'],
      dtype='object')

In [32]:
df.iloc[:,5:8].columns[1]

'river_distance_(m)'

In [33]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 4240 entries, 0 to 4294
Data columns (total 15 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   id                  4240 non-null   float64
 1   long                4240 non-null   float64
 2   lat                 4240 non-null   float64
 3   sub_basin           4240 non-null   string 
 4   elevation           4240 non-null   float64
 5   aap(mm)             4240 non-null   object 
 6   river_distance_(m)  4240 non-null   object 
 7   fault_distance_(m)  4240 non-null   object 
 8   landuse_type        4240 non-null   string 
 9   slop(percent)       4240 non-null   float64
 10  slop(degrees)       4240 non-null   float64
 11  geo_unit            4240 non-null   string 
 12  des_geouni          4240 non-null   string 
 13  climate_type        4240 non-null   string 
 14  des_climatetype     4240 non-null   string 
dtypes: float64(6), object(3), string(6)
memory usage: 530.0+ KB


In [34]:
df.head(5)

Unnamed: 0,id,long,lat,sub_basin,elevation,aap(mm),river_distance_(m),fault_distance_(m),landuse_type,slop(percent),slop(degrees),geo_unit,des_geouni,climate_type,des_climatetype
0,1.0,52.326,27.763,Mehran,617.0,137.0,1448.71,40639.58,poorrange,42.24,22.9,EOas-ja,"Undivided Asmari and Jahrum Formation , regard...",A-M-VW,"Warm and humid, with a humid period longer tha..."
1,2.0,52.333,27.772,Mehran,944.0,137.0,344.3,40135.03,mix(woodland_x),68.22,34.3,KEpd-gu,Keewatin Epedotic quartz diorite,A-M-VW,"Warm and humid, with a humid period longer tha..."
2,3.0,52.326,27.763,Mehran,617.0,137.0,1448.71,40639.58,poorrange,42.24,22.9,EOas-ja,"Undivided Asmari and Jahrum Formation , regard...",A-M-VW,"Warm and humid, with a humid period longer tha..."
3,4.0,52.333,27.694,Mehran,55.0,137.0,1889.83,42189.54,rock,12.14,6.92,Mlmmi,Low weathering grey marls alternating with ba...,A-M-VW,"Warm and humid, with a humid period longer tha..."
4,5.0,52.324,27.682,Mehran,20.0,137.0,874.2,43010.08,poorrange,2.22,1.27,MuPlaj,"Brown to grey , calcareous , feature - formin...",A-M-VW,"Warm and humid, with a humid period longer tha..."


# **5. Validation**

In [35]:
# Create a data context

context = FileDataContext.create(project_root_dir='./')

1. Connect to Datasource

In [36]:
# Give a name to a Datasource. This name must be unique between Datasources.
datasource_name = 'csv-data-landslide'
datasource = context.sources.add_pandas(datasource_name)

# Give a name to a data asset
asset_name = 'landslide_asset'
path_to_data = 'P2M3_devin_lee_data_clean.csv'
asset = datasource.add_csv_asset(asset_name, filepath_or_buffer=path_to_data)

# Build batch request
batch_request = asset.build_batch_request()

2. Create an Expectation Suite

In [37]:
# Creat an expectation suite
expectation_suite_name = 'expectation-landslide'
context.add_or_update_expectation_suite(expectation_suite_name)

# Create a validator using above expectation suite
validator = context.get_validator(
    batch_request = batch_request,
    expectation_suite_name = expectation_suite_name
)

# Check the validator
validator.head()

Calculating Metrics:   0%|          | 0/1 [00:00<?, ?it/s]

Unnamed: 0,id,long,lat,sub_basin,elevation,aap(mm),river_distance_(m),fault_distance_(m),landuse_type,slop(percent),slop(degrees),geo_unit,des_geouni,climate_type,des_climatetype
0,1.0,52.326,27.763,Mehran,617.0,137.0,1448.71,40639.58,poorrange,42.24,22.9,EOas-ja,"Undivided Asmari and Jahrum Formation , regard...",A-M-VW,"Warm and humid, with a humid period longer tha..."
1,2.0,52.333,27.772,Mehran,944.0,137.0,344.3,40135.03,mix(woodland_x),68.22,34.3,KEpd-gu,Keewatin Epedotic quartz diorite,A-M-VW,"Warm and humid, with a humid period longer tha..."
2,3.0,52.326,27.763,Mehran,617.0,137.0,1448.71,40639.58,poorrange,42.24,22.9,EOas-ja,"Undivided Asmari and Jahrum Formation , regard...",A-M-VW,"Warm and humid, with a humid period longer tha..."
3,4.0,52.333,27.694,Mehran,55.0,137.0,1889.83,42189.54,rock,12.14,6.92,Mlmmi,Low weathering grey marls alternating with ba...,A-M-VW,"Warm and humid, with a humid period longer tha..."
4,5.0,52.324,27.682,Mehran,20.0,137.0,874.2,43010.08,poorrange,2.22,1.27,MuPlaj,"Brown to grey , calcareous , feature - formin...",A-M-VW,"Warm and humid, with a humid period longer tha..."


3. Expectations

`To Be Unique`

In [38]:
# Expectation 3 : Column `sub_basin` must be unique

validator.expect_column_values_to_be_unique('sub_basin')

Calculating Metrics:   0%|          | 0/8 [00:00<?, ?it/s]

{
  "success": false,
  "result": {
    "element_count": 4240,
    "unexpected_count": 4230,
    "unexpected_percent": 99.76415094339622,
    "partial_unexpected_list": [
      "Mehran",
      "Mehran",
      "Mehran",
      "Mehran",
      "Mehran",
      "Mehran",
      "Mehran",
      "Mehran",
      "Mehran",
      "Mehran",
      "Mehran",
      "Mand",
      "Mand",
      "Mand",
      "Mehran",
      "Shoor",
      "Shoor",
      "Shoor",
      "Mehran",
      "Mehran"
    ],
    "missing_count": 0,
    "missing_percent": 0.0,
    "unexpected_percent_total": 99.76415094339622,
    "unexpected_percent_nonmissing": 99.76415094339622
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}

`to be between min_value and max_value`

In [39]:
# Expectation 4 : Column `slop(degrees)` must be less than 20 Degrees

validator.expect_column_values_to_be_between(
    column='slop(degrees)', min_value=0, max_value=20
)

Calculating Metrics:   0%|          | 0/8 [00:00<?, ?it/s]

{
  "success": false,
  "result": {
    "element_count": 4240,
    "unexpected_count": 751,
    "unexpected_percent": 17.712264150943398,
    "partial_unexpected_list": [
      22.9,
      34.3,
      22.9,
      20.85,
      20.04,
      21.1,
      21.68,
      20.08,
      29.63,
      35.27,
      24.18,
      31.33,
      23.93,
      43.61,
      31.11,
      23.7,
      31.08,
      23.95,
      32.3,
      33.54
    ],
    "missing_count": 0,
    "missing_percent": 0.0,
    "unexpected_percent_total": 17.712264150943398,
    "unexpected_percent_nonmissing": 17.712264150943398
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}

`to be in set`

In [40]:
# Expectation 6 : Column `sub_basin` must contain 'Mehran' 
validator.expect_column_values_to_be_in_set('sub_basin', ['mehran'])

Calculating Metrics:   0%|          | 0/8 [00:00<?, ?it/s]

{
  "success": false,
  "result": {
    "element_count": 4240,
    "unexpected_count": 4240,
    "unexpected_percent": 100.0,
    "partial_unexpected_list": [
      "Mehran",
      "Mehran",
      "Mehran",
      "Mehran",
      "Mehran",
      "Mehran",
      "Mehran",
      "Mehran",
      "Mehran",
      "Mehran",
      "Mehran",
      "Mand",
      "Mand",
      "Mand",
      "Mehran",
      "Shoor",
      "Shoor",
      "Shoor",
      "Mehran",
      "Mehran"
    ],
    "missing_count": 0,
    "missing_percent": 0.0,
    "unexpected_percent_total": 100.0,
    "unexpected_percent_nonmissing": 100.0
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}

`to be in type list`

In [41]:
# Expectation 7 : Column `total_amount` must in form of integer or float

validator.expect_column_values_to_be_in_type_list('river_distance_(m)', ['integer', 'float'])

Calculating Metrics:   0%|          | 0/1 [00:00<?, ?it/s]

{
  "success": true,
  "result": {
    "observed_value": "float64"
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}

In [42]:
# Expectation 7 : Column `total_amount` must in form of integer or float

validator.expect_column_values_to_be_in_type_list('fault_distance_(m)', ['integer', 'float'])

Calculating Metrics:   0%|          | 0/1 [00:00<?, ?it/s]

{
  "success": true,
  "result": {
    "observed_value": "float64"
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}

`Median to Between`

In [43]:
validator.expect_column_median_to_be_between(
    column='slop(degrees)', 
    min_value=0, 
    max_value=20
)

Calculating Metrics:   0%|          | 0/4 [00:00<?, ?it/s]

{
  "success": true,
  "result": {
    "observed_value": 11.58
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}

In [44]:
df.head(5)

Unnamed: 0,id,long,lat,sub_basin,elevation,aap(mm),river_distance_(m),fault_distance_(m),landuse_type,slop(percent),slop(degrees),geo_unit,des_geouni,climate_type,des_climatetype
0,1.0,52.326,27.763,Mehran,617.0,137.0,1448.71,40639.58,poorrange,42.24,22.9,EOas-ja,"Undivided Asmari and Jahrum Formation , regard...",A-M-VW,"Warm and humid, with a humid period longer tha..."
1,2.0,52.333,27.772,Mehran,944.0,137.0,344.3,40135.03,mix(woodland_x),68.22,34.3,KEpd-gu,Keewatin Epedotic quartz diorite,A-M-VW,"Warm and humid, with a humid period longer tha..."
2,3.0,52.326,27.763,Mehran,617.0,137.0,1448.71,40639.58,poorrange,42.24,22.9,EOas-ja,"Undivided Asmari and Jahrum Formation , regard...",A-M-VW,"Warm and humid, with a humid period longer tha..."
3,4.0,52.333,27.694,Mehran,55.0,137.0,1889.83,42189.54,rock,12.14,6.92,Mlmmi,Low weathering grey marls alternating with ba...,A-M-VW,"Warm and humid, with a humid period longer tha..."
4,5.0,52.324,27.682,Mehran,20.0,137.0,874.2,43010.08,poorrange,2.22,1.27,MuPlaj,"Brown to grey , calcareous , feature - formin...",A-M-VW,"Warm and humid, with a humid period longer tha..."


`Mean Between`

In [45]:
validator.expect_column_mean_to_be_between(
    column='elevation', 
    min_value=0, 
    max_value=800
)

Calculating Metrics:   0%|          | 0/4 [00:00<?, ?it/s]

{
  "success": false,
  "result": {
    "observed_value": 1400.7282971698114
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}

`Null Checking`

In [46]:
validator.expect_column_values_to_not_be_null('sub_basin')

Calculating Metrics:   0%|          | 0/6 [00:00<?, ?it/s]

{
  "success": true,
  "result": {
    "element_count": 4240,
    "unexpected_count": 0,
    "unexpected_percent": 0.0,
    "partial_unexpected_list": []
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}

# **6. Kesimpulan Analisa**

- Daerah dengan penggunaan lahan urban menunjukkan derajat kemiringan yang lebih tinggi rata-rata. Ini mengindikasikan bahwa pembangunan di daerah perkotaan di Iran mungkin memerlukan perencanaan yang matang untuk mitigasi risiko longsor, terutama dalam memilih lokasi konstruksi dan desain infrastruktur.

- Sub-basin dengan ketinggian yang lebih tinggi dapat memiliki risiko longsor yang lebih besar. Pemetaan ini penting untuk pengelolaan sumber daya air dan perencanaan penggunaan lahan, serta untuk mengidentifikasi area yang mungkin memerlukan tindakan pencegahan terhadap longsor.

- Ada beberapa wilayah tertentu yang dekat dengan zona sesar aktif. Area dengan jarak yang terdekat lebih cenderung memiliki resiko longsor yang tinggi. Hal ini juga perlu dilakukan pengawasan pembangunan pada zona-zona tertentu.

- Nilai rata-rata kemiringan 12.741 derajat memberikan informasi bahwa secara keseluruhan, daerah studi memiliki kemiringan yang cukup signifikan, yang menambah faktor risiko terjadinya longsor, terutama bila dipadukan dengan kondisi geologi atau cuaca yang tidak stabil.

- Mayoritas wilayah memiliki iklim panas dan kering dengan periode kering yang lebih panjang dari pada periode lembab. Kondisi ini bisa mengurangi vegetasi dan kelembapan tanah, sehingga pada saat musim hujan tiba, tanah menjadi lebih rentan terhadap longsor karena perubahan kelembapan yang drastis.

Insight :

Pemerintah setempat dapat melakukan peninjauan dalam hal pembangunan pada zona-zona bahaya. Hal yang dapat diprioritaskan adalah zona sesar setempat, dimana pembangunan tidak direkomendasikan dekat dengan zona tersebut, dan juga harus ditinjau lagi bagaimana kondisi batuan di wilayah tersebut, zona yang perlu dihindarkan adalah zona yang memiliki tekstur dan struktur batuan yang kering dan tidak memiliki bobot untuk menopang tekanan dari atas, analisa ini perlu dilakukan uji lanjutan seperti uji laboratorium, dan uji tekanan. Dan juga pemerintah dapat melakukan antisipasi longsor dengan melakukan zona penghijauan