# üìù Data Update: Change Governorate_Lahj to 'Tuban'

This notebook demonstrates how to modify the `Governorate_Lahj` column values to 'Tuban'.

**Prerequisites:** Run the [Column_Renaming_Tutorial.ipynb](Column_Renaming_Tutorial.ipynb) first to ensure columns are renamed.

---

## Step 1: Import Libraries and Load Data

First, let's import pandas and load our renamed dataset.

In [None]:
import pandas as pd
from datetime import datetime

# Load the dataset with renamed columns
df = pd.read_excel('Irrigation_DS.xlsx')

print("‚úì Dataset loaded successfully!")
print(f"Shape: {df.shape[0]} rows √ó {df.shape[1]} columns")

## Step 2: Examine Current Governorate_Lahj Values

Let's see what values currently exist in the `Governorate_Lahj` column.

In [None]:
print("CURRENT VALUES IN 'Governorate_Lahj' COLUMN:")
print("=" * 80)

print(f"\nColumn data type: {df['Governorate_Lahj'].dtype}")
print(f"\nUnique values:")
print(df['Governorate_Lahj'].unique())

print(f"\nValue counts:")
print(df['Governorate_Lahj'].value_counts())

print(f"\nNull/missing values: {df['Governorate_Lahj'].isna().sum()}")
print(f"Total rows: {len(df)}")

## Step 3: Display Sample Data Before Change

Let's look at some sample rows to see the current state.

In [None]:
print("Sample rows before update (showing location-related columns):")
print("=" * 80)

# Display relevant columns
location_cols = ['Governorate', 'Governorate_Lahj', 'Village_Name', 'Surveyor_Name']
display(df[location_cols].head(10))

## Step 4: Create Backup Before Making Changes

**Important:** Always create a backup before modifying data!

In [None]:
# Create timestamped backup
backup_filename = f'Irrigation_DS_before_tuban_update_{datetime.now().strftime("%Y%m%d_%H%M%S")}.xlsx'

print(f"Creating backup: {backup_filename}")
df.to_excel(backup_filename, index=False)

print("‚úì Backup created successfully!")
print(f"\nYou can restore from this backup if needed.")

## Step 5: Update Governorate_Lahj to 'Tuban'

Now let's change all values in the `Governorate_Lahj` column to 'Tuban'.

In [None]:
# Store the count of rows being updated
rows_to_update = len(df)
old_values = df['Governorate_Lahj'].unique()

# Update the column
df['Governorate_Lahj'] = 'Tuban'

print("="* 80)
print("‚úì UPDATE COMPLETED SUCCESSFULLY!")
print("=" * 80)
print(f"\nRows updated: {rows_to_update}")
print(f"Old values: {old_values}")
print(f"New value: 'Tuban'")

## Step 6: Verify the Change

Let's verify that all values were successfully updated.

In [None]:
print("VERIFICATION AFTER UPDATE:")
print("=" * 80)

print(f"\nUnique values in 'Governorate_Lahj':")
unique_after = df['Governorate_Lahj'].unique()
print(unique_after)

print(f"\nValue counts:")
print(df['Governorate_Lahj'].value_counts())

# Verify all rows have 'Tuban'
all_tuban = (df['Governorate_Lahj'] == 'Tuban').all()

if all_tuban:
    print("\n‚úì SUCCESS: All rows now have 'Tuban' in Governorate_Lahj column!")
else:
    print("\n‚ö† WARNING: Some rows may not have been updated correctly.")

## Step 7: Display Sample Data After Change

Let's view the same sample rows to confirm the change.

In [None]:
print("Sample rows after update (showing location-related columns):")
print("=" * 80)

# Display the same columns as before
display(df[location_cols].head(10))

print("\n" + "=" * 80)
print("Note: All 'Governorate_Lahj' values should now show 'Tuban'")
print("=" * 80)

## Step 8: Compare Before and After

Let's create a summary comparison.

In [None]:
print("BEFORE vs AFTER COMPARISON:")
print("=" * 80)

comparison_data = {
    'Metric': ['Old Values', 'New Values', 'Rows Updated', 'Update Status'],
    'Value': [str(list(old_values)), "['Tuban']", rows_to_update, '‚úì Complete']
}

comparison_df = pd.DataFrame(comparison_data)
display(comparison_df)

## Step 9: Save the Updated Dataset

Finally, save the changes back to the Excel file.

In [None]:
# Save to the same filename
output_filename = 'Irrigation_DS.xlsx'

print(f"Saving updated data to: {output_filename}")
df.to_excel(output_filename, index=False)

print("\n" + "=" * 80)
print("‚úì DATA UPDATE COMPLETE!")
print("=" * 80)
print(f"\n‚úì Column 'Governorate_Lahj' updated to: 'Tuban'")
print(f"‚úì Total rows updated: {rows_to_update}")
print(f"‚úì Changes saved to: {output_filename}")
print(f"‚úì Backup available at: {backup_filename}")
print("\nüéâ Your dataset is ready with updated Governorate_Lahj values!")

---

## üìä Summary

### What We Accomplished:
1. ‚úÖ Loaded the irrigation dataset
2. ‚úÖ Examined current values in `Governorate_Lahj` column
3. ‚úÖ Created timestamped backup for safety
4. ‚úÖ Updated all values to 'Tuban'
5. ‚úÖ Verified the changes were applied correctly
6. ‚úÖ Saved the updated dataset

### Key Changes:
- **Column Modified**: `Governorate_Lahj`
- **New Value**: `'Tuban'` (for all rows)
- **Rows Affected**: All 30 rows in the dataset
- **Backup Created**: Yes (timestamped)

### Files:
- **Updated File**: `Irrigation_DS.xlsx`
- **Backup File**: `Irrigation_DS_before_tuban_update_[timestamp].xlsx`

---

**Next Steps:**
- Begin your data analysis with the updated dataset
- Use the standardized governorate name for filtering and grouping
- Create visualizations and reports with consistent location data