<a href="https://colab.research.google.com/github/Kingoric/Karamoja-Project/blob/main/Karamoja_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Project Overview

**Objective:**

The objective of this project is to assess crop yields of maize and sorghum across various districts and subcounties in Uganda, with a focus on identifying key factors affecting agricultural productivity and providing actionable insights for stakeholders, including farmers, government agencies, and agricultural businesses.

**Data Overview:**

For analysis, the following data sets are being used:

1. Uganda_Subcounties: Contains geographic data and administrative boundaries of subcounties in Uganda.
2. Uganda_Karamoja_Subcounty_Crop_Yield_Population: Includes crop yield data (maize and sorghum) and population information for subcounties in the Karamoja region.
3. Uganda_Karamoja_District_Crop_Yield_Population: Provides similar data but aggregated at the district level within Karamoja.
4. Uganda_Districts: Contains geographic data and administrative boundaries of districts in Uganda.
5. Crop_Type_Map_Sorghum and Crop_Type_Map_Maize: These datasets focus on mapping the crop types to specific geographic regions and yield data.

# Data Cleaning
Data cleaninig needs to be done so as to handle missing values, to correct inconsistent data and remove duplicates.

In [None]:
#import os to handle file paths
import os

**Code for Loading and Cleaning Shape Files**




**Geopandas installation**

Shapefiles can be loaded using the geopandas library. First, I will have geopandas installed:



In [None]:
!pip install geopandas




Geopandas importation and file loading



In [None]:
import geopandas as gpd
import os

# Path to shapefiles
shapefiles_folder_path = '/content/drive/MyDrive/DATA/DATA/SHAPEFILES'

# List of shapefile names
shapefiles = [
    'Uganda_Subcounties.shp',
    'Uganda_Districts.shp',
    'Crop_Type_Map_Sorghum.shp',
    'Crop_Type_Map_Maize.shp'
]

# Function to clean and save shapefile
def clean_and_save_shapefile(filename):
    shapefile_path = os.path.join(shapefiles_folder_path, filename)
    gdf = gpd.read_file(shapefile_path)
    gdf = gdf[gdf.is_valid]  # Remove invalid geometries
    gdf = gdf.to_crs(epsg=4326)  # Convert to WGS84 (EPSG:4326)
    cleaned_path = os.path.join(shapefiles_folder_path, f'cleaned_{filename}')
    gdf.to_file(cleaned_path)
    print(f"Cleaned shapefile saved to {cleaned_path}")

# Clean and save each shapefile
for shapefile in shapefiles:
    clean_and_save_shapefile(shapefile)


Cleaned shapefile saved to /content/drive/MyDrive/DATA/DATA/SHAPEFILES/cleaned_Uganda_Subcounties.shp
Cleaned shapefile saved to /content/drive/MyDrive/DATA/DATA/SHAPEFILES/cleaned_Uganda_Districts.shp
Cleaned shapefile saved to /content/drive/MyDrive/DATA/DATA/SHAPEFILES/cleaned_Crop_Type_Map_Sorghum.shp
Cleaned shapefile saved to /content/drive/MyDrive/DATA/DATA/SHAPEFILES/cleaned_Crop_Type_Map_Maize.shp


**Loading tables**

In [None]:
import pandas as pd

# Path to tables folder
tables_folder_path = '/content/drive/MyDrive/DATA/DATA/TABLES'

# Load the CSV files
subcounty_yield_population_csv = 'Uganda_Karamoja_Subcounty_Crop_Yield_Population.csv'
district_yield_population_csv = 'Uganda_Karamoja_District_Crop_Yield_Population.csv'

# Full paths
subcounty_yield_population_path = os.path.join(tables_folder_path, subcounty_yield_population_csv)
district_yield_population_path = os.path.join(tables_folder_path, district_yield_population_csv)

# Load CSV files into DataFrames
df_subcounty_yield_population = pd.read_csv(subcounty_yield_population_path)
df_district_yield_population = pd.read_csv(district_yield_population_path)

# Display first few rows of each DataFrame
print("Uganda_Karamoja_Subcounty_Crop_Yield_Population")
print(df_subcounty_yield_population.head())

print("\nUganda_Karamoja_District_Crop_Yield_Population")
print(df_district_yield_population.head())

# Data Cleaning

# Remove any duplicate rows
df_subcounty_yield_population = df_subcounty_yield_population.drop_duplicates()
df_district_yield_population = df_district_yield_population.drop_duplicates()

# Handle missing values by filling with a specific value (e.g., 0 for numeric columns)
df_subcounty_yield_population = df_subcounty_yield_population.fillna(0)
df_district_yield_population = df_district_yield_population.fillna(0)

# Remove any rows with missing values
df_subcounty_yield_population = df_subcounty_yield_population.dropna()
df_district_yield_population = df_district_yield_population.dropna()


# Save cleaned data
cleaned_subcounty_yield_population_path = os.path.join(tables_folder_path, 'cleaned_Uganda_Karamoja_Subcounty_Crop_Yield_Population.csv')
cleaned_district_yield_population_path = os.path.join(tables_folder_path, 'cleaned_Uganda_Karamoja_District_Crop_Yield_Population.csv')

df_subcounty_yield_population.to_csv(cleaned_subcounty_yield_population_path, index=False)
df_district_yield_population.to_csv(cleaned_district_yield_population_path, index=False)

print(f"\nCleaned Uganda_Karamoja_Subcounty_Crop_Yield_Population CSV saved to {cleaned_subcounty_yield_population_path}")
print(f"Cleaned Uganda_Karamoja_District_Crop_Yield_Population CSV saved to {cleaned_district_yield_population_path}")


Uganda_Karamoja_Subcounty_Crop_Yield_Population
   OBJECTID       SUBCOUNTY_NAME DISTRICT_NAME    POP        Area Karamoja  \
0       263              KACHERI        KOTIDO  17244  1067176155        Y   
1       264               KOTIDO        KOTIDO  52771   597575188        Y   
2       265  KOTIDO TOWN COUNCIL        KOTIDO  27389    23972401        Y   
3       266         NAKAPERIMORU        KOTIDO  38775   419111591        Y   
4       267           PANYANGARA        KOTIDO  65704   880955930        Y   

   S_Yield_Ha   M_Yield_Ha  Crop_Area_Ha     S_Area_Ha   M_Area_Ha  \
0  354.207411  1137.467019   7023.533691   6434.342449  528.124229   
1  367.890523  1162.996687  13587.990760  12455.592640  824.767081   
2  369.314177  1167.005832   1656.531855   1520.322052    8.561644   
3  283.324569   852.366578   7087.823334   6761.488901   45.721712   
4  373.836926  1283.859882  10398.249390  10111.198130  172.611914   

     S_Prod_Tot     M_Prod_Tot  
0  2.279092e+06  600723.89290

# Correlation Analysis

Calculating correlation coefficients between maize yields and other variables like population, area, and sorghum yields to help understand if there is a statistical relationship between these variables.

In [None]:


# Load your data
df = pd.read_csv('/content/drive/MyDrive/DATA/DATA/TABLES/Uganda_Karamoja_Subcounty_Crop_Yield_Population.csv')

# Calculate correlation
correlations = df[['M_Yield_Ha', 'POP', 'Crop_Area_Ha', 'S_Yield_Ha']].corr()
print(correlations)


              M_Yield_Ha       POP  Crop_Area_Ha  S_Yield_Ha
M_Yield_Ha      1.000000  0.135996      0.263790    0.624494
POP             0.135996  1.000000      0.392587   -0.081385
Crop_Area_Ha    0.263790  0.392587      1.000000    0.171402
S_Yield_Ha      0.624494 -0.081385      0.171402    1.000000



**Insights**
  1. Low Correlation with Population
  2. Significant Correlation with Sorghum Yield
  3. Moderate Correlation with Crop Area
  
**Recommendations**


1.   Explore non-population factors affecting maize yields, such as soil quality or climate.

2.   Improve agricultural practices in regions with high yields for both crops.

3. Optimize crop area management to boost maize yields.



In [None]:
# Load your data
df = pd.read_csv('/content/drive/MyDrive/DATA/DATA/TABLES/Uganda_Karamoja_District_Crop_Yield_Population.csv')

# Calculate correlation
correlations = df[['M_Yield_Ha', 'POP', 'Crop_Area_Ha', 'S_Yield_Ha']].corr()
print(correlations)

              M_Yield_Ha       POP  Crop_Area_Ha  S_Yield_Ha
M_Yield_Ha      1.000000 -0.032883      0.300017    0.565185
POP            -0.032883  1.000000      0.424930    0.024405
Crop_Area_Ha    0.300017  0.424930      1.000000    0.206778
S_Yield_Ha      0.565185  0.024405      0.206778    1.000000


**Insights:**

1. Low Correlation with Population:


2. Significant Correlation with Sorghum Yield:

3. Moderate Correlation with Crop Area:

**Recommendations:**

1. Investigate Non-Population Factors:

2. Enhance Agricultural Practices:

3. Optimize Crop Area Management:












# Data Visualization using tableau
For visualization using crosstabs, geographic maps, scatter plots, heatmaps etc, I used tableau.

The link below will redirect you to the published project workbook:
[link](https://public.tableau.com/views/projectworkbook_17251817703950/KaramojaCropYieldandFoodSecurityDashboard?:language=en-US&:sid=&:redirect=auth&:display_count=n&:origin=viz_share_link)