# Project 2: Karamoja Region Productivity Report

##### **BUSINESS UNDERSTANDING**

Karamoja is a region in Uganda facing Food insecurity.

Reasons for food insecurity :

a. Low production due to intense droughts.

b. Low production due to Pest and disease outbreaks.

c. Lack of proper visisbility in the state of the region by NGO's and reliance on inadequate information to prioritize help.

**Business Need**: 
Visibility of the overall state of the Karamoja Region.

*Solution*: 
1. Develop a food security monitoring tool for decision making.

2. Measure yield for main staple food i.e sorghum and maize.

#### **Data Understanding** 

Main Crops of focus: Sorghum and Maize.

Research Objectives and Research Questions:
1. To find out the top 2 district with the lowest sorghum and maize yield in Karamoja district.

    a. *which district has the lowest sorghum and maize yields in Karamoja?*

2. To find the bottom 6 subcounties have the lowest maize yields relative to their population size.

    b. *Which are the 6 subcounties with the lowest maize yields relative to their population size?*

3. To find out the top 2 district with the lowest productivity of maize and sorghum.

    c. *Which top 2 district has the lowest total productivity of maize and sorghum?*.

4. To find out the district with the highest crop area allocation for sorghum and maize in Karamoja

    d. *Which district has the highest crop area allocation of sorghum and Maize in Karamoja?*

Tableau Dashboard link: https://public.tableau.com/app/profile/pauline.armani/viz/Project2Dashboard_17390316658160/KaramojaFoodProductivity?publish=yes

Powerpoint Presentation Link: https://docs.google.com/presentation/d/1GI46fcSOQqiWPYIRQCKA1QsSHX4jU0xEn9xAkwTCpP4/edit#slide=id.p1

In [None]:
#import pandas and numpy packages
import pandas as pd
import numpy as np

#Data Ingestion
district_df = pd.read_csv('Uganda_Karamoja_District_Crop_Yield_Population.csv')
subcounty_df = pd.read_csv('Uganda_Karamoja_Subcounty_Crop_Yield_Population.csv')

# Perform an inner join on the "District Name" column
df = district_df.merge(subcounty_df, left_on="NAME", right_on="DISTRICT_NAME", how="inner")

df.head()
df.columns


Index(['OBJECTID_x', 'NAME', 'POP_x', 'Area_x', 'S_Yield_Ha_x', 'M_Yield_Ha_x',
       'Crop_Area_Ha_x', 'S_Area_Ha_x', 'M_Area_Ha_x', 'S_Prod_Tot_x',
       'M_Prod_Tot_x', 'OBJECTID_y', 'SUBCOUNTY_NAME', 'DISTRICT_NAME',
       'POP_y', 'Area_y', 'Karamoja', 'S_Yield_Ha_y', 'M_Yield_Ha_y',
       'Crop_Area_Ha_y', 'S_Area_Ha_y', 'M_Area_Ha_y', 'S_Prod_Tot_y',
       'M_Prod_Tot_y'],
      dtype='object')

#### Research Objective 1

1. To find out the top 2 district with the lowest sorghum and maize average yield in Karamoja district.

*RQ1. which district has the lowest sorghum and maize average yields in Karamoja?*

>  Results from the analysis revealed that,Moroto district had the lowest average maize yield followed by Napak . On the other hand, the district with the lowest sorghum yield per ha was moroto followed by Napak. see Figure 1 in the dashboard.




In [None]:
# computes the bottom 2 districts for Sorghum yield
top_sorghum_districts = district_df.nsmallest(2, 'S_Yield_Ha')[['NAME', 'S_Yield_Ha']]
print("Top 2 districts with the highest Sorghum yield:\n", top_sorghum_districts)

# computes the bottom 2 districts for Maize yield
top_maize_districts = district_df.nsmallest(2, 'M_Yield_Ha')[['NAME', 'M_Yield_Ha']]
print("\nTop 2 districts with the highest Maize yield:\n", top_maize_districts)




Top 2 districts with the highest Sorghum yield:
      NAME  S_Yield_Ha
4  MOROTO         128
6   NAPAK         137

Top 2 districts with the highest Maize yield:
      NAME  M_Yield_Ha
4  MOROTO         355
6   NAPAK         854


#### Research Objective 2

RO2: To find the bottom 6 subcounties have the lowest maize yields relative to their population size.

*RQ2. Which are the 6 subcounties with the lowest maize yields relative to their population size?* 

A treemap was utilized to identify the top 6 subcounties with the lowest maize yield in relation to their population size. The map revealed that Katikekile subcounty Name, Nadunget,Northern Division, Rupa, Southern Division and Tapac had the lowest maize yield with each division having an average of 355 maize yield per Ha, and a population of 127,811. This reveals that the 6 subcounties have the highest shortage of maize and should be considered first when donations are made by the NGO's.


#### Research Objective 3

RO3. To find out the top 2 district with the lowest productivity of maize and sorghum.

*RQ3. Which top 2 district has the lowest total productivity of maize and sorghum?*

The top 2 districts with the lowest productivity of sorghum  and maize is Moroto (M prod tot = 422,468, S prod tot = 606,944) and Abim (S prod tot = 1,471,506, M prod tot = 1,922,567).

In [39]:
# Group by District and sum up relevant columns for total productivity
district_productivity = subcounty_df.groupby("DISTRICT_NAME").agg(
    S_Prod_Tot=("S_Prod_Tot", "sum"),
    M_Prod_Tot=("M_Prod_Tot", "sum")
).reset_index()

# Compute the total productivity per district
district_productivity["Total_Prod"] = district_productivity["S_Prod_Tot"] + district_productivity["M_Prod_Tot"]

# Find the top 2 districts with the lowest total productivity of sorghum and Maize
top_2_districts = district_productivity.nsmallest(2, "Total_Prod")[["DISTRICT_NAME", "S_Prod_Tot", "M_Prod_Tot", "Total_Prod"]]

# Print the results
print("Top 2 districts with the lowest total productivity of Maize and Sorghum:\n", top_2_districts)

Top 2 districts with the lowest total productivity of Maize and Sorghum:
   DISTRICT_NAME    S_Prod_Tot    M_Prod_Tot    Total_Prod
4        MOROTO  6.075967e+05  4.221161e+05  1.029713e+06
0          ABIM  1.472671e+06  1.922133e+06  3.394804e+06


#### Research Objective 4
RO4. To find out the district with the highest crop area allocation for sorghum and maize in Karamoja

RQ4. *Which district has the highest crop area allocation of sorghum and Maize in Karamoja?*

Kotido district had the highest sorghum allocation area (50,247 ha) out of the total crop area of 53,033 in the district. 
On the other hand, Kaabong has the highest maize area allocation with 7,394 hectare of maize in the district.




#### Recommendations

1. Moroto has the most food shortage due to low crop area allocation and low productivity and low average crop yields for both sorghum and Maize. as a result, Moroto should be considered a priority for support by the NGO's.

2. Future region planning should access the major cause of low productivity in Moroto such as pest outbreak and draught intensity to help increase the subcounty yield.

3. Compared to the crop area available, sorghum has the largest crop allocation compared to maize. however, the average production and total yield of sorgum is less than that of maize. As such, the NGO's can advice farmers to revise the crop allocation to ensure high average yields in the area.

