<img src="https://www-kiva-org.global.ssl.fastly.net/rgit2a3310b93e9c26beb06ce2915185c0953fc3ba8f/img/kiva_k_cutout_new.jpg" style ="float:right;" width ="50" height = "75">

# <font color ='blue'>KIVA CROWDFUNDING. </font>

# Models to estimate the poverty levels of residents in the regions where Kiva has active loans.

# Assignment 1 

 ### <font color='red'>by James Gikunju Kibugu </font>

### Kiva an online crowdfunding platform is inviting the community to help then build more localized models to estimate the poverty levels of residents in the regions where Kiva has active loans. 

### The aim will be to explore the data using Python to help Kiva understand their borrowers and their poverty levels so as to better assess and maximize the impact of their work. Participants should develop their own creative approaches to addressing the objective

##### Problem Statement.
For the locations in which Kiva has active loans, your objective is to pair Kiva's data with additional data sources `to estimate the welfare level of borrowers in specific regions, based on shared economic and demographic characteristics.` 

A good solution would connect the features of each loan or product to one of several poverty mapping datasets, which indicate the average level of welfare in a region on as granular a level as possible. Many datasets indicate the poverty rate in a given area, with varying levels of granularity. Kiva would like to be able to disaggregate these regional averages by gender, sector, or borrowing behavior in order to estimate a Kiva borrower’s level of welfare using all of the relevant information about them. Strong submissions will attempt to map vaguely described locations to more accurate geocodes.


This file contains records from the Kiva Data Snapshot and can be matched to the loan theme regions to get a loan’s location.

Column descriptions:

1. id: Unique ID for loan (Loan ID)
2. Loan Theme ID: Unique ID for loan theme
3. Loan Theme Type: General description of the loan theme category
4. Partner ID: Unique ID for field partners (Partner ID)


### 1. INGESTING DATA SETS.

In [3]:
import pandas as pd
import numpy as np

##### 1.1 IMPORTING THAT KIVA LOAN DATA DEALING WITH MISSING VALUES IN IT.

In [4]:
df1=pd.read_csv('KIVA/kiva_loans.csv')
df1.head()

Unnamed: 0,id,funded_amount,loan_amount,activity,sector,use,country_code,country,region,currency,partner_id,posted_time,disbursed_time,funded_time,term_in_months,lender_count,tags,borrower_genders,repayment_interval,date
0,653051,300.0,300.0,Fruits & Vegetables,Food,"To buy seasonal, fresh fruits to sell.",PK,Pakistan,Lahore,PKR,247.0,2014-01-01 06:12:39+00:00,2013-12-17 08:00:00+00:00,2014-01-02 10:06:32+00:00,12.0,12,,female,irregular,2014-01-01
1,653053,575.0,575.0,Rickshaw,Transportation,to repair and maintain the auto rickshaw used ...,PK,Pakistan,Lahore,PKR,247.0,2014-01-01 06:51:08+00:00,2013-12-17 08:00:00+00:00,2014-01-02 09:17:23+00:00,11.0,14,,"female, female",irregular,2014-01-01
2,653068,150.0,150.0,Transportation,Transportation,To repair their old cycle-van and buy another ...,IN,India,Maynaguri,INR,334.0,2014-01-01 09:58:07+00:00,2013-12-17 08:00:00+00:00,2014-01-01 16:01:36+00:00,43.0,6,"user_favorite, user_favorite",female,bullet,2014-01-01
3,653063,200.0,200.0,Embroidery,Arts,to purchase an embroidery machine and a variet...,PK,Pakistan,Lahore,PKR,247.0,2014-01-01 08:03:11+00:00,2013-12-24 08:00:00+00:00,2014-01-01 13:00:00+00:00,11.0,8,,female,irregular,2014-01-01
4,653084,400.0,400.0,Milk Sales,Food,to purchase one buffalo.,PK,Pakistan,Abdul Hakeem,PKR,245.0,2014-01-01 11:53:19+00:00,2013-12-17 08:00:00+00:00,2014-01-01 19:18:51+00:00,14.0,16,,female,monthly,2014-01-01


In [5]:
df1.columns

Index(['id', 'funded_amount', 'loan_amount', 'activity', 'sector', 'use',
       'country_code', 'country', 'region', 'currency', 'partner_id',
       'posted_time', 'disbursed_time', 'funded_time', 'term_in_months',
       'lender_count', 'tags', 'borrower_genders', 'repayment_interval',
       'date'],
      dtype='object')

In [6]:
df1['repayment_interval'].unique()

array(['irregular', 'bullet', 'monthly', 'weekly'], dtype=object)

In [7]:
df1['borrower_genders'].unique()

array(['female', 'female, female', 'female, female, female', ...,
       'female, female, male, female, female, female, female, female, female, female, male, male, female, female, male, female, female, female, female, female, female, female',
       'male, female, female, female, female, female, female, female, male, male, female, male, female, male, male, male',
       'female, female, female, male, female, female, female, male, female, female, female, male, female, male, female, female, female, female, female, female, female, female, female, female, female, female, female, female, male'],
      dtype=object)

In [8]:
df1.isnull().sum()

id                         0
funded_amount              0
loan_amount                0
activity                   0
sector                     0
use                     4232
country_code               8
country                    0
region                 56800
currency                   0
partner_id             13507
posted_time                0
disbursed_time          2396
funded_time            48331
term_in_months             0
lender_count               0
tags                  171416
borrower_genders        4221
repayment_interval         0
date                       0
dtype: int64

There are a no of  missing value on the Country code and we will try and indentify the missing values in so that we can replace with the correct country code.


In [9]:
# select 3 columns so that we can know the missing country names and its respective country code.
df=df1[['country_code','country','id']]

In [10]:
null_data = df[df.isnull().any(axis=1)]
null_data

Unnamed: 0,country_code,country,id
202537,,Namibia,851360
202823,,Namibia,851368
344929,,Namibia,991853
351177,,Namibia,998555
420953,,Namibia,1068167
421218,,Namibia,1068159
487207,,Namibia,1147852
487653,,Namibia,1147866


In [11]:
# Nambia country code is NA so let us replace the missing values.
df1['country_code']=df1['country_code'].fillna('NA')

In [12]:
df1.isnull().sum()

id                         0
funded_amount              0
loan_amount                0
activity                   0
sector                     0
use                     4232
country_code               0
country                    0
region                 56800
currency                   0
partner_id             13507
posted_time                0
disbursed_time          2396
funded_time            48331
term_in_months             0
lender_count               0
tags                  171416
borrower_genders        4221
repayment_interval         0
date                       0
dtype: int64

In [13]:
# indentify the region missing values and try and replace them.
df_reg=df1[['country','region']]

In [14]:
null_reg = df_reg[df_reg.isnull().any(axis=1)]
null_reg['region']=null_reg['region'].fillna(1)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


In [15]:
null_reg.groupby('country')[['region']].sum()

Unnamed: 0_level_0,region
country,Unnamed: 1_level_1
Albania,315
Armenia,5
Azerbaijan,170
Belize,2
Benin,495
Bhutan,1
Bolivia,97
Brazil,41
Burkina Faso,6
Burundi,56


#### Try replace the missing value on the partner id using the partner id in the Loan theme my region. 

In [16]:
# Replace the missing value on the partner id using the partner id in the Loan theme ny region.
df_pa_id=df1[['id','partner_id']]

null_pa_id=df_pa_id[df_pa_id.isnull().any(axis=1)]
null_pa_id.head()

Unnamed: 0,id,partner_id
5,1080148,
67,1080150,
99,1080153,
114,1080151,
195,1080149,


In [25]:
# partner id in the kiva loan themes and the ids
df3pa = df3[['id','Partner ID']]
df3pa.head()

Unnamed: 0,id,Partner ID
0,638631,151.0
1,640322,151.0
2,641006,160.0
3,641019,160.0
4,641594,336.0


In [26]:
# deal with the missing value in the kiva loan data for the missing data for partner ID whether its there
dfpa_kA=pd.merge(df3pa,null_pa_id,on ='id')
dfpa_kA.isnull().sum()

id                0
Partner ID    13507
partner_id    13507
dtype: int64

##### Our conclusion on the partner ID  is that there is no partner id was recorded on the loan theme and loan data so that we cannot replace or deal with that missing value

In [18]:
df1.describe()

Unnamed: 0,id,funded_amount,loan_amount,partner_id,term_in_months,lender_count
count,671205.0,671205.0,671205.0,657698.0,671205.0,671205.0
mean,993248.6,785.995061,842.397107,178.199616,13.739022,20.590922
std,196611.3,1130.398941,1198.660073,94.247581,8.598919,28.459551
min,653047.0,0.0,25.0,9.0,1.0,0.0
25%,823072.0,250.0,275.0,126.0,8.0,7.0
50%,992780.0,450.0,500.0,145.0,13.0,13.0
75%,1163653.0,900.0,1000.0,204.0,14.0,24.0
max,1340339.0,100000.0,100000.0,536.0,158.0,2986.0


##### 1.2 IMPORTING THE KIVA MPI REGION AND LOCATION.

In [19]:
df2=pd.read_csv('KIVA/kiva_mpi_region_locations.csv')
df2.head()

Unnamed: 0,LocationName,ISO,country,region,world_region,MPI,geo,lat,lon
0,"Badakhshan, Afghanistan",AFG,Afghanistan,Badakhshan,South Asia,0.387,"(36.7347725, 70.81199529999999)",36.734772,70.811995
1,"Badghis, Afghanistan",AFG,Afghanistan,Badghis,South Asia,0.466,"(35.1671339, 63.7695384)",35.167134,63.769538
2,"Baghlan, Afghanistan",AFG,Afghanistan,Baghlan,South Asia,0.3,"(35.8042947, 69.2877535)",35.804295,69.287754
3,"Balkh, Afghanistan",AFG,Afghanistan,Balkh,South Asia,0.301,"(36.7550603, 66.8975372)",36.75506,66.897537
4,"Bamyan, Afghanistan",AFG,Afghanistan,Bamyan,South Asia,0.325,"(34.8100067, 67.8212104)",34.810007,67.82121


In [20]:
df2.columns

Index(['LocationName', 'ISO', 'country', 'region', 'world_region', 'MPI',
       'geo', 'lat', 'lon'],
      dtype='object')

In [21]:
df2.isnull().sum()

LocationName    1788
ISO             1764
country         1764
region          1788
world_region    1764
MPI             1788
geo                0
lat             1880
lon             1880
dtype: int64

##### 1.3  LOAN THEME AND THE IDS OF THE LOANEES.

In [22]:
df3=pd.read_csv('KIVA/loan_theme_ids.csv')
df3.head()

Unnamed: 0,id,Loan Theme ID,Loan Theme Type,Partner ID
0,638631,a1050000000skGl,General,151.0
1,640322,a1050000000skGl,General,151.0
2,641006,a1050000002X1ij,Higher Education,160.0
3,641019,a1050000002X1ij,Higher Education,160.0
4,641594,a1050000002VbsW,Subsistence Agriculture,336.0


In [23]:
df3.isnull().sum()

id                     0
Loan Theme ID      14813
Loan Theme Type    14813
Partner ID         14813
dtype: int64

In [28]:
# no of loan theme type.
df3['Loan Theme Type'].unique()

array(['General', 'Higher Education', 'Subsistence Agriculture',
       'Extreme Poverty', 'Underserved', 'Mobile Transactions', 'Green',
       'End Consumer Finance', 'Agriculture', 'Vulnerable Women',
       'Full Tuition', 'Business in a Box', 'Organic Conversion',
       'Startup', 'Youth', 'Rural Inclusion', 'WLIFT', 'Retailer Finance',
       'FUSAI', 'Water', 'Unbanked', 'Conflict Zone',
       "Hai Duong Women's Union", 'At-Risk Youth', 'Housing Improvement',
       'Haiti', 'Youth Entrepreneurship', 'Primary/Secondary Education',
       'Vulnerable Populations', 'Murabaha', 'Small Enterprise',
       'Agricultural Equipment', 'Artisan', 'Murabaha Youth',
       'Kiva City LA', 'Disaster Recovery', 'First/Second Chance',
       'Women Without Poverty', 'Agricultural Infrastructure',
       'CAMEO Partnership', 'Safe Water System for Institution',
       'Rural Conflict Zone', 'SME', 'Biodigester',
       'Clients below the poverty line', 'Distributor Finance - India',
       '


##### 1.4 IMPORTING THE LOAN THEME BY REGION DATA SET AND DEAL WITH ANY MISSING:


In [77]:
df4=pd.read_csv('KIVA/loan_themes_by_region.csv')
df4.head()

Unnamed: 0,Partner ID,Field Partner Name,sector,Loan Theme ID,Loan Theme Type,country,forkiva,region,geocode_old,ISO,...,amount,LocationName,geocode,names,geo,lat,lon,mpi_region,mpi_geo,rural_pct
0,9,KREDIT Microfinance Institution,General Financial Inclusion,a1050000000slfi,Higher Education,Cambodia,No,Banteay Meanchey,"(13.75, 103.0)",KHM,...,450,"Banteay Meanchey, Cambodia","[(13.6672596, 102.8975098)]",Banteay Meanchey Province; Cambodia,"(13.6672596, 102.8975098)",13.66726,102.89751,"Banteay Mean Chey, Cambodia","(13.6672596, 102.8975098)",90.0
1,9,KREDIT Microfinance Institution,General Financial Inclusion,a10500000068jPe,Vulnerable Populations,Cambodia,No,Battambang Province,,KHM,...,20275,"Battambang Province, Cambodia","[(13.0286971, 102.989615)]",Battambang Province; Cambodia,"(13.0286971, 102.989615)",13.028697,102.989615,"Banteay Mean Chey, Cambodia","(13.6672596, 102.8975098)",90.0
2,9,KREDIT Microfinance Institution,General Financial Inclusion,a1050000000slfi,Higher Education,Cambodia,No,Battambang Province,,KHM,...,9150,"Battambang Province, Cambodia","[(13.0286971, 102.989615)]",Battambang Province; Cambodia,"(13.0286971, 102.989615)",13.028697,102.989615,"Banteay Mean Chey, Cambodia","(13.6672596, 102.8975098)",90.0
3,9,KREDIT Microfinance Institution,General Financial Inclusion,a10500000068jPe,Vulnerable Populations,Cambodia,No,Kampong Cham Province,"(12.0, 105.5)",KHM,...,604950,"Kampong Cham Province, Cambodia","[(12.0982918, 105.3131185)]",Kampong Cham Province; Cambodia,"(12.0982918, 105.3131185)",12.098292,105.313119,"Kampong Cham, Cambodia","(11.9924294, 105.4645408)",90.0
4,9,KREDIT Microfinance Institution,General Financial Inclusion,a1050000002X1Uu,Sanitation,Cambodia,No,Kampong Cham Province,"(12.0, 105.5)",KHM,...,275,"Kampong Cham Province, Cambodia","[(12.0982918, 105.3131185)]",Kampong Cham Province; Cambodia,"(12.0982918, 105.3131185)",12.098292,105.313119,"Kampong Cham, Cambodia","(11.9924294, 105.4645408)",90.0


In [67]:
df4.columns

Index(['Partner ID', 'Field Partner Name', 'sector', 'Loan Theme ID',
       'Loan Theme Type', 'country', 'forkiva', 'region', 'geocode_old', 'ISO',
       'number', 'amount', 'LocationName', 'geocode', 'names', 'geo', 'lat',
       'lon', 'mpi_region', 'mpi_geo', 'rural_pct'],
      dtype='object')

In [68]:
df4.isnull().sum()

Partner ID                0
Field Partner Name        0
sector                    0
Loan Theme ID             0
Loan Theme Type           0
country                   0
forkiva                   0
region                    0
geocode_old           14536
ISO                      14
number                    0
amount                    0
LocationName              0
geocode                2074
names                  2075
geo                       0
lat                    2074
lon                    2074
mpi_region               14
mpi_geo                6065
rural_pct              1392
dtype: int64

In [69]:
df4[['ISO','country']].isnull().sum()

ISO        14
country     0
dtype: int64

In [70]:
df4[['ISO','country']]

Unnamed: 0,ISO,country
0,KHM,Cambodia
1,KHM,Cambodia
2,KHM,Cambodia
3,KHM,Cambodia
4,KHM,Cambodia
5,KHM,Cambodia
6,KHM,Cambodia
7,KHM,Cambodia
8,KHM,Cambodia
9,KHM,Cambodia


In [71]:
df_ISO=df4[['country','ISO']]

null_ISO=df_ISO[df_ISO.isnull().any(axis=1)]
null_ISO

Unnamed: 0,country,ISO
12101,Kosovo,
12102,Kosovo,
12103,Kosovo,
12104,Kosovo,
12105,Kosovo,
12106,Kosovo,
12107,Kosovo,
12108,Kosovo,
12109,Kosovo,
12110,Kosovo,


In [85]:
# So the ISO for Kosovo is RKS and for Cote D ivore is CIV. So we replace any missing value in KOsovo and Cote D ivore with their respective.
df4.loc[df4['country'] == 'Kosovo', 'ISO'] = 'RKS'


In [86]:
df4.loc[df4["country"] == "Cote D'Ivoire", 'ISO'] = 'CIV'

In [88]:
df4.isnull().sum()

Partner ID                0
Field Partner Name        0
sector                    0
Loan Theme ID             0
Loan Theme Type           0
country                   0
forkiva                   0
region                    0
geocode_old           14536
ISO                       0
number                    0
amount                    0
LocationName              0
geocode                2074
names                  2075
geo                       0
lat                    2074
lon                    2074
mpi_region               14
mpi_geo                6065
rural_pct              1392
dtype: int64

In [95]:
df_region=df4[['country','geo','mpi_region']]
null_region=df_region[df_region.isnull().any(axis=1)]
null_region.head()

Unnamed: 0,country,geo,mpi_region
12101,Kosovo,"(42.3701844, 21.1483281)",
12102,Kosovo,"(42.3701844, 21.1483281)",
12103,Kosovo,"(42.3701844, 21.1483281)",
12104,Kosovo,"(42.6374365, 21.0931113)",
12105,Kosovo,"(42.6014008, 21.1918761)",


#### 2. MERGING THE DATA MERGES 

In [97]:
#The first data that we need to merge that have the same unique indentifier is the loan and the loan theme df1 and df2.
df13=pd.merge(df1,df3, on = 'id')
df13.head()

Unnamed: 0,id,funded_amount,loan_amount,activity,sector,use,country_code,country,region,currency,...,funded_time,term_in_months,lender_count,tags,borrower_genders,repayment_interval,date,Loan Theme ID,Loan Theme Type,Partner ID
0,653053,575.0,575.0,Rickshaw,Transportation,to repair and maintain the auto rickshaw used ...,PK,Pakistan,Lahore,PKR,...,2014-01-02 09:17:23+00:00,11.0,14,,"female, female",irregular,2014-01-01,a1050000000sjEC,Underserved,247.0
1,653068,150.0,150.0,Transportation,Transportation,To repair their old cycle-van and buy another ...,IN,India,Maynaguri,INR,...,2014-01-01 16:01:36+00:00,43.0,6,"user_favorite, user_favorite",female,bullet,2014-01-01,a1050000002VkWz,Underserved,334.0
2,653063,200.0,200.0,Embroidery,Arts,to purchase an embroidery machine and a variet...,PK,Pakistan,Lahore,PKR,...,2014-01-01 13:00:00+00:00,11.0,8,,female,irregular,2014-01-01,a1050000000sjEC,Underserved,247.0
3,653084,400.0,400.0,Milk Sales,Food,to purchase one buffalo.,PK,Pakistan,Abdul Hakeem,PKR,...,2014-01-01 19:18:51+00:00,14.0,16,,female,monthly,2014-01-01,a1050000000wf22,General,245.0
4,1080148,250.0,250.0,Services,Services,purchase leather for my business using ksh 20000.,KE,Kenya,,KES,...,2014-01-29 14:14:57+00:00,4.0,6,,female,irregular,2014-01-01,,,


In [127]:
df13.columns

Index(['id', 'funded_amount', 'loan_amount', 'activity', 'sector', 'use',
       'country_code', 'country', 'region', 'currency', 'partner_id',
       'posted_time', 'disbursed_time', 'funded_time', 'term_in_months',
       'lender_count', 'tags', 'borrower_genders', 'repayment_interval',
       'date', 'Loan Theme ID', 'Loan Theme Type', 'Partner ID'],
      dtype='object')

In [130]:
df13['sector'].unique()

array(['Transportation', 'Arts', 'Food', 'Services', 'Agriculture',
       'Manufacturing', 'Wholesale', 'Retail', 'Clothing', 'Construction',
       'Health', 'Education', 'Personal Use', 'Housing', 'Entertainment'],
      dtype=object)

In [129]:
df13.columns

Index(['id', 'funded_amount', 'loan_amount', 'activity', 'sector', 'use',
       'country_code', 'country', 'region', 'currency', 'partner_id',
       'posted_time', 'disbursed_time', 'funded_time', 'term_in_months',
       'lender_count', 'tags', 'borrower_genders', 'repayment_interval',
       'date', 'Loan Theme ID', 'Loan Theme Type', 'Partner ID'],
      dtype='object')

In [100]:
df13.isnull().sum()

id                         0
funded_amount              0
loan_amount                0
activity                   0
sector                     0
use                     4231
country_code               0
country                    0
region                 56799
currency                   0
partner_id             13507
posted_time                0
disbursed_time          2396
funded_time            48330
term_in_months             0
lender_count               0
tags                  171411
borrower_genders        4220
repayment_interval         0
date                       0
Loan Theme ID          13507
Loan Theme Type        13507
Partner ID             13507
dtype: int64

In [106]:
df13[['id','partner_id','Partner ID']].head()

Unnamed: 0,id,partner_id,Partner ID
0,653053,247.0,247.0
1,653068,334.0,334.0
2,653063,247.0,247.0
3,653084,245.0,245.0
4,1080148,,


In [110]:
df24=pd.merge(df2,df4)
df24.head()

Unnamed: 0,LocationName,ISO,country,region,world_region,MPI,geo,lat,lon,Partner ID,...,Loan Theme Type,forkiva,geocode_old,number,amount,geocode,names,mpi_region,mpi_geo,rural_pct
0,"Rio de Janeiro, Brazil",BRA,Brazil,Rio de Janeiro,Latin America and Caribbean,0.011,"(-22.9068467, -43.1728965)",-22.906847,-43.172897,225,...,Artisan,No,,6,6900,"[(-22.9068467, -43.1728965)]",Rio de Janeiro; Rio de Janeiro; State of Rio d...,"Rio de Janeiro, Brazil","(-22.9068467, -43.1728965)",0.0
1,"Thimphu, Bhutan",BTN,Bhutan,Thimphu,South Asia,0.016,"(27.4727924, 89.6392863)",27.472792,89.639286,534,...,Artisan,No,,2,20000,"[(27.4727924, 89.6392863)]",Thimphu; Thimphu; Bhutan,"Thimphu, Bhutan","(27.4727924, 89.6392863)",
2,"Douala, Cameroon",CMR,Cameroon,Douala,Sub-Saharan Africa,0.024,"(4.0510564, 9.7678687)",4.051056,9.767869,217,...,Extreme Poverty,No,,127,54575,"[(4.0510564, 9.7678687)]",Douala; Wouri; Littoral; Cameroon,"Douala, Cameroon","(4.0510564, 9.7678687)",32.0
3,"Douala, Cameroon",CMR,Cameroon,Douala,Sub-Saharan Africa,0.024,"(4.0510564, 9.7678687)",4.051056,9.767869,217,...,Agriculture,No,,3,1500,"[(4.0510564, 9.7678687)]",Douala; Wouri; Littoral; Cameroon,"Douala, Cameroon","(4.0510564, 9.7678687)",32.0
4,"Chimaltenango, Guatemala",GTM,Guatemala,Chimaltenango,Latin America and Caribbean,0.082,"(14.6631591, -90.8246386)",14.663159,-90.824639,55,...,General,No,"(14.6666667, -90.9166667)",214,86300,"[(14.6631591, -90.8246386)]",Chimaltenango; Chimaltenango Department; Guate...,"Chimaltenango, Guatemala","(14.6631591, -90.8246386)",60.0


In [113]:
df1234=pd.merge(df13,df24)
df1234.head()

Unnamed: 0,id,funded_amount,loan_amount,activity,sector,use,country_code,country,region,currency,...,Field Partner Name,forkiva,geocode_old,number,amount,geocode,names,mpi_region,mpi_geo,rural_pct
0,678313,400.0,400.0,Agriculture,Agriculture,To buy supplies and fertilizers.,NI,Nicaragua,Matagalpa,NIO,...,PAC,No,,42,34475,"[(12.9290069, -85.9151211)]",Matagalpa; Matagalpa Department; Nicaragua,"Matagalpa, Nicaragua","(12.9290069, -85.9151211)",65.0
1,678300,1200.0,1200.0,Agriculture,Agriculture,to buy supplies for her farming business.,NI,Nicaragua,Matagalpa,NIO,...,PAC,No,,42,34475,"[(12.9290069, -85.9151211)]",Matagalpa; Matagalpa Department; Nicaragua,"Matagalpa, Nicaragua","(12.9290069, -85.9151211)",65.0
2,678307,1200.0,1200.0,Farm Supplies,Agriculture,to buy supplies and fertilisers for his coffee...,NI,Nicaragua,Matagalpa,NIO,...,PAC,No,,42,34475,"[(12.9290069, -85.9151211)]",Matagalpa; Matagalpa Department; Nicaragua,"Matagalpa, Nicaragua","(12.9290069, -85.9151211)",65.0
3,678304,3275.0,3600.0,Agriculture,Agriculture,to purchase inputs for his coffee crops.,NI,Nicaragua,Matagalpa,NIO,...,PAC,No,,42,34475,"[(12.9290069, -85.9151211)]",Matagalpa; Matagalpa Department; Nicaragua,"Matagalpa, Nicaragua","(12.9290069, -85.9151211)",65.0
4,678315,1675.0,2000.0,Agriculture,Agriculture,to buy supplies and fertilizers,NI,Nicaragua,Matagalpa,NIO,...,PAC,No,,42,34475,"[(12.9290069, -85.9151211)]",Matagalpa; Matagalpa Department; Nicaragua,"Matagalpa, Nicaragua","(12.9290069, -85.9151211)",65.0


In [114]:
df1234.isnull().sum()

id                       0
funded_amount            0
loan_amount              0
activity                 0
sector                   0
use                      0
country_code             0
country                  0
region                   0
currency                 0
partner_id               0
posted_time              0
disbursed_time           0
funded_time            808
term_in_months           0
lender_count             0
tags                  1802
borrower_genders         0
repayment_interval       0
date                     0
Loan Theme ID            0
Loan Theme Type          0
Partner ID               0
LocationName             0
ISO                      0
world_region             0
MPI                      0
geo                      0
lat                      0
lon                      0
Field Partner Name       0
forkiva                  0
geocode_old           6476
number                   0
amount                   0
geocode                  0
names                    0
m

In [121]:
# We use the geocode column ignore the geocode_old as the missing value here are many.
df1234.columns

40

#### Column descriptions:
`id:` -  Unique ID for loan (Loan ID)

`Loan Theme ID:` -  Unique ID for loan theme

`Loan Theme Type:`  - General description of the loan theme category 

`Partner ID:`  - Unique ID for field partners (Partner ID) 

`MPI: `- Global Multidimensional Poverty Index

Let us segment data so that we can be able to know the MPI for the county and try to develop a model that is able toe

In [None]:
dfkenya['country']

In [125]:
df1234['sector'].nunique()

2