# Oyo State location data analysis

## Introduction
* With two-thirds of the world’s population now connected by mobile devices, location data has emerged as one of the most powerful and important data sources. Location data is used to solve hard problems—from providing firefighters and emergency medical technicians key information during times of human crisis, helping us avoid bad traffic, helping us make a better choice in situating important parastatals such as companies, restaurants and business enterprises and to making self-driving cars smarter.
* At the same time, data overall has hit a rough patch in our public discourse. California granting the state’s citizens unprecedented control of their data, while Facebook’s Cambridge Analytica troubles made headline news for several days in Spring, 2018, shining an intense spotlight on digital platforms’ data-sharing practices. It’s imperative that businesses be transparent and responsible for the information they collect. We cannot violate customers’ trust and lose out on the opportunities for location data to make lives better. Businesses need to address these issues and create standards for how marketers access, use and share data. 
* We are in an era of big data and with more devices, the amount of data generated will keep increasing, data will be more accessible than ever before and location data will become more accurate.

## Statement of the problem
* Decision to bring up an organization is a tough decision. After an entrepreneur decided what type of business he want to set up, there comes the problem of where and how it should be set up. Location of a business affect the profit of the business therefore getting a good location cannot be taken with levity.
* Location data have proved resourceful in many areas and have affect our life. In this data research, location data will be utilized in exploring Oyo State Local Government Areas. Oyo State is an inland state in south-western Nigeria, with its capital at Ibadan which is the third most populous city in Nigeria and the country’s largest city by geographical area. These qualities of hers make it popular among other state in the country and make it a good spot for situating commercials, industries, recreational centres and refreshment centres.

## Significance of the Study
* This study will be of great benefit to entrepreneurs, government and other parastatals who might want to start up something significant. Exploring the city will give insights into where a business could be situated given the large state it is. It could also help bridge the gaps between customers and producers, government and citizens as decision making will be made easier with the study.


Importing necessary libraries

In [69]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', 50)
pd.set_option('display.max_rows', 40)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

print('Libraries imported.')

Libraries imported.


### Using pandas html reader, we want to scrape the website to obtain the table as pandas dataframe
* The data used for this project was sourced from the Oyo state website, oyostate.gov.ng. The data features were local government, the headquarters where the local government is situated, date of creation of the local government,  total number of wards in the local government, area population from 2006 census and landmass of the local government. 

In [23]:
#!pip install lxml
url = 'https://oyostate.gov.ng/ministry-of-local-government-and-chieftaincy-matters/detailed-information-of-the-33-local-governments-in-brief/'
oyo_page = pd.read_html(url)

In [24]:
oyo_df = oyo_page[0]

In [25]:
oyo_df.head()

Unnamed: 0,0,1,2,3,4,5,6
0,S/N,LOCAL GOVERNMENT,HEADQUARTERS,DATE OFCREATION,TOTAL NOOF WARDS,AREA POPULATION(2006 CENSUS),LANDMASS(KM2)
1,1,AFIJIO,Jobele,1989,10,134173,800
2,2,AKINYELE,Moniya,1976,12,211359,575
3,3,ATIBA,Ofa-Meta,1996,10,169702,219.753
4,4,ATISBO,Tede,1996,10,110792,315.23


The columns are labelled using the default label for loading in pandas dataframe, the Oyo state page didn't have the columns as columns. By manually reseting the columns to the necessary columns and resetting the index to S/N

In [27]:
oyo_df.columns = oyo_df.iloc[0,:]
oyo_df.drop([0],inplace = True)
oyo_df.set_index('S/N', inplace = True)
oyo_df.head()

Unnamed: 0_level_0,LOCAL GOVERNMENT,HEADQUARTERS,DATE OFCREATION,TOTAL NOOF WARDS,AREA POPULATION(2006 CENSUS),LANDMASS(KM2)
S/N,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1,AFIJIO,Jobele,1989,10,134173,800.0
2,AKINYELE,Moniya,1976,12,211359,575.0
3,ATIBA,Ofa-Meta,1996,10,169702,219.753
4,ATISBO,Tede,1996,10,110792,315.23
5,EGBEDA,Egbeda,1989,11,319388,410.0


Let save the dataframe as a csv file for easy access later 

In [28]:
oyo_df.to_csv('Oyo_LGA.csv', index = False)

We will drop total no of wards as the feature won't be useful in our exploratory data analysis and also our geospatial analysis of Oyo State

In [2]:
# Let first load our dataframe
oyo_lga = pd.read_csv('Oyo_LGA.csv')
oyo_lga

Unnamed: 0,LOCAL GOVERNMENT,HEADQUARTERS,DATE OFCREATION,TOTAL NOOF WARDS,AREA POPULATION(2006 CENSUS),LANDMASS(KM2)
0,AFIJIO,Jobele,1989,10,134173,800.0
1,AKINYELE,Moniya,1976,12,211359,575.0
2,ATIBA,Ofa-Meta,1996,10,169702,219.753
3,ATISBO,Tede,1996,10,110792,315.23
4,EGBEDA,Egbeda,1989,11,319388,410.0
5,IBADAN NORTH,Agodi-Gate,1991,12,856988,420.0
6,IBADAN NORTH EAST,Iwo-Road,1991,12,330399,125.0
7,IBADAN NORTH WEST,Onireke,1991,11,152834,238.0
8,IBADAN SOUTH EAST,Mapo,1991,12,266457,805.37
9,IBADAN SOUTH WEST,Ring-Road,1991,12,283098,244.55


In [3]:
oyo_lga.drop([  'TOTAL NOOF WARDS'], axis = 1,inplace = True)

In [4]:
oyo_lga.head()

Unnamed: 0,LOCAL GOVERNMENT,HEADQUARTERS,DATE OFCREATION,AREA POPULATION(2006 CENSUS),LANDMASS(KM2)
0,AFIJIO,Jobele,1989,134173,800.0
1,AKINYELE,Moniya,1976,211359,575.0
2,ATIBA,Ofa-Meta,1996,169702,219.753
3,ATISBO,Tede,1996,110792,315.23
4,EGBEDA,Egbeda,1989,319388,410.0


**Let check the local government with the most population and with the least population**

In [5]:
oyo_lga[oyo_lga['AREA POPULATION(2006 CENSUS)'] == pd.Series.max(oyo_lga['AREA POPULATION(2006 CENSUS)'])]
#pd.Series.max(oyo_lga['AREA POPULATION(2006 CENSUS)'])

Unnamed: 0,LOCAL GOVERNMENT,HEADQUARTERS,DATE OFCREATION,AREA POPULATION(2006 CENSUS),LANDMASS(KM2)
5,IBADAN NORTH,Agodi-Gate,1991,856988,420.0


From the above result, Ibadan North LGA with headquarter located at Agodi-Gate has the highest population. We can easily assume there will be a lot of activities going on in the LGA. We will see that in our coming analysis

In [6]:
oyo_lga[oyo_lga['AREA POPULATION(2006 CENSUS)'] == pd.Series.min(oyo_lga['AREA POPULATION(2006 CENSUS)'])]


Unnamed: 0,LOCAL GOVERNMENT,HEADQUARTERS,DATE OFCREATION,AREA POPULATION(2006 CENSUS),LANDMASS(KM2)
22,OGBOMOSO SOUTH,Ajaawa,1989,73939,4159.51


Ogbomosho South has the least population with the 2006 census yielding 73939 citizens.

**Let check the local government with the most landmass and with the landmass**

In [7]:
print(oyo_lga[oyo_lga['LANDMASS(KM2)'] == pd.Series.min(oyo_lga['LANDMASS(KM2)'])])

print(oyo_lga[oyo_lga['LANDMASS(KM2)'] == pd.Series.max(oyo_lga['LANDMASS(KM2)'])])


   LOCAL GOVERNMENT HEADQUARTERS DATE OFCREATION  \
21   OGBOMOSO NORTH    Arowomole            1991   

    AREA POPULATION(2006 CENSUS)  LANDMASS(KM2)  
21                        113853           15.0  
   LOCAL GOVERNMENT HEADQUARTERS DATE OFCREATION  \
29         OYO WEST    Ojongbodu       Dec. 1996   

    AREA POPULATION(2006 CENSUS)  LANDMASS(KM2)  
29                        154532        5193.77  


Ogbomosho North has the smallest landmass and Oyo West has the highest

**These two simple EDAs will be useful when we later explore locations around the local government**

To get the latitude and longitude of each local government, I will utilize the Nominatim module in geopy package to get the latitude and longitude of each local government. But  instead of using the local government name, we will rather use the headquarters which I assume will be less prone to bug.
* We will first create a function to do this and then use the apply function to apply the function to every rows in headquarters

In [21]:
def latitude_longitude(headquarter):
    try:
        address = str(headquarter) # ensure it is in string format
    
        geolocator = Nominatim(user_agent="NG_explorer")
        location = geolocator.geocode(address)
        latitude = location.latitude
        longitude = location.longitude
        return [latitude, longitude]
    except AttributeError:
        return None
      
    


In [9]:
oyo_lga['Latitude_longitude'] = oyo_lga['LOCAL GOVERNMENT'].apply(latitude_longitude)
#geolocator = Nominatim(user_agent="NG_explorer")
#location = geolocator.geocode('Ogbomoso South')
#latitude = location.latitude
#longitude = location.longitude
#latitude

In [10]:
oyo_lga

Unnamed: 0,LOCAL GOVERNMENT,HEADQUARTERS,DATE OFCREATION,AREA POPULATION(2006 CENSUS),LANDMASS(KM2),Latitude_longitude
0,AFIJIO,Jobele,1989,134173,800.0,"(7.747811049999999, 3.8965833157288534)"
1,AKINYELE,Moniya,1976,211359,575.0,"(7.5978324, 3.9162521)"
2,ATIBA,Ofa-Meta,1996,169702,219.753,"(8.235692499999999, 3.8685804390458687)"
3,ATISBO,Tede,1996,110792,315.23,"(8.403148250000001, 3.251587245035839)"
4,EGBEDA,Egbeda,1989,319388,410.0,"(5.234722, 6.7525)"
5,IBADAN NORTH,Agodi-Gate,1991,856988,420.0,"(7.4028109, 3.8717884151139517)"
6,IBADAN NORTH EAST,Iwo-Road,1991,330399,125.0,"(7.3939066, 3.9273483432755025)"
7,IBADAN NORTH WEST,Onireke,1991,152834,238.0,"(7.4028109, 3.8717884151139517)"
8,IBADAN SOUTH EAST,Mapo,1991,266457,805.37,"(7.355048, 3.903528862708936)"
9,IBADAN SOUTH WEST,Ring-Road,1991,283098,244.55,"(7.3733705, 3.8590379450057544)"


From the look of the table, we noticed that Oorelope Local Government didn't get a latitude and longitude, with trial and error analysiis, it was known that the Nominator module will work if we use 'Orelope' instead of 'Oorelope'. Let use that

In [11]:
oyo_lga.loc[25, 'LOCAL GOVERNMENT'] = 'ORELOPE'

In [23]:
oyo_lga['Latitude_longitude'] = oyo_lga['LOCAL GOVERNMENT'].apply(latitude_longitude)

In [24]:
oyo_lga.loc[25, 'Latitude_longitude']
oyo_lga.columns

Index(['LOCAL GOVERNMENT', 'HEADQUARTERS', 'DATE OFCREATION',
       'AREA POPULATION(2006 CENSUS)', 'LANDMASS(KM2)', 'Latitude_longitude',
       'longitude'],
      dtype='object')

Let separate the latitude_longitude to different columns and remove it

In [32]:
oyo_lga['latitude'] = pd.Series([x[0] for x in oyo_lga.iloc[:,5]])
oyo_lga['longitude'] = pd.Series([x[1] for x in oyo_lga.iloc[:,5]])
oyo_lga.drop('Latitude_longitude',axis = 1, inplace = True)
oyo_lga.head()

Unnamed: 0,LOCAL GOVERNMENT,HEADQUARTERS,DATE OFCREATION,AREA POPULATION(2006 CENSUS),LANDMASS(KM2),longitude,latitude
0,AFIJIO,Jobele,1989,134173,800.0,3.896583,7.747811
1,AKINYELE,Moniya,1976,211359,575.0,3.916252,7.597832
2,ATIBA,Ofa-Meta,1996,169702,219.753,3.86858,8.235692
3,ATISBO,Tede,1996,110792,315.23,3.251587,8.403148
4,EGBEDA,Egbeda,1989,319388,410.0,6.7525,5.234722


Let get the geographical coordinate of Oyo state and plot it using Folium superimposing the LGA

In [35]:
address = 'Oyo, NG'

geolocator = Nominatim(user_agent="ng_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Manhattan are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Manhattan are 8.2151249, 3.5642897.


In [40]:
map_oyo = folium.Map(location=[latitude, longitude], zoom_start=8)

# add markers to map
for lat, lng, label in zip(oyo_lga['latitude'], oyo_lga['longitude'], oyo_lga['HEADQUARTERS']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_oyo)  
    
map_oyo

We are going to start utilizing the Foursquare API

In [41]:
# Let use the API to get the venues that are near each  LGA. First let define our API credentials
CLIENT_ID = 'Q01WSW3AWAQBTALTI0YGKK3DLWIG0FTGM5NMYGYQ5T2BZX4T' # your Foursquare ID
CLIENT_SECRET = '1HN3XSSWFB351EBR2JOZGSOX33IIJFODTLL5ZRMMB3SGFM2J' # your Foursquare Secret
VERSION = '20201213' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: Q01WSW3AWAQBTALTI0YGKK3DLWIG0FTGM5NMYGYQ5T2BZX4T
CLIENT_SECRET:1HN3XSSWFB351EBR2JOZGSOX33IIJFODTLL5ZRMMB3SGFM2J


Let explore the first LGA in our dataset

In [43]:
LGA_latitude = oyo_lga.loc[0, 'latitude'] 
LGA_longitude = oyo_lga.loc[0, 'longitude'] 

LGA_name = oyo_lga.loc[0, 'LOCAL GOVERNMENT'] 

print('Latitude and longitude values of {} are {}, {}.'.format(LGA_name, 
                                                               LGA_latitude, 
                                                               LGA_longitude))

Latitude and longitude values of AFIJIO are 7.747811049999999, 3.8965833157288534.


Let us get the top 50 venues that are in Afijio LGA within a radius of 30km

In [51]:
LIMIT = 50
radius = 30000
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    LGA_latitude, 
    LGA_longitude, 
    radius, 
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=Q01WSW3AWAQBTALTI0YGKK3DLWIG0FTGM5NMYGYQ5T2BZX4T&client_secret=1HN3XSSWFB351EBR2JOZGSOX33IIJFODTLL5ZRMMB3SGFM2J&v=20201213&ll=7.747811049999999,3.8965833157288534&radius=30000&limit=50'

In [52]:
# Send the get request
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5fd61a9f6713ee2216c5f1b8'},
 'response': {'headerLocation': 'Current map view',
  'headerFullLocation': 'Current map view',
  'headerLocationGranularity': 'unknown',
  'totalResults': 7,
  'suggestedBounds': {'ne': {'lat': 8.01781132000027,
    'lng': 4.168562347705378},
   'sw': {'lat': 7.4778107799997295, 'lng': 3.6246042837523293}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '56995d0e38facdc3fbd80c7b',
       'name': 'Ola Royal Hotel',
       'location': {'lat': 7.822221,
        'lng': 3.919077,
        'labeledLatLngs': [{'label': 'display',
          'lat': 7.822221,
          'lng': 3.919077}],
        'distance': 8646,
        'cc': 'NG',
        'country': 'Nigeria',
        'formattedAddress': ['Nigeria']},
       'categ

We know all the informations are in the items key. Let define the function; get_category and clean the json file to be structured into a pandas data frame

In [53]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
    

In [55]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues

  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,name,categories,lat,lng
0,Ola Royal Hotel,Hotel Pool,7.822221,3.919077
1,Oparinde market,Market,7.8291,3.905739
2,Ice cream joint.Ajayi crowther university.,Dessert Shop,7.826751,3.922422
3,"Durbar Stadium, Oyo.",Soccer Field,7.82983,3.924249
4,Yellow corner.Ajayi Crowther University.,Breakfast Spot,7.833991,3.922273
5,Munchies,Diner,7.851275,3.945329
6,Port Harcourt,Art Gallery,7.790288,4.154203


We get 6 locations within 30km radius of Afijio LGA, we could see a stadium, school restaurants are located. But there are actually few restaurant like in the area. This could be a potential spot for situating restaurants considering the population of the LGA, 134,173 and the few places. Let explore further more..


**Let explore location within 30km of all the LGA in Oyo State using the FourSquare API**

In [63]:
def getNearbyVenues(names, latitudes, longitudes, radius=30000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['LGA', 
                  'LGA Latitude', 
                  'LGA Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [64]:
oyo_venues = getNearbyVenues(names = oyo_lga['LOCAL GOVERNMENT'], latitudes = oyo_lga['latitude'], longitudes = oyo_lga['longitude'])

AFIJIO
AKINYELE
ATIBA
ATISBO
EGBEDA
IBADAN NORTH
IBADAN NORTH EAST
IBADAN NORTH WEST
IBADAN SOUTH EAST
IBADAN SOUTH WEST
IBARAPA CENTRAL
IBARAPA EAST
IBARAPA NORTH
IDO
IREPO
ISEYIN
ITESIWAJU
IWAJOWA
KAJOLA
LAGELU
OGO OLUWA
OGBOMOSO NORTH
OGBOMOSO SOUTH
OLORUNSOGO
OLUYOLE
ORELOPE
ONA ARA
ORIRE
OYO EAST
OYO WEST
SAKI EAST
SAKI WEST
SURULERE


In [65]:
print(oyo_venues.shape)
oyo_venues.head()

(270, 7)


Unnamed: 0,LGA,LGA Latitude,LGA Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,AFIJIO,7.747811,3.896583,Ola Royal Hotel,7.822221,3.919077,Hotel Pool
1,AFIJIO,7.747811,3.896583,Oparinde market,7.8291,3.905739,Market
2,AFIJIO,7.747811,3.896583,Ice cream joint.Ajayi crowther university.,7.826751,3.922422,Dessert Shop
3,AFIJIO,7.747811,3.896583,"Durbar Stadium, Oyo.",7.82983,3.924249,Soccer Field
4,AFIJIO,7.747811,3.896583,Yellow corner.Ajayi Crowther University.,7.833991,3.922273,Breakfast Spot


Let check the number of venue returned for each LGA

In [66]:
oyo_venues['LGA'].value_counts()

OLUYOLE              22
IBADAN SOUTH EAST    20
AKINYELE             20
IBADAN NORTH EAST    20
LAGELU               20
ONA ARA              20
IBADAN NORTH         19
IBADAN NORTH WEST    19
IBADAN SOUTH WEST    19
OGBOMOSO NORTH        9
OGBOMOSO SOUTH        9
SURULERE              8
OLORUNSOGO            8
AFIJIO                7
OYO EAST              5
OGO OLUWA             4
IDO                   4
IBARAPA EAST          3
IBARAPA CENTRAL       3
OYO WEST              3
IWAJOWA               3
SAKI WEST             3
EGBEDA                3
SAKI EAST             3
ISEYIN                2
ATIBA                 2
IBARAPA NORTH         2
IREPO                 2
ATISBO                2
ORELOPE               2
ORIRE                 2
KAJOLA                1
ITESIWAJU             1
Name: LGA, dtype: int64

Let find out how many unique values can be get from the returned venue

In [67]:
print('There are {} uniques categories.'.format(len(oyo_venues['Venue Category'].unique())))

There are 49 uniques categories.


# Analyse each LGA

In [70]:
# one hot encoding
oyo_onehot = pd.get_dummies(oyo_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
oyo_onehot['LGA'] = oyo_venues['LGA'] 

# move neighborhood column to the first column
fixed_columns = [oyo_onehot.columns[-1]] + list(oyo_onehot.columns[:-1])
oyo_onehot = oyo_onehot[fixed_columns]

oyo_onehot.head()

Unnamed: 0,LGA,African Restaurant,Airport,Art Gallery,Bagel Shop,Bank,Bar,Bay,Big Box Store,Breakfast Spot,Buffet,Burger Joint,Bus Station,Campground,Caribbean Restaurant,Concert Hall,Construction & Landscaping,Convenience Store,Department Store,Dessert Shop,Diner,Fast Food Restaurant,Furniture / Home Store,Go Kart Track,Harbor / Marina,Hotel,Hotel Pool,Italian Restaurant,Lake,Leather Goods Store,Market,Mobile Phone Shop,Motel,Movie Theater,Moving Target,Optical Shop,Park,Pharmacy,Pizza Place,Playground,Pub,Restaurant,Shopping Mall,Soccer Field,Soup Place,Street Art,Trail,Turkish Restaurant,Vietnamese Restaurant,Water Park
0,AFIJIO,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,AFIJIO,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,AFIJIO,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,AFIJIO,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0
4,AFIJIO,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


#### Next, let's group rows by LGA and by taking the mean of the frequency of occurrence of each category

In [71]:
oyo_grouped = oyo_onehot.groupby('LGA').mean().reset_index()
oyo_grouped

Unnamed: 0,LGA,African Restaurant,Airport,Art Gallery,Bagel Shop,Bank,Bar,Bay,Big Box Store,Breakfast Spot,Buffet,Burger Joint,Bus Station,Campground,Caribbean Restaurant,Concert Hall,Construction & Landscaping,Convenience Store,Department Store,Dessert Shop,Diner,Fast Food Restaurant,Furniture / Home Store,Go Kart Track,Harbor / Marina,Hotel,Hotel Pool,Italian Restaurant,Lake,Leather Goods Store,Market,Mobile Phone Shop,Motel,Movie Theater,Moving Target,Optical Shop,Park,Pharmacy,Pizza Place,Playground,Pub,Restaurant,Shopping Mall,Soccer Field,Soup Place,Street Art,Trail,Turkish Restaurant,Vietnamese Restaurant,Water Park
0,AFIJIO,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.142857,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0
1,AKINYELE,0.05,0.05,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.1,0.05,0.0,0.0,0.15,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.1,0.15,0.0,0.0,0.0,0.0,0.0,0.0,0.05
2,ATIBA,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,ATISBO,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,EGBEDA,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.666667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,IBADAN NORTH,0.052632,0.052632,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.105263,0.052632,0.0,0.0,0.157895,0.0,0.0,0.0,0.105263,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.105263,0.157895,0.0,0.0,0.0,0.0,0.0,0.0,0.052632
6,IBADAN NORTH EAST,0.05,0.05,0.0,0.0,0.05,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.05,0.0,0.0,0.15,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.1,0.15,0.0,0.0,0.0,0.0,0.0,0.0,0.05
7,IBADAN NORTH WEST,0.052632,0.052632,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.105263,0.052632,0.0,0.0,0.157895,0.0,0.0,0.0,0.105263,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.105263,0.157895,0.0,0.0,0.0,0.0,0.0,0.0,0.052632
8,IBADAN SOUTH EAST,0.05,0.05,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.1,0.05,0.0,0.0,0.15,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.1,0.15,0.0,0.0,0.0,0.0,0.0,0.0,0.05
9,IBADAN SOUTH WEST,0.052632,0.052632,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.105263,0.052632,0.0,0.0,0.157895,0.0,0.0,0.0,0.105263,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.105263,0.157895,0.0,0.0,0.0,0.0,0.0,0.0,0.052632


In [72]:
oyo_grouped.shape

(33, 50)

Let get the top 3 venues for each LGA

In [74]:
num_top_venues = 3

for hood in oyo_grouped['LGA']:
    print("----"+hood+"----")
    temp = oyo_grouped[oyo_grouped['LGA'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----AFIJIO----
         venue  freq
0  Art Gallery  0.14
1   Hotel Pool  0.14
2       Market  0.14


----AKINYELE----
                  venue  freq
0         Shopping Mall  0.15
1  Fast Food Restaurant  0.15
2                 Hotel  0.10


----ATIBA----
                venue  freq
0       Shopping Mall   0.5
1       Moving Target   0.5
2  African Restaurant   0.0


----ATISBO----
          venue  freq
0       Airport   0.5
1  Optical Shop   0.5
2      Pharmacy   0.0


----EGBEDA----
                  venue  freq
0  Fast Food Restaurant  0.67
1            Playground  0.33
2    African Restaurant  0.00


----IBADAN NORTH----
                  venue  freq
0         Shopping Mall  0.16
1  Fast Food Restaurant  0.16
2                 Hotel  0.11


----IBADAN NORTH EAST----
                  venue  freq
0         Shopping Mall  0.15
1  Fast Food Restaurant  0.15
2                 Hotel  0.10


----IBADAN NORTH WEST----
                  venue  freq
0         Shopping Mall  0.16
1  Fast Food 

From the analysis above, it can be concluded that most of the significant locations that are in Oyo state are located in the Local Government in the capital, Ibadan. Still there are still less in Ibadan and venue are significantly low in other local government. People seeking to situate their organization in a place with less competition could consider Oyo state in general.

**Let  us put the analysis above in a pandas data frame**

In [75]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [99]:
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['LGA']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
LGA_venues_sorted = pd.DataFrame(columns=columns)
LGA_venues_sorted['LGA'] = oyo_grouped['LGA']

for ind in np.arange(oyo_grouped.shape[0]):
    LGA_venues_sorted.iloc[ind, 1:] = return_most_common_venues(oyo_grouped.iloc[ind, :], num_top_venues)

LGA_venues_sorted.head()

Unnamed: 0,LGA,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,AFIJIO,Art Gallery,Hotel Pool,Diner,Soccer Field,Dessert Shop
1,AKINYELE,Shopping Mall,Fast Food Restaurant,Hotel,Convenience Store,Restaurant
2,ATIBA,Shopping Mall,Moving Target,Water Park,Bus Station,Fast Food Restaurant
3,ATISBO,Airport,Optical Shop,Water Park,Campground,Furniture / Home Store
4,EGBEDA,Fast Food Restaurant,Playground,Water Park,Bus Station,Furniture / Home Store


### Clustering algorithm

In [100]:
# set number of clusters
kclusters = 3

oyo_grouped_clustering = oyo_grouped.drop('LGA', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=10).fit(oyo_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 0, 1, 2, 0, 0, 0, 0, 0, 0])

Let create a dataframe that include the cluster label  and the LGA

In [101]:
# add clustering labels
LGA_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

oyo_merged = oyo_lga

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
oyo_merged = oyo_merged.join(LGA_venues_sorted.set_index('LGA'), on='LOCAL GOVERNMENT')

oyo_merged.head() # check the last columns!

Unnamed: 0,LOCAL GOVERNMENT,HEADQUARTERS,DATE OFCREATION,AREA POPULATION(2006 CENSUS),LANDMASS(KM2),longitude,latitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,AFIJIO,Jobele,1989,134173,800.0,3.896583,7.747811,1,Art Gallery,Hotel Pool,Diner,Soccer Field,Dessert Shop
1,AKINYELE,Moniya,1976,211359,575.0,3.916252,7.597832,0,Shopping Mall,Fast Food Restaurant,Hotel,Convenience Store,Restaurant
2,ATIBA,Ofa-Meta,1996,169702,219.753,3.86858,8.235692,1,Shopping Mall,Moving Target,Water Park,Bus Station,Fast Food Restaurant
3,ATISBO,Tede,1996,110792,315.23,3.251587,8.403148,2,Airport,Optical Shop,Water Park,Campground,Furniture / Home Store
4,EGBEDA,Egbeda,1989,319388,410.0,6.7525,5.234722,0,Fast Food Restaurant,Playground,Water Park,Bus Station,Furniture / Home Store


Let us make a meaning out of our clusters

In [103]:
cluster_one = oyo_merged[oyo_merged['Cluster Labels'] == 0]
cluster_one

Unnamed: 0,LOCAL GOVERNMENT,HEADQUARTERS,DATE OFCREATION,AREA POPULATION(2006 CENSUS),LANDMASS(KM2),longitude,latitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
1,AKINYELE,Moniya,1976,211359,575.0,3.916252,7.597832,0,Shopping Mall,Fast Food Restaurant,Hotel,Convenience Store,Restaurant
4,EGBEDA,Egbeda,1989,319388,410.0,6.7525,5.234722,0,Fast Food Restaurant,Playground,Water Park,Bus Station,Furniture / Home Store
5,IBADAN NORTH,Agodi-Gate,1991,856988,420.0,3.871788,7.402811,0,Fast Food Restaurant,Shopping Mall,Hotel,Convenience Store,Restaurant
6,IBADAN NORTH EAST,Iwo-Road,1991,330399,125.0,3.927348,7.393907,0,Shopping Mall,Fast Food Restaurant,Hotel,Convenience Store,Restaurant
7,IBADAN NORTH WEST,Onireke,1991,152834,238.0,3.871788,7.402811,0,Fast Food Restaurant,Shopping Mall,Hotel,Convenience Store,Restaurant
8,IBADAN SOUTH EAST,Mapo,1991,266457,805.37,3.903529,7.355048,0,Shopping Mall,Fast Food Restaurant,Hotel,Convenience Store,Restaurant
9,IBADAN SOUTH WEST,Ring-Road,1991,283098,244.55,3.859038,7.373371,0,Fast Food Restaurant,Shopping Mall,Hotel,Convenience Store,Restaurant
10,IBARAPA CENTRAL,Igbo-Ora,1996,116809,480.424,3.247705,7.411516,0,Bank,Restaurant,Buffet,Water Park,Campground
11,IBARAPA EAST,Eruwa,1989,118288,705.78,3.459362,7.584067,0,Street Art,Bank,Concert Hall,Water Park,Campground
12,IBARAPA NORTH,Ayete,1999,101092,427.857,3.173101,7.644575,0,Bar,Concert Hall,Water Park,Campground,Furniture / Home Store


In [106]:
cluster_two = oyo_merged[oyo_merged['Cluster Labels'] == 1]
cluster_two

Unnamed: 0,LOCAL GOVERNMENT,HEADQUARTERS,DATE OFCREATION,AREA POPULATION(2006 CENSUS),LANDMASS(KM2),longitude,latitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,AFIJIO,Jobele,1989,134173,800.0,3.896583,7.747811,1,Art Gallery,Hotel Pool,Diner,Soccer Field,Dessert Shop
2,ATIBA,Ofa-Meta,1996,169702,219.753,3.86858,8.235692,1,Shopping Mall,Moving Target,Water Park,Bus Station,Fast Food Restaurant
15,ISEYIN,Iseyin,1976,260000,988.54,3.595614,7.971011,1,Go Kart Track,Moving Target,Campground,Furniture / Home Store,Fast Food Restaurant
28,OYO EAST,Kosobo,Dec. 1996,118465,365.5,4.029546,7.8708,1,Art Gallery,Dessert Shop,Soccer Field,Moving Target,Water Park
29,OYO WEST,Ojongbodu,Dec. 1996,154532,5193.77,3.816631,7.957193,1,Dessert Shop,Soccer Field,Moving Target,Water Park,Bus Station


In [105]:
cluster_three = oyo_merged[oyo_merged['Cluster Labels'] == 2]
cluster_three

Unnamed: 0,LOCAL GOVERNMENT,HEADQUARTERS,DATE OFCREATION,AREA POPULATION(2006 CENSUS),LANDMASS(KM2),longitude,latitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
3,ATISBO,Tede,1996,110792,315.23,3.251587,8.403148,2,Airport,Optical Shop,Water Park,Campground,Furniture / Home Store
18,KAJOLA,Okeho,1976,139412,4329.0,3.333333,8.083333,2,Airport,Water Park,Campground,Furniture / Home Store,Fast Food Restaurant


Let visualize the clusters


In [109]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=8)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(oyo_merged['latitude'], oyo_merged['longitude'], oyo_merged['LOCAL GOVERNMENT'], oyo_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

* Cluster one can be named a set of LGA where reception, restaurants and relaxation centres are highly concentrated, it consist mostly of LGA in Ibadan.
* Cluster two can be named a set of LGA with their venue concentrated to be sport centre, recreation centre and fast food joint. Al LGA except Iseyin are known to come from the town named Oyo in the state.
* Cluster three tends to be a commercial area because of the presence of airport


## CONCLUSION
* From my analysis of the local government in Oyo state, I was able to conclude that Oyo state has low level of socialization and industrialization. It makes it a convenient and resourceful place to situate business, industries, recreation centres, restaurants and shops and many more. 
This study was performed to gain insights about how location data of Oyo state can be used to achieve great feat in decision making; from entrepreneurs to government. This study was not an official study and there are still room for many analyses to explore.
Future study might include; 
1. Considering the population of each local government area and how sparse the venues within the area is, how viable is it to set up a business or company in the area?
2. Considering the social and financial status of family in each local government area and how the venues in the area are distributed, what is the probability of sale increase if a restaurant or coffee shop is established in the area?

