<a href="https://cognitiveclass.ai"><img src = "https://www.parkia.es/wp-content/uploads/2018/01/bilbo-8.jpg" width = 100% align="center"> </a>

<h1 align=center><font size = 5>Business Analysis in Bilbao City</font></h1>

## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>

1. <a href="#item1">Introduction to the problem</a>

2. <a href="#item2">Methodology</a>

3. <a href="#item3">Results</a>

4. <a href="#item4">Discussion</a>

5. <a href="#item5">Conclusion</a>    

</font>
</div>

Before we get the data and start exploring it, let's download all the dependencies that we will need.

In [5]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


<a id='item1'></a>

## 1. Introduction to the problem

Every analysis should start with a problem to be solved, so we will raise an assumption and from it and with the tools we have, we will try to reach conclusions that help to solve the problem or help Make the right decisions.

Let's imagine that there is a group of investors who plan to open a business in Bilbao, they are not clear about the business idea, so they want to analyze the offer of leisure and services in the metropolitan area of Bilbao that has a greater population, since it is going to be an activity at street level facing the public. They want it to also be a business that is destined for the population sector with the greatest representation in the city.

Based on this assumption and with the data we have available, the ultimate goal is to make a recommendation to investors, which meets the previously established requirements.

## 2. Methodology

The first step is to collect relevant data that will help us analyze the problem. demographic data of the city will be needed, with the population divided into ages or at least age ranges. The Bilbao city council makes available to anyone who wants to consult the data related to the demography of the capital of Biscay, which we will use as a starting point for the required demographic study. The public website is Bilbao Open Data.

If you want to consider the fact of exposing this data in a more visual representation such as a map, you must have a division of the population into districts or neighborhoods. Likewise, a geojson map with the city divisions can be very valuable when it comes to treating such data and presenting them in a friendly way, as well as the most representative geographical coordinates. Fortunately, there is a community, Bilbao Data Lab, that has generated a geojson with the different districts of the city of Bilbao to generate maps of school zoning, which will be used to geolocate our own data. This file will be obtained from their public Github repository.

There is also a part in which the most representative businesses in the area with the largest population should be analyzed, so a way to obtain this data must be found. One solution may be the Foursquare public API, with which we can obtain the most representative businesses given a location.

#### Retrieving the data

First of all, lets obtain Bilbao's Coordinates:

##### Use geopy library to get the latitude and longitude values of Bilbao City.

In [12]:
address = 'Bilbao'

geolocator = Nominatim(user_agent="bio_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Bilbao are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Bilbao are 43.2630051, -2.9349915.


Bilbao has a total of 8 boroughs and 40 neighborhoods. In order to segment the neighborhoods and explore them, we will essentially need a dataset that contains the 8 boroughs and the neighborhoods that exist in each borough as well as the the latitude and logitude coordinates of each neighborhood. 

Luckily, this dataset exists for free on the web. We can obtain it throough the Bilbao official website on his Open Data section, here is its URL: https://www.bilbao.eus/opendata/es/inicio

We will manage these data later, there is a lot of information on he website.

Also, in order to show the data in a more graphical way, we would also need a map with the divisions between boroughs of the city of Bilbao, this way we will be able to show densities in a choropleth map.
These divisions can be found in the public repository of github property of BilbaoDataLab, in a geojson file.
The map with the divisions will look as follows, colours shown are based on the borough ID:

In [13]:
# prepare de geojson data of Bilbao to display a choropleth map of its districts
with open('distritos-bilbao.json') as json_data:
    bilbao_data = json.load(json_data)
    
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude', 'Count'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

# Notice how all the relevant data is in the *features* key, which is basically a list of the neighborhoods. So, let's define a new variable that includes this data.
bilbao_data = bilbao_data['features']

for data in bilbao_data:
    borough = data['properties']['distrito'] 
    neighborhood_name = data['properties']['BAR_DS_O']
        
    neighborhood_latlon = data['geometry']['coordinates'][0][0][0]
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon,
                                         'Count': borough}, ignore_index=True)
    
neighborhoods['Borough'] = neighborhoods['Count']

# Create the map with boroughs coloured

bio_geo = r'distritos-bilbao.json' # geojson file

# create a plain Bio map
bio_map = folium.Map(location=[latitude, longitude], zoom_start = 13)

# generate choropleth map
bio_map.choropleth(
    geo_data=bio_geo,
    data=neighborhoods,
    columns=['Borough', 'Count'],
    key_on='feature.properties.distrito',
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Boroughs in Bilbao City by BorughID',
    reset=True
)

# display map
bio_map

#### Load and explore the data from Bilbao OpenData

Next, let's load the data.

Let's take a quick look at the data.

#### Tranform the data into a *pandas* dataframe

The next task is essentially transforming this data of nested Python dictionaries into a *pandas* dataframe. So let's start by creating an empty dataframe.

In [14]:
df_bio = pd.read_csv('Bilbao_population.csv',  encoding = "ISO-8859-1", sep=';')

In [15]:
df_bio.head()

Unnamed: 0,FEC_OFI_AYTO,COD. DISTR.,DISTRITO,COD. BARRIO,BARRIO,Latitud,Longitud,SEXO,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,TOTAL
0,01/01/2018,1.0,DEUSTU,101.0,SAN IGNACIO,43.281709,-2.962711,HOMBRES,47,46,48,58,55,59,55,45,57,70,55,56,61,70,57,62,70,65,57,61,48,46,54,43,57,57,56,69,43,52,58,57,84,58,70,66,59,70,82,85,90,93,93,78,88,96,90,111,90,108,109,102,127,134,106,100,99,87,93,91,83,63,66,72,70,64,59,59,42,60,47,30,49,41,36,35,30,40,25,46,42,38,62,51,48,39,31,21,21,15,19,17,8,6,8,4,2,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,5803
1,01/01/2018,1.0,DEUSTU,101.0,SAN IGNACIO,43.281709,-2.962711,MUJERES,47,47,35,54,65,42,52,46,34,53,52,37,57,60,59,63,56,64,53,45,65,58,52,52,50,62,51,61,54,61,39,68,56,73,77,72,78,90,90,80,92,85,98,90,107,102,100,106,119,109,128,133,112,121,109,117,101,101,97,97,80,85,71,78,68,74,69,64,72,65,61,67,54,68,63,69,65,73,45,69,74,99,81,67,84,69,59,58,47,48,41,47,37,26,24,15,11,8,6,2,2,1,2,0,1,0,2,0,0,0,0,0,6705
2,01/01/2018,1.0,DEUSTU,102.0,ELORRIETA,43.282791,-2.965837,HOMBRES,2,6,7,6,7,9,8,4,12,11,8,12,7,7,6,5,9,9,9,7,8,14,4,5,4,7,1,7,6,5,4,4,5,5,2,2,4,7,4,9,14,7,7,9,11,20,14,9,16,16,12,16,21,15,15,12,8,15,7,11,7,10,10,9,9,8,5,4,5,3,5,4,4,7,4,2,3,1,2,1,4,0,2,2,3,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,629
3,01/01/2018,1.0,DEUSTU,102.0,ELORRIETA,43.282791,-2.965837,MUJERES,7,6,1,4,5,2,11,7,9,11,2,14,7,8,7,12,6,7,3,8,5,5,5,3,5,7,4,4,1,4,5,4,5,8,7,8,6,8,8,11,11,15,11,16,6,22,15,14,17,20,18,14,16,8,12,9,15,12,11,13,20,8,3,14,6,7,4,6,5,5,2,7,4,2,2,1,3,5,0,2,3,5,3,3,4,5,4,5,3,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,668
4,01/01/2018,1.0,DEUSTU,103.0,IBARREKOLANDA,43.274479,-2.958697,HOMBRES,32,32,25,33,30,32,29,30,37,36,35,24,39,43,33,38,29,49,42,41,35,40,54,34,51,43,49,56,51,66,71,43,61,74,62,62,58,50,73,56,59,71,63,54,68,56,56,49,65,53,67,70,81,78,72,85,76,63,87,79,82,74,95,75,85,84,78,61,78,63,60,72,66,52,49,46,33,30,37,30,48,40,43,34,48,37,27,27,17,19,5,12,7,8,1,2,0,1,1,2,0,0,0,0,0,0,0,0,0,0,0,0,4759
